Data Are Data, Not Objects

A small post for the new year (in which I have resolved to write more).

The standard introduction to object-oriented programming teaches you to create a class for each type of thing you want to deal with in your program. So if you’re writing a payroll program, you would have an Employee class, a Department class, and so on. The methods of those classes are supposed to model some sort of real-world behavior. You quickly realize that real-world objects are not so easily separable, so you learn about inheritance, abstract classes, polymorphism, virtual methods, overloading, and all the other gobbledygook that is supposed to bring object-oriented programming closer to the real world, but usually just confuses programmers.

Like many others, I became suspicious of this technique after trying to use library classes that did not provide the methods I needed. Subclassing is supposed to provide a way to add new behavior to classes, but it is often thwarted by private variables, final methods, and the admonition that modifying the internals of a class is “breaking encapsulation.”

The joy I experienced when I first discovered Perl was due in large part to the realization that it provided just a few simple data structures — scaler, array, hash — that were sufficient for any sort of object I wanted to model. Moreover, every CPAN library used those same data structures, so it was easy to link them together and extend them where I needed to. Even Perl “objects” are just data structures with some added functionality. I was similarly delighted by Clojure’s abstract data structures — list, vector, map, set — all of which are manipulated with a few generic functions.

The problem with defining your own classes is that every class you define has its own, unique semantics. Someone who wants to use your class has to learn those semantics, which may not be suitable for how they want to use it. I once read (I don’t remember where) that classes are good for modeling abstract, mathematical entities like sets, but they fall apart when trying to model the real world.

So here’s a slightly radical notion: don’t use classes to model the real world. Treat data as data. Every modern programming language has at least a few built-in data structures that usually provide all the semantics you need. Even Java, the prince of “everything is a class” languages, has an excellent collections library. If your program has a list of names, you don’t need to invent a NameList object, just use a List<String>. Don’t hide it behind a specialized interface. The interface is already there: it’s a List. If somebody wants to sort the list, they already know how to do it, and you never have to write a SortedNameList class.

This is an important idea behind JSON (and YAML) — the semantics are deliberately limited, so you know what to expect. That’s also why JSON is popular for sharing data between programs written in different languages — the semantics are simple, so they’re easy to implement. The point is, don’t create new semantics when you don’t need to — you’re only making it harder to understand, extend, and reuse your code.

5 Replies to “Data Are Data, Not Objects”

  1. Your argument against object orientation is a fairly common one in the world of single dispatch languages such as Java, Python and Ruby.

    When you use languages which support multiple dispatch, you see that the argument for keeping data and functionality separate is strenghened, but at the same time there’s a clarity on your objects that lets them know which functions can be applied to them.

    This may seem unclear but it’s difficult to explain other than you’re right, given the constraints of your language, but if you look past it, you will come around again.

  2. Great article.

    My personal opinion is the OOP was used to make programming accessible, for better or worse, to a larger group of people. When building applications a lot of times it makes sense to have your code conform more closely to exactly what you want it to do, instead of spending massive amounts of time defining objects and re-factoring your inheritance hierarchy.

    I think your example of JSON is perfect representation of getting a lot of functionality out of a simple standard.

  3. emacsen said “When you use languages which support multiple dispatch, you see that the argument for keeping data and functionality separate is strengthened”.

    You’re right, multiple dispatch (as in CLOS) makes things better. But keeping data and functionality separate is exactly my point — data should not be tied to a particular implementation.

    Clojure supports CLOS-style multiple dispatch without classes. Each “multimethod” has an arbitrary “dispatch function” that determines which method gets called. It’s very flexible, and very powerful. http://clojure.org/multimethods

  4. Yes, but isn’t a class supposed to model the behavior of a real-world object ? Like for example a class for a CD-player will have methods such as play,stop, ffw etc. Inside the class the songs might be stored in a hash, but the user never sees them. The class methods are supposed to represent only semantic and behavior of the real-world object, and has nothing to do with data structures…

  5. David Moon’s new language, PLOT (Programming Language for Old Timers), carefully separates data from program at its lowest level, while providing for object-oriented programming as a higher level facility.

    At the Lisp conference (ilc09.org), in his invited talk, he mainly talked about macros, but he first did an introduction to the basic PLOT concepts, and this is one of the things he said. He has not released general info about PLOT since he feels it’s not ready yet, but I’m pretty sure he intends to. He insists that he’s just doing this as a hobby. It does support multimethods.

    I don’t know enough about PLOT nor Clojure yet to compare them. So many technologies to learn, so little time.

Comments are closed.