Data formats are annoying. As much as half the code in any large software project consists of translating from one data representation — objects, SQL tables, files, XML, RDF, JSON, YAML, CSV, Protocol Buffers, Avro, XML-RPC — to another.
Each format has its own strengths and weaknesses. Often, no single representation is complete enough to be considered “canonical.” The only canonical representation is an abstract one, a platonic ideal in the mind of some developer. Since this platonic ideal cannot be implemented in code, different people have different expectations for how a particular model is supposed to work.
There are two options: Either you re-implement the model, with all its features and constraints, for each format, and hand-code all the translations; or you use a “smart” library that automatically translates between different representations. ActiveRecord and Hibernate are popular examples of the latter.
The problem with “smart” libraries is that they can never be smart enough. At some point you always have to dig into the generated SQL or whatever to make them work efficiently, or even correctly. Frequently this is impossible without hacking the library sources, a daunting tangle of generated and meta-programmed code. The library that was supposed to make your life easier instead makes it hell.
Do these “smart” libraries really save any time? Would it be easier to just write the translation code in the first place? We’ll never know, because programmers can’t resist “smart” systems, the myth that you can “do more with less code.” You can never do more with less, unless what you’re doing is the lowest common denominator of what everyone else is doing. And if that is what you’re doing, then why bother?

Entries (RSS)