Where Does the XML Go?

Here’s a question that’s been bugging me for a while: what’s the best way to store information that is a mixture of highly- and loosely-structured data? For example, a collection of documents like Project Posner. Certain attributes of each document like the title, date, and citation fit easily into a normalized relational database model. But the body can only be described with some kind of markup.

I could just use HTML, except for one problem: my documents have to handle footnotes, for which HTML does not provide a tag. (As an aside, footnotes are a pain whether you’re doing web design or typesetting.)

On Project Posner, I compromised: everything is stored in a MySQL database, and the documents table has a “body” column that contains my own made-up XML syntax.

I could, in theory, normalize everything, even individual paragraphs. But that would be a nightmare to code and deadly slow. I could also store everything as XML documents. But then I’d have to reinvent all the facilities that MySQL (and ActiveRecord) provide, like transaction handling, auto-incrementing IDs, and so on.

For another project, I’m trying to create a pseudo-database that stores everything as XML files and uses Ferret for searching. I was going to use Ferret for full-text search anyway, so my original thought was to save overhead by not bothering with MySQL indexes. It works, but looking over it I realize that most of the data could be normalized to fit into the standard relational model. I’d still need a blob of XML data somewhere, but it could be in the database as easily as a file. What have I really gained, besides an impressively large and complex pile of code?

3 Replies to “Where Does the XML Go?”

  1. If the only thing keeping you from using HTML is the lack of a footnote tag, you might create a microformat for HTML footnotes. It sounds like defining a “footnote” class for DIV should solve your problem.

  2. Leonard Richardson Says: “It sounds like defining a ‘footnote’ class for DIV should solve your problem.”

    I’d like to do that, but I couldn’t come up with a general-purpose representation for footnotes in HTML. What’s the best way to do it, anyway? Put them at the bottom of a section with internal links (what I do now)? JavaScript popups? Margin notes? Any one of these is more than I can accomplish with a simple DIV and CSS. Maybe some JavaScript rewriting could do it, but I’m not much of a JS hacker. So I took the complicated route and made my own FOOTNOTE tag that gets used twice—once for the linked reference, once for the footnote itself—when the page is displayed.

Comments are closed.