Where Does the XML Go? – Digital Digressions by Stuart Sierra

Here’s a question that’s been bugging me for a while: what’s the best way to store information that is a mixture of highly- and loosely-structured data? For example, a collection of documents like Project Posner. Certain attributes of each document like the title, date, and citation fit easily into a normalized relational database model. But the body can only be described with some kind of markup.

I could just use HTML, except for one problem: my documents have to handle footnotes, for which HTML does not provide a tag. (As an aside, footnotes are a pain whether you’re doing web design or typesetting.)

On Project Posner, I compromised: everything is stored in a MySQL database, and the documents table has a “body” column that contains my own made-up XML syntax.

I could, in theory, normalize everything, even individual paragraphs. But that would be a nightmare to code and deadly slow. I could also store everything as XML documents. But then I’d have to reinvent all the facilities that MySQL (and ActiveRecord) provide, like transaction handling, auto-incrementing IDs, and so on.

For another project, I’m trying to create a pseudo-database that stores everything as XML files and uses Ferret for searching. I was going to use Ferret for full-text search anyway, so my original thought was to save overhead by not bothering with MySQL indexes. It works, but looking over it I realize that most of the data could be normalized to fit into the standard relational model. I’d still need a blob of XML data somewhere, but it could be in the database as easily as a file. What have I really gained, besides an impressively large and complex pile of code?

3 Replies to “Where Does the XML Go?”

Leonard Richardson says:

June 8, 2007 at 9:55 am

If the only thing keeping you from using HTML is the lack of a footnote tag, you might create a microformat for HTML footnotes. It sounds like defining a “footnote” class for DIV should solve your problem.
Stuart says:

June 14, 2007 at 9:26 am

Leonard Richardson Says: “It sounds like defining a ‘footnote’ class for DIV should solve your problem.”

I’d like to do that, but I couldn’t come up with a general-purpose representation for footnotes in HTML. What’s the best way to do it, anyway? Put them at the bottom of a section with internal links (what I do now)? JavaScript popups? Margin notes? Any one of these is more than I can accomplish with a simple DIV and CSS. Maybe some JavaScript rewriting could do it, but I’m not much of a JS hacker. So I took the complicated route and made my own FOOTNOTE tag that gets used twice—once for the linked reference, once for the footnote itself—when the page is displayed.
Digital Digressions by Stuart Sierra » Blog Archive » HTML Footnotes says:

June 15, 2007 at 11:40 am

[…] comment on my post about XML and footnotes got me thinking about representing footnotes in HTML. Not the visual presentation — there are […]

Comments are closed.