Org Mode for Blogging

On the impossibility of separating content from presentation

I like writing in Emacs’ Org mode, not because it’s an especially good means of writing prose, but because I already use Org so heavily for notes and source code. My last post was written in Org mode. But my blog remains, as it always has been, WordPress.

I don’t make heavy use of WordPress features or plugins, but it’s kept my blog working for almost as long as I’ve been using Emacs. I’m reluctant to replace it simply for the quantity of content that would have to be ported, and the standard blog-like features — comments, RSS, search — that I would have to recreate if I wanted to switch to, say, a static-site generator.

There are a couple of ways to get content from Org into WordPress. The first is to use Org’s built-in HTML export, then copy and paste that into the WordPress editor. That works, but the HTML Org exports is not optimal for WordPress: It introduces extra markup that can clash with WordPress’ built-in formatting.

In recent versions, WordPress introduced a WYSIWYG-like editor, which I am using to write this post. It produces cleaner markup, and the editor isn’t bad, really, but it’s not ideal for a post with lots of embedded code snippets.

Then there’s org2blog, an Emacs package that has full interface to the WordPress API in Emacs Lisp. It has its own export, based on Org’s HTML export but adding enhancements specifically for WordPress. It’s impressive work, but as with anything that passes through so many layers of transformation, when things go wrong it’s hard to know where to start looking for them.

For example, when I first exported my last post from Org to WordPress, there was a caption like “Listing 1″ attached to each source code block. I spent many minutes in frustration, trying to find where that addition came from. Was it in WordPress, in Org’s HTML export? Eventually, I found it in the custom export code of org2blog.

And therein lies the problem. Org mode wasn’t developed as a formal language. I’m not even sure if it has a formal grammar. Instead, it’s a collection of conventions that evolved over time. HTML isn’t much better, despite all the efforts at standardization. And, of course, WordPress has its own conventions and idioms about HTML. The process of converting from one format to another is ad-hoc at each stage, subject to the whims of the programmer who implemented that stage.

Every conversion between syntaxes or formats suffers from this problem. Markdown is just as bad, if not worse.

The ideal of almost every markup language has been to separate content from presentation. But in reality it’s almost impossible to separate the two.