Despite all of the AJAX/Web 2.0 hype, the fact remains that most web pages are mostly static. The most efficient way to serve static pages is unquestionably to store them as static files on a file-based web server such as Apache. I add new pages to this site once every few days at most, but I’m still using a framework (WordPress) that requires the server to execute dozens of lines of PHP code and make several database calls for every page request. This seems like a tremendous waste, even though it makes it very easy for me, the maintainer, to add new content whenever I want to.
In the past, I built this site and others by writing programs (usually shell scripts calling an XSLT processor on a series of stylesheets) to generate static HTML from content stored in XML source files. Unfortunately, this method made it difficult or impossible to update only the content that had changed, so I ended up regenerating the entire site every time I updated one page. For a small site this was not a problem, but as the site grew larger it was cumbersome.
On a commercial site I experimented with a variation on this process: I still stored content in XML source files, but did not generate any HTML until it was requested. If an HTTP request came in for a file that did not exist on the server, an .htaccess directive would call a Perl script that generated the requested page and then saved it as a file at the original requested URL. Then on the next request for that URL, Apache would simply serve that file. To update a page, all I had to do was modify the source file and delete the “cached” HTML file.
This caching technique proved very reliable, and meant I did not have to worry much about the efficiency of my HTML-generation code. I could simulate “dynamic” pages that changed on a schedule by setting up a cron job that would delete the cached HTML file on a regular basis. By adding some directory prefixes and URL rewriting, I was even able to simulate a kind of session tracking without cookies or hidden form fields.
So getting back to my blog, why can’t it use the same technique? Store the content in a database, yes, but never render anything more than once. (In programming, this would be called memoization.) If one change would affect many pages, simply flag those pages as out-of-date and regenerate them when they are requested.
On a related note, someone on comp.lang.lisp suggested that Kenny Tilton’s Cells dataflow extension to Common Lisp might be useful for web applications. I hope to have some time to explore something along these lines using Cells as a front-end to output static HTML files.
On the web, memoization is known as “caching”. Just place mod_cache in front of your application write code to cons like mad ;)