Archive for June, 2006

So it turns out that nearly every binary search algorithm is broken. Why? Fixed-sized integers, a holdover from machine programming that still lingers in supposedly modern languages like Java. LtU’ers are haggling to decide if it’s a type error, a bad data structure, a language defect, a failure of correctness proofs, or just a bug. Lispers are talking smugly about their automatic fixnum-to-bignum conversion.

Of course, even Common Lisp arbitrarily limits the size of arrays. On my 64-bit SBCL, it’s a little over one quintillion, or (assuming one-byte array elements) one exabyte. That’s a bigger array than even Google is likely to need in the near future. But I think it’s valid to ask why code should be limited by anything except the hardware it’s running on.

So here’s my take on the whole issue: Yes, Java/C++/C suck. But a mathmatically perfect computer will never be built. Every premature optimization is just a concession to the reality of chips and electrons. Computer Science likes to believe it’s pure Mathematics, but Engineering will always be there to keep it humble. Double-check those proofs, and then start running tests.

Comments No Comments »

Despite all of the AJAX/Web 2.0 hype, the fact remains that most web pages are mostly static. The most efficient way to serve static pages is unquestionably to store them as static files on a file-based web server such as Apache. I add new pages to this site once every few days at most, but I’m still using a framework (WordPress) that requires the server to execute dozens of lines of PHP code and make several database calls for every page request. This seems like a tremendous waste, even though it makes it very easy for me, the maintainer, to add new content whenever I want to.

In the past, I built this site and others by writing programs (usually shell scripts calling an XSLT processor on a series of stylesheets) to generate static HTML from content stored in XML source files. Unfortunately, this method made it difficult or impossible to update only the content that had changed, so I ended up regenerating the entire site every time I updated one page. For a small site this was not a problem, but as the site grew larger it was cumbersome.

On a commercial site I experimented with a variation on this process: I still stored content in XML source files, but did not generate any HTML until it was requested. If an HTTP request came in for a file that did not exist on the server, an .htaccess directive would call a Perl script that generated the requested page and then saved it as a file at the original requested URL. Then on the next request for that URL, Apache would simply serve that file. To update a page, all I had to do was modify the source file and delete the “cached” HTML file.

This caching technique proved very reliable, and meant I did not have to worry much about the efficiency of my HTML-generation code. I could simulate “dynamic” pages that changed on a schedule by setting up a cron job that would delete the cached HTML file on a regular basis. By adding some directory prefixes and URL rewriting, I was even able to simulate a kind of session tracking without cookies or hidden form fields.

So getting back to my blog, why can’t it use the same technique? Store the content in a database, yes, but never render anything more than once. (In programming, this would be called memoization.) If one change would affect many pages, simply flag those pages as out-of-date and regenerate them when they are requested.

On a related note, someone on comp.lang.lisp suggested that Kenny Tilton’s Cells dataflow extension to Common Lisp might be useful for web applications. I hope to have some time to explore something along these lines using Cells as a front-end to output static HTML files.

Comments 1 Comment »

Fiddling with some Common Lisp code a few weeks ago, I needed to push items onto the end of a list. After thinking for a moment, I wrote this:

(defun push-end (object place)
  (nconc place (list object))

Since I was fiddling in LispWorks at the time, I discovered that LispWorks already defines its own push-end as a macro which works like this:

CL-USER> (macroexpand '(push-end object place))
(LET* ((#:ITEM563 OBJECT))
  (LET* ((#:NEW-VALUE562 (APPEND PLACE (LIST #:ITEM563))))
    (SETQ PLACE #:NEW-VALUE562)))

Leaving aside the peculiar nested let* and gensyms, this is still very different. Redefining my function as my-push-end, here are the differences:

Both destructively modify their second argument:

(defvar *x* '(a b c))

(push-end 'd *x*)  -->  (A B C D)
*x*  -->  (A B C D)

(my-push-end 'e *x*)  -->  (A B C D E)
*x*  -->  (A B C D E)

My version also works on quoted lists, but LispWorks’ version does not:

(my-push-end 'd '(a b c))  -->  (A B C D)

(push-end 'd '(a b c))
  -->  Error: Couldn't find a setf expansion for (QUOTE (A B C)).

But LispWorks’ version correctly handles temporary bindings, whereas mine does not:

(let ((temp-x *x*))
  (push-end 'f temp-x))  -->  (A B C D E F)
*x*  -->  (A B C D E)

(let ((temp-x *x*))
  (my-push-end 'f temp-x))  -->  (A B C D E F)
*x*  -->  (A B C D E F)

This explains why push is implemented using setf instead of nconc, and why LispWorks’ push-end is implemented the same way. The only way to ensure that these macros handle temporary variables correctly is to operate only on bindings and not touch their list values directly.

Of course, operating directly on lists with nconc is slightly more efficient, so in situations without global variables it might be preferable. However, in complex code, subtle bugs might emerge from modifying lists. A good compiler should optimize away the extra assignments in the push-end macro when they aren’t needed. In LispWorks, I ran:

(disassemble #'my-push-end)  -->  16 lines of assembler

;; must wrap the macro in a lambda to disassemble:
(disassemble #'(lambda (obj place) (push-end obj place)))
  -->  20 lines of assembler

So my function requires four fewer assembly instructions than LispWorks’ macro. Perhaps in some extremely list-intensive code that would make a difference, but for anything else I’d go with the slightly safer macro.

Comments 2 Comments »

Every programming language has idioms–patterns used so frequently that they are almost considered part of the language itself. (See these C idioms for examples.) Common Lisp, with almost unlimited possibilities for syntax, depends more heavily on idioms to remain comprehensible. Some examples are defsomething or define-something for definitions, something-p for predicate tests, and make-something for constructors.

Marco Antoniotti created a clever macro package called DEFINER that codifies the most common of these, definitions, into a single macro with consistent syntax for any object that is being defined. With DEFINER, all definitions can be made with a single macro: def, which is followed by a symbol specifying what is being defined. So defun becomes def function, defvar becomes def var, and define-condition becomes def condition.

Of course, this is Lisp, so def can be turned on itself to produce new definition forms with def definer. New def forms can then be created that follow the same syntactic pattern of:

(def type-of-thing name properties...)

As Marco writes, DEFINER “makes it possible to homogeneize a natural programming practice followed by several CL programmers.”

I say, why not extend this practice to other idioms? Here’s a list of other Common Lisp idioms I think could be well-served by the same sort of homogeneization:

with
Use with thing to delimit the scope of certain things–local variables, special variables, functions, open files, etc. Then let becomes with lexicals, labels becomes with functions, and with-open-file becomes with file.
make
A generic constructor for any anonymous (unnamed) object. This could ensure that all object constructors follow a predictable pattern after make instance, and turns lambda into the infinitely more readable make function. Also begs the question of why one can’t create anonymous classes.
is
A generic predicate, borrowed from the FiveAM testing framework. Should alway return a boolean true/false value. Gives us the more readable is even instead of evenp.
equal
Replace Lisp’s scattering of equality predicates–eq, eql, equal, equalp, =, string=, etc.–with forms like equal object, equal contents, equal number, and equal string. One could alternately use is equal, although that looks odd in English grammar.
as
A generic type converter. Makes it more obvious, for example, when string is being used as a type specifier and when as a coercion function. Also makes it very easy to define new converters between types, assuming these are implemented as generic functions.
get
Generic accessor. Replaces both primitives like car as well as object-related functions like slot-value.
set
Replacement for setf. Every get form should have a corresponding set. Primitives like rplacd become set cdr.

And of course, to top it off we would want def with, def maker, def is, def equality, and def as. And while we’re at it, def idiom!

The point of all this is to make code more readable. One can reduce the 978 Common Lisp symbols to pairs of symbols, the first naming the general category of operation and the second specifying the exact operation within that category.

When examining unfamiliar code that uses these idiomatic macros, a programmer can look at the categorical names to get a rough idea of what is happening even without knowing about the specific functions being used. This can make Common Lisp more manageable without sacrificing any of its expressiveness.

Comments 1 Comment »