Blog

JavaScript-like Objects in Ruby (or Lisp)

As part of my exploration of Ruby, I attended Francis Hwang’s presentation to the New York Linux Users’ Group. One feature that caught my interest in his talk was the OpenStruct class, which lets you assign values to arbitrary “slots” within an object.

require 'ostruct'

me = OpenStruct.new
me.name = "Stuart Sierra"
me.net_worth = -500

Now me.name returns “Stuart Sierra” and me.net_worth returns -500. The clever thing about this is that nowhere did I define name or net_worth as member variables or methods of a class. They just spring into being in the OpenStruct instance as soon as I use them. This is great for developing new data structures before you’re entirely sure what slots you’ll need.

Objects in JavaScript always work this way. JavaScript is prototype-based, so there are no classes, just objects. Each object effectively has its own unique namespace in which you can create new slots simply by using them.

Common Lisp can do this too. Every CL symbol has a property list that can be accessed with get.

CL-USER> (defvar me)
ME
CL-USER> (setf (get 'me 'name) "Stuart Sierra")
"Stuart Sierra"
CL-USER> (get 'me 'name)
"Stuart Sierra"

What I find curious is that this feature of Common Lisp is totally distinct from the object system. Object slots are not symbol properties, just as symbols are not objects (they are bound to objects). But one could make a passable prototype-based object system using just symbol properties and function closures. I think it’s one of those places that shows how Common Lisp grew from many different sources, resulting in similar but distinct ways of doing the same thing. Peter Seibel demonstrates that it can be useful to have both, by using symbol properties to store bookkeeping information needed by macros that generate new classes.

Ruby: Python for Lisp Programmers

A popular game on comp.lang.lisp is comparing Lisp with Python. Lispers complain that Python has a crippled lambda and no lexical closures, and they hiss and boo whenever Python’s development tends in a non-functional direction.

I’ve recently been playing with Ruby. Lo and behold, it has real lambdas, closures, and a generally more functional style. The syntax is almost as clean as Python’s, without that significant whitespace that raises so many hackles.

One thing I don’t yet understand about Ruby is the way it treats functional arguments. A function or method in Ruby can have only one functional argument, which must be the last argument. Functional arguments, or “blocks,” as they’re called in Ruby, appear outside the argument list of the function when it is called. So it looks like this for short blocks:

object.method(arg1, arg2) { |var| block code }

Or like this for multi-line blocks:

object.method(arg1, arg2) do |var|
   block code . . .
end

In both examples, |var| is the block’s own argument list, so the Lispy equivalent of the above would be:

(method object arg1 arg2 (lambda (var) . . . ))

The main advantage to this syntax comes with iterators, which I take it were the reason Ruby got blocks in the first place. Ruby’s iterators are just methods of container objects that take a block as their only argument. Then one can omit the parentheses to get something like:

list.each do |item|
   do stuff with item
end

This looks a bit like Smalltalk, one of Ruby’s inspirations. The Smalltalk approach has the advantage that most control constructs are just object methods, so they can be extended and modified as needed. (Unlike Smalltalk, Ruby has a few primitive controls that are not methods, such as if.)

So the special syntax for blocks isn’t strictly necessary, but it does make the most common use for blocks syntactically very simple. In short, a compromise. Ruby also provides the Proc class when general-purpose lambdas are needed.

I won’t advocate Lisp, Ruby, or Python over any other language; each has its place. But Lispers looking for a compromise with Python may want to give Ruby a spin.

List Processing and the Efficiency of CONS

One big difference between Lisp and most other programming languages is its use of recursion instead of iteration.

So while I was working on some text-parsing code, I fell in to this simple pattern:

(defun process (list)
  (if list
      (cons (do-something (first list))
	    (process (rest list)))))

Ah, the joys of Lisp-2.

If do-something were actually a function, this example would be better written:

(mapcar #'do-something list)

But the CL Hyperspec explicitly states, “For list traversal operations, the cdr chain of the list is not allowed to be destructively modified.” Stepping through the list myself means I can modify its structure along the way. I can insert new CONS cells into the list, or even rebind list to something else. This makes it easy to convert a flat list like this:

  • Heading 1
  • Paragraph
  • Paragraph
  • Heading 2
  • Paragraph
  • Paragraph

Into a structured tree like this:

  • Heading 1
    • Paragraph
    • Paragraph
  • Heading 2
    • Paragraph
    • Paragraph

Here I encounter the Lisper’s Dilemma: to CONS or not to CONS. Articles and books about Lisp frequently say “CONS is expensive, avoid it where possible.” The pattern above does a lot of CONSing, effectively rebuilding its input. But when I tried to modify my code to use destructive list operations like rplaca and nconc I quickly ran into problems. Destructive operations are tricky at the best of times. Using them while experimenting with new code is just plain asking for trouble. The CONSing code I ended up with was plenty fast enough for my purposes.

I think the constant repetition of “CONS is expensive” is disingenuous for Lisp, especially when it comes to teaching new programmers about its advantages. Yes, CONS is computationally inefficient, but its benefits in readability and maintainability make it is very programmer-efficient.

Do Engines and the Future of Web Applications

Or, What I Have In Common With Craig Silverstein.

I’ve been enjoying John Battele’s The Search, a history of the search engine business from Archie to Google. He quotes Google’s first employee, Craig Silverstein, as saying, “I would like to see the search engines become like the computers in Star Trek. You talk to them and they understand what you’re asking.”

This is exactly what I’ve often said I want, except that I would extend the concept beyond search engines to computers in general. This leads me to a Grand Prediction On The Future Of Computing. I call it “Do Engines.” No, seriously, stay with me here. Google has gotten pretty close to the Star Trek computer when it comes to one specific task, namely, searching for information. The Google search box lets you tell the computer “find this” and it gets what you want.

The next stage must be the ability to say “do this” and have the computer know what you want, as in “email my rÍ©sumÍ© to Craig Silverstein.” Natural-language processing has so far had little success at this. (Battele cites GNP Development, a product that added natural-language spreadsheet formulas to to Lotus 1-2-3, but it didn’t catch on.) I believe that now, with the advent of web-based applications for desktop tasks such as word processing and spreadsheets, the “do engine” can become a reality. Instead of digging through menus and dialog boxes to find a command or setting to achieve the effect you want, you just type in what you want and click “do.”

This can leverage the vastness of the web much like open-source software. If I want to accomplish a specific task with my computer, I might search on Freshmeat, CPAN, or the Common Lisp Directory to find a piece of code that does what I want. If a web-based application has a programmable API with that kind of user community, the same advantages come to everyone who uses them. Therefore, the successful web-based applications will be the ones that make it possible for users to extend them beyond what the original desigers imagined. Google Maps mashups are a perfect example of this, but the trend at the moment seems to be simply porting traditional desktop applications to JavaScript, e.g. Google Spreadsheets. By leveraging the input of millions of users, a web application can “know” how to do common tasks the same way a search engine “knows” how to find things.

Goodbye Toolbar, Hello “Ribbon”

Microsoft Office 12 will feature a new interface called the ribbon. I’m not usually a fan of Microsoft interfaces, but this one shows some potential. Office’s hierarchical menus are definitely overloaded, and “Task Panes” are clunky. Moving controls that were formally buried in modal dialogs out into the ribbon should also make working with the interface faster. My only worry here is accessibility: plain menus and dialogs are very easy to access via the keyboard. Looking at screenshots of the ribbon makes me think using it with a keyboard would be awkward.

Broken Binary Searches Oh My!

So it turns out that nearly every binary search algorithm is broken. Why? Fixed-sized integers, a holdover from machine programming that still lingers in supposedly modern languages like Java. LtU’ers are haggling to decide if it’s a type error, a bad data structure, a language defect, a failure of correctness proofs, or just a bug. Lispers are talking smugly about their automatic fixnum-to-bignum conversion.

Of course, even Common Lisp arbitrarily limits the size of arrays. On my 64-bit SBCL, it’s a little over one quintillion, or (assuming one-byte array elements) one exabyte. That’s a bigger array than even Google is likely to need in the near future. But I think it’s valid to ask why code should be limited by anything except the hardware it’s running on.

So here’s my take on the whole issue: Yes, Java/C++/C suck. But a mathmatically perfect computer will never be built. Every premature optimization is just a concession to the reality of chips and electrons. Computer Science likes to believe it’s pure Mathematics, but Engineering will always be there to keep it humble. Double-check those proofs, and then start running tests.

Static-Dynamic Pages

Despite all of the AJAX/Web 2.0 hype, the fact remains that most web pages are mostly static. The most efficient way to serve static pages is unquestionably to store them as static files on a file-based web server such as Apache. I add new pages to this site once every few days at most, but I’m still using a framework (WordPress) that requires the server to execute dozens of lines of PHP code and make several database calls for every page request. This seems like a tremendous waste, even though it makes it very easy for me, the maintainer, to add new content whenever I want to.

In the past, I built this site and others by writing programs (usually shell scripts calling an XSLT processor on a series of stylesheets) to generate static HTML from content stored in XML source files. Unfortunately, this method made it difficult or impossible to update only the content that had changed, so I ended up regenerating the entire site every time I updated one page. For a small site this was not a problem, but as the site grew larger it was cumbersome.

On a commercial site I experimented with a variation on this process: I still stored content in XML source files, but did not generate any HTML until it was requested. If an HTTP request came in for a file that did not exist on the server, an .htaccess directive would call a Perl script that generated the requested page and then saved it as a file at the original requested URL. Then on the next request for that URL, Apache would simply serve that file. To update a page, all I had to do was modify the source file and delete the “cached” HTML file.

This caching technique proved very reliable, and meant I did not have to worry much about the efficiency of my HTML-generation code. I could simulate “dynamic” pages that changed on a schedule by setting up a cron job that would delete the cached HTML file on a regular basis. By adding some directory prefixes and URL rewriting, I was even able to simulate a kind of session tracking without cookies or hidden form fields.

So getting back to my blog, why can’t it use the same technique? Store the content in a database, yes, but never render anything more than once. (In programming, this would be called memoization.) If one change would affect many pages, simply flag those pages as out-of-date and regenerate them when they are requested.

On a related note, someone on comp.lang.lisp suggested that Kenny Tilton’s Cells dataflow extension to Common Lisp might be useful for web applications. I hope to have some time to explore something along these lines using Cells as a front-end to output static HTML files.

Pushing to Lists

Fiddling with some Common Lisp code a few weeks ago, I needed to push items onto the end of a list. After thinking for a moment, I wrote this:

(defun push-end (object place)
  (nconc place (list object))

Since I was fiddling in LispWorks at the time, I discovered that LispWorks already defines its own push-end as a macro which works like this:

CL-USER> (macroexpand '(push-end object place))
(LET* ((#:ITEM563 OBJECT))
  (LET* ((#:NEW-VALUE562 (APPEND PLACE (LIST #:ITEM563))))
    (SETQ PLACE #:NEW-VALUE562)))

Leaving aside the peculiar nested let* and gensyms, this is still very different. Redefining my function as my-push-end, here are the differences:

Both destructively modify their second argument:

(defvar *x* '(a b c))

(push-end 'd *x*)  -->  (A B C D)
*x*  -->  (A B C D)

(my-push-end 'e *x*)  -->  (A B C D E)
*x*  -->  (A B C D E)

My version also works on quoted lists, but LispWorks’ version does not:

(my-push-end 'd '(a b c))  -->  (A B C D)

(push-end 'd '(a b c))
  -->  Error: Couldn't find a setf expansion for (QUOTE (A B C)).

But LispWorks’ version correctly handles temporary bindings, whereas mine does not:

(let ((temp-x *x*))
  (push-end 'f temp-x))  -->  (A B C D E F)
*x*  -->  (A B C D E)

(let ((temp-x *x*))
  (my-push-end 'f temp-x))  -->  (A B C D E F)
*x*  -->  (A B C D E F)

This explains why push is implemented using setf instead of nconc, and why LispWorks’ push-end is implemented the same way. The only way to ensure that these macros handle temporary variables correctly is to operate only on bindings and not touch their list values directly.

Of course, operating directly on lists with nconc is slightly more efficient, so in situations without global variables it might be preferable. However, in complex code, subtle bugs might emerge from modifying lists. A good compiler should optimize away the extra assignments in the push-end macro when they aren’t needed. In LispWorks, I ran:

(disassemble #'my-push-end)  -->  16 lines of assembler

;; must wrap the macro in a lambda to disassemble:
(disassemble #'(lambda (obj place) (push-end obj place)))
  -->  20 lines of assembler

So my function requires four fewer assembly instructions than LispWorks’ macro. Perhaps in some extremely list-intensive code that would make a difference, but for anything else I’d go with the slightly safer macro.