Digital Digressions by Stuart Sierra

Beyond Syntax

Posted on August 1, 2007 by Stuart Sierra

From a 1995 paper on intentional programming: “Present day syntax had [sic] been predicated on a character stream that could have been input from punch cards, or teletypes.” Exactly! Why are we still working in a punch-card manner on million-pixel displays? Why are we still arguing about curly brackets versus parentheses when Unicode has millions of characters? In short, why do we have syntax at all?

Abandoning syntax wouldn’t mean abandoning “real” programming. Visual programming probably didn’t catch on because designing software with a mouse was too slow. But that’s a fault of the interface, not the method. There’s no reason why an tree-like editor in an Intentional Programming system couldn’t be as slick as Emacs with Paredit. Assuming the editor were also developed with Intentional Programming, it would be just as extendable as Emacs.

From the same paper: “Meta-work (i.e. consistent rewriting of a program) is not expressible in languages even though it comprises a substantial part of programmer’s workload. Lisp is an exception in this regard, but only at the cost of failing most other tests for usability.” I might argue with the usability statement, but the rest is true.

Sadly, Intentional Programming was originally developed at Microsoft, patented, and then seemingly abandoned. On the upside, the author of that paper now has his own company, Intentional Software, with patent deals with Microsoft and a blog.

Ruby vs. Lisp

Posted on July 31, 2007 by Stuart Sierra

I’m certainly not the first to do this, but I felt like writing it. Comparing Ruby and Common Lisp:

Syntax: Advantage, Common Lisp. No contest here. Ruby’s syntax is ugly, with all those ends hanging around and the { |var| ... } block syntax. The one thing Ruby has going for it is conciseness. The block syntax, ugly though it may be, is shorter than (lambda (var) ...), which may explain why Ruby uses blocks everywhere while CL programmers go out of their way to avoid lambdas.

Libraries: Advantage, Ruby. CL has some really interesting high-level libraries, but it’s lacking in bread-and-butter utilities like date/time, file handling, and string munging.

Speed: Everybody knows Ruby is slow. CL with native compilation is not. On the other hand, a lot of Ruby libraries are written in hand-optimized C, so they’re plenty fast. But just try to decipher that C code when you want to modify the behavior of a library. Slight advantage, CL.

Resource usage: Slight advantage, Ruby. Most CL implementations carry the baggage of a 20MB runtime. Ruby by itself is small, but some major libraries (e.g. Rails) are memory hogs.

Web development: Advantage, Ruby. Rails rocks. The CL web frameworks are complex and not well tested in mainstream production use.

Testing: Dead heat. Both languages have several excellent testing frameworks. There’s been some particularly innovative work on Behavior-Driven Development in Ruby with rSpec.

Metaprogramming: Advantage, CL. Although Ruby is famous for its metaprogramming abilities, it can’t compete with CL’s macro system and the Meta-Object Protocol. I find the Ruby metaprogramming methods confusing, so I fall back on evaluating template strings, which is error-prone.

Difficulty: Ruby is definitely easier to learn, especially for someone with some C/Perl background. Common Lisp is really different, and shows its age in areas like file handling.

Code organization: This can’t be quantified, but I find it easier to structure my code in Ruby. Knowing at the outset that everything will go into classes and methods makes it more obvious how code is divided up. Lisp code, the “big ball of mud” as some call it, grows more organically but leaves me with a disorganized mess of individual functions scattered across a bunch of source files. I think writing well-organized code in Common Lisp requires more discipline and attention to detail than it does in a class-oriented language like Ruby.

Conclusion: Both languages have their problems. I feel more affection for Common Lisp, and I’m glad I learned it, but Ruby will probably continue to be my primary working language for a while. If I were starting something really unique that I would have to build from the ground up, Lisp would be my choice. But for building dynamic web sites, Ruby gets the job done right now. I hope Ruby will continue to evolve in a Lispy direction, with an abstract syntax tree and a macro system. Throw in an optimizing compiler and it might almost be perfect.

The Virtues of Static Typing

Posted on July 23, 2007 by Stuart Sierra

When I first discovered dynamically-typed languages like Perl and Ruby, I was convinced of their superiority to statically-typed languages like C++. No longer did I have to waste hours typing redundant type declarations or adding casts just to make the compiler happy. Dynamic typing allowed me to work quickly and unencumbered in what felt like a natural manner.

Lately, however, I’ve been experiencing some of the drawbacks. Many times I have started a long batch process running in Ruby, only to come back hours later to find that it crashed before getting halfway through because I made a mistake. Often, the mistakes were simple typos or misspellings. Sometimes they were the result of sloppy code, like not checking if a returned value is nil.

Because my code isn’t error-checked before it runs — in Rails, some code doesn’t even get syntax-checked before it runs, thanks to lazy autoloading — my mistakes don’t manifest until hours or days into a process. Then I have to fix them and start the process over.

Now I realize while statically-typed and -checked languages were so important in the pre-PC era. If you had a turnaround time of several hours between submitting your program for execution and receiving the results, you would want to be certain that it would work correctly. The instant feedback of the write-compile-run (or, with an interpreted language, just write-run) cycle makes it easier to be sloppy.

A static compiler or syntax checker would probably catch most of the mistakes I just described. So would proper unit tests. But the ability to constantly restructure large sections of code as I explore the problem — an ability granted by an interpreted, dynamic language — also makes it hard to keep tests up-to-date. Writing tests comes to seem as tedious and unhelpful as type declarations.

What I really want is a “magic” compiler that can look through my code and point out all the mistakes. A combination of good unit tests and a test coverage tool would come close, but that requires a lot of extra effort on my part.

The Weirdness of C++

Posted on July 15, 2007 by Stuart Sierra

I’ve been dredging up my C++ for a class recently, and I’m struck by just how weird it feels now that I spend most of my time with Ruby.

I was all proud of myself for remembering how to write a copy constructor. Then I ran into a situation like this:

MyClass a = foo;
MyClass c;
c = foo;

The first line was fine; the last segfaulted. What the heck?

I had hit upon the subtle difference between assignment at construction time and normal assignment. The former calls the copy constructor, the latter calls operator=.

MyClass a = foo;  // calls copy constructor MyClass(foo)
MyClass c;
c = foo;  // calls operator=

I had neglected to provide an operator= for MyClass, so the compiler invented one for me. Since MyClass contained pointers to other structures, that naturally led to problems pretty quickly.

Had I not known to look up the specific behavior of operator=, and then implement a correct one for MyClass, I would have been really confused. This sort of subtlety is what makes me think of C++ as a “hard” language and Ruby as an “easy” language.

To be sure, Ruby has its subtle quirks too, but they occur less frequently and usually around “advanced” topics like metaprogramming. In C++, even a fundamental operation like assignment can have strange, unpredictable properties.

Paragraph Numbering and the Semantics of BLOCKQUOTE

Posted on June 21, 2007 by Stuart Sierra

Continuing on the theme of HTML’s flaws, consider the humble BLOCKQUOTE. While long used simply to indent text, it has a recognizable semantic meaning: a long quotation from another work.

A block quote may contain multiple paragraphs, so BLOCKQUOTE logically enough is a block-level element that contains other block-level elements like P.

But suppose I want to number the paragraphs in my document? I don’t want to count the P elements within the BLOCKQUOTE, because they’re part of a larger paragraph that contains the quote. Furthermore, I don’t want to count the P immediately after the BLOCKQUOTE either, because it is actually just the continuation of the paragraph before the quote. Except in those rare cases when a paragraph ends with a block quote.

So really, what I’d like to do is this:

<p>
  This is the beginning of my body paragraph.
  <blockquote>
    <p>First paragraph of the quotation.</p>
    <p>Second paragraph of the quotation.</p>
  </blockquote>
  Here the body paragraph continues.
</p>

But I can’t, because BLOCKQUOTE can’t appear inside a P. (Looser forms of HTML will allow it, but only by inserting an assumed </P> before the BLOCKQUOTE.)

To be fair, word processors don’t handle this correctly either. Even LaTeX requires you to flag the continuing paragraph with \noindent. The “correct” definition of BLOCKQUOTE, in my opinion, is awkward: BLOCKQUOTE itself should be an in-line element, but it should contain block-level elements.

If HTML had a FOOTNOTE tag, I would want it to work the same way.

Learning to Cook With Ruby

Posted on June 17, 2007 by Stuart Sierra

I don’t much like programming language tutorials. They’re useful for getting the general sense of what a language is all about, but they inevitably elide too many crucial details to teach you how to write a real program.

When I got interested in Ruby, I read the on-line version of Programming Ruby: The Pragmatic Programmer’s Guide and floated in a confused, dreamlike state through Why’s (Poignant) Guide to Ruby.

I didn’t feel like either one gave me a complete grasp of the language, though. I started digging into the standard library documentation — my usual second stop when learning a new language — but found it pretty brief and confusing for a newcomer to Ruby.

Then I picked up a copy of The Ruby Cookbook. This was the perfect programming book for someone who likes to dive right in and start writing programs. It’s not just about Ruby the language, nor is it a tour of the standard library. Rather, it’s a comprehensive list of everyday programming tasks — ranging from “Building a string from parts” to “Managing Windows services” — with solutions for how to do them in Ruby. Each solution is followed by a discussion of the whys and wherefores, the language features or external libraries involved. Reading the discussions gives insight into how Ruby works and why it works the way it does.

The Ruby Cookbook was an enjoyable way to learn Ruby, which I now use for the majority of my work. It’s even got some geek humor embedded in its code examples, like this one on string templates:

template = 'Oceania has always been at war with %s.'
template % 'Eurasia'
#  =>  "Oceania has always been at war with Eurasia."
template % 'Eastasia'
#  =>  "Oceania has always been at war with Eastasia."

Not long after reading the book, I got to know one of the authors, Leonard Richardson, and his wife Sumana. Nice people, and Leonard’s a good cook of the non-Ruby variety as well.

I’ll end with my favorite code example in the book, in a section about the IMAP email client library:

From: jabba@thehuttfoundation.org
Subject: Bwah!
---
From: jabba@thehuttfoundation.org
Subject: Go to do wa IMAP

HTML Footnotes

Posted on June 15, 2007 by Stuart Sierra

Leonard’s comment on my post about XML and footnotes got me thinking about representing footnotes in HTML. Not the visual presentation — there are lots of options for that, using CSS, JavaScript, and internal links — but the semantic one. In other words, using nothing but semantically-meaningful HTML tags (DIV, SPAN, P, A), how should one mark up a footnote in a document?

I believe it’s a failing of HTML that it does not include a footnote tag. I’ve heard that earlier drafts did include it, but it was dropped for lack of interest. Clearly, early HTML users — math and comp. sci. types — weren’t as fond of footnotes as those in the humanities. Lawyers, for instance, are fanatical about footnotes. There are entire books on proper citation for legal documents.

So if you’re building, as I am now, a web-based database for legal documents, what do you do? An article called Scholarship on the Web: Managing & Presenting footnotes and Endnotes lists several possibilities. Basically, if you don’t want to use internal links, you can put the entire footnote inside a SPAN and use CSS and/or JavaScript to display it as a pop-up or a side note. This is great for on-screen reading, but there’s an edge case: multi-paragraph footnotes. Lawyers love footnotes so much they can go on for many paragraphs, even including block quotations.

It’s not valid XHTML to put a block-level element like P inside of a SPAN. So put the footnote in a DIV instead of a SPAN, I thought. No dice: you can’t put a DIV inside of a P. So in the end I have to put the footnote in a DIV entirely outside P that referenced it. I need some way to connect the footnote with the place in the text where it’s referenced, so at this point I might as well go back to internal links.

Perhaps I’m making this more complicated than it really is.

Defining Eval … In a Library

Posted on June 13, 2007 by Stuart Sierra

I was at LispNYC last night listening to Anton van Straaten discuss his work on R6RS, the new Scheme standard. One surprising change from R5RS is that eval is defined in a library.

Eval, in a library? Holy scopes! The Common Lispers in the audience were aghast. Even the Schemers were a tad confused. Anton explained. The goal of Scheme, he said, has always been to incorporate as much dynamic behavior as possible without sacrificing efficient compilation. Towards this end, the R6RS eval is more limited than Common Lisp’s eval.

As I understand it, eval was central to McCarthy’s original design for LISP. Eval is LISP, and LISP is eval. Of course, as others reminded me later, LISP’s definition of eval with dynamic scope led to the 30-year “funarg” bug. Eval is also a thorn in the side of anyone trying to generate a standalone Common Lisp program — the possibility of a call to eval means the compiler has to include the entire Common Lisp runtime (up to 20 MB, depending on the implementation) in the final executable.

This got me thinking about Ruby, too. While Common Lisp and Scheme actually discourage the use of eval, Ruby is pretty casual about it. In a purely interpreted, dynamic language, that’s not a problem. But it would make it tough to implement a static Ruby compiler.

Where Does the XML Go?

Posted on June 7, 2007 by Stuart Sierra

Here’s a question that’s been bugging me for a while: what’s the best way to store information that is a mixture of highly- and loosely-structured data? For example, a collection of documents like Project Posner. Certain attributes of each document like the title, date, and citation fit easily into a normalized relational database model. But the body can only be described with some kind of markup.

I could just use HTML, except for one problem: my documents have to handle footnotes, for which HTML does not provide a tag. (As an aside, footnotes are a pain whether you’re doing web design or typesetting.)

On Project Posner, I compromised: everything is stored in a MySQL database, and the documents table has a “body” column that contains my own made-up XML syntax.

I could, in theory, normalize everything, even individual paragraphs. But that would be a nightmare to code and deadly slow. I could also store everything as XML documents. But then I’d have to reinvent all the facilities that MySQL (and ActiveRecord) provide, like transaction handling, auto-incrementing IDs, and so on.

For another project, I’m trying to create a pseudo-database that stores everything as XML files and uses Ferret for searching. I was going to use Ferret for full-text search anyway, so my original thought was to save overhead by not bothering with MySQL indexes. It works, but looking over it I realize that most of the data could be normalized to fit into the standard relational model. I’d still need a blob of XML data somewhere, but it could be in the database as easily as a file. What have I really gained, besides an impressively large and complex pile of code?

Back to Blogging, Elsewhere

Posted on April 6, 2007 by Stuart Sierra

I have achieved the dream of every geek: I have become a professional blogger. Well, sort of. In February I was hired as the Assistant Director of the new Program on Law and Technology at Columbia Law School. I’ll be doing a mixture of programming, web design, and administration for … whatever we decide to do. I’ve spent the past two months putting together a new website for the program itself, including a blog covering news of interest in the intersection of law and technology.

Currently I’m the only one writing for the blog. Since I’m not a lawyer myself, I’m mostly limiting myself to linking to news stories and blogs by law professors. But I’m trying to encourage some of the faculty at Columbia to try blogging as well. Hopefully they’ll get excited about it and we’ll have a great new forum for discussing tech law issues.

As you can see, of course, I haven’t had much time to contribute to my own personal blog, but I have a few ideas that I want to write up some time soon.