Archive for May, 2006

A common complaint about Object-Oriented Programming (OOP) is that classes can make simple data hard to deal with: “I don’t want a DirectoryList object, I just want a list of strings.” I would say this is not a flaw inherent in OOP but rather in the way it is commonly used. “Encapsulation” and “data hiding” encourage creation of abstract interfaces completely divorced from their underlying implementation, even when that underlying implementation is in fact very simple. The problem is that abstract interfaces usually only provide a tiny subset of the operations one might potentially want to perform on the data hiding behind them.

What we have, essentially, are interfaces that abstract “down” to increasing levels of speficity, farther and farther away from the data themselves. Wouldn’t it be better if we could abstract “up,” choosing the representation of the data based on what we want to do with it? For example, take a very common, very general case: a collection of data elements with similar structure. There are only a few basic operations we are likely to perform on such a collection:

  1. Add an item
  2. Remove an item
  3. Modify an item
  4. Sort the items based on some criteria
  5. Find one or more items that satisfy certain criteria
  6. Perform agregate calculations on the items (max, min, etc.)

This is the genius of SQL: it provides the first three with simple commands, INSERT, UPDATE, and DELETE; and the last three with the multipurpose SELECT. An SQL query looks the same regardless of how big the data set is or what algorithm is used to index it. Unfortunately, this idea has never quite made it into general purpose programming languages.

Ultimately, it shouldn’t matter whether data are stored in an array in local memory or on a cluster of relational database servers located across the internet. The things we want to do with the data remain the same. A compiler should be able to translate abstract operations on collections into the actual work needed to perform them in a specific implementation. Ideally, we could just start feeding data into the system and let the compiler choose the most efficient implementation, switching over automatically when the collection becomes too big for a particular method. This is similar to the way Common Lisp (and other languages) automatically switch to arbitrary-precision integers when a calculation grows too large for fixed-width integers.

For an example of this, I turn, strangely enough, to one of the hairier aspects of Perl: tied variables. Basically, a tied variable looks and behaves exactly like one of Perl’s built-in data types (scalar, array, or hash table) when you use it, but it is actually an object that calls methods in response to the operations performed on it. For example, the Tie::File module represents a file as an array of strings, one per line. Delete an item from the array, and you remove a line from the file. Same with insertion, modification, and sorting. Presto! A complete random-access file library that doesn’t require the user to learn a new interface. The implementation does not load the whole file into memory, so it is efficient even for very large files. Iterating over lines in a file used to require a special syntax unique to files; with Tie::File it can be done with standard Perl control structures that iterate over lists.

The even more ambitious Data::All module attempts the same thing in a more general way: tying any external data format (CSV, RDBMS, Excel spreadsheets…) to standard Perl data structures. It’s still alpha code, but the idea is brilliant.

In short, I would like object-oriented interfaces that abstract data towards its simplest possible representation rather than the most specific representation. I do not want objects that limit what I can do with the data they contain.

Comments 1 Comment »

In Common Lisp, you can’t do this:

(defgeneric g (x))
(defmethod g ((x (and integer (satisfies evenp))) ...)

Because (and integer ...) is not a class. It is a valid type specifier and thus could be used in declarations, but it cannot be used in a method argument list.

It’s fairly easy to see why this is the case: allowing compound type specifiers in method argument lists effectively means creating anonymous classes on-the-fly, which would play havoc with the type hierarchies on which method dispatch depends.

But why can’t we unify types and classes? Every CLOS class has a corresponding type of the same name, and the built-in types are also classes. Furthermore, it is fairly obvious that (and integer (satisfies evenp)) is a sub-type of integer, so why not go ahead and treat it like a class that inherits from integer?

Comments No Comments »

Playing with the first in a (hopefully ongoing) series of Common Lisp Quizzes, I wrote a simple text-only CAPTCHA (completely automated public Turing test to tell computers and humans apart). My solution and others are posted at CL Quiz #1.

CAPTCHA> (generate-captcha)
"You started out with forty-nine Lisp Machines. Not through any fault of your own, you lost three Lisp Machines. Years later, when you were reflecting on this whole sordid process, you counted up how many you had. What was the result?"
"46"

As per the suggestion in the quiz problem, I used CL’s format "~r" to make up simple arithmetic questions. To make it harder for a computer to parse, I embedded the questions in complex sentences selected from stored lists. The results are at least hard enough that the Google Calculator can’t solve them, but even a modest AI with language capabilities would likely have little trouble.

This and discussions on the CL-Quiz mailing list got me thinking about CAPTCHAs in general. They’re really an unwinnable proposition. In order to check that a given answer is correct, it must be possible to solve the test algorithmically. Given that, it’s only a matter of time before someone writes an algorithm to solve it. Most CAPTCHAs in common use rely on adding enough “noise” to the image, sound, or text to confuse a computer, but as computer pattern-recognition abilities get better, the necessary noise level reaches the point that humans get confused as well.

Comments No Comments »