Archive for January, 2009

Up at Cornell, Tom Bruce has a post about the problem of funding open access to legal materials. This brings to mind a conversation I had with a doctor friend recently about AltLaw. My friend, accustomed to the open-access requirements of NIH grants, was frankly shocked that there are no comparable rules for legal decisions.

NIH Public Access Policy

Screenshot: PubMed home page

A related problem is how to make people aware of what free services are available. AltLaw has been around for two years, and while traffic has grown steadily, it has not gotten as much attention as commercial startups operating similar services. Admittedly, we have done no advertising at all, and that’s our fault. “If you build it they will come” we thought, naïvely. But how would we advertise? I’m a programmer; the people I work with are law professors. None of us know the first thing about marketing, and quite frankly, none of us care. Seen in that light, Cornell’s recent partnership with Justia.com is a smart move that will benefit everyone working on open-access law, since it will expose more lawyers to the idea.

Comments No Comments »

It’s interesting to see the first signs of rebellion against RSpec. I jumped on the RSpec bandwagon when it first appeared, mostly so I wouldn’t have to write “assert_equals” all the time. But while I liked and used RSpec, I don’t think it made my tests any better. If anything, they were a little bit worse. I found myself testing things that were not really relevant to the code I was writing, asserting obvious things like “a newly-created record should be empty.”

When I got interested in Clojure, one of the first things I wrote was a testing library called “test-is”. I borrowed from a lot of Common Lisp testing frameworks, especially the idea of a generic assertion macro called “is”. It looks like this:

(deftest test-my-function
  (is (= 7 (my-function 3 4)))
  (is (even? (my-function 10 2))))

This is pretty basic, but it’s sufficient for low-level unit testing. So far, I think that’s how the library has been typically used. There have been occasional requests, however, for RSpec-style syntax. I can see how this would be useful for testing at a level higher than individual functions, but I have come to believe that the added semantics of RSpec are not really necessary.

Right now, the test-is library is built on the same abstractions as Clojure itself. Tests are functions, so you can apply all the same tools that already exist for handling functions. Tests can be called by name, organized into namespaces, and composed. There is almost no extra bookkeeping code that I need to write to make all of this work.

In contrast, if I were to adopt the RSpec style, I would have to write code to call, store, and organize tests. That’s more work for me, and ultimately restricts the flexibility of the library for people who use it. Furthermore, RSpec has its own set of semantics, above and beyond the language itself, which must be learned.

This is my first experience supporting a library for anyone other than myself, and I don’t want to force anyone into a particular style. A library like RSpec is a complete environment that attempts to anticipate all possible usage scenarios, so it’s grown correspondingly complicated. I want to provide a set of small tools, that can be combined with other tools to do interesting things.

Of course, by making that decision I’m already dictating, to some extent, how the library can be used. But really, what I’m trying to do is set limits for myself. I will commit to providing a flexible, extensible set of functions and macros for writing tests. I am explicitly not trying to provide a complete testing framework. If someone wants to build an RSpec-style framework on top of test-is, more power to them. I will happily try to make test-is easier to integrate into that framework.

But there’s one other thing that struck me about that article that I linked to at the beginning — the idea of putting tests and code in the same file. I think that’s a great idea, and Clojure comes ready-made to implement it. Clojure supports the idea of “metadata” on definitions. You can attach a set of arbitrary properties to any object, without affecting the value of that object.

It’s easy to attach a test function as metadata on a definition in Clojure, but the syntax is a little ugly, and there is no easy way to remove the tests from production code. So I came up with in addition to my library, the “with-test” macro. It lets you wrap any definition in a set of tests. It looks like this:

(with-test
 (defn add-numbers [a b]
   (+ a b))
 (is (= 7 (add-numbers 3 4)))
 (is (= -4 (add-numbers -6 2))))

This is equivalent to adding metadata to the function, but the syntax is a little cleaner. I’ve also added a global variable, “*load-tests*”, which can be set to false to omit tests when loading production code.

I like having each function right next to its tests. It makes it easier to remember to write tests, and easier to see how the function is supposed to behave. So to the extent that test-is will promote a testing style, this is it. But it’s a pretty radical departure from the traditional style of testing, so I’m not sure how others will react to it.

Comments 5 Comments »

A small post for the new year (in which I have resolved to write more).

The standard introduction to object-oriented programming teaches you to create a class for each type of thing you want to deal with in your program. So if you’re writing a payroll program, you would have an Employee class, a Department class, and so on. The methods of those classes are supposed to model some sort of real-world behavior. You quickly realize that real-world objects are not so easily separable, so you learn about inheritance, abstract classes, polymorphism, virtual methods, overloading, and all the other gobbledygook that is supposed to bring object-oriented programming closer to the real world, but usually just confuses programmers.

Like many others, I became suspicious of this technique after trying to use library classes that did not provide the methods I needed. Subclassing is supposed to provide a way to add new behavior to classes, but it is often thwarted by private variables, final methods, and the admonition that modifying the internals of a class is “breaking encapsulation.”

The joy I experienced when I first discovered Perl was due in large part to the realization that it provided just a few simple data structures — scaler, array, hash — that were sufficient for any sort of object I wanted to model. Moreover, every CPAN library used those same data structures, so it was easy to link them together and extend them where I needed to. Even Perl “objects” are just data structures with some added functionality. I was similarly delighted by Clojure’s abstract data structures — list, vector, map, set — all of which are manipulated with a few generic functions.

The problem with defining your own classes is that every class you define has its own, unique semantics. Someone who wants to use your class has to learn those semantics, which may not be suitable for how they want to use it. I once read (I don’t remember where) that classes are good for modeling abstract, mathematical entities like sets, but they fall apart when trying to model the real world.

So here’s a slightly radical notion: don’t use classes to model the real world. Treat data as data. Every modern programming language has at least a few built-in data structures that usually provide all the semantics you need. Even Java, the prince of “everything is a class” languages, has an excellent collections library. If your program has a list of names, you don’t need to invent a NameList object, just use a List<String>. Don’t hide it behind a specialized interface. The interface is already there: it’s a List. If somebody wants to sort the list, they already know how to do it, and you never have to write a SortedNameList class.

This is an important idea behind JSON (and YAML) — the semantics are deliberately limited, so you know what to expect. That’s also why JSON is popular for sharing data between programs written in different languages — the semantics are simple, so they’re easy to implement. The point is, don’t create new semantics when you don’t need to — you’re only making it harder to understand, extend, and reuse your code.

Comments 5 Comments »