Dirty Necessary Money

Up at Cornell, Tom Bruce has a post about the problem of funding open access to legal materials. This brings to mind a conversation I had with a doctor friend recently about AltLaw. My friend, accustomed to the open-access requirements of NIH grants, was frankly shocked that there are no comparable rules for legal decisions.

NIH Public Access Policy
Screenshot: PubMed home page

A related problem is how to make people aware of what free services are available. AltLaw has been around for two years, and while traffic has grown steadily, it has not gotten as much attention as commercial startups operating similar services. Admittedly, we have done no advertising at all, and that’s our fault. “If you build it they will come” we thought, naïvely. But how would we advertise? I’m a programmer; the people I work with are law professors. None of us know the first thing about marketing, and quite frankly, none of us care. Seen in that light, Cornell’s recent partnership with Justia.com is a smart move that will benefit everyone working on open-access law, since it will expose more lawyers to the idea.

Fragmentation and the Failure of the Web

What makes an on-line community?  In the past two weeks I have received announcements of three new “communities” all interested in using open-source software to retrieve, share, and analyze data from or about governments.  Most of these announcements say the same thing: “A lot people seem to be working on this, but they aren’t talking to each other.”  Each group has a slightly different slant, but in my mind I lump them all under the heading “Semantic Government,” i.e. building the semantic web for government data.

I started casting out a few search queries, and quickly compiled a list of eight different mailing lists and/or wikis devoted to this subject.  That doesn’t include for-profits like Justia.com or larger non-profits like the Sunlight Foundation.

This is a problem.  Not only do I have to subscribe to half a dozen mailing lists to keep abreast of what others are doing, I also have to cross-post to several lists when I want to announce something myself.  So far, nothing I have posted to these lists has garnered as much response as private emails sent directly to people whom I know are subscribed to the lists.

Perhaps the very idea of a “web-based community” has become a victim of its own success.  Back in the olden days, when I was still learning how to type, creating an on-line community was hard.  You had to wrangle with BBS software, mailing list managers, or content management systems.  It took dedicated individuals willing to invest considerable time and money.  Now?  Just go to Google / Yahoo / Facebook / whatever flavor-of-the-month service, type in the name of your group and presto, you’re a “community.”

The problem is that it’s now easier to start a group than to join one.  Every project wants to be the center of its own community, but what most projects actually get is a lonely soapbox in the wilderness from which to cry, “Announcing version 0.x…”

I’m equally guilty in this trend, having founded one of the sites I referred to above (LawCommons) and built a wiki for another (IGOTF).  Once you’ve started a site it’s easier to leave it there than to formally announce “I am shutting down X and throwing my lot in with Y.”  It’s also a hedge against the (very likely) possibility that group Y won’t be around in a year.  But I worry that a broad movement (Semantic Government) fragmented into so many tiny sub-groups will never gather enough momentum to succeed.  The very thing we all want — to share information better — is lost through the scattered efforts to achieve it.

Lawyers and Engineers

Part of what I hope to do with the Program on Law & Technology at Columbia is bridge the communication gap between lawyers and engineers.  The two groups think completely differently.

To take a recent example, a new P2P file-sharing system has emerged called the Owner-Free Filesystem (OFF).  It stores and transmits only chunks of random binary data.  To retrieve a file, one uses a special URL that provides a mathematical formula for combining those random chunks into the original file.

Random bytes can’t be copyrighted, the system’s designers say, therefore the network is immune to copyright liability, although users are still liable for any infringement they themselves commit.  The idea is that the RIAA/MPAA/etc. cannot take legal action to shut down the OFF network, and will be forced to search individuals’ hard drives if they want to prove infringement.

It’s a typical engineer’s conception of how the law works — technicalities and loopholes.  They think they can get around the law with cleverness.  But it doesn’t work that way.  True, there are many technical loopholes in legal statutes, but statutes only form a small part of what “The Law” actually is.  The rest of the story comes in judicial decisions, or common law.

In this instance, the Supreme Court decision in MGM v. Grokster clearly comes out against OFF.  In Grokster, the court said that merely distributing “a device with the object of promoting its use to infringe copyright” (case syllabus, my emphasis) makes the distributor liable for copyright infringement.  Furthermore, a court would likely not care that the network only transmits random bytes.  The end result — you get a copy of a copyrighted work without the owner’s permission — is still infringement.

Therefore, the OFF would lose if it got sued by copyright owners.  The point is this: it doesn’t matter what the statute says, it matters what a judge would decide if presented with the case.

Here’s another, simpler example: There is a federal statute prohibiting flag burning.  Law professors ask first-year students: Is this a law?  The answer is no, because any federal judge would decide that the statute violates the First Amendment and would not enforce it.

I hope that services like AltLaw and KeepYourCopyrights will encourage engineers and other non-lawyers to take a closer look at how law actually works.  If engineers stopped wasting their time trying to engineer clever technological solutions to legal problems, and instead advocated for legislative reform to solve those problems, they might have a better chance of getting what they want.

Disclaimer: I am not a lawyer, and this is not legal advice.  If you need legal advice, get an attorney.


Finally, my blog has a new theme.  I liked the old one, but it was getting a bit… old.  And the serif-font-on-a-translucent-background-image was never a great idea.  I didn’t have time for a complete redesign, so I settled on the excellent Mandigo theme by onehertz.  The header image is my own — the view from the North end of the Central Park Reservoir, around dusk.


I need a new laptop.  My current machine, a rebranded ASUS that I bought on the cheap a few years ago, has developed a crack in the screen hinge, so it’s only a matter of time.

I have to admit, I’m sorely tempted by the Macbook Air.  It’s a beautiful machine — sleek, light, even elegant.  But I’ve been 100% Linux for some time, and I’m reluctant to turn my life over to the tyranny of Steve Jobs.  Of course, all my favorite apps — Emacs, Firefox, OpenOffice — are still there, and I’ve read one can even install Ubuntu on an Air with only the usual Linux-on-a-laptop aggravations.

My alternate choice is a Raven from Emperor Linux, a.k.a. the ThinkPad X61.  It’s more expensive and heavier, but it’s a tablet.  I’ve always wanted a tablet, but pen-based software is barely functional even in the commercial software world, so I don’t expect much from open-source equivalents.

Any gentle readers out there with experience using a Linux tablet?  Is it worth it?

New York Neanderthals

Paul Graham writes, “Cambridge seems to be the intellectual capital of the world. … And what US city has a stronger claim? New York? A fair number of smart people, but diluted by a much larger number of neanderthals in suits.” Harsh but true.

I’ve never been to Cambridge, and never lived in any city but New York, but I’ll accept Graham’s casual portrayals as plausible. New York is obsessed with money, although I believe that’s more influenced by the ridiculous cost of living than Wall Street. But it is also, I would argue, a city that values achievement, of any kind, above all else. Whether you’re a dancer, fashion designer, diplomat, programmer, or stock broker, New York is where you come to be the best at whatever it is you do. There’s a reason all the city services define their members in terms of superlatives — police (New York’s Finest), firefighters (Bravest), corrections officers (Boldest) and sanitation workers (Strongest).

Privacy, Open Access, and the Law

Since we started putting court cases on the interwebs, first with Project Posner and then with AltLaw, we’ve had the occasional angry email from someone who Googles himself/herself and finds a court case from 20 years ago that reveals embarrassing and career-damaging facts.  They usually want the page taken down.

Now, sometimes I’m sympathetic with the people making these requests — sexual harassment plaintiffs, asylum-seekers, and so on.  Other times, I’m not — usually when the person writing to us was convicted of sexual harassment, fraud, etc.

We’ve canvassed for opinions on what we should do.  The responses generally fall into 3 categories:

  1. Lawyers say: Tough, it’s public record.  Without public case law the American legal system would cease to function.  Only the courts can (and do) decide to anonymize a case.
  2. Techies say: Tough, information wants to be free.  If you don’t like what the web says about you, make your own web site to tell your side of the story.
  3. Others say: I don’t know.  It’s wrong to censor public records, but it’s also wrong to make people suffer for something that happened 20 years ago.

There are also suggested solutions:

  1. Refuse to take anything down.  Don’t answer the phone.
  2. Anonymize names in “sensitive” cases.  Provide a protected link to a non-censored version.  The problem is, cases are routinely identified by the names of the parties.  If you take out the names, you don’t know what case it is anymore.
  3. Block search engines from the entire site, either with robots.txt or free registration.  And say goodbye to 50% of our traffic.
  4. Refuse to modify or take down cases, but block individual cases in robots.txt on request.

Our current policy is #4.  But is that good enough?  For the appeals and supreme court cases we currently host, probably.  But we hope to expand AltLaw to every U.S. court, down to the state level.  What happens when we start hosting, say, bankruptcy court decisions?

This gets into bigger questions of open access versus individual privacy.  We’re not the only ones struggling with the issue — our friends at Justia and public.resource.org have similar problems.  Ultimately, it’s a question for society at large.  Perhaps, as legal research on the web expands, courts will develop stricter standards for how they publish cases containing sensitive information.  But legal institutions are extremely resistant — and slow — to change.  The web of free legal information is growing fast — in the eight months since AltLaw launched, at least three commercial competitors have appeared.

URI Templates for RDF

There’s a school of thought that URIs should be opaque identifiers with no inherent meaning or structure. I think this is clearly a bad idea on the human-facing web, but it is more reasonable for computer-facing web services.

However, I’ve been generating a lot of RDF lately, trying to organize piles of metadata in AltLaw. I use URIs that have meaning to me, although they aren’t formally specified anywhere. I realized that a URI can represent a lot of information — not just the identity of a resource, but also its type, version, date, etc. — in a compact form. I can write code that manipulates URIs to get key information about a resource more quickly than I could by querying the RDF database.

Unfortunately, RDF query languages like SPARQL assume that URIs themselves are just pointers that do not contain any information. I could easily generate additional RDF statements containing all the same information encoded in the URI, but that would triple the size of my database and slow down my queries. (My experience so far with Sesame 2.0 is that complex queries are slow.)

What I need is a rule-based language for describing the structure of a URI and what it means. This would be similar to URI templates, and would map parts of the URI to RDF statements about that URI.

So if I make a statement about a resource at
(note: not a real URI), the RDF database automatically infers the following (in Turtle syntax):

        rdf:type        <http://altlaw.org/rdf/Case> ;
        dc:identifier   "1204" ;
        dc:jurisdiction "US" ;
        dc:date         "2008-02-14" .