Lawyers and Engineers

Part of what I hope to do with the Program on Law & Technology at Columbia is bridge the communication gap between lawyers and engineers. The two groups think completely differently.

To take a recent example, a new P2P file-sharing system has emerged called the Owner-Free Filesystem (OFF). It stores and transmits only chunks of random binary data. To retrieve a file, one uses a special URL that provides a mathematical formula for combining those random chunks into the original file.

Random bytes can’t be copyrighted, the system’s designers say, therefore the network is immune to copyright liability, although users are still liable for any infringement they themselves commit. The idea is that the RIAA/MPAA/etc. cannot take legal action to shut down the OFF network, and will be forced to search individuals’ hard drives if they want to prove infringement.

It’s a typical engineer’s conception of how the law works — technicalities and loopholes. They think they can get around the law with cleverness. But it doesn’t work that way. True, there are many technical loopholes in legal statutes, but statutes only form a small part of what “The Law” actually is. The rest of the story comes in judicial decisions, or common law.

In this instance, the Supreme Court decision in MGM v. Grokster clearly comes out against OFF. In Grokster, the court said that merely distributing “a device with the object of promoting its use to infringe copyright” (case syllabus, my emphasis) makes the distributor liable for copyright infringement. Furthermore, a court would likely not care that the network only transmits random bytes. The end result — you get a copy of a copyrighted work without the owner’s permission — is still infringement.

Therefore, the OFF would lose if it got sued by copyright owners. The point is this: it doesn’t matter what the statute says, it matters what a judge would decide if presented with the case.

Here’s another, simpler example: There is a federal statute prohibiting flag burning. Law professors ask first-year students: Is this a law? The answer is no, because any federal judge would decide that the statute violates the First Amendment and would not enforce it.

I hope that services like AltLaw and KeepYourCopyrights will encourage engineers and other non-lawyers to take a closer look at how law actually works. If engineers stopped wasting their time trying to engineer clever technological solutions to legal problems, and instead advocated for legislative reform to solve those problems, they might have a better chance of getting what they want.

Disclaimer: I am not a lawyer, and this is not legal advice. If you need legal advice, get an attorney.

Aesthetics

Finally, my blog has a new theme. I liked the old one, but it was getting a bit… old. And the serif-font-on-a-translucent-background-image was never a great idea. I didn’t have time for a complete redesign, so I settled on the excellent Mandigo theme by onehertz. The header image is my own — the view from the North end of the Central Park Reservoir, around dusk.

Temptation

I need a new laptop. My current machine, a rebranded ASUS that I bought on the cheap a few years ago, has developed a crack in the screen hinge, so it’s only a matter of time.

I have to admit, I’m sorely tempted by the Macbook Air. It’s a beautiful machine — sleek, light, even elegant. But I’ve been 100% Linux for some time, and I’m reluctant to turn my life over to the tyranny of Steve Jobs. Of course, all my favorite apps — Emacs, Firefox, OpenOffice — are still there, and I’ve read one can even install Ubuntu on an Air with only the usual Linux-on-a-laptop aggravations.

My alternate choice is a Raven from Emperor Linux, a.k.a. the ThinkPad X61. It’s more expensive and heavier, but it’s a tablet. I’ve always wanted a tablet, but pen-based software is barely functional even in the commercial software world, so I don’t expect much from open-source equivalents.

Any gentle readers out there with experience using a Linux tablet? Is it worth it?

More Clojure Love

I dropped by the Java Users’ Group meeting last week since Rich Hickey was there to talk about Clojure.

I expected a bit of carping from the Java guys, and at first they were all “efficiency this” and “security that.” But by mid-way through the talk I think they were getting it. A few even got excited about macros.

If I didn’t make it clear in my first post about Clojure, I like this language. Here’s some more reasons why:

  1. All binding constructs — let, defn, and the like — perform destructuring.
  2. Universal data structures — vector, list, map, set.
  3. Built-in java.util.regex support.
  4. Sixty squintillion Java libraries.
  5. A small but growing number of Clojure libraries, some written by yours truly.
  6. You can generate Java .class files that run Clojure code.
  7. Lightning-fast bug fixes from the author.

New York Neanderthals

Paul Graham writes, “Cambridge seems to be the intellectual capital of the world. … And what US city has a stronger claim? New York? A fair number of smart people, but diluted by a much larger number of neanderthals in suits.” Harsh but true.

I’ve never been to Cambridge, and never lived in any city but New York, but I’ll accept Graham’s casual portrayals as plausible. New York is obsessed with money, although I believe that’s more influenced by the ridiculous cost of living than Wall Street. But it is also, I would argue, a city that values achievement, of any kind, above all else. Whether you’re a dancer, fashion designer, diplomat, programmer, or stock broker, New York is where you come to be the best at whatever it is you do. There’s a reason all the city services define their members in terms of superlatives — police (New York’s Finest), firefighters (Bravest), corrections officers (Boldest) and sanitation workers (Strongest).

EC2 Authorizations for Hadoop

I just did my first test-run of a Hadoop cluster on Amazon EC2. It’s not as tricky as it appears, although I ran into some snags, which I’ll document here. I also found these pages helpful: EC2 on Hadoop Wiki and manAmplified.

First, make sure the EC2 API tools are installed and on your path. Also make sure the EC2 environment variables are set. I added the following to my ~/.bashrc:

export EC2_HOME=$HOME/ec2-api-tools-1.3-19403
export EC2_PRIVATE_KEY=$HOME/.ec2/MY_PRIVATE_KEY_FILE
export EC2_CERT=$HOME/.ec2/MY_CERT_FILE
export PATH=$PATH:$EC2_HOME/bin

I also copied my generated SSH key to ~/.ec2/id_rsa-MY_KEY_NAME.

You need authorizations for the EC2 security group that Hadoop uses. The scripts in hadoop-*/src/contrib/ec2 are supposed to do this for you, but they didn’t for me. I had to do:

ec2-add-group hadoop-cluster-group -d "Group for Hadoop clusters."
ec2-authorize hadoop-cluster-group -p 22
ec2-authorize hadoop-cluster-group -o hadoop-cluster-group -u YOUR_AWS_ACCOUNT_ID
ec2-authorize hadoop-cluster-group -p 50030
ec2-authorize hadoop-cluster-group -p 50060

The first line creates the security group. The second line lets you SSH into it. The third line lets the individual nodes in the cluster communicate with one another. The fourth and fifth lines are optional; they let you monitor your MapReduce jobs through Hadoop’s web interface. (If you have a fixed IP address, you can be slightly more secure by adding -s YOUR_ADDRESS to the commands above.)

These authorizations are permanently tied to your AWS account, not to any particular group of instances, so you only need to do this once. You can see your current EC2 authorization settings with ec2-describe-group, it should look something like this:

GROUP   YOUR_AWS_ID    hadoop-cluster-group    Group for Hadoop clusters.
PERMISSION      YOUR_AWS_ID    hadoop-cluster-group    ALLOWS  all                     FROM    USER    YOUR_AWS_ID    GRPNAME hadoop-cluster-group
PERMISSION      YOUR_AWS_ID    hadoop-cluster-group    ALLOWS  tcp     22      22      FROM    CIDR    0.0.0.0/0

With additional lines for ports 50030 and 50060, if you enabled those.

Stop Your Java SAX Parser from Downloading DTDs

Back in February, in a slightly plaintive post, the W3 sysadmins asked that people stop hammering their servers with requests for XHTML DTDs. Everyone said yes, this is a stupid problem that wouldn’t have happened if a) the XML spec were less dumb, or b) XML libraries were less dumb.

After that post, I spent two whole days fighting with XML catalogs — possibly the worst-documented XML spec ever — to make sure my Java code wasn’t downloading a DTD every time it read an XHTML document.

To my annoyance, no one seems to have posted any cut-and-paste solutions to this problem. Setting properties on the SAX parser is no help, and the XML catalogs solution is a pain to set up.

So what if someone wrote a “dummy” XML entity resolver that does nothing? Here’s what I came up with:

public class DummyEntityResolver implements EntityResolver {
    public InputSource resolveEntity(String publicID, String systemID)
        throws SAXException {
        
        return new InputSource(new StringReader(""));
    }
}

Lo and behold, it works! The key is the return line — if you return null, the SAX parser reverts to its default behavior and downloads the DTD.

Use it like this:

XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setEntityResolver(new DummyEntityResolver());
reader.setContentHandler(new YourContentHandler());
reader.parse(your_xml_source);

The catch is that this will break any externally-defined entities, including standard XHTML entities like ©. The built-in XML entities such as &, and numeric character entities like &x43;, will still work.

You can check that you’re not downloading any DTD’s by watching the output of ngrep -q DTD while running your XML parser. If it doesn’t print anything, you’re good.