Year: 2008


When storing any large collection of data, one of the most critical decisions one has to make is when to normalize and when to denormalize.  Normalized data is good for flexibility — you can write queries to recombine things in any combination.  Denormalized data is more efficient when you know, in advance, what the queries…

Read the full article

Whither RDF?

RDF is seductive.  I can’t get away from it.  Something about the ability to represent anything and everything in one consistent model just tugs at my engineer’s heartstrings. The problem with RDF, as I’ve discovered through painful experience, is that the ability to represent everything sacrifices the ability to represent anything efficiently.  Certainly that is…

Read the full article

Clojure for the Semantic Web

I dropped in to hear Rich Hickey talk about Clojure at the New York Semantic Web meetup group.  Some highlights: • Some programs, like compilers or theorem provers, are themselves functions.  They take input and produce output.  Purely functional languages like Haskell are good for these kinds of programs.  But other programs, like GUIs or…

Read the full article

The Document-Blob Model

Update September 22, 2008: I have abandoned this model.  I’m still using Hadoop, but with a much simpler data model.  I’ll post about it at some point. … Gosh darn, it’s hard to get this right.  In my most recent work on AltLaw, I’ve been building an infrastructure for doing all my back-end data processing…

Read the full article

Thrift vs. Protocol Buffers

Google recently released its Protocol Buffers as open source. About a year ago, Facebook released a similar product called Thrift. I’ve been comparing them; here’s what I’ve found: Thrift Protocol Buffers Backers Facebook, Apache (accepted for incubation) Google Bindings C++, Java, Python, PHP, XSD, Ruby, C#, Perl, Objective C, Erlang, Smalltalk, OCaml, and Haskell C++,…

Read the full article

Moving the ‘C’ in MVC

I’m sure I’m not the first to suggest this, but here goes. Ever since somebody first thought of applying the Model-View-Controller paradigm to the web, we’ve had this: The View is a conflation of HTML and JavaScript.  JavaScript is an afterthought, a gimmick to make pages “dynamic.”  All the real action is in the Controller,…

Read the full article

Lawyers and Engineers

Part of what I hope to do with the Program on Law & Technology at Columbia is bridge the communication gap between lawyers and engineers.  The two groups think completely differently. To take a recent example, a new P2P file-sharing system has emerged called the Owner-Free Filesystem (OFF).  It stores and transmits only chunks of…

Read the full article

Strange Referrers

The web is a strange beast.  Server logs reveal just how strange.  Someone’s crawling, sending an HTTP Referrer of “” with a User-Agent identified as “MSIE 5.00; Windows 98”.  What the heck?

AI Lives!

Just when you thought the A.I. Winter would last forever, up pops Brainhat with an open-source inference engine that uses natural language as its primary interface.  Cool!