When storing any large collection of data, one of the most critical decisions one has to make is when to normalize and when to denormalize. Normalized data is good for flexibility — you can write queries to recombine things in any combination. Denormalized data is more efficient when you know, in advance, what the queries… Continue reading Antidenormalizationism
What makes an on-line community? In the past two weeks I have received announcements of three new “communities” all interested in using open-source software to retrieve, share, and analyze data from or about governments. Most of these announcements say the same thing: “A lot people seem to be working on this, but they aren’t talking… Continue reading Fragmentation and the Failure of the Web
RDF is seductive. I can’t get away from it. Something about the ability to represent anything and everything in one consistent model just tugs at my engineer’s heartstrings. The problem with RDF, as I’ve discovered through painful experience, is that the ability to represent everything sacrifices the ability to represent anything efficiently. Certainly that is… Continue reading Whither RDF?
I dropped in to hear Rich Hickey talk about Clojure at the New York Semantic Web meetup group. Some highlights: • Some programs, like compilers or theorem provers, are themselves functions. They take input and produce output. Purely functional languages like Haskell are good for these kinds of programs. But other programs, like GUIs or… Continue reading Clojure for the Semantic Web
Update September 22, 2008: I have abandoned this model. I’m still using Hadoop, but with a much simpler data model. I’ll post about it at some point. … Gosh darn, it’s hard to get this right. In my most recent work on AltLaw, I’ve been building an infrastructure for doing all my back-end data processing… Continue reading The Document-Blob Model
Google recently released its Protocol Buffers as open source. About a year ago, Facebook released a similar product called Thrift. I’ve been comparing them; here’s what I’ve found: Thrift Protocol Buffers Backers Facebook, Apache (accepted for incubation) Google Bindings C++, Java, Python, PHP, XSD, Ruby, C#, Perl, Objective C, Erlang, Smalltalk, OCaml, and Haskell C++,… Continue reading Thrift vs. Protocol Buffers
Part of what I hope to do with the Program on Law & Technology at Columbia is bridge the communication gap between lawyers and engineers. The two groups think completely differently. To take a recent example, a new P2P file-sharing system has emerged called the Owner-Free Filesystem (OFF). It stores and transmits only chunks of… Continue reading Lawyers and Engineers
The web is a strange beast. Server logs reveal just how strange. Someone’s crawling AltLaw.org, sending an HTTP Referrer of “http://www.nero.com/enu/downloads-nero8-trial.php” with a User-Agent identified as “MSIE 5.00; Windows 98”. What the heck?
Just when you thought the A.I. Winter would last forever, up pops Brainhat with an open-source inference engine that uses natural language as its primary interface. Cool!