JDK Version Survey Results

After a month and about 175 responses, here are the results of my JDK Version Usage Survey (now closed):

Versions: Almost everyone uses 1.6. A few are still using 1.5, and a few are trying out 1.7. Only a handful are still on 1.4. Fortunately, no one is on a version older than 1.4.

Reasons: These are more varied. The most common reason for not upgrading is lack of time, with “it just works” running a close second. A little less than half of respondents are limited by external forces: either operations/management or third-party dependencies.

Not much came out in the comments. Banks and other large institutions seem to be the most resistant to upgrades, especially if they’ve been bitten by past JDK changes.

Design Philosophies of Developer Tools

I’ve been thinking about some of the tools that I use every day, and about the different design philosophies they reflect.

Git

First and foremost, Git. We use Git on every single project, internal and external. Git is a great example of the Unix design philosophy: many small programs — 153 of them by my count — each of which does exactly one thing and does it well. But this is not “loose coupling.” The components of Git are tightly integrated: they all depend on the same repository structure and file formats.

One of the nice things about Git is how its internals are both exposed for the world to see and thoroughly documented. We can easily write scripts to automate common tasks or create different workflows. With a bit more effort, we could even write new tools that integrate with the Git suite. These tools can do things that Git’s authors never intended, as long as they follow the documented repository structure. Git isn’t so much a version control system as the means to construct one.

Still, Git is one project with many components, not many separate projects. All 153 executables in Git are governed by a single release cycle, tested and known to work together. We never have to worry about incompatible versions of, say, git-branch and git-merge on the same machine. Older versions of Git can read repositories created with newer versions even if they don’t provide all the same features.

Maven

In stark contrast to Git, we have tools from the Java world like Ant and Maven. The JVM cannot fork/exec, so the many-small-programs design is a non-starter. Instead, the Java tools usually favor some sort of plug-in architecture, which is a great idea in theory but hard to get right in practice.

I’ve tried writing a Maven plugin. Hacking up a one-off for a single project is not too difficult, but designing a general-purpose plugin that works everywhere is maddeningly complicated. Maven plugins are just Java code, so they can do whatever they want, but the APIs for interacting with the rest of the Maven system are woefully underdocumented. The contract of a Maven plugin, what it can and cannot do, is not well-defined. The internals of Maven itself are largely a black box.

The core Maven plug-ins have independent release cycles, so there is the possibility for unexpected incompatibilities, but I’ve never encountered such. On the whole, the Maven ecosystem is quite stable. The struggle comes once you venture outside the realm of what the standard plugins provide. Maven plugins are not designed to be composed, so adding new capabilities is rarely as simple as scripting plugins that already exist. You have to start from scratch every time.

Ruby / Rubygems / RVM / Bundler

Finally, we have tools from the Ruby world, the ever-changing cornucopia of Ruby implementations, libraries, and tools to manage it all. The problem with the Ruby tools is that they are both tightly-coupled and uncoordinated. Despite having separate tools for each task, each tool reaches into at least one of the others: Rubygems modifies the behavior of the Ruby interpreter, Bundler modifies the behavior of Rubygems, RVM modifies the behavior of the shell, and so on. Each one adds another layer of indirection, making debugging harder.

All of the Ruby development tools have independent release cycles, and they don’t seem to plan or coordinate with one another in advance of each release. Integration testing is left up to the users.

I admire the speed and eagerness with which the Ruby community produces new tools. But on almost every Ruby project I’ve worked on, we’ve spent hours or days sorting out incompatibilities among some combination of libraries, language implementations, and development tools. Our internal mailing list is littered with advice like “Don’t use Bundler version X with RVM version Y.” The speed of development comes with its own cost.

Thoughts

So what do I take from all this? Just a few principles to keep in mind when writing software tools:

  1. Plan for integration
  2. Rigorously specify the boundaries and extension points of your system
  3. Do not depend on unspecified behavior

And a couple of ideas if you’re starting a new project from scratch:

  1. The filesystem is the universal integration point
  2. Fork/exec is the universal plugin architecture

Update 8/31: More comments at Hacker News.

The Naming of Namespaces

From time to time I’m asked, “How do you organize namespaces in Clojure projects?” The question surprised me at first, because I hadn’t thought about it much. But then I was using Clojure back when the only way to load code was “load-file.”

Most programming languages, especially object-oriented languages, provide strong hints on how to structure your source files. Everything is a class, and (almost) every class is a file. In Clojure, everything (almost) is a function. Functions are much smaller units than classes. So how do we group them?

The important thing to remember about namespaces is that, from the compiler’s point of view, they don’t matter. Namespaces are a convenience for the programmer, to help you avoid name clashes without having to write longer names. There’s no reason why an entire application can’t be defined in a single namespace. Most of Clojure itself is defined in one namespace with over 500 symbols. (Common Lisp has 978 symbols in a single namespace.)

You can think of namespaces as a tool to express something about your application. Here are some ideas to get you started:

  • Group functions into namespaces based on type of data they manipulate. For example, functions to manipulate customer data go in the “customer” namespace. This technique is familiar from object-oriented languages, but it has the same limitations: where do you put functions concerning relationships among two or more types? The OO answer would be to make a new name for the relationship. This style leads to a proliferation of small namespaces, which can become a burden.
  • Divide a library into a public API namespace and a internal implementation namespace. Or define a high-level API for common cases and a low-level API for more advanced usage.
  • Divide an application into namespaces representing architectural layers. You can examine “ns” declarations to prove that each layer calls functions only from the layer below it.
  • Divide an application into namespaces representing functional modules, with well-defined contracts for communication between modules.
  • Try to separate decision-making code from the code that carries out those decisions. That is, keep your business logic purely functional and free of side-effects, so it is easy to test. You don’t necessarily have to put side-effect code in a separate namespace, but doing so may help keep it cleanly separated.

With all these techniques, the point to remember is that namespaces are there to help you, not to get in your way. If you have a large namespace, you can still divide it up into multiple files.

The one hard-and-fast rule is that you cannot have a circular dependency between namespaces. That is, if namespace A needs to call functions defined in namespace B, then namespace B cannot call functions in namespace A. There is no workaround, it simply can’t be done. In practice, this is rarely a problem. If you encounter a situation where two namespaces are mutually dependent, it’s probably a sign that they should be merged into a single namespace.

In client projects at Relevance (home of Clojure/core) we often end up with one namespace for each aspect of an application — data access, UI, logging, and so on. Then there’s one “main” namespace that depends on all the others and ties it all together.

Update 9/5/2011: Chris Houser wrote a nice answer on StackOverflow about how to split a Clojure namespace over several files.

ClojureScript Launch, New York

As you may have heard, last night we (Clojure/core) announced ClojureScript at the Clojure NYC Meetup. Rich Hickey gave a talk, which was streamed live over the web, while we monitored Twitter and IRC for feedback.

The event was a great success, with loads of excitement expressed by both the local New York crowd and the Internet at large.

Screenshot of IRC / Twitter during the ClojureScript announcement

Screenshot of IRC / Twitter during the ClojureScript announcement

Video was also recorded, which will be posted soon.

Thanks to Google New York for hosting.

Dependency Management First-Aid Kit

This article attempts to unravel some of the mysteries of dependency management with Maven and Maven-based tools.

Help, something’s missing!

Say you have a project named “my-new-project” which declares a dependency on version 3 of the “awesome-sauce” library by the Example.com corporation. You add the dependency to your pom.xml, project.clj, or whatever configuration file your build tool uses. You take a deep breath and start a build. And it fails!

If you’re using Maven 2, you see something like this:

[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Missing:
----------
1) com.example:awesome-sauce:jar:3.0.0

  Try downloading the file manually from the project website.

  Then, install it using the command: 
      mvn install:install-file -DgroupId=com.example -DartifactId=awesome-sauce -Dversion=3.0.0 -Dpackaging=jar -Dfile=/path/to/file

  Alternatively, if you host your own repository you can deploy the file there: 
      mvn deploy:deploy-file -DgroupId=com.example -DartifactId=awesome-sauce -Dversion=3.0.0 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

  Path to dependency: 
        1) my.group:my-new-project:jar:1.0.0-SNAPSHOT
        2) com.example:awesome-sauce:jar:3.0.0

----------
1 required artifact is missing.

for artifact: 
  my.group:my-new-project:jar:1.0.0-SNAPSHOT

from the specified remote repositories:
  central (http://repo1.maven.org/maven2),
  clojars (http://clojars.org/repo/)

Leiningen, which uses Maven 2 under the covers, produces similar output, but it mistakenly prints the current project name as org.apache.maven:super-pom:jar:2.0.

Maven 3 prints a less verbose (and less informative) error message, but the gist is the same.

What happened?

What is all this verbosity saying? Well, obviously, the build failed because something was missing. What was missing? Maven tells you:

Missing:
----------
1) com.example:awesome-sauce:jar:3.0.0

The JAR file for the project “awesome-sauce”, version 3.0.0, published in the “com.example” group, is missing. That just means Maven didn’t find it in any of the places it looked.

Where did it look? Maven tells you that too:

from the specified remote repositories:
  central (http://repo1.maven.org/maven2),
  clojars (http://clojars.org/repo/)

These are the public repositories where Maven searched for the file. Each repository has an ID (“central” and “clojars” in this case) and a URL. Both are specified in the configuration of:

  1. Your project, in pom.xml or project.clj

  2. Your build tool’s global configuration file

    • settings.xml for Maven
    • N/A for Leiningen
  3. Your build tool’s built-in defaults

If you visit http://repo1.maven.org/maven2/com/example/awesome-sauce or http://clojars.org/repo/com/example/awesome-sauce in a browser you will see that those directories do not, in fact, exist.

Although it’s not listed, the first place Maven checks for a dependency is your local Maven repository. The local repository is just a big cache of everything Maven has downloaded in the past. It’s typically located at $HOME/.m2/repository.

What to do next

You have two options at this point:

  1. Find a public repository containing “awesome-sauce”
  2. Install “awesome-sauce” in your local repository

The first option is generally less work, and more repeatable if you ever build your project on another machine.

Finding a repository

Odds are, if the library you are looking for is free, open-source, and popular, it will already be in a public Maven repository somewhere. Start with the source: who released the library? Large organizations with a lot of open-source projects often host their own repositories, like Google and Codehaus. Failing that, search engines such as Mvnbrowser may help you find it.

Once you’ve found a repository, you need to add it to your build. For example, to add the Codehaus repository to a Maven project, add these lines to pom.xml inside the <project> tag:

<repositories>
  <repository>
    <id>codehaus</id>
    <name>Codehaus</name>
    <url>http://repository.codehaus.org/</url>
  </repository>
</repositories>

(You can pick your own <id> and <name>.)

For Leiningen, add the following lines inside the (defproject ...) block:

  :repositories {"codehaus" "http://repository.codehaus.org/"}

Installing locally

If the library you want is not available in any public repository, you’re not stuck, you just have to do a bit more work. You need to get the JAR file for the library, either by downloading it manually or building from source. Then you need to install that JAR file in your local Maven repository. That’s easy, because Maven has already told you exactly how to do it:

  Then, install it using the command: 
      mvn install:install-file -DgroupId=com.example -DartifactId=awesome-sauce \
 -Dversion=3.0.0 -Dpackaging=jar -Dfile=/path/to/file

Copy that command verbatim, changing only /path/to/file to the path to the library’s JAR file. Maven will copy the file to $HOME/.m2/repository/com/example/awesome-sauce/3.0.0/awesome-sauce-3.0.0.jar. The next time you build your project, Maven knows exactly where to find it.

Installing remotely

If you want others to be able to build your project without having to go through these manual steps, you need your own public Maven repository to which you can upload files. Hosting a Maven repository isn’t hard: all you need is a web server.

If you work with a team, consider setting up a shared repository for everyone to use. A repository manager such as Nexus can help you take care of user accounts and authentication.

If you publish open-source libraries, I strongly encourage you to get an account on Sonatype OSS, a free service provided by the makers of Nexus. Releasing your projects to Sonatype OSS gives them a path to get added to the Maven Central Repository. While the requirements for projects in Maven Central are more stringent than just tossing code into your own repository, it’s worth the effort. In Maven Central, your project will have greater visibility and will be easier for anyone in the world to use.

But what if I don’t want that dependency?

Maven dependencies are transitive: if your project depends on project X, which depends on projects Y and Z, then your build will try to download X, Y, and Z.

But sometimes projects declare dependencies that aren’t strictly necessary. Or they declare dependencies on something you want, but the wrong version. How can you avoid including those extra dependencies in your build?

Maven supports dependency exclusions for these cases. For example, suppose the “awesome-sauce” library declares a dependency on “com.example:stupidity:0.0.1″. You know that you don’t need “stupidity” in your project, so you want to prevent the build from including it. In pom.xml, you write:

<dependencies>
  <dependency>
    <groupId>com.example</groupId>
    <artifactId>awesome-sauce</artifactId>
    <version>3.0.0</version>
    <exclusions>
      <exclusion>
        <groupId>com.example</groupId>
        <artifactId>stupidity</artifactId>
      </exclusion>
    </exclusions> 
  </dependency>
</dependencies>

Or in Leiningen’s project.clj, you write:

  :dependencies [[com.example/awesome-sauce "3.0.0"
                  :exclusions [com.example/stupidity]]]

Note that once you start using exclusions, you’re on your own. It’s up to you to make sure you still have the correct versions of all the libraries your project needs.

On rare occasions, a project’s dependencies cannot be resolved at all. In particular, if you need two different versions of the same library with the same class names but incompatible APIs, you’re pretty much stuck. Time to refactor, or investigate multiple-Classloader schemes like OSGi. But that’s a whole ‘nother story.

Edit Quotient

Remember Levenshtein Distance: the number of changes to turn one string into another? Here’s a naïve implementation in Clojure:

(defn levenshtein [s t]
  (let [m (count s)
        n (count t)
        d (make-array Integer/TYPE (inc m) (inc n))]
    (dotimes [i m] (aset-int d i 0 i))
    (dotimes [j n] (aset-int d 0 j j))
    (doseq [j (range 1 (inc n))]
      (doseq [i (range 1 (inc m))]
        (if (= (.charAt s (dec i)) (.charAt t (dec j)))
          (aset-int d i j (aget d (dec i) (dec j)))
          (aset-int d i j (min (inc (aget d (dec i) j))
                               (inc (aget d i (dec j)))
                               (inc (aget d (dec i) (dec j))))))))
    (aget d m n)))

(assert (= 0 (levenshtein "" "")))
(assert (= 3 (levenshtein "foo" "foobar")))
(assert (= 3 (levenstein "kitten" "sitting")))
(assert (= 3 (levenstein "Saturday" "Sunday")))

I ripped that straight off Wikipedia’s pseudocode. Not functional at all, and probably not particularly efficient either.

But I’ve never seen this: the number of changes to turn one string into another, scaled by the lengths of the strings. Call it the “edit quotient,” computed as the Levenshtein Distance divided by the mean of the lengths of the two strings. The edit quotient of two empty strings is zero.

(defn edit-quotient [s t]
  (let [sum (+ (count s) (count t))]
    (if (pos? sum)
      (/ (levenstein s t)
         (/ sum 2))
      0)))

(assert (= 0 (edit-quotient "" "")))
(assert (= 0 (edit-quotient "foof" "")))
(assert (= 1 (edit-quotient "foo" "bar")))
(assert (= 6/7 (edit-quotient "foof" "bar")))
(assert (= 2/3 (edit-quotient "foo" "faa")))
(assert (= 2/7 (edit-quotient "foof" "foo")))
(assert (= 1/2 (edit-quotient "foobar" "faabir")))

This has some interesting properties. The edit quotient is zero if the two strings are completely identical, one if they are completely different. Values in between zero and one give some idea of how different the strings are.