I was lucky enough to see a talk by Barbara Liskov, the grande dame of computer science. The talk was titled “The Power of Abstraction,” and it covered Liskov’s work on programming languages in the 1970s and 1980s, primarily a language called CLU.
Update 1/14/2010: Video of the same talk is available here: OOPSLA Keynote: The Power Of Abstraction
CLU had a number of interesting features that were ahead of its time — heap-based garbage collection, typed exceptions, and iterators. Many of these features made their way into object-oriented languages such as Java. But CLU itself is not object-oriented.
Object-oriented languages, Liskov said, tend to conflate the concrete representation of a type with the interface used to access it. Think of a classic Java class in an introductory OOP text. The class contains both instance fields and methods to manipulate them. Even though the fields are private, the interface is tied to a specific implementation. You can’t substitute a different implementation, not even by subclassing.
CLU provides separate structures for fields and methods. Fields are defined in types, which are more or less like C structs. Methods are defined in clusters, from which the name CLU derives. A cluster is a named set of method implementations, associated with one particular type. Users only work with clusters, not types. A cluster may be substituted by a another cluster that implements the same methods.
Why is this interesting now? Because we’re just catching up to where Liskov was in the seventies. Modern Java designs often favor interface-based APIs with no concrete inheritance and no public constructors.
This is even more interesting to me, because my favorite programming language will soon have features very similar to CLU’s types and clusters. The “new” branch of Clojure defines two new abstractions: datatypes and protocols.
A protocol is a set of function signatures, with no implementation. Conceptually, it’s similar to a Java interface. You could use a protocol to define an API to model some real-world object, such as Employee, Department, etc.
A datatype is a set of named fields, with optional type declarations. Conceptually, it’s similar to a C struct. However, a datatype can also declare support for any number of protocols, and supply methods to implement those protocols. For example, Clojure will probably have a Countable protocol with a single method count. Clojure datatypes like Lists and Vectors can provide their own implementations of count. At that level, the datatype is like a concrete class implementing several interfaces.
What’s really cool is that you can extend protocols for existing types, even Java classes. So, for example, we could implement Countable for java.lang.String by writing a count method that calls String.length(). This means you can create new protocols for Java classes that you do not control. This is like interface injection, a proposed but as-yet unimplemented feature for Java.
Protocol method calls are dispatched dynamically based on the type of their first argument, very similar to (and at the same speed as) Java method calls.
18 Comments »
Data formats are annoying. As much as half the code in any large software project consists of translating from one data representation — objects, SQL tables, files, XML, RDF, JSON, YAML, CSV, Protocol Buffers, Avro, XML-RPC — to another.
Each format has its own strengths and weaknesses. Often, no single representation is complete enough to be considered “canonical.” The only canonical representation is an abstract one, a platonic ideal in the mind of some developer. Since this platonic ideal cannot be implemented in code, different people have different expectations for how a particular model is supposed to work.
There are two options: Either you re-implement the model, with all its features and constraints, for each format, and hand-code all the translations; or you use a “smart” library that automatically translates between different representations. ActiveRecord and Hibernate are popular examples of the latter.
The problem with “smart” libraries is that they can never be smart enough. At some point you always have to dig into the generated SQL or whatever to make them work efficiently, or even correctly. Frequently this is impossible without hacking the library sources, a daunting tangle of generated and meta-programmed code. The library that was supposed to make your life easier instead makes it hell.
Do these “smart” libraries really save any time? Would it be easier to just write the translation code in the first place? We’ll never know, because programmers can’t resist “smart” systems, the myth that you can “do more with less code.” You can never do more with less, unless what you’re doing is the lowest common denominator of what everyone else is doing. And if that is what you’re doing, then why bother?
2 Comments »
I’ve been fascinated with RDF for years, but I always end up frustrated when I try to use it. How do you read/write/manipulate RDF data in code? Sure, there are lots of libraries, but they all represent RDF data as its primitive structures: statements, resources, literals, etc. Working with data through these APIs feels like using a glovebox. To get anything useful done, you have to define mappings between RDF properties/classes and normal data structures in your programming language — classes, maps, lists, whatever. In effect, you have to define everything twice.
Some Java APIs allow one to add annotation properties to classes and methods, with the annotations defining the mapping between Java objects and RDF triples. It’s convenient, and familiar if you’ve used Java persistence frameworks like Hiberante, but you still have to define everything twice — once in your RDF schema, once in Java code.
Other libraries generate Java source code from RDFS or OWL ontologies. This means you don’t have to define everything twice, but adds another step to the write-compile-run cycle, and limits you to the semantics that the code generator can understand. In particular, certain features of RDFS/OWL — multiple inheritance, sub-properties — do not map well into Java.
What I really wanted was a way to create and work with RDF data in Clojure, using the same map/set/sequence APIs that I use for any other Clojure data structure. I flirted with implementing RDF in Clojure but lost interest when I realized that 1) there’s a lot more to implementing RDF than datatype conversions; and 2) my Clojure library suffered from the same glovebox problem as the Java RDF libraries.
The solution, however, was staring me in the face all along. Clojure is a Lisp. I can generate functions directly, without any intermediate “source” representation. I can use my own customized validation and type-checking functions. Furthermore, I can extend the definitions in my RDF schema with new Clojure functions.
Here’s what I ended up with: I designed a simple OWL ontology using Protege 4 and saved it as RDF/XML. Then I used the Sesame 2 library to find all the RDF classes and properties defined in my ontology, and create the appropriate getter, setter, and constructor functions in Clojure. It looks something like this:
(defn intern-classes []
(doseq [cls (find-all-classes *ontology*)]
(let [name (resource-to-symbol cls)]
(intern *ns* name (fn [] {:type name})))))
The resource-to-symbol function creates a symbol named for the local name of the RDF class, with the full URI of its XML namespace in the symbol’s metadata. The call to intern defines a new function that takes no arguments and returns a Clojure map with the symbol as its :type.
Suppose I have a class named Document in my ontology. I now have a Clojure function named Document that creates a new instance of that class, represented as a Clojure map. Furthermore, using Clojure hierarchies and the isa? function, I can generate Clojure code that implements the subclass relationships defined in the ontology. Whee!
I don’t entirely know where I’m headed with this, but I like the way it’s going. I can define my own data types, decide how they map to Clojure data structures, and have code that’s always up-to-date with my RDF vocabulary.
3 Comments »
My Hadoop World NYC talk went off well; here are my slides [PDF]
1 Comment »
Hello, everyone.
I’ll be performing my Clojure+Hadoop magic tricks at the following events:
Friday, October 2: Hadoop World NYC. Use the code hadoopworld_friend for 10% off the registration fee.
Monday, October 5: NoSQL NYC Meetup. Free!
At both events I’ll be talking about:
- Why Clojure and Hadoop are a perfect fit.
- How to write Hadoop jobs in Clojure.
- My clojure-hadoop library.
- Storage options for Clojure data structures.
Will post slides after, and recordings if they are available.
3 Comments »
I’ve said It’s About the Libraries, and indeed, one of the major selling points of Clojure is that it can call Java libraries directly.
But there’s more to it than that. Libraries are just one benefit to building Clojure on top of Java, or, more accurately, on top of Java the platform.
Look around you, and you’ll see that 99% of all the software in the world runs on just three platforms:
- Unix/C
- Java Virtual Machine
- .NET Common Language Runtime
Where did these platforms come from? Let’s see:
- AT&T
- Sun
- Microsoft
Notice something? All three were all developed by huge corporations.
Building a new platform isn’t just about writing the code. In fact, very little of it is about code. You need books, articles, conferences, workshops, and university courses. You need multinational corporations to trust their entire business to your platform. It takes millions of dollars and tens of thousands of hours of labor to create a new platform. Think of the massive ad campaigns Sun ran for Java. Can you do that? Of course not.
So when you’re designing a new language, you have to build on an existing platform. Most of the so-called “scripting” languages grew up on Unix, so they’re written in C. Now, Unix/C is a great platform, still going strong after 40 years. It provides powerful tools and standardized interfaces such as files, sockets, and pipes.
The problem is that each of the “scripting” languages has developed into its own mini-platform. Perl, Python, and Ruby each define their own set of data structures for fundamental types like strings, lists, and maps. The only “types” that Unix recognizes are text and binary. You can’t exchange data between two languages without serializing everything to some agreed-upon format. And you can’t do callbacks between languages below the level of a whole process.
The other problem with languages written in C is, well, C. Pointers are hard. Memory management is hard. I know from bitter experience that Ruby libraries can have segfaults or memory leaks. That just doesn’t happen in Java.
Clojure was created to leverage capabilities of Java-the-platform — garbage collection, dynamic code generation, JIT compilation, threads, locks — some of which are difficult to use effectively in Java-the-language. To implement Clojure in C, for example, you would first have to build your own platform with these features. That’s effectively what most Common Lisp implementations do, and they suffer because the Common Lisp world is too small to sustain its own platform.
The brilliant thing about Java-the-platform is that it allows many languages to coexist. I can mix code written in Java, Clojure, JRuby, Jython, etc. and it’s pretty easy, because they all implement the same fundamental interfaces like java.util.List and java.lang.Runnable. For example, right now I have Hadoop (Java code) calling Clojure code calling JRuby code. It all just works.
(The .NET CLR provides similar capabilities, and there is a Clojure CLR port.)
1 Comment »
Posted by: Stuart in Programming, tags: Maven
I hope I’ve demonstrated in the last few posts that Maven is pretty cool, not so scary. But the public Maven repositories sometimes leave a bit to be desired. They don’t have entries for every possible library, and occasionally they have incorrect dependencies or other metadata. Also, the process of adding new libraries to the central repositories is somewhat involved.
Maybe you want to depend on a project that isn’t in the public repos. Or maybe you want to publish development snapshots of your own projects. In either case, you need your own Maven repository.
Fortunately, running your own Maven repository is dirt simple. All you need is a web server where you can upload files. Maven understands FTP, SCP, WebDAV, and more exotic protocols like Subversion. Here I’m going to describe the simplest one, plain old FTP. If you have a web site on a cheap, shared web host, odds are you can use FTP to manage the files on the server.
Step 1: Get Some Web Space
I’m assuming you have a web site somewhere. Lets say it’s at http://www.example.net/. Furthermore, let’s say you can publish files on this web site by uploading them to the FTP server ftp.example.net. Your FTP user name is samiam with the password greeneggs. You put the files for your web site in the directory /home/samiam/public_html on the server.
Using your favorite FTP client, create a directory named maven2 inside your web site directory (public_html in our example).
Congratulations, you just created a Maven 2 repository!
Check that you can visit http://www.example.net/maven2 in a web browser. You should see a directory listing; it’s empty, because we haven’t added any files yet.
Step 2: Configure Your Project Deployment
Now you’re ready to deploy a project to your Maven repository. If you’ve been following along with my Maven blog posts, you know that each Maven project has a project description file named pom.xml.
Open up your awesome software project and edit the pom.xml file. You’re going to add two new sections, ending up with a file that looks like this:
<project ...>
...
<groupId>net.example</groupId>
<artifactId>awesome</artifactId>
<name>The Awesome Library</name>
<version>1.0-SNAPSHOT</version>
...
<build>
...
<extensions>
<extension>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-ftp</artifactId>
<version>1.0-alpha-6</version>
</extension>
</extensions>
...
</build>
...
<distributionManagement>
<repository>
<id>example-ftp</id>
<url>ftp://ftp.example.net/home/samiam/public_html/maven2</url>
</repository>
</distributionManagement>
</project>
The <extensions> section loads the Maven plugin that handles FTP uploads. The <distributionManagement> section tells Maven where to publish the project. Here we specified a <url> with the full URL path to our maven2 directory. The <id> tag in the <repository> is a name that you choose — just remember it for the next step.
Step 3: Configure Your Server Credentials
Remember that the pom.xml will be part of the public distribution of your project. It’s OK to put the name of the FTP server there, but you wouldn’t want to include private information like your user name and password. Those go in a special Maven configuration file called settings.xml that you keep private.
You can find settings.xml in your personal Maven cache directory. On Unix-like systems, it should be at ~/.m2/settings.xml. You may have to create the file if it doesn’t already exist. Here’s what it should contain:
<settings xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<servers>
<server>
<id>example-ftp</id>
<username>samiam</username>
<password>greeneggs</password>
</server>
</servers>
</settings>
The <id> is the same as in our pom.xml. The user name and password are the credentials for your FTP server.
Step 4: Deploy!
Now all you have to do is run this command in your project directory:
mvn deploy
Maven builds your project and uploads it to your public repository. That’s all there is to it!
Take a look with your web browser at http://www.example.net/maven2/net/example/awesome/1.0-SNAPSHOT and you’ll see the JAR and POM files there.
Step 5: Tell the World
Now, anyone who wants to use your awesome library can just add a dependency to their pom.xml, like this:
...
<dependencies>
...
<dependency>
<groupId>net.example</groupId>
<artifactId>awesome</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
...
</dependencies>
...
<repositories>
...
<repository>
<id>example-net</id>
<name>The Example Repository</name>
<url>http://www.example.net/maven2</url>
</repository>
...
</repositories>
...
The <repository> section is necessary, since your repository is not on the list of “central” repositories that Maven searches by default.
In general, if your project has a stable release that is widely used, then it’s worth the effort to get it into the central Maven repository. This is easier to do when you already have your own personal repository to point to. Note that the central repository does not accept SNAPSHOT releases, nor does it allow any changes to a release after it has been uploaded.
2 Comments »
I promised, in my previous post, that I would show you how to use the latest-and-greatest versions of Clojure and clojure-contrib in your Maven projects. Here’s that post.
Formos Software maintains a Maven server with nightly builds of Clojure and contrib at http://tapestry.formos.com/maven-snapshot-repository/
Here’s a complete pom.xml file with dependencies on both Clojure and clojure-contrib:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>YOUR.GROUP.ID</groupId>
<artifactId>YOUR-PROJECT-NAME</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>YOUR-PROJECT-NAME</name>
<dependencies>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-lang</artifactId>
<version>1.1.0-alpha-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-contrib</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>tapestry.formos.com</id>
<name>Formos Software snapshot repository</name>
<url>http://tapestry.formos.com/maven-snapshot-repository</url>
</repository>
</repositories>
</project>
Yes, that’s a pile of XML. But it’s not that complicated once you break it down. Here’s what’s going on:
Dependencies
The <dependencies> section lists the libraries our project depends on. We have one <dependency> for Clojure (called clojure-lang in the Formos repository) and one for clojure-contrib. We’re depending on SNAPSHOT versions, which tells Maven to follow the most recent version on a particular branch.
The current development branch of clojure-lang is called 1.1.0-alpha-SNAPSHOT. The development branch of contrib, which has never had a formal release, is just 1.0-SNAPSHOT.
How did I find these version numbers? I just looked at the repository in a web browser. In the org/clojure/clojure-lang directory I found directories named for each development branch, 1.0-SNAPSHOT, 1.0.0-RC1-SNAPSHOT, and 1.1.0-alpha-SNAPSHOT. I chose the latest one, 1.1.0-alpha-SNAPSHOT. Then I did the same with clojure-contrib.
If you look inside a branch directory like 1.1.0-alpha-SNAPSHOT, you’ll find hundreds of files, one for each daily snapshot, named with timestamps.
Repositories
The <repositories> section tells Maven where to look for JAR files to download. We added the Formos repository by specifying its URL.
The <id> and <name> tags inside <repository> are purely for our own reference. Maven only cares about the URL. We could have used any id and name to describe the Formos repository; those names will be used in Maven’s console logging.
Managing Dependency Versions
The problem with tracking the latest snapshot is that sometimes there’s a release that breaks your code. It might be a bug, or it might just be a change in behavior that makes the library incompatible with previous versions.
The Versions Maven Plugin can help to alleviate this problem by “locking” dependencies to specific releases and updating them in a controlled way.
First, we have to make the Versions plugin available to our project. Do this by adding the following lines just before the final </project> in your pom.xml:
<pluginRepositories>
<pluginRepository>
<id>Codehaus</id>
<name>Codehaus Maven Plugin Repository</name>
<url>http://repository.codehaus.org/org/codehaus/mojo</url>
</pluginRepository>
</pluginRepositories>
We’ve added a “plugin repository,” which is just a Maven repository that happens to contain Maven plugins. (Technically, you don’t need to add this if your local Maven cache already has a copy of the Versions plugin, but putting it in pom.xml ensures that other developers coming to your project have access to all the same plugins.)
Now we can use the following mvn command:
mvn versions:lock-snapshots
This modifies your pom.xml file, setting the version string of every SNAPSHOT dependency to the current snapshot timestamp. For example, when I run this command, I end up with the following:
<dependencies>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-lang</artifactId>
<version>1.1.0-alpha-20090904.093041-38</version>
</dependency>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-contrib</artifactId>
<version>1.0-20090904.093531-59</version>
</dependency>
</dependencies>
Now, whenever you build the project, you know exactly which Clojure release you’re getting.
So you work with a particular release for a while, then you want to upgrade to the latest one. No problem! Just run:
mvn versions:unlock-snapshots
mvn -U install
The first command modifies pom.xml, replacing all the timestamped version numbers with SNAPSHOT versions.
The -U option on the second command forces Maven to check for updated versions of all the snapshot dependencies.
Note: Both lock-snapshots and unlock-snapshots create a backup file called pom.xml.versionsBackup. To remove this file (and accept the Version Plugin’s changes to your pom.xml) run:
mvn versions:commit
Likewise, to go back to the pom.xml file you had before the Versions Plugin messed with it, run:
mvn versions:revert
Explore
There’s a lot more to the Versions plugin, and to Maven dependency management in general. Check the documentation for details.
Of course, the example here can be combined with the clojure-maven-plugin demonstrated in my previous post. The syntax of the combined pom.xml file is left as an exercise for the reader.
One other thing: if you really want the absolute latest, up-to-the-second, still-hot-from-the-github version of something, there’s nothing for it but to use Git submodules. I’ll demonstrate that in another post.
<dependencies>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-lang</artifactId>
<version>1.1.0-alpha-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure-contrib</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
</dependencies>
<pluginRepositories>
<pluginRepository>
<id>Codehaus</id>
<name>Codehaus Maven Plugin Repository</name>
<url>http://repository.codehaus.org/org/codehaus/mojo/</url>
</pluginRepository>
</pluginRepositories>
2 Comments »
Update Sept. 4: How to get the latest builds of Clojure & Contrib
Maven is a touchy subject. People tend to have strong opinions about it. But like it or not, it’s the de-facto standard for dependency management in the Java world. Clojure lives in the Java world, so that means we have to live with Maven.
Here are some good things about Maven:
- “Convention over configuration.”
- Plugins are downloaded & installed automatically.
- Handles dependencies of dependencies.
- Declarative configuration, not imperative like Ant.
- Only stores one copy of each JAR, shared by all projects.
Here are some bad things about Maven:
- XML configuration file.
- Verbose command line options.
Doesn’t track latest source code of projects. It does; see comments (Thanks, Tim!)
- First run takes forever to download all the plugins.
- Verbose console output.
In my estimation, the good outweigh the bad. And nothing outweighs the huge fact that Maven is already there.
So let’s develop a Clojure app using Maven.
Step 1: Install Maven.
If you don’t already have it, that is. This is pretty easy, just visit maven.apache.org and follow the instructions.
Step 2: Create a new project.
Type the following at the command line:
mvn archetype:generate
Maven will ask a series of questions:
- archetype: At the “Choose a number” prompt, press enter to accept the default project type, maven-archetype-quickstart.
- groupId: Enter a name to identify yourself in the global Maven namespace. All your Maven projects will use the same groupId. This is typically a reverse domain name in the style of Java package names. For example, I could use the groupId com.stuartsierra
- artifactId: Enter a name to identify this specific project in the Maven repository. For example, my-great-clojure-library
- version: Press enter to accept the default, 1.0-SNAPSHOT
- package: Press enter to accept the default, which is the same as your groupId.
- Confirmation: Press enter.
Now you have a directory named my-great-clojure-library containing the skeleton of a new Java project.
Step 3: Configure your pom.xml.
Go into your new project directory and edit the pom.xml file. The basic information will already be filled out.
We can remove the dependency on JUnit, so delete these lines:
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
Then we need to add the Clojure Maven plugin, which tells Maven how to compile Clojure source code. Before the final </project> tag, add these lines:
<build>
<plugins>
<plugin>
<groupId>com.theoryinpractise</groupId>
<artifactId>clojure-maven-plugin</artifactId>
<version>1.0</version>
<executions>
<execution>
<id>compile-clojure</id>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
We also need to add Clojure itself as a dependency of our project (the Clojure Maven plugin does not do this automatically). Inside the <dependencies> tag, add the following lines:
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure</artifactId>
<version>1.0.0</version>
</dependency>
That’s if you want the official Clojure 1.0.0 release. If you want a cutting-edge version, I’ll explain how in a later post.
Step 4: Delete Java sources.
Your project directory comes pre-equipped with two Java source directories at src/main/java and src/test/java. You can delete both of them, unless, of course, you’re developing a mixed Clojure-Java project.
Step 5: Add dependencies.
If your project does anything interesting, chances are it’s going to depend on some external Java libraries. You can find libraries in the public Maven repositories at mvnrepository.com. Search for a library name, and it will show you the code to put in your pom.xml file.
For example, say we want to use the Apache Commons IO library. At mvnrepository.com, we find the dependency code for this library:
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>1.4</version>
</dependency>
And we can add that inside the <dependencies> section of pom.xml.
Step 6: Start coding!
Create the directory src/main/clojure. This is where all your .clj source files will go.
Follow the standard Clojure/Java convention for file names. That is, if you have a Clojure namespace called my.great.library, it should be in a file named src/main/clojure/my/great/library.clj
Step 7: Compile and install.
Run the following command:
mvn install
That will compile all your .clj source files into Java .class files, package them into a JAR, and install that JAR in your local Maven cache. On Unix-like systems, the cache should be at ~/.m2/repository/
Step 8: Live and learn.
There’s a whole lot more to learn about Maven. It’s a very flexible tool, and it can do almost anything. Yes, you will have to write some XML, but it’s really not that much.
Things I hope to cover in future posts:
- Using git submodules to track development versions of Clojure libraries.
- Running tests written in Clojure.
- Including .clj source files in your JAR.
- Creating a stand-alone JAR including all dependencies.
- Setting up a private Maven repository.
I hope this was a reasonable introduction to developing with Maven and Clojure, and that I have shown that Maven isn’t nearly as scary as people make it out to be. I think Maven suffered for a long time from poor documentation, but that’s changing rapidly. I found the (free) book Maven: The Definitive Guide extremely helpful.
Appendix: Complete pom.xml
Here’s the complete pom.xml file for the project I developed in this post:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.stuartsierra</groupId>
<artifactId>my-great-clojure-library</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>my-great-clojure-library</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>org.clojure</groupId>
<artifactId>clojure</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>1.4</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>com.theoryinpractise</groupId>
<artifactId>clojure-maven-plugin</artifactId>
<version>1.0</version>
<executions>
<execution>
<id>compile-clojure</id>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
11 Comments »
Update Sept. 3: Maven’s Not So Bad.
A lot of Ruby types come to Clojure and ask, “Where’s the package manager?” The answer is usually, “Maven or Ivy,” which isn’t really an answer.
I discussed this in the latter half of my Philly Lambda talk (PDF slides). The problem is that Clojure is built on Java, and any Clojure library that does something interesting is going to need some Java libraries beyond what the JDK provides.
Java has only one established dependency management system, Maven. (Ivy is an alternative, but it uses the Maven repositories.) Maven works, but it’s a big, complicated beast, built in the best giant-XML-configuration-file Java tradition. It’s also slow to accept new libraries into the public repositories. The central Maven 2 repository contains fewer than 700 libraries. Rubyforge, by contrast, lists over 8,000.
Maven seems to work well for large organizations that can benefit from setting up their own, private repositories, but it’s kind of a headache for the independent developer.
There’s a Clojure Maven plugin, some shell-based hacks like Corkscrew, and some Ivy-related code floating around, but none really provides what people want: one simple command to download and install all the dependencies for a project, without needing any XML.
What everyone wants, of course, is CPAN. Thousands of documented, tested modules for just about any task you could imagine, and quite a few you couldn’t (e.g., Acme::Buffy).
But CPAN was not created in a day. Most of its imitators (Rubygems, PEAR, Python Eggs) have failed to reach the same level of quality. Perl is also much older, and therefore more stable, than Python or Ruby. 10-year-old Perl code probably still works.
Part of this CPAN’s success, I think, has to do with the environment in it evolved. When Perl was the hot new language, running a web server was an expensive proposition. Even domain names weren’t cheap. If you were going to publish code on the web, there was a cost to doing so, either in time or money, so you wanted to make sure that it was worth publishing.
These days, when everyone has a blog and a Github account, sharing code is easy. Doing “git push” requires almost no thought, no investment of time. Why not release everything, even when it’s untested, undocumented, or unfinished?
So this weekend I started working on a package repository for Clojure. It was modeled it after CPAN, but designed to support anything that could be packaged in a JAR file, including compiled Java libraries and Clojure source code.
I got started. Then I thought, who would actually use this? Of the few dozen Clojure libraries that have been published on Github, only a handful are “production-ready.” Most aren’t even finished. Very few have been thoroughly tested. (I’m equally guilty in this regard.)
I concluded that it’s just too early. Clojure is a scarcely two years old. It just released “1.0″ this year, and is still developing rapidly. The libraries are evolving equally rapidly. If you want to build a project using, say, Compojure, the best way to do it is with Git submodules.
The one place a package manager would really be useful is in downloading and installing the standard Java packages that get used in almost every project, like the Apache Commons libraries. For this, Maven/Ivy works, if not brilliantly.
Update: another Maven helper: Clojure-POM
11 Comments »
|