End of the Free Lunch

I’m part of that awkward, in-between cohort, a little too young to fit in with Gen Xers but — although we grew up with computers like our younger siblings, the much-loathed Millennials — still old enough to recall life before the Internet. The Oregon Trail Generation still remembers, dimly, the screech of a dial-up modem on a phone line that went into the wall.

When I was a kid, my family subscribed to three daily newspapers. My favorite was the Washington Post, because it had the most comic strips.

We didn’t buy newspapers because we wanted to read the news — that came free from radio or television. Buying newspapers was just what you did if you wanted to know what was going on: movie showtimes, restaurant hours, job listings, or who was selling a boat trailer.

Sure, there were other ways to get that information. With the Yellow Pages (remember those?) and some patience (remember that?) you could find just about anything. But it would take hours to gather up all the information conveniently dropped on your doorstep every morning. So we bought newspapers. And because we bought newspapers, so did advertisers.

Nobody minded if a little journalism happened along the way. It gave the whole enterprise a kind of public-spirited sheen, as if you were fulfilling a civic duty by skimming the “World” headlines before turning with a sigh to “Arts & Leisure” (in my case, for the comics).

As an Oregon Trail adult, I have never subscribed to a newspaper. I don’t recall having bought a newspaper since about 2003. In that time, newspaper revenues dropped by half while Google grew 1,452%. You, as my generation used to say, do the math.

Whenever I read another article (on the web) bemoaning the decline of newspapers, the end of journalism, and the collapse of civilization — these are, of course, written by journalists — I wonder what business they thought newspapers were in. Nobody ever paid for news. News is a commodity. Being the sole gatekeeper for information, now, that’s a good business to be in.

Today Google is the gatekeeper. So we go to Google, so go the advertisers, so no more subsidized journalism. Instead, we get subsidized email, calendars, word processors, spreadsheets, maps, directions, translations, web browsers, operating systems, and self-driving cars. All we had to give up was a little privacy.

And it’s not as if journalism is going to disappear completely. It will just have to get used to surviving like all the other unprofitable civic institutions: on the philanthropic whims of rich people.

Apathy of the Commons

Eight years ago, I filed a bug on an open-source project.

HADOOP-3733 appeared to be a minor problem with special characters in URLs. I hadn’t bothered to examine the source code, but I assumed it would be an easy fix. Who knows, maybe it would even give some eager young programmer the opportunity to make their first contribution to open-source.

I moved on; I wasn’t using Hadoop day-to-day anymore. About once a year, though, I got a reminder email from JIRA when someone else stumbled across the bug and chimed in. Three patches were submitted, with a brief discussion around each, but the bug remained unresolved. A clumsy workaround was suggested.

Linus’s Law decrees that Given enough eyeballs, all bugs are shallow. But there’s a correlary: Given enough hands, all bugs are trivial. Which is not the same as easy.

The bug I reported clearly affected other people: It accumulated nine votes, making it the fourth-most-voted-on Hadoop ticket. And it seems like something easy to fix: just a simple character-escaping problem, a missed edge case. A beginning Java programmer should be able to fix it, right?

Perhaps that’s why no one wanted to fix it. HADOOP-3733 is not going to give anyone the opportunity to flex their algorithmic muscles or show off to their peers. It’s exactly the kind of tedious, persistent bug that programmers hate. It’s boring. And hey, there’s an easy workaround. Somebody else will fix it, right?

Eventually it was fixed. The final patch touched 12 files and added 724 lines: clearly non-trivial work requiring knowledge of Hadoop internals, a “deep” bug rather than a shallow one.

One day later, someone reported a second bug for the same issue with a different special character.

If there’s a lesson to draw from this, it’s that programming is not just hard, it’s often slow, tedious, and boring. It’s work. When programmers express a desire to contribute to open-source software, we think of grand designs, flashy new tools, and cheering crowds at conferences.

A reward system based on ego satisfaction and reputation optimizes for interesting, novel work. Everyone wants to be the master architect of the groundbreaking new framework in the hip new language. No one wants to dig through dozens of Java files for a years-old parsing bug.

But sometimes that’s the work that needs to be done.

* * *

Edit 2016-07-19: The author of the final patch, Steve Loughran, wrote up his analysis of the problem and its solution: Gardening the Commons. He deserves a lot of credit for being willing to take the (considerable) time needed to dig into the details of such an old bug and then work out a solution that addresses the root cause.

Fixtures as Caches

I am responsible — for better or for worse — for the library which eventually became clojure.test. It has remained largely the same since it was first added to the language distribution back in the pre-1.0 days. While there are many things about clojure.test which I would do differently now — dynamic binding, var metadata, side effects — it has held up remarkably well.

I consider fixtures to be one of the less-well-thought-out features of clojure.test. A clojure.test fixture is a function which wraps a test function, typically for the purpose of setting up and tearing down the environment in which the test should run. Because test functions do not take arguments, the only way for a fixture to pass state to the test function is through dynamic binding. A typical fixture looks like this:

 (ns fixtures-example
   (:require [clojure.test :as test :refer [deftest is]]))
 
 (def ^:dynamic *fix*)
 
 (defn my-fixture [test-fn]
   (println "Set up *fix*")
   (binding [*fix* 42]
     (test-fn))
   (println "Tear down *fix*"))
 
 (test/use-fixtures :each my-fixture)
 
 (deftest t1
   (println "Do test t1")
   (is (= *fix* 42)))
 
 (deftest t2
   (println "Do test t2")
   (is (= *fix* (* 7 6))))

There are two kinds of fixtures in clojure.test:

:each fixtures run once per test, for every test in the namespace.

:once fixtures run once per namespace, wrapped around all tests in that namespace.

I think the design of fixtures has a lot of problems. Firstly, attaching them to namespaces was a bad idea, since namespaces typically contain many different tests, only some of which actually need the fixture. This increases the likelihood of unintended coupling between fixtures and test code.

Secondly, :each fixtures are redundant. If you need to wrap every test in some piece of shared code, all you need to do is put the shared code in a function or macro and call it in the body of each test function. There’s a small amount of duplication, but you gain flexibility to add tests which do not use the same shared code.

(Another common complaint about fixtures is that they make it difficult to execute single tests in isolation, although the addition of test-vars in Clojure 1.6 ameliorated that problem.)

So :once fixtures are the only ones that matter. But if you want true isolation between your tests then they should not share any state at all. The only reason for sharing fixtures across tests is when the fixture does something expensive or time-consuming. Here again, namespaces are often the wrong level of granularity. If some resource is expensive to prepare, you may only want to pay the cost of preparing it once for all tests in your project, not once per namespace.

So the purpose of :once fixtures is to cache their initialized state in between tests. What if we were to use fixtures only for caching? It might look something like this:

 (ns caching-example
   (:require [clojure.test :refer [deftest is]]))
 
 (def ^:dynamic ^:private *fix* nil)
 
 (defn new-fix
   "Computes a new 'fix' value for tests."
   []
   (println "Computing fixed value")
   42)
 
 (defn fix
   "Returns the current 'fix' value for
   tests, creating one if needed."
   []
   (or *fix* (new-fix)))
 
 (defn fix-fixture
   "A fixture function to provide a reusable
   'fix' value for all tests in a namespace."
   [test-fn]
   (binding [*fix* (new-fix)]
     (test-fn)))
 
 (clojure.test/use-fixtures :once fix-fixture)
 
 (deftest t1
   (is (= (fix) 42)))
 
 (deftest t2
   (is (= (fix) (* 7 6))))

This still avoids repeated computation of the fix value, but clearly shows exactly which tests use it. The :once fixture is just an optimization: You could remove it and the tests would still work, perhaps more slowly. Best of all, you can run the individual test functions in the REPL without any additional setup.

The same idea works even if the fixture requires tear-down after tests are finished:

 (ns resource-example
   (:require [clojure.test :refer [deftest is]]))
 
 (defn acquire-resource []
   (println "Acquiring resource")
   :the-resource)
 
 (defn release-resource [resource]
   (println "Releasing resource"))
 
 (def ^:dynamic ^:private *resource* nil)
 
 (defmacro with-resource
   "Acquires resource and binds it locally to
   symbol while executing body. Ensures resource
   is released after body completes. If called in
   a dynamic context in which *resource* is
   already bound, reuses the existing resource and
   does not release it."
   [symbol & body]
   `(let [~symbol (or *resource*
                      (acquire-resource))]
      (try ~@body
           (finally
             (when-not *resource*
               (release-resource ~symbol))))))
 
 (defn resource-fixture
   "Fixture function to acquire a resource for all
   tests in a namespace."
   [test-fn]
   (with-resource r
     (binding [*resource* r]
       (test-fn))))
 
 (clojure.test/use-fixtures :once resource-fixture)
 
 (deftest t1
   (with-resource r
     (is (keyword? r))))
 
 (deftest t2
   (with-resource r
     (is (= "the-resource" (name r)))))
 
 (deftest t3
   (with-resource r
     (is (nil? (namespace r)))))

Again, each of these tests can be run individually at the REPL with no extra ceremony. If you don’t want to keep paying the resource-setup cost in the REPL, you could temporarily redefine the *resource* var in its initialized state.

The key in both cases is that the “fixtures” are designed to nest without duplicating effort. Each test function specifies exactly what state or resources it needs, but only creates them if they do not already exist. Some of those resources may be shared among multiple tests, but that fact is hidden from the individual tests.

With this in mind, it becomes possible to share a resource across all tests in a project, not just within a namespace. All you need is an “entry point” which kicks off all the tests. clojure.test provides run-tests for specifying individual namespaces and run-all-tests to search for namespaces by regex. All you have to do is make sure your test namespaces are loaded, either via direct require or a utility such as tools.namespace. Then you can run a full test suite that only executes the expensive setup/teardown code once:

 (ns main-test
   (:require [clojure.test :as test]
             [my.app.a-test]))
 
 (defn -main [& _]
   (with-resource-1
     (with-resource-2
       ;;; ... more fixture wrappers ...
       (test/run-all-tests #"^my\.app\..+-test$"))))

Open-source Bundling

Cast your mind back to the halcyon days of the late ’90s. Windows 95/98. Internet Explorer 4. Before you laugh, consider that IE4 included some pretty cutting-edge technology for the time: Dynamic HTML, TLS 1.0, single sign-on, streaming media, and “Channels” before RSS. IE4 even pioneered — unsuccessfully — the idea of “web browser as operating system” a decade before Google Apps.

But if you remember anything about IE in the ’90s, it’s probably the word bundling. United States v. Microsoft centered on the tight integration of IE with Windows. If you had Windows, you had to have IE. By the time the lawsuit reached a settlement, IE was entrenched as the dominant browser.

Fast forward to the present. What an enlightened age we live in. Open-source has won and the browser market has fragmented. Firefox broke the IE hegemony, and Chrome killed it. The web browser really is an operating system.

But if you look around at software today, “bundling” is still with us, even in open-source software, that champion of choice and touchstone of tinkering.

To take an example (and to get the taste of IE out of your brain) let’s look at Hystrix, a Java fault-tolerance framework written at Netflix. First let me say that Hystrix is a fantastic piece of engineering. Netflix has given a great gift to the open-source community by releasing, for free, an essential part of their software infrastructure. I’ve learned a lot by studying the Hystrix documentation and source code.

But if you want to use Hystrix in your application, you have to use RxJava and Netflix’s Archaius configuration management framework. Via transitive dependencies, you also have to use Google’s Guava, the Jackson JSON processor, SLF4J, and Apache’s Commons Configuration, Commons Lang, and Commons Logging. For those of you keeping score at home, that’s two different logging APIs, two configuration APIs, and two grab-bag “utility” libraries.

There’s nothing wrong with these library choices. They may be suitable for your application or they may not. But either way, you don’t get a choice. If you want Hystrix, you have to have RxJava and all the rest. Even if you choose to ignore, say, Archaius, it’s still there, linked into your application code, with whatever bugs and security holes it might carry.

I don’t mean to pick on Netflix here either. As I said, Hystrix is a fantastic piece of engineering, and I’m very happy that Netflix released it. But it points to a mismatch between the goals of “internal-use” software and “open-source” software.

If you’re developing a tool or library for internal use within an organization, it makes sense to integrate closely with other software internal to that organization. It saves time, reduces development effort, and makes the software organization more efficient. When software is tightly integrated, each new tool or library multiplies the value of all the other software which came before it. That’s how technology companies like Netflix or Google can deliver consistently high-quality products and rapid innovation at scale.

The downside to this approach, from the open-source point of view, is that each new tool or library released by a software organization tends to be tightly coupled to the software which preceded it. More dependencies mean more opportunities for bugs, security holes, and misconfiguration. For the application developer using open-source libraries, each new dependency multiplies the cost of development and maintenance.

It’s not just corporate-sponsored open source that suffers from this problem — just look at the dependency tree of any Apache project.

The root problem is that great, hairy Minotaur which stalks the labyrinthine passages of any large code base: cross-cutting concerns. Almost any piece of code in an application will need, at some point, to deal with at least some of:

  • Logging
  • Configuration
  • Error handling & recovery
  • Process/thread management
  • Resource management
  • Startup/shutdown
  • Network communication
  • Filesystems
  • Data persistence
  • (De)serialization
  • Caching
  • Internationalization/translation
  • Build/provisioning/deployment

It’s much easier to write code if you know how each of these cross-cutting concerns will be handled. So when you’re developing something in-house, obviously you use the tools and libraries your organization has standardized on. Even if you’re writing something which you plan to make open-source, it’s easier to rely on the tools and patterns you already know.

It’s difficult to avoid coupling library code to one or more of these concerns. Take logging, for example. Java has had a built-in logging framework since 1.4. But many developers preferred Log4j or one of a handful of others. To avoid coupling libraries to a single logging framework, there is Apache Commons Logging, which tries to abstract over different logging frameworks with clever class-loading tricks. That turned out to be a brittle solution, so we got SLF4J, which puts responsibility for linking the correct logging APIs back in the hands of the application developer. But no one wants to take an entire day to slog through the SLF4J manual in the middle of building an application. Throw in the mysterious interactions of transitive dependencies in Maven-style build tools, and it’s no wonder every Java app starts up with an error message about logging. And logging is the easy case — most programmers could probably agree on what, broadly speaking, a logging framework needs to do. But still we have half a dozen widely-used, slightly-different logging APIs.

Developing a library which avoids making decisions about cross-cutting concerns is possible, but it takes painstaking attention to detail, with lots of extra extension points. (See Chris Houser’s talk on Exception Handling for an example.) Unfortunately, the resulting library is often less-than-satisfying to potential users because it has so many “holes” that need to be filled in. Who wants to spend half a day writing “glue” code and callbacks before you can even try out a new library? Busy application developers have an incentive to choose libraries that work “out of the box,” so library creators have an incentive to make arbitrary decisions about cross-cutting concerns. We justify this with the oxymoron “sensible defaults.”

The conclusion I draw from all this is that modern programming languages have succeeded at making software out of reusable parts, but have largely failed at making software out of interchangeable parts. You cannot just “swap in,” say, a different thread-management library. Hystrix itself exists to solve a problem with libraries and cross-cutting concerns in a services architecture. Quoting from the Hystrix docs:

Applications in complex distributed architectures have dozens of dependencies, each of which will inevitably fail at some point. If the host application is not isolated from these external failures, it risks being taken down with them.

These issues are exacerbated when network access is performed through a third-party client — a “black box” where implementation details are hidden and can change at any time, and network or resource configurations are different for each client library and often difficult to monitor and change.

Even worse are transitive dependencies that perform potentially expensive or fault-prone network calls without being explicitly invoked by the application.

Netflix has so many “API client” libraries, each making their own network calls with unpredictable behavior, that to make their systems robust they have to isolate each library in its own thread pool. Again, this is amazing engineering, but it was necessary precisely because too many libraries came bundled with their own networking, error handling, and resource management decisions.

A robust solution would seem to require everyone to agree on standards for every possible cross-cutting concern. That will obviously never happen. Even a so-called batteries-included language cannot keep the same batteries forever. This is a hard problem, and like all truly hard problems in software, it’s more about people than code.

I wish I had a perfect solution, but the best I can offer is some guidance. If you’re writing an open-source library, do everything in your power to avoid dependencies. Use only the features of the core language, and use those conservatively. Don’t pull in a library that deals with some cross-cutting concern just because it might be more convenient for your users. Build your API around plain functions and standard data structures.

Some examples, specific to Clojure:

  • Don’t depend on a logging framework unless it’s SLF4J.

  • Don’t use an error-handling framework: Throw ex-info with enough data for a handler to decide what to do.

  • If you need to do something asynchronous, use callbacks instead of core.async. Callbacks are easily integrated with core.async if that’s what the user wants to do. Likewise, if you need some kind of inversion of control, use function callbacks or protocols.

  • Don’t depend on any state-management framework or “ambient” state. Pass everything needed by an API function in its arguments. Provide operations for resource initialization and termination as part of your API. Same for configuration: pass a Clojure map as an argument.

  • Network communication and serialization: these are, admittedly, almost impossible to avoid if you’re writing a library for some network API. But you can at least give users the option of controlling their own networking by providing APIs to prepare requests and parse responses independently of making the actual network calls.

On the other hand, some “libraries” really are more like “embeddable services,” with their own internal state. Large frameworks like Hystrix fall into this category, as do a few sophisticated “client” libraries. These libraries might be expected to manage their own resources and state “under the hood.” That’s a reasonable design choice, but at least be clear about which goal you’re pursuing and what trade-offs you’re making. In most language runtimes, the behavior and dependencies of these libraries cannot be fully isolated from the rest of the code. As an application developer, I might be willing to invest time and effort arranging my code to accommodate one or two embedded services that offer significant power in exchange for the added complexity. For everything else, when I need a library, just give me some ordinary functions.

How to Name Clojure Functions

This is a guide on naming Clojure functions. There are exceptions to every rule. When you’re defining something based on natural language, there are more exceptions than rules. I break these rules more often than I follow them. This guide is just a starting point for thinking about how to name things.

Pure functions

Pure functions which return values are named with nouns describing the value they return.

If I have a function to compute a user’s age based on their birthdate, it is called age, not calculate-age or get-age.

Think of the definition: a pure function is one which can be replaced with its value without affecting the result. So why not make that evident in the name?

This is particularly good for constructors and accessors. No need to clutter up your function names with meaningless prefixes like get- and make-.

Don’t repeat the name of the namespace

Function names should not repeat the name of the namespace.

(ns products)

;; Bad, redundant:
(defn product-price [product]
  ;; ...
  )

;; Good:
(defn price [product]
  ;; ...
  )

Assume that consumers of a function will use it with a namespace alias.

Conversions and coercions

I don’t much like -> arrows in function names, and I try to avoid them.

If the function is a coercion, that is, it is meant to convert any of several input types into the desired output type, then name it for the output type. For example, in clojure.java.io the functions file, reader, and writer are all coercions.

If there are different functions for different input types, then each one is a conversion. In that case, use input-type->output-type names.

Functions with side effects

Functions which have side-effects are named with verbs describing what they do.

Constructor functions with side-effects, such as adding a record to a database, have names starting with create-. (I borrowed this idea from Stuart Halloway’s Datomic tutorials.)

Functions which perform side-effects to retrieve some information (e.g. query a web service) have names starting with get-.

For words which could be either nouns or verbs, assume noun by default then add words to make verb phrases. E.g. message constructs a new object representing a message, send-message transmits it.

I don’t use the exclamation-mark convention (e.g. swap!) much. Different people use it to mean different things (side effect, state change, transaction-unsafe) so the meaning is vague at best. If I do use an exclamation mark, it’s to signal a change to a mutable reference, not other side-effects such as I/O.

Local name clashes

One problem I find is in let blocks, when the obvious name for a local is the same as the function which computes it. If you’re not careful, this can lead to clashes:

(defn shipping-label
  "Returns a new label to ship product to customer."
  [customer product]
  (let [address (address customer)
        weight (weight product)
        supplier (supplier product)]
    {:from (address supplier)  ; oops, 'address' clashes!
     :to address
     :weight weight}))

This is less of a problem when the functions are defined in a different namespace and referenced via an alias:

(defn shipping-label
  "Returns a new label to ship product to customer."
  [customer product]
  (let [address (mailing/address customer)
        weight (product/weight product)
        supplier (product/supplier product)]
    {:from (mailing/address supplier)  ; OK
     :to address
     :weight weight}))

If name-clashes become a problem, add prefixes to the function names, new- for constructors and get- for accessors. If you are bothered that this contradicts the previous section, re-read the first paragraph of this article.

Function returning functions

In general, I try to avoid defining top-level functions which return functions if I can write make the intent clearer using anonymous functions instead.

For example, writing something like this makes me feel clever:

(defn foo
  "Returns a function to compute foo of value."
  [option]
  (fn [value]
    ;;... do stuff with value ...
    ))

(defn computation
  "Does stuff with values."
  [option values]
  (->> values
       (map (foo option))  ; look at me!
       ;; ...
       ))

But it’s easier for someone else to read when the closure is created close to where it’s used:

(defn foo
  "Returns the foo of value"
  [value option]
  ;; ... 
  )

(defn computation [option values]
  (->> values
       (map #(foo % option))  ; I see what this does
       ;; ...
       ))

I allow an exception to this rule when returning functions is part of a repeated pattern. For example, the transducer versions of map, filter, and other sequence functions all return functions, but that’s a standard part of the language since Clojure 1.7 so users can be expected to know about it. Occasionally I discover a similar pattern in my own code.

When functions returning functions are not part of a repeated pattern but for some reason I want them anyway, I call them out with a suffix -fn, like:

(defn foo-fn
  "Returns a function to compute the foo of a value."
  [param]
  (fn [value]
    ;; ...
    ))

Clojure 2015 Year in Review

Another year, another year-in-review post. To be honest, I feel like any attempt I make to summarize what happened in the Clojure world this year is largely moot. Clojure has gotten so big, so — dare I say it? — mainstream that I can’t even begin to keep up with all the interesting things that are happening. But it’s a tradition, so I’ll stick to it. Once again, here is my incomplete, thoroughly-biased list of notable Clojurey things this year.

As I said of JVM Clojure in 2012, I think I can safely say that 2015 was the year ClojureScript grew up. It got a real release number, improved REPL support, and the ability to compile itself. But you don’t have to take my word for it: David Nolen has written his own ClojureScript Year in Review.

Clojure in the World

We already knew Clojure was being used at big companies like Walmart and Amazon. Based on public job postings, we’ve also seen places like Reuters, Capital One, and Oracle interested in Clojure developers.

Big corporations tend to be cagey about their technology choices, but Walmart’s Anthony Marcar came to Clojure/West to talk about how they do Clojure at Scale.

In other big-tech news, Facebook acquired Wit.ai, a Clojure startup that released an open-source library to parse structured data from text. Clojure early-adopter Prismatic pivoted away from its popular news-recommendation app to focus full-time on the A.I. business as well.

Language, Tools, and Libraries

Clojure 1.7 was released, bringing Transducers and the much-anticipated Reader Conditionals to support mixed Clojure-ClojureScript projects. Writing cross-platform libraries suddenly got easier. A bunch of popular Clojure libraries were ported to ClojureScript, including test.check, tools.reader, and my Component.

core.async got a major new release, with the added features promise-chan, offer!, and poll!.

The big news on the tooling front was the 1.0 release of Cursive, the first commercial IDE for Clojure. On the open-source side, both Light Table and CIDER got major new releases.

In the ClojureScript tooling world, Figwheel and Devcards really took off this year.

Clojars started getting financial support from the community, and CLJSJS started offering JavaScript libraries conveniently packaged for ClojureScript and the Google Closure Compiler.

Books and Docs

clojure.org went open-source for contributions from the community.

New books: Clojure Applied (my review), Clojure for the Brave and True in print, Living Clojure, Clojure Recipes, and many more.

Events and Community

The Clojurians Slack community rocketed from just an idea to over four thousand members. If you don’t care for Slack, the #clojure IRC channel on Freenode is still going.

The Clojure mailing list hit ten thousand members.

At Clojure/conj this year, we had the first-ever Datomic conference. You can binge-watch Clojure conference videos (Clojure/conj, EuroClojure, and Clojure/West) on the ClojureTV YouTube channel. Also check out Clojure eXchange and :clojureD.

Clojure is attracting some interest from academic computer science, including a new paper on optimizing immutable hash maps.

Summary

There’s not much more to say. Or rather, there is very much more to say than what I can capture in a single post. Clojure is here to stay. Let’s enjoy it.

Thanks to David Nolen, Alex Miller, Timothy Baldridge, Carin Meier, and Daemian Mack for their help preparing this post.

An Opinionated Review of Clojure Applied

Why write a book about open-source software? (Not for the money. Trust me.) I’ve seen far too many “technical books” that merely regurgitate the documentation of a bunch of open source libraries. I’m happy to say that Clojure Applied, by my friends and colleagues Alex Miller and Ben Vandgrift, is not in this category. They sent me a free copy in return for review, so here it is.

As my other colleague Luke VanderHart pointed out in his recent talk at ClojureTRE, the biggest gap in written material about about Clojure — and a good candidate for the source of most “documentation” complaints — has been the lack of narrative. Clojure has great reference documentation, and lots of Clojure libraries come with great tutorials, but there aren’t many comprehensive stories about building complete applications.

Clojure Applied tells a story about how to write Clojure applications. This is not just a book about Clojure, it’s a book about how to write software which assumes you’re going to use Clojure to do it. Clojure Applied would probably not be a good choice as your first Clojure book, although it would be an excellent book to read while you are learning Clojure.

The narrative develops like most successful Clojure programs: from the bottom up, starting with data. Instead of primitives or abstract concepts, the first chapter begins with modeling domain data using maps and records. This is the right place to start, and I could wish this chapter went into even more detail. For example, I would have liked to see a comparison of nested versus “flat” map structures (“flatter” is easier to work with).

Domain modeling is a difficult concept to describe, so I can’t fairly criticize Ben & Alex’s efforts here, but I do think they lose their way slightly by introducing Prismatic’s Schema library very early on. To be sure, Schema is a powerful library that a lot of Clojure developers find useful. But placed here, along with discussions of type-based dispatch, it leaves the reader with the idea that types are a central feature of domain modeling in Clojure. I disagree. Thinking in terms of types early in the design process often leads to unnecessarily restrictive choices, producing inflexible designs.

Further compounding the error of focusing on types, this chapter wanders into more advanced topics such as protocols and multimethods. There’s even a technique for dynamically extending protocols at runtime, an advanced tactic I would not recommend under most circumstances.

Embracing the possibilities of design in a dynamically-typed language requires a willingness to work with values whose types may be unknown at certain points. For example, the type of the “accumulator” value in a transducer is hard to define, because for most transducers its type is irrelevant. The ability to ignore types when they are not needed is what makes dynamic languages so expressive.

On the other hand, I have seen many large Clojure programs drift into incomprehensibility by failing to constrain their types in any way, passing complex nested structures everywhere. In this case, the lack of validation leads to a kind of inside-out spaghetti code in which it’s impossible to deduce the type of something by reading the code which uses it. Given the choice between these two extremes, “over-typed” code will be easier to untangle than “under-typed,” so perhaps introducing validation early is a good idea.

Moving along, Chapter Two covers the other Clojure collection types. This is beginner material, but presented in terms of how and why you might want to use each type of collection. This chapter also covers some common beginner questions, such as how to search a sequential collection. Another advanced topic which I would not recommend — defining a new collection type implementing Clojure’s interfaces — sneaks in here, but I’ll give it a pass because it helps you understand the collection interfaces.

Chapter Three zeroes in on the sequential collections, in particular the sequence library. Here the narrative is about combining sequence functions, in particular the pattern of filter-map-reduce. This pattern is so fundamental that an experienced Clojure programmer (or Lisp, or any functional language) might not even think about it, but it’s a critical step to becoming an effective user of Clojure. This chapter also introduces Transducers. Even though Transducers might be considered an “advanced” topic, I think they belong here alongside sequences. The concepts are the same, and Transducers are really quite straightforward if you’re only looking, as this chapter does, at how to use them and not how they are implemented.

Part II, “Applications,” is probably the best section in the book. This is the critical piece missing from the first-round books about Clojure (including my own). How do you start combining all these little functional pieces into a working program?

The first chapter in this section describes mutable references with the most comprehensible real-world example I have seen, and also includes an excellent explanation of identity versus state. Another chapter describes all the various techniques for using multiple cores, including an important discussion of pipelines and core.async go blocks as processes.

Then there’s an entire chapter devoted to components as they are expressed in namespaces. Ben & Alex introduce all the concepts necessary to design and implement components without using my Component library, which I think is a smart choice. The Component library arrives in the following chapter, along with the idea of composing an application out of many components.

Part III, “Practices,” covers testing, output formats, and deployment.

The chapter on testing has a good description of the trade-offs between example-based and property-based testing, but doesn’t delve into the more difficult areas of integration or whole-system testing. Advanced testing techniques really deserve an entire book of their own.

“Formatting data” covers the usual suspects: JSON, EDN, and Transit. In my experience, the choice of data format is usually dictated by external constraints, but this chapter at least makes the trade-offs clear.

Finally, the chapter on deployment is a high-level overview of everything from GitHub to Elastic Beanstalk. There’s even a discussion of open-source licensing and contributor agreements. Heroku gets the most attention, which makes sense for a book targeted at mostly-beginners, but at least this chapter introduces some of the concerns one might want to think about when choosing a deployment platform.

After the last chapter, there’s a bonus pair of appendices. The first briefly covers the “roots” of Clojure, with links to source material. The second summarizes some principal motivations behind Clojure’s design as a guide to “Thinking in Clojure.” This latter section might have been more usefully incorporated into the text of the book, but that’s harder to write and can tend toward the preachy, so I can’t complain.

Circling back to where I started, Clojure Applied is a great book to read while learning Clojure. It’s not a language tutorial. It’s not stuffed with revolutionary ideas. Most importantly, it doesn’t try to do too much. It’s just solid, practical advice. Even the recommendations I disagree with are not bad ideas, just different preferences. Follow Ben & Alex’s advice while building your first Clojure program, and you’ll have a solid foundation to explore your own ideas and preferences.

Clojure Don’ts: Lazy Effects

This is probably my number one Clojure Don’t.

Laziness is often useful. It allows you to express “infinite” computations, and only pay for as much of the computation as you need.

Laziness also allows you to express computations without specifying when they should happen. And that’s a problem when you add side-effects.

By definition, a side-effect is something that changes the world outside your program. You almost certainly want it to happen at a specific time. Laziness takes away your control of when things happen.

So the rule is simple: Never mix side effects with lazy operations.

For example, if you need to do something to every element in a collection, you might reach for map. If thing you’re doing is a pure function, that’s fine. But if the thing you’re doing has side effects, map can lead to very unexpected results.

For example, this is a common new-to-Clojure mistake:

(take 5 (map prn (range 10)))

which prints

0
1
2
3
4
5
6
7
8
9

This is the old “chunked sequence” conundrum. Like many other lazy sequence functions, map has an optimization which allows it to evaluate batches of 32 elements at a time.

Then there’s the issue of lazy sequences not being evaluated at all. For example:

(do (map prn [0 1 2 3 4 5 6 7 8 9 10])
    (println "Hello, world!"))

which prints only:

Hello, world!

You might get the advice that you can “force” a lazy sequence to be evaluated with doall or dorun. There are also snippets floating around that purport to “unchunk” a sequence.

In my opinion, the presence of doall, dorun, or even “unchunk” is almost always a sign that something never should have been a lazy sequence in the first place.

Only use pure functions with the lazy sequence operations like map, filter, take-while, etc. When you need side effects, use one of these alternatives:

  • doseq: good default choice, clearly indicates side effects
  • run!: new in Clojure 1.7, can take the place of (dorun (map ...))
  • reduce, transduce, or something built on them

The last requires some more explanation. reduce and transduce are both non-lazy ways of consuming sequences or collections. As such, they are technically safe to use with side-effecting operations.

For example, this composition of take and map:

(transduce (comp (take 5)
                 (map prn))
           conj
           []
           (range 10))

only prints 5 elements of the sequence, as requested:

0
1
2
3
4

The single-argument version of map returns a transducer which calls its function once for each element. The map transducer can’t control when the function gets evaluated — that’s in the hands of transduce, which is eager (non-lazy). The single-argument take limits the reduction to the first five elements.

As a general rule, I would not recommend using side-effecting operations in transducers. But if you know that the transducer will be used only in non-lazy operations — such as transduce, run!, or into — then it may be convenient.

(defn operation [input]
  ;; do something with input, return result
  (str "Result for " input))

(prn (into #{}
           (comp (take 3)
                 (map operation))
           (range 100)))

reduce, transduce, and into are useful when you need to collect the return value of the side-effecting operation.

Clojure Don’ts: Redundant map

Today’s Clojure Don’t is the opposite side of the coin to the heisenparameter.

If you have an operation on a single object, you don’t need to define another version just to operate on a collection of those objects.

That is, if you have a function like this:

(defn process-thing [thing]
  ;; process one thing
  )

There is no reason to also write this:

(defn process-many-things [things]
  (map process-thing things))

The idiom “map a function over a collection” is so universal that any Clojure programmer should be able to write it without thinking twice.

Having a separate definition for processing a group of things implies that there is something special about processing a group instead of a single item. (For example, a more efficient batch implementation.) If that’s the case, then by all means write the batch version as well. But if not, then a function like process-many-things just clutters up your code while providing no benefit.

Clojure Don’ts: Single-branch if

A short Clojure don’t for today. This one is my style preference.

You have a single expression which should run if a condition is true, otherwise return nil.

Most Clojure programmers would probably write this:

(when (condition? ...)
  (then-expression ...))

But you could also write this:

(if (condition? ...)
  (then-expression ...)
  nil)

Or even this, because the “else” branch of if defaults to nil:

(if (condition? ...)
  (then-expression ...))

There’s an argument to be made for any one of these.

The second variant, if ... nil, makes it very explicit that you want to return nil. The nil might be semantically meaningful in this context instead of just a “default” value.

Some people like the third variant, if with no “else” branch, because they think when is only for side-effects, leaving the single-branch if for “pure” code.

But for me it comes down, as usual, to readability.

The vast majority of the time, if contains both “then” and “else” expressions.

Sometimes a long “then” branch leaves the “else” branch dangling below it. I’m expecting this, so when I read an if my eyes automatically scan down to find the “else” branch.

If I see an if but don’t find an “else” branch, I get momentarily confused. Maybe a line is missing or the code is mis-indented.

Likewise, if I see an if explicitly returing nil, it looks like a mistake because I know it could be written as when. This is a universal pattern in Clojure: lots of expressions (cond, get, some) return nil as their default case, so it’s jarring to see a literal nil as a return value.

So my preferred style is the first version. In general terms:

An if should always have both “then” and “else” branches.
Use when for a condition which should return nil in the negative case.