On the Perils of Dynamic Scope

Common Lisp the Language (CLtL) devotes an entire chapter to the subject of Scope and Extent. It defines scope as the textual region of a program in which an entity may be used, where “entity” could be a symbol, a value, or something more abstract like a variable binding.

So scope is about where you can use something. CLtL defines two different kinds of scope:

Lexical scope is usually the body of a single expression like let or defun.

Indefinite scope is everything else, effectively the global set of symbols that exist in a program.

In contrast, extent is about when: it’s the interval of time during which something may be used. CLtL defines two different kinds of extent:

Dynamic extent refers to things that exist for a fixed period of time and are explicitly “destroyed” at the end of that period, usually when control returns to the code that created the thing.

Indefinite extent is everything else: things that get created, passed around, and eventually garbage-collected.

In any language with a garbage collector, most things have indefinite extent. You can create strings, lists, or hash tables and pass them around with impunity. When the garbage collector determines that you are done using something, it reclaims the memory. The process is, in most cases, completely transparent to you.

But what about the so-called dynamic scope? The authors of CLtL have this to say:

The term “dynamic scope” is a misnomer. Nevertheless it is both traditional and useful.

They also define “dynamic scope” to be the combination of indefinite scope and dynamic extent. That is, things with dynamic scope are valid in any place in a program, but only for a limited time. In Common Lisp, these are called “special variables,” and are created with the macros defparameter and defvar.

Vars

So what does this have to do with Clojure? Clojure has these things called Vars. Every time you write def or defn or one of its variants in Clojure, you’re creating a Var.

Vars have indefinite scope: no matter where you def a Var, it’s visible everywhere in the program.1

Vars usually have indefinite extent as well. Usually. This is where things get tricky. Clojure, unlike Common Lisp, was designed for multi-threaded programs.2 The meaning of extent gets a lot muddier in the face of multiple threads. Each thread has its own timeline, its own view of “now” which may or may not conform to any other thread’s view.

In Clojure versions 1.2 and earlier, all Vars had dynamic scope by default, but this meant that there was a performance cost to look up the current dynamic binding of a Var on every function call. Leading up to 1.3, Rich Hickey experimented with allowing Vars to be declared ^:static, before settling on static by default with ^:dynamic as an option. You can still find ^:static declarations littered through the Clojure source code. Maybe someday they’ll be useful again.

The definition of “dynamic scope” in Clojure is even fuzzier than it is in Common Lisp. How do we define “extent” in the face of multiple threads, each potentially with its own thread-local binding? If a resource can be shared across multiple threads, we have to coordinate the cleanup of that resource. For example, if I open a socket and then hand it off to another piece of code, who is responsible for closing the socket?

Resource management is one concurrency bugaboo that Clojure developers have not managed to crack. Various attempts have been made: you can see the artifacts in wiki and mailing list discussions of Resource Scopes. So far, no solution has been found that doesn’t just shift the problem somewhere else.

It ends up looking a bit like garbage collection: how do I track the path of every resource used by my program and ensure that it gets cleaned up at the appropriate time? But it’s even harder than that, because resources like file handles and sockets are much scarcer than memory: they need to be reclaimed as soon as possible. In a modern runtime like the JVM, garbage collection is stochastic: there’s no guarantee that it will happen at any particular time, or even that it will happen at all.

To make matters worse, Clojure has laziness to contend with. It’s entirely possible to obtain a resource, start consuming it via a lazy sequence, and never finish consuming it.

The Wrong Solution

This brings me to one of my top anti-patterns in Clojure: the Dynamically-Scoped Singleton Resource (DSSR).

The DSSR is popular in libraries that depend on some external resource such as a socket, file, or database connection. It typically looks like this:

(ns com.example.library)

(def ^:dynamic *resource*)

(defn- internal-procedure []
  ;; ... uses *resource* ...
  )

(defn public-api-function [arg]
  ;; ... calls internal-procedure ...
  )

That is, there is a single dynamic Var holding the “resource” on which the rest of the API operates. The DSSR is often accompanied by a with-* macro:

(defmacro with-resource [src & body]
  `(binding [*resource* (acquire src)]
     (try ~@body
       (finally
         (dispose *resource*)))))

This looks harmless enough. It’s practically a carbon copy of Clojure’s with-open macro, and it ensures that the resource will get cleaned up even if body throws an exception.

The problem with this pattern, especially in libraries, is the constraints it imposes on any code that wants to use the library. The with-resource macro severely constrains what you can do in the body:

You can’t dispatch to another thread. Say goodbye to Agents, Futures, thread pools, non-blocking I/O, or any other kind of asynchrony. The resource is only valid on the current thread.3

You can’t return a lazy sequence backed by the resource because the resource will be destroyed as soon as body returns.

You can’t have more than one resource at a time. Hence the “singleton” in the name of this pattern. Using a thread-bound Var throughout the API means that you can never operate on more than one instance of the resource in a single thread. Lots of apps need to work with multiple databases, which really sucks using this kind of library.

The last problem with this pattern is a more subtle one: hidden dependencies. The public API functions, which have global scope, depend on the state (thread-local binding) of another Var with global scope. This dependency isn’t explicitly stated anywhere in the definition of those functions. That might not seem like such a big deal in small examples, and it isn’t. But as programs (and development teams) grow larger, it’s one additional piece of implicit knowledge that you have to keep in your head. If there are seventeen layers of function calls between the resource binding and its usage, how certain are you going to be that the resource has the right extent?

Friends Don’t Let Friends Use Dynamic Scope

The alternative is easy: don’t do it. Don’t try to “solve” resource management in every library.

By all means, provide the functions to acquire and dispose of resources, but then let the application programmer decide what to do with them. Define API functions to take the resource as an argument.

Applications can manage their own resources, and only the application programmer knows what the extent of those resources should be. Maybe you can pass it around as a value. Maybe you want to use dynamic binding after all. Maybe you want to stash it in a global state Var.4 That’s for you to decide.

Datomic is a good example to follow: it creates connection objects that have a lot of state attached to them — sockets, queues, and threads. But it says nothing about how you should manage the extent of those connections.5 Most functions in the Datomic API take either a connection or a database (a value obtained from the connection) as an argument.

Safe Dynamic Scope

So dynamic scope is totally evil, right? Not totally. There are situations where dynamic scope can be helpful without causing the cascade of problems I described above.

Remember that dynamic scope in Clojure is really thread-local binding. Therefore, it’s best suited to operations that are confined to a single thread. There are plenty of examples of this: most popular algorithms are single-threaded, after all. Consider the classic recursive-descent parser: you start with one function call at the top and you’re not done until that function returns. The entire operation happens on a single thread, in a single call stack. It has dynamic extent.

I took advantage of this fact in a Clojure JSON parser. There were a number of control flags that I needed to make available to all the functions. Rather than pass around extra arguments all over the place, I created private dynamic Vars to hold them. Those Vars get bound in the entry-point to the parser, based on options passed in as arguments to the public API function. The thread-local state never leaks out of the initial function call.

As another example, the Clojure compiler, although written in Java, uses dynamic Vars to keep track of internal state.

And what about our friend with-open? I said that the example with-resource macro was nearly a copy of it, but only nearly. clojure.core/with-open creates lexical (i.e. local) bindings. It still suffers from some limitations around what you can do in the body, but at least it doesn’t limit you to one resource at a time.

Global state is the zombie in the closet of every Clojure program, about which I’ll have more to say in future posts. For now, I hope I’ve convinced you that dynamic scope is easily abused and has a lot of unintended consequences.

Footnotes:

1 Technically, a Var is visible only after the point at which it was defined. This is significant with regard to the order of definitions, but Vars are still globally visible once they have been defined.

2 CLtL has this note embedded in the chapter on scope and extent: “Behind the assertion that dynamic extents nest properly is the assumption that there is only a single program or process. Common Lisp does not address the problems of multiprogramming (timesharing) or multiprocessing (more than one active processor) within a single Lisp environment.” Modern Common Lisp implementations have added multi-threading, but it remains absent from the language specification.

3 You can use bound-fn to capture the bindings and pass them to another thread, but you still have the problem that the resource may be destroyed before the other thread is finished with it.

4 Not recommended, to be discussed in a future post.

5 There’s some caching of connection objects under the hood, but this is not relevant to the consumer.

Affordance and Concision

Quick, Clojure programmers, what does the following expression do?

(get x k)

If you answered, It looks up the key k in an associative data structure x and returns its associated value, you’re right, but only partially.

What if x is not an associative data structure? In every released version of Clojure up to and including 1.5.0, get will return nil in that case.

Is that a bug or a feature? It can certainly lead to some hard-to-find bugs, such as this one which I’ve often found in my own code:

(def person (ref {:name "Stuart" :job "Programmer"}))

(get person :name)
;;=> nil

Spot the bug? person is not a map but rather a Ref whose state is a map. I should have written (get @person :name). One character between triumph and defeat! To make matters worse, that nil might not show up until it triggers a NullPointerException several pages of code later.

It turns out that several core functions in Clojure behave this way: if called on an object which does not implement the correct interface, they return nil rather than throwing an exception.

The contains? function is a more bothersome example. Not only is the name difficult to remember — it’s an associative function that checks for keys, not a linear search of values like java.util.Collection#contains — but it also returns nil on functions which do not implement clojure.lang.Associative. Or at least it did up through Clojure 1.4.0. I submitted a patch (CLJ-932), included in Clojure 1.5.0, which changed contains? to throw an exception instead.[1]

I submitted a similar patch (CLJ-1107) to do the same thing for get, but not in time for consideration in the 1.5.0 release.

A few weeks later, I was writing some code that looked like this:

(defn my-type [x]
  (or (get x :my-namespace/type)
      (get (meta x) :my-namespace/type)
      (get x :type)
      (clojure.core/type x)))

I wanted a flexible definition of “type” which worked on maps or records with different possible keys, falling back on the clojure.core/type function, which looks for a :type key in metadata before falling back to clojure.core/class.

Before the patch to get in CLJ-1107, this code works perfectly well. After the patch, it won’t. I would have to write this instead:

(defn my-type [x]
  (or (when (associative? x)
        (get x :my-namespace/type))
      (get (meta x) :my-namespace/type)
      (when (associative? x)
        (get x :type))
      (clojure.core/type x)))

But wait! The meta function also returns nil for objects which do not support metadata. Maybe that should be “fixed” too. Then I would have to write this:

(defn my-type [x]
  (or (when (associative? x)
        (get x :my-namespace/type))
      (when (instance? x clojure.lang.IMeta)
        (get (meta x) :my-namespace/type))
      (when (associative? x)
        (get x :type))
      (clojure.core/type x)))

And so on.

Every language decision means trade-offs. Clojure accepts nil as a logical false value in boolean contexts, like Common Lisp (and also many scripting languages). This “nil punning” enables a concise style in which nil stands in for an empty collection or missing data.[2] For example, Clojure 1.5.0 introduces two new macros some-> and some->>, which keep evaluating expressions until one of them returns nil.

Is Clojure’s get wrong? It depends on what you think get should mean. If you’re a fan of more strictly-typed functional languages you might think get should be defined to return an instance of the Maybe monad:

;; made-up syntax:
get [Associative⟨K,V⟩, K] → Maybe⟨V⟩

You can implement the Maybe monad in Clojure, but there’s less motivation to do so without the support of a static type checker. You could also argue that, since Clojure is dynamically-typed, get can have a more general type:

;; made-up syntax:
get [Any, Any] → Any | nil

This latter definition is effectively the type of get in Clojure right now.

Which form is better is a matter of taste. What I do know is that the current behavior of get doesn’t give much affordance to a Clojure programmer, even an experienced one.[3]

Again, tradeoffs. Clojure’s definition of get is flexible but can lead to subtle bugs. The stricter version would be safer but less flexible.

An even stricter version of get would throw an exception if the key is not present instead of returning nil. Sometimes that’s what you want. The Simulant testing framework defines a utility function getx that does just that.

Over the past five years, Rich Hickey has gradually led Clojure in the direction of “fast but correct by default.” This is particularly evident in the numeric primitives since release 1.3.0, which throw an exception on overflow (correct) but do not automatically promote from fixed- to arbitrary-precision types (slow).

I believe the change to get in CLJ-1107 will ultimately be more help than hindrance. But it might also be useful to have a function which retains the “more dynamic” behavior. We might call it get' in the manner of the auto-promoting arithmetic functions such as +'. Or perhaps, with some cleverness, we could define a higher order function that transforms any function into a function that returns nil when called on a type it does not support. This would be similar in spirit to fnil but harder to define.[4]

Update #1: changed (instance? x clojure.lang.Associative) to (associative? x), suggested by Luke VanderHart.

Update #2: Some readers have pointed out that I could make my-type polymorphic, thereby avoiding the conditional checks. But that would be even longer and, in my opinion, more complicated than the conditional version. The get function is already polymorphic, a fact which I exploited in the original definition of my-type. It’s a contrived example anyway, not a cogent design.

Footnotes:

[1] We can’t do anything about the name of contains? without breaking a lot more code. This change, at least, is unlikely to break any code that wasn’t already broken.

[2] There’s a cute poem about nil-punning in Common Lisp versus Scheme or T.

[3] I am slightly abusing the definition of affordance here, but I think it works to convey what I mean: the implementation of get in the Clojure runtime does not help me to write my code correctly.

[4] I don’t actually know how to do it without catching IllegalArgumentException, which would be bad for performance and potentially too broad. Left as an exercise for the reader!

A Brief Rant About Versioning

Version numbers are meaningless. By that, I mean they convey no useful information. Oh sure, there are conventions: major.minor.patch, even/odd for stable/development versions, and designations like release candidate. But they’re just conventions. Version numbers are chosen by people, so they are subject to all the idiosyncrasies and whims of individuals.

Semantic Versioning, you say? Pshaw. Nobody does semantic versioning. If they did, we’d see dozens of libraries and applications with major-version numbers in the double or triple digits. It’s almost impossible to change software without breaking something. Even a change which is technically a bugfix can easily break a downstream consumer that relied, intentionally or not, on the buggy behavior.

That’s not to say you shouldn’t try to follow semantic versioning. It’s a good idea, and even its author admits that some versioning decisions boil down to Use your best judgment.

The trouble with semantic versioning is that everyone want others to follow it, but no one wants to follow it themselves. Everyone thinks there’s room for one more quick fix, or this change isn’t big enough to warrant a major-version bump, or simply my project is special. It’s a slippery slope from there to redesigning your entire API between versions 2.7.4-RC13 and 2.7.4-RC14.

Everybody does it. I could name names, but that would be redundant. I’m not sitting in a glass house here, either. I caught major flack for breaking the API of a JSON parser – a JSON parser! – between two 0.x releases. People don’t like change, even improvements, if it means the tiniest bit more work for them. Even if the new API is cleaner and more logical, even if you change things that were never explicitly promised by the old API, there will be grumbles and calls for your resignation. It’s enough to make you want to stop releasing things altogether, or to throw up your hands and just number all your releases sequentially, or to go totally off the reservation a have your version numbers converge towards an irrational constant.

Did I mention this was a rant? Please don’t take it too seriously.

The Reluctant Dictator

I have a confession to make. I’m bad at open-source. Not writing the code. I’m pretty good at that. I can even write pretty good documentation. I’m bad at all the rest: patches, mailing lists, chat rooms, bug reports, and anything else that might fall under the heading of “community.” I’m more than bad at it: I don’t like doing it and generally try to avoid it.

I write software to scratch an itch. I release it as open-source in the vague hope that someone else might find it useful. But once I’ve scratched the itch, I’m no longer interested. I don’t want to found a “community” or try to herd a bunch of belligerent, independent-minded cats. I’m not in it for the money. I’m not even in it for the fame and recognition. (OK, maybe a little bit for the fame.)

But this age of “social” insists that everything be a community. Deoderant brands beg us to “like” their Facebook pages and advertising campaigns come accesorized with Twitter hash tags. In software, you can’t just release a bit of code as open-source. You have to create a Google Group and a blog and an IRC channel and a novelty Twitter account too.

The infrastructure of “social coding” has codified this trend into an expectation that every piece of open-source software participate in a world-wide collaboration / popularity contest. The only feature of GitHub that can’t be turned off is the pull request.

Don’t get me wrong, I love GitHub and use it every day. On work projects, I find pull requests to be an efficient tool for doing code reviews. GitHub’s collaboration tools are great when you’re only trying to collaborate with a handful of people, all of whom are working towards a common, mutually-understood goal.

But when it comes to open-source work, I use GitHub primarily as a hosting platform.[1] I put code on GitHub because I want people to be able to find it, and use it if it helps them. I want them to fork it, fix it, and improve it. But I don’t want to be bothered with it. If you added something new to my code, great! It’s open-source – have at it!

I’m puzzled by people who write to me saying, “If I were to write a patch for your library X to make it do Y, would you accept it?” First of all, you don’t need my or anybody else’s permission to modify my code. That’s the whole point of open-source! Secondly, how can I decide whether or not I’ll accept a patch I haven’t seen yet? Finally, if you do decide to send me a pull request, please don’t be offended if I don’t accept it, or if I ignore it for six months and then take the idea and rewrite it myself.

Why didn’t I accept your pull request? Not because I want to hog all the glory for myself. Not because I want to keep you out of my exclusive open-source masters’ club. Not even because I can find any technical fault with your implementation. I’ve just got other things to do, other itches to scratch.

If everyone thought that way, would open-source still work? Probably. Maybe not as well.

To be sure, there’s a big difference between one-off utilities written in a weekend and major projects sustained for years by well-funded organizations. Managing a world-wide collaborative open-source project is a full-time job. The benevolent-dictator-for-life needs an equally-benevolent corporate-sponsor-for-life.[2] You can’t expect the same kind of support from individuals working in their spare time, for free.

I sometimes dream of an open-source collaboration model that is truly pull-based instead of GitHub’s they-should-have-called-it-push request. I don’t want to be forced to look at anything on any particular schedule. Don’t give me “notifications” or send me email. Instead, and only when I ask for it, allow me to browse the network of forks spawned by my code. Let me see who copied it, how they used it, and how they modified it. Be explicit about who owns the modifications and under what terms I can copy them back into my own project. And not just direct forks — show me all the places where my code was copied-and-pasted too.

Imagine if you could free open-source developers from all the time spent on mailing lists, IRC, bug trackers, wikis, pull requests, comment threads, and patches and channel all that energy into solving problems. Who knows? We might even solve the hard problems, like dependency management.

Update Jan 17, 8:52am EST: I should mention that I have nothing but admiration and respect for people who are good at the organizational/community aspects of open-source software. I’m just not one of them.

Footnotes:

[1] I’m not the only one. Linus Torvalds famously pointed out flaws in the GitHub pull-request model, in particular its poor support for more rigorous submission/signoff processes.

[2] Even with a cushy corporate sponsor, accepting patches is a far more work than the authors of those patches typically realize. See The story with #guava and your patches.

Playing the Obstacle

When I was in acting school (yes, I was in acting school, see my bio) one of my teachers had an expression: playing the obstacle. When studying for a role, one of an actor’s most important jobs is to determine the character’s overall objective: What’s my motivation? The plot of any play or movie typically centers around how the character overcomes obstacles to achieve that objective.

What my teacher had noticed was a tendancy of young actors to focus too much on the obstacles themselves, attempting to build characters out of what they can’t do rather than what they want to do.

I think there’s a similar tendency in programmers. We start out with a clear objective, but when we encounter an obstacle to that objective we obsess over it. How many times has a programmer said, “I wanted to do X, but I couldn’t because Y got in the way,” followed by a 10-minute rant about how much language / framework / library / tool Y sucks? That’s playing the obstacle.

If you’re lucky enough to make software that real people (not programmers) actually use, then Y is irrelevant. No one cares how many ugly hacks you had to put in to make Y do something it wasn’t quite designed to do. All that matters is X.

Clojure 2012 Year in Review

I signed off my Clojure 2011 Year in Review with the words You ain’t seen nothing yet. Coming back for 2012, all I can think of is Wow, what a year! I’m happy to say that Clojure in 2012 exceeded even my wildest dreams.

2012 was the year when Clojure grew up. It lost the squeaky voice of adolescence and gained the confident baritone of a professional language. The industry as a whole took notice, and people started making serious commitments to Clojure in both time and money.

There was so much Clojure news in 2012 that I can’t even begin to cover it all. I’m sure I’ve missed scores of important and exciting projects. But here are the ones that came to mind:

Growth & Industry Mindshare

The Language

Software & Tools

  • The big news, of course, was the release of Datomic, a radical new database from Rich Hickey and Relevance, in March. Codeq, a new way to look at source code repositories, followed in October.

  • Light Table, a new IDE oriented towards Clojure, rocketed to over $300,000 in pledges on Kickstarter and entered the Summer 2012 cohort of YCombinator.

  • Speaking of tooling, what a bounty! Leiningen got a major new version, as did nREPL and tools.namespace. Emacs users finally escaped the Common Lisp SLIME with nrepl.el.

  • Red Hat’s Immutant became the first comprehensive application server for Clojure.

  • ClojureScript One demonstrated techniques for building applications in ClojureScript.

Blogs and ‘Casts

 

I have no idea what 2013 is going to bring. But if I were to venture a guess, I’d say it’s going to be a fantastic time to be working in Clojure.

When (Not) to Write a Macro

The Solution in Search of a Problem

A few months ago I wrote an article called Syntactic Pipelines, about a style of programming (in Clojure) in which each function takes and returns a map with similar structure:

(defn subprocess-one [data]
  (let [{:keys [alpha beta]} data]
    (-> data
        (assoc :epsilon (compute-epsilon alpha))
        (update-in [:gamma] merge (compute-gamma beta)))))

;; ...

(defn large-process [input]
  (-> input
      subprocess-one
      subprocess-two
      subprocess-three))

In that article, I defined a pair of macros that allow the preceding example to be written like this:

(defpipe subprocess-one [alpha beta]
  (return (:set :epsilon (compute-epsilon alpha))
          (:update :gamma merge (compute-gamma beta))))

(defpipeline large-process
  subprocess-one
  subprocess-two
  subprocess-three)

I wanted to demonstrate the possibilities of using macros to build abstractions out of common syntactic patterns. My example, however, was poorly chosen.

The Problem with the Solution

Every choice we make while programming has an associated cost. In the case of macros, that cost is usually borne by the person reading or maintaining the code.

In the case of defpipe, the poor sap stuck maintaining my code (maybe my future self!) has to know that it defines a function that takes a single map argument, despite the fact that it looks like a function that takes multiple arguments. That’s readily apparent if you read the docstring, but the docstring still has to be read and understood before the code makes sense.

The return macro is even worse. First of all, the fact that return is only usable within defpipe hints at some hidden coupling between the two, which is exactly what it is. Secondly, the word return is commonly understood to mean an immediate exit from a function. Clojure does not support non-tail function returns, and my macro does not add them, so the name return is confusing.

Using return correctly requires that the user first understand the defpipe macro, then understand the “mini language” I have created in the body of return, and also know that return only works in tail position inside of defpipe.

Is it Worth It?

Confusion, lack of clarity, and time spent reading docs: Those are the costs. The benefits are comparatively meager. Using the macros, my example is shorter by a couple of lines, one let, and some destructuring.

In short, the costs outweigh the benefits. Code using the defpipe macro is actually worse than code without the macro because it requires more effort to read. That’s not to say that the pipeline pattern I’ve described isn’t useful: It is. But my macros haven’t improved on that pattern enough to be worth their cost.

That’s the crux of the argument about macros. Whenever you think about writing one, ask yourself, “Is it worth it?” Is the benefit provided by the macro – in brevity, clarity, or power – worth the cost, in time, for you or someone else to understand it later? If the answer is anything but a resounding “yes” then you probably shouldn’t be writing a macro.

Of course, the same question can (and should) be asked of any code we write. Macros are a special case because they are so powerful that the cost of maintaining them is higher than that of “normal” code. Functions and values have semantics that are specified by the language and universally understood; macros can define their own languages. Buyer beware.

I still got some value out of the original post as an intellectual exercise, but it’s not something I’m going to put to use in my production code.

Why I’m Using ClojureScript

Elise Huard wrote about why she’s not using ClojureScript. To quote her essential point, “The browser doesn’t speak clojure, it speaks javascript.”

This is true. But the CPU doesn’t speak Clojure either, or JavaScript. This argument against ClojureScript is similar to arguments made against any high-level language which compiles down to a lower-level representation. Once upon a time, I feel sure, the same argument was made against FORTRAN.

A new high-level language has to overcome a period of skepticism from those who are already comfortable programming in the lower-level representation. A young compiler struggles to produce code as efficient as that hand-optimized by an expert. But compilers tend to get better over time, and some smart folks are working hard on making ClojureScript fast. ClojureScript applications can get the benefit of improvements in the compiler without changing their source code, just as Clojure applications benefit from years of JVM optimizations.

To address Huard’s other points in order:

1. Compiled ClojureScript code is hard to read, therefore hard to debug.

This has not been an issue for me. In development mode (no optimizations, with pretty-printing) ClojureScript compiles to JavaScript which is, in my opinion, fairly readable. Admittedly, I know Clojure much better than I know JavaScript. The greater challenge for me has been working with the highly-dynamic nature of JavaScript execution in the browser. For example, a function called with the wrong number of arguments will not trigger an immediate error. Perhaps ClojureScript can evolve to catch more of these errors at compile time.

2. ClojureScript forces the inclusion of the Google Closure Library.

This is mitigated by the Google Closure Compiler‘s dead-code elimination and aggressive space optimizations. You only pay, in download size, for what you use. For example, jQuery 1.7.2 is 33K, minified and gzipped. Caching doesn’t always save you. “Hello World” in ClojureScript, optimized and gzipped, is 18K.

3. Hand-tuning performance is harder in a higher-level language.

This is true, as per my comments above about high-level languages. Again, this has not been an issue for me, but you can always “drop down” to JavaScript for specialized optimizations.

4. Cross-browser compatibility is hard.

This is, as Huard admits, unavoidable in any language. The Google Closure Libraries help with some of the basics, and ClojureScript libraries such as Domina are evolving to deal with other browser-compatibility issues. You also have the entire world of JavaScript libraries to paper over browser incompatibilities.

* * *

Overall, I think I would agree with Elise Huard when it comes to browser programming “in the small.” If you just want to add some dynamic behavior to an HTML form, then ClojureScript has little advantage over straight JavaScript, jQuery, and whatever other libraries you favor.

What ClojureScript allows you to do is tackle browser-based programming “in the large.” I’ve found it quite rewarding to develop entire applications in ClojureScript, something I would have been reluctant to attempt in JavaScript.

It’s partially a matter of taste and familiarity. Clojure programmers such as myself will likely prefer ClojureScript over JavaScript. Experienced JavaScript programmers will have less to gain — and more work to do, learning a new language — by adopting ClojureScript. JavaScript is indeed “good enough” for a lot of applications, which means ClojureScript has to work even harder to prove its worth. I still believe that ClojureScript has an edge over JavaScript in the long run, but that edge will be less immediately obvious than the advantage that, say, Clojure on the JVM has over Java.

Syntactic Pipelines

Lately I’ve been thinking about Clojure programs written in this “threaded” or “pipelined” style:

(defn large-process [input]
  (-> input
      subprocess-one
      subprocess-two
      subprocess-three))

If you saw my talk at Clojure/West (video forthcoming) this should look familiar. The value being “threaded” by the -> macro from one subprocess- function to the next is usually a map, and each subprocess can add, remove, or update keys in the map. A typical subprocess function might look something like this:

(defn subprocess-two [data]
  (let [{:keys [alpha beta]} data]
    (-> data
        (assoc :epsilon (compute-epsilon alpha))
        (update-in [:gamma] merge (compute-gamma beta)))))

Most subprocess functions, therefore, have a similar structure: they begin by destructuring the input map and end by performing updates to that same map.

This style of programming tends to produce slightly longer code than would be obtained by writing larger functions with let bindings for intermediate values, but it has some advantages. The structure is immediately apparent: someone reading the code can get a high-level overview of what the code does simply by looking at the outer-most function, which, due to the single-pass design of Clojure’s compiler, will always be at the bottom of a file. It’s also easy to insert new functions into the process: as long as they accept and return a map with the same structure, they will not interfere with the existing functions.

The only problem with this code from a readability standpoint is the visual clutter of repeatedly destructuring and updating the same map. (It’s possible to move the destructuring into the function argument vector, but it’s still messy.)

defpipe

What if we could clean up the syntax without changing the behavior? That’s exactly what macros are good for. Here’s a first attempt:

(defmacro defpipe [name argv & body]
  `(defn ~name [arg#]
     (let [{:keys ~argv} arg#]
       ~@body)))
(macroexpand-1 '(defpipe foo [a b c] ...))
;;=> (clojure.core/defn foo [arg_47_auto]
;;     (clojure.core/let [{:keys [a b c]} arg_47_auto] ...))

That doesn’t quite work: we’ve eliminated the :keys destructuring, but lost the original input map.

return

What if we make a second macro specifically for updating the input map?

(def ^:private pipe-arg (gensym "pipeline-argument"))

(defmacro defpipe [name argv & body]
  `(defn ~name [~pipe-arg]
     (let [{:keys ~argv} ~pipe-arg]
       ~@body)))

(defn- return-clause [spec]
  (let [[command sym & body] spec]
    (case command
      :update `(update-in [~(keyword (name sym))] ~@body)
      :set    `(assoc ~(keyword (name sym)) ~@body)
      :remove `(dissoc ~(keyword (name sym)) ~@body)
      body)))

(defmacro return [& specs]
  `(-> ~pipe-arg
       ~@(map return-clause specs)))

This requires some more explanation. The return macro works in tandem with defpipe, and provides a mini-language for threading the input map through a series of transformations. So it can be used like this:

(defpipe foo [a b]
  (return (:update a + 10)
          (:remove b)
          (:set c a)))

;; which expands to:
(defn foo [input]
  (let [{:keys [a b]} input]
    (-> input
        (update-in [:a] + 10)
        (dissoc :b)
        (assoc :c a))))

As a fallback, we can put any old expression inside the return, and it will be just as if we had used it in the -> macro. The rest of the code inside defpipe, before return, is a normal function body. The return can appear anywhere inside defpipe, as long as it is in tail position.

The symbol used for the input argument has to be the same in both defpipe and return, so we define it once and use it again. This is safe because that symbol is not exposed anywhere else, and the gensym ensures that it is unique.

defpipeline

Now that we have the defpipe macro, it’s trivial to add another macro for defining the composition of functions created with defpipe:

(defmacro defpipeline [name & body]
  `(defn ~name [arg#]
     (-> arg# ~@body)))

This macro does so little that I debated whether or not to include it. The only thing it eliminates is the argument name. But I like the way it expresses intent: a pipeline is purely the composition of defpipe functions.

Further Possibilities

One flaw in the “pipeline” style is that it cannot express conditional logic in the middle of a pipeline. Some might say this is a feature: the whole point of the pipeline is that it defines a single thread of execution. But I’m toying with the idea of adding syntax for predicate dispatch within a pipeline, something like this:

(defpipeline name
  pipe1
  ;; Map signifies a conditional branch:
  {predicate-a pipe-a
   predicate-b pipe-b
   :else       pipe-c}
  ;; Regular pipeline execution follows:
  pipe2
  pipe3)

The Whole Shebang

The complete implementation follows. I’ve added doc strings, metadata, and some helper functions to parse the arguments to defpipe and defpipeline in the same style as defn.

(def ^:private pipe-arg (gensym "pipeline-argument"))

(defn- req
  "Required argument"
  [pred spec message]
  (assert (pred (first spec))
          (str message " : " (pr-str (first spec))))
  [(first spec) (rest spec)])

(defn- opt
  "Optional argument"
  [pred spec]
  (if (pred (first spec))
    [(list (first spec)) (rest spec)]
    [nil spec]))

(defmacro defpipeline [name & spec]
  (let [[docstring spec] (opt string? spec)
        [attr-map spec] (opt map? spec)]
    `(defn ~name 
       ~@docstring
       ~@attr-map
       [arg#]
       (-> arg# ~@spec))))

(defmacro defpipe
  "Defines a function which takes one argument, a map. The params are
  symbols, which will be bound to values from the map as by :keys
  destructuring. In any tail position of the body, use the 'return'
  macro to update and return the input map."
  [name & spec]
  {:arglists '([name doc-string? attr-map? [params*] & body])}
  (let [[docstring spec] (opt string? spec)
        [attr-map spec] (opt map? spec)
        [argv spec] (req vector? spec "Should be a vector")]
    (assert (every? symbol? argv)
            (str "Should be a vector of symbols : "
                 (pr-str argv)))
    `(defn ~name
       ~@docstring
       ~@attr-map
       [~pipe-arg]
       (let [{:keys ~argv} ~pipe-arg]
         ~@spec))))

(defn- return-clause [spec]
  (let [[command sym & body] spec]
    (case command
      :update `(update-in [~(keyword (name sym))] ~@body)
      :set    `(assoc ~(keyword (name sym)) ~@body)
      :remove `(dissoc ~(keyword (name sym)) ~@body)
      body)))

(defmacro return
  "Within the body of the defpipe macro, returns the input argument of
  the defpipe function. Must be in tail position. The input argument,
  a map, is threaded through exprs as by the -> macro.

  Expressions within the 'return' macro may take one of the following
  forms:

      (:set key value)      ; like (assoc :key value)
      (:remove key)         ; like (dissoc :key)
      (:update key f args*) ; like (update-in [:key] f args*)

  Optionally, any other expression may be used: the input map will be
  inserted as its first argument."
  [& exprs]
  `(-> ~pipe-arg
       ~@(map return-clause exprs)))

And a Made-Up Example

(defpipe setup []
  (return  ; imagine these come from a database
   (:set alpha 4)
   (:set beta 3)))

(defpipe compute-step1 [alpha beta]
  (return (:set delta (+ alpha beta))))

(defpipe compute-step2 [delta]
  (return
   (assoc-in [:x :y] 42)  ; ordinary function expression
   (:update delta * 2)
   (:set gamma (+ delta 100))))  ; uses old value of delta

(defpipe respond [alpha beta gamma delta]
  (println " Alpha is" alpha "\n"
           "Beta is" beta "\n"
           "Delta is" delta "\n"
           "Gamma is" gamma)
  (return)) ; not strictly necessary, but a good idea

(defpipeline compute
  compute-step1
  compute-step2)

(defpipeline process-request
  setup
  compute
  respond)
(process-request {})

;; Alpha is 4 
;; Beta is 3 
;; Delta is 14 
;; Gamma is 107

;;=> {:gamma 107, :delta 14, :beta 3, :alpha 4}