Clojure Don’ts: Non-Polymorphism

Polymorphism is a powerful feature. The purpose of polymorphism is to provide a single, consistent interface to a caller. There may be multiple ways to carry out that behavior, but the caller doesn’t need to know that. When you call a polymorphic function, you remain blissfully ignorant of (and therefore decoupled from) which method will actually run.

Don’t use polymorphism where it doesn’t exist.

All too often, I see protocols or multimethods used in cases where the caller does know which method is going to be called; where it is completely, 100% unambiguous, at every call site, which method will run.

As a contrived example, say we have this protocol with two record implementations:

(defprotocol Blerg
  (blerg [this]))

(defrecord Foo []
  Blerg
  (blerg [this]
    ;; ... do Foo stuff ...
    ))

(defrecord Bar []
  Blerg
  (blerg [this]
    ;; ... do Bar stuff ...
    ))

Then, elsewhere in the code, we have some uses of that protocol:

(defn process-foo [x]
  ;; ...
  (blerg x)  ; I know x is always a Foo
  ;; ...
  )
(defn process-bar [x]
  ;; ...
  (blerg x)  ; I know x is always a Bar
  ;; ...
  )

If you know which method will be called, it’s easy to fall into the trap of depending on that specific behavior. Now you’ve broken the abstraction barrier the protocol was meant to provide.

(defn process-bar [x]
  ;; ...
  (blerg x)  ; I know x is always a Bar

  ;; ... do something that relies on
  ;;     Bar's blerg having been called ...
  )

Code like this is already tightly coupled, which isn’t necessarily a problem. The problem is that the coupling is hidden behind the implied decoupling of a protocol. That’s going to lead to bugs sooner or later.

Instead, write ordinary functions with distinct names and let the caller use the appropriate one.

(defn blerg-foo [foo]
  ;; ... do foo stuff ...
  )

(defn blerg-bar [bar]
  ;; ... do bar stuff ...
  )

(defn process-foo [x]
  ;; ...
  (blerg-foo x)
  ;; ...
  )

(defn process-bar [x]
  ;; ...
  (blerg-bar x)
  ;; ...
  )

Remember the Liskov Substitution Principle: If you cannot substitute one implementation of a protocol for another, it’s not a good abstraction.

This post is part of my Clojure Do’s and Don’ts series.

How to ns

Quick link: Stuart’s ns Style Guide

Everyone has their own personal quirks when it comes to syntax. No matter how hard you try to lock it down with code review, IDEs, scripts, or check-in hooks, individual differences will emerge.

In Clojure the situation is generally pretty stable: most people follow the same general patterns, which are implemented fairly consistently across editors and IDEs.

With one exception: the ns macro at the top of every file.

The original implementation of the ns macro in Clojure was short, simple, and effective. It was also spectacularly over-generalized. ns will take almost any combination of symbols, keywords, vectors, and lists and find something to evaluate.

There’s a spec of sorts in the docstring, but of course nobody reads that.

The laxness of the ns implementation was a constant thorn in my side as I worked on tools.namespace. Now it’s causing more headaches as macro specs introduced in Clojure 1.9.0-alpha11 uncover a bevy of bad syntax in libraries.

I’ll admit to having my own syntactic quirks when it comes to ns, but I make an effort to be consistent. After years of collecting preferences, I finally decided to write it all down.

So now you can read Stuart’s Opinionated Style Guide for Clojure Namespace Declarations and link to it during your next syntactic flamewar.

How to Name Clojure Functions

This is a guide on naming Clojure functions. There are exceptions to every rule. When you’re defining something based on natural language, there are more exceptions than rules. I break these rules more often than I follow them. This guide is just a starting point for thinking about how to name things.

Pure functions

Pure functions which return values are named with nouns describing the value they return.

If I have a function to compute a user’s age based on their birthdate, it is called age, not calculate-age or get-age.

Think of the definition: a pure function is one which can be replaced with its value without affecting the result. So why not make that evident in the name?

This is particularly good for constructors and accessors. No need to clutter up your function names with meaningless prefixes like get- and make-.

Don’t repeat the name of the namespace

Function names should not repeat the name of the namespace.

(ns products)

;; Bad, redundant:
(defn product-price [product]
  ;; ...
  )

;; Good:
(defn price [product]
  ;; ...
  )

Assume that consumers of a function will use it with a namespace alias.

Conversions and coercions

I don’t much like -> arrows in function names, and I try to avoid them.

If the function is a coercion, that is, it is meant to convert any of several input types into the desired output type, then name it for the output type. For example, in clojure.java.io the functions file, reader, and writer are all coercions.

If there are different functions for different input types, then each one is a conversion. In that case, use input-type->output-type names.

Functions with side effects

Functions which have side-effects are named with verbs describing what they do.

Constructor functions with side-effects, such as adding a record to a database, have names starting with create-. (I borrowed this idea from Stuart Halloway’s Datomic tutorials.)

Functions which perform side-effects to retrieve some information (e.g. query a web service) have names starting with get-.

For words which could be either nouns or verbs, assume noun by default then add words to make verb phrases. E.g. message constructs a new object representing a message, send-message transmits it.

I don’t use the exclamation-mark convention (e.g. swap!) much. Different people use it to mean different things (side effect, state change, transaction-unsafe) so the meaning is vague at best. If I do use an exclamation mark, it’s to signal a change to a mutable reference, not other side-effects such as I/O.

Local name clashes

One problem I find is in let blocks, when the obvious name for a local is the same as the function which computes it. If you’re not careful, this can lead to clashes:

(defn shipping-label
  "Returns a new label to ship product to customer."
  [customer product]
  (let [address (address customer)
        weight (weight product)
        supplier (supplier product)]
    {:from (address supplier)  ; oops, 'address' clashes!
     :to address
     :weight weight}))

This is less of a problem when the functions are defined in a different namespace and referenced via an alias:

(defn shipping-label
  "Returns a new label to ship product to customer."
  [customer product]
  (let [address (mailing/address customer)
        weight (product/weight product)
        supplier (product/supplier product)]
    {:from (mailing/address supplier)  ; OK
     :to address
     :weight weight}))

If name-clashes become a problem, add prefixes to the function names, new- for constructors and get- for accessors. If you are bothered that this contradicts the previous section, re-read the first paragraph of this article.

Function returning functions

In general, I try to avoid defining top-level functions which return functions if I can write make the intent clearer using anonymous functions instead.

For example, writing something like this makes me feel clever:

(defn foo
  "Returns a function to compute foo of value."
  [option]
  (fn [value]
    ;;... do stuff with value ...
    ))

(defn computation
  "Does stuff with values."
  [option values]
  (->> values
       (map (foo option))  ; look at me!
       ;; ...
       ))

But it’s easier for someone else to read when the closure is created close to where it’s used:

(defn foo
  "Returns the foo of value"
  [value option]
  ;; ... 
  )

(defn computation [option values]
  (->> values
       (map #(foo % option))  ; I see what this does
       ;; ...
       ))

I allow an exception to this rule when returning functions is part of a repeated pattern. For example, the transducer versions of map, filter, and other sequence functions all return functions, but that’s a standard part of the language since Clojure 1.7 so users can be expected to know about it. Occasionally I discover a similar pattern in my own code.

When functions returning functions are not part of a repeated pattern but for some reason I want them anyway, I call them out with a suffix -fn, like:

(defn foo-fn
  "Returns a function to compute the foo of a value."
  [param]
  (fn [value]
    ;; ...
    ))

Clojure Don’ts: Lazy Effects

This is probably my number one Clojure Don’t.

Laziness is often useful. It allows you to express “infinite” computations, and only pay for as much of the computation as you need.

Laziness also allows you to express computations without specifying when they should happen. And that’s a problem when you add side-effects.

By definition, a side-effect is something that changes the world outside your program. You almost certainly want it to happen at a specific time. Laziness takes away your control of when things happen.

So the rule is simple: Never mix side effects with lazy operations.

For example, if you need to do something to every element in a collection, you might reach for map. If thing you’re doing is a pure function, that’s fine. But if the thing you’re doing has side effects, map can lead to very unexpected results.

For example, this is a common new-to-Clojure mistake:

(take 5 (map prn (range 10)))

which prints

0
1
2
3
4
5
6
7
8
9

This is the old “chunked sequence” conundrum. Like many other lazy sequence functions, map has an optimization which allows it to evaluate batches of 32 elements at a time.

Then there’s the issue of lazy sequences not being evaluated at all. For example:

(do (map prn [0 1 2 3 4 5 6 7 8 9 10])
    (println "Hello, world!"))

which prints only:

Hello, world!

You might get the advice that you can “force” a lazy sequence to be evaluated with doall or dorun. There are also snippets floating around that purport to “unchunk” a sequence.

In my opinion, the presence of doall, dorun, or even “unchunk” is almost always a sign that something never should have been a lazy sequence in the first place.

Only use pure functions with the lazy sequence operations like map, filter, take-while, etc. When you need side effects, use one of these alternatives:

  • doseq: good default choice, clearly indicates side effects
  • run!: new in Clojure 1.7, can take the place of (dorun (map ...))
  • reduce, transduce, or something built on them

The last requires some more explanation. reduce and transduce are both non-lazy ways of consuming sequences or collections. As such, they are technically safe to use with side-effecting operations.

For example, this composition of take and map:

(transduce (comp (take 5)
                 (map prn))
           conj
           []
           (range 10))

only prints 5 elements of the sequence, as requested:

0
1
2
3
4

The single-argument version of map returns a transducer which calls its function once for each element. The map transducer can’t control when the function gets evaluated — that’s in the hands of transduce, which is eager (non-lazy). The single-argument take limits the reduction to the first five elements.

As a general rule, I would not recommend using side-effecting operations in transducers. But if you know that the transducer will be used only in non-lazy operations — such as transduce, run!, or into — then it may be convenient.

(defn operation [input]
  ;; do something with input, return result
  (str "Result for " input))

(prn (into #{}
           (comp (take 3)
                 (map operation))
           (range 100)))

reduce, transduce, and into are useful when you need to collect the return value of the side-effecting operation.

Clojure Don’ts: Redundant map

Today’s Clojure Don’t is the opposite side of the coin to the heisenparameter.

If you have an operation on a single object, you don’t need to define another version just to operate on a collection of those objects.

That is, if you have a function like this:

(defn process-thing [thing]
  ;; process one thing
  )

There is no reason to also write this:

(defn process-many-things [things]
  (map process-thing things))

The idiom “map a function over a collection” is so universal that any Clojure programmer should be able to write it without thinking twice.

Having a separate definition for processing a group of things implies that there is something special about processing a group instead of a single item. (For example, a more efficient batch implementation.) If that’s the case, then by all means write the batch version as well. But if not, then a function like process-many-things just clutters up your code while providing no benefit.

Clojure Don’ts: Single-branch if

A short Clojure don’t for today. This one is my style preference.

You have a single expression which should run if a condition is true, otherwise return nil.

Most Clojure programmers would probably write this:

(when (condition? ...)
  (then-expression ...))

But you could also write this:

(if (condition? ...)
  (then-expression ...)
  nil)

Or even this, because the “else” branch of if defaults to nil:

(if (condition? ...)
  (then-expression ...))

There’s an argument to be made for any one of these.

The second variant, if ... nil, makes it very explicit that you want to return nil. The nil might be semantically meaningful in this context instead of just a “default” value.

Some people like the third variant, if with no “else” branch, because they think when is only for side-effects, leaving the single-branch if for “pure” code.

But for me it comes down, as usual, to readability.

The vast majority of the time, if contains both “then” and “else” expressions.

Sometimes a long “then” branch leaves the “else” branch dangling below it. I’m expecting this, so when I read an if my eyes automatically scan down to find the “else” branch.

If I see an if but don’t find an “else” branch, I get momentarily confused. Maybe a line is missing or the code is mis-indented.

Likewise, if I see an if explicitly returing nil, it looks like a mistake because I know it could be written as when. This is a universal pattern in Clojure: lots of expressions (cond, get, some) return nil as their default case, so it’s jarring to see a literal nil as a return value.

So my preferred style is the first version. In general terms:

An if should always have both “then” and “else” branches.
Use when for a condition which should return nil in the negative case.

Clojure Don’ts: The Heisenparameter

A pattern I particularly dislike: Function parameters which may or may not be collections.

Say you have a function that does some operation on a batch of inputs:

(defn process-batch [items]
  ;; ... do some work with items ...
  )

Say further that, for this process, the fundamental unit of work is always a batch. Processing one thing is just a batch size of one.

Lots of processes are like this: I/O (arrays of bytes), database APIs (transactions of rows), and so on.

But maybe you have lots of code that mostly deals with one thing at a time, and only occasionally makes a larger batch. In the name of “convenience,” people write things like this:

(defn wrap-coll
  "Wraps argument in a vector if it is not already a collection."
  [arg]
  (if (coll? arg)
    arg
    [arg]))

(defn process
  "Processes a single input or a collection of inputs."
  [input]
  (process-batch (wrap-coll input)))

This is prevalent in dynamically-typed languages of all stripes. I think it’s a case of mistakenly choosing convenience over clarity.

This leads easily to mistakes like iterating over a collection, calling process on each element, when the same work could be done more efficiently in a batch.

Now imagine reading some code when you encounter a call to this function:

(process stuff)

Is stuff a collection or a single object? Who knows?

When you read code, there’s a kind of ad-hoc, mental type-inference going on. This is true regardless of what typing scheme your language uses. Narrowing the range of possible types something can be makes it easier to reason about what type it actually is.

The more general principle:
Be explicit about your types even when they’re dynamic.

If the operation requires a collection, then pass it a collection every time.

A “helper” like wrap-coll saves you a whopping two characters over just wrapping the argument in a literal vector, at the cost of lost clarity and specificity.


If you often forget to wrap the argument correctly, consider adding a type check:

(defn process-batch [items]
  {:pre [(coll? items)]}
  ;; ... 
  )

If there actually are two distinct operations, one for a single object and one for a batch, then they should be separate functions:

(defn process-one [item]
  ;; ...
  )

(defn process-batch [items]
  ;; ...
  )

Clojure Don’ts: Optional Arguments with Varargs

Another Clojure don’t today. This one is a personal style preference, but I’ll try to back it up.

Say you want to define a function with a mix of required and optional arguments. I’ve often seen this:

(defn foo [a & [b]]
  (println "Required argument a is" a)
  (println "Optional argument b is" b))

This is a clever trick. It works because & [b] destructures the sequence of arguments passed to the function after a. Sequential destructuring doesn’t require that the number of symbols match the number of elements in the sequence being bound. If there are more symbols than values, they are bound to nil.

(foo 3 4)
;; Required argument a is 3
;; Optional argument b is 4
;;=> nil

(foo 9)
;; Required argument a is 9
;; Optional argument b is nil
;;=> nil

I don’t like this pattern for two reasons.

One. Because it’s variable arity, the function foo accepts any number of arguments. You won’t get an error if you call it with extra arguments, they will just be silently ignored.

(foo 5 6 7 8)
;; Required argument a is 5
;; Optional argument b is 6
;;=> nil

Two. It muddles the intent. The presence of & in the parameter vector suggests that this function is meant to be variable-arity. Reading this code, I might start to wonder why. Or I might miss the & and think this function is meant to be called with a sequence as its second argument.

A couple more lines make it clearer:

(defn foo
  ([a]
   (foo a nil))
  ([a b]
   (println "Required argument a is" a)
   (println "Optional argument b is" b)))

The intent here is unambiguous: The function takes either one or two arguments, with b defaulting to nil. Trying to call it with more than two arguments will throw an exception, telling you that you did something wrong.

And one more thing: it’s faster. Variable-arity function calls have to allocate a sequence to hold the arguments, then go through apply. Timothy Baldridge did a quick performance comparison showing that calls to a function with multiple, fixed arities can be much faster than variable-arity (varargs) function calls.

Clojure Do’s: Uncaught Exceptions

Some more do’s and don’ts for you. This time it’s a ‘do.’

In the JVM, when an exception is thrown on a thread other than the main thread, and nothing is there to catch it, nothing happens. The thread dies silently.

This is bad news if you needed that thread to do some work. If all the worker threads die, the application could appear to be “up” but cease to do any useful work. And you’ll never know why.

In Clojure, this could happen on any thread you created with core.async/thread, a worker thread used by core.async/go, or a thread that was created for you by a Java framework such as a Servlet container.

One solution is to just wrap the body of every thread or go in a try/catch block. There are good reasons for doing this: you can get fine-grained control over how exceptions are handled. But it’s easy to forget, and it’s tedious to repeat if you can’t do anything useful with the exception besides log it.

So at a minimum, I recommend always including this snippet of code somewhere in the start-up procedure of your application:

;; Assuming require [clojure.tools.logging :as log]
(Thread/setDefaultUncaughtExceptionHandler
 (reify Thread$UncaughtExceptionHandler
   (uncaughtException [_ thread ex]
     (log/error ex "Uncaught exception on" (.getName thread)))))

This bit of code has saved my bacon more times than I can count.

This is a global, JVM-wide setting. There can be only one default uncaught exception handler. Individual Threads and ThreadGroups can have their own handlers, which get called in preference to the default handler. See Thread.setDefaultUncaughtExceptionHandler.

I’ve tried more aggressive measures, such as terminating the whole JVM process on any uncaught exception. While I think this is technically the correct thing to do, it turns out to be annoying in development.

Also annoying is the fact that some Java frameworks are designed to let threads fail silently. They just allocate a new thread in a pool and keep going. If your application is logging lots of uncaught exceptions but appears to be working normally, look to your container framework to see if that’s expected behavior.

The Hidden Future

Another wrinkle: exceptions inside a future are always caught by the Future. The exception will not be thrown until something calls Future.get (deref in Clojure).

Be aware that ExecutorService.submit returns a Future, so if you’re using an ExecutorService you need to make sure something is eventually going to consume that Future to surface any exceptions it might have caught.

The parent interface Executor.execute does not return a Future, so exeptions will reach the default exception handler.

Using ExecutorService.submit instead of Executor.execute was a bug in very early versions of core.async.

Record Constructors

Some more Clojure Do’s and Don’ts for you. This week: record constructors.

Don’t use interop syntax to construct records

defrecord and deftype compile into Java classes, so it is possible to construct them using Java interop syntax like this:

(defrecord Foo [a b])

(Foo. 1 2)
;;=> #user.Foo{:a 1, :b 2}

But don’t do that. Interop syntax is for interop with Java libraries.

Since Clojure version 1.3, defrecord and deftype automatically create constructor functions. Use those instead of interop syntax.

For records, you get two constructor functions: one taking the values of fields in the same order they appear in the defrecord:

(defrecord Foo [a b])

(->Foo 1 2)
;;=> #user.Foo{:a 1 :b 2}

And another taking a map whose keys are keywords with the same names as the fields:

(map->Foo {:b 4 :a 3})
;;=> #user.Foo{:a 3, :b 4}

deftype only creates the first kind of constructor, taking the field values in order.

(deftype Bar [c d])

(->Bar 5 6)
;;=> #<Bar user.Bar@2168aeae>

Constructor functions are ordinary Clojure Vars. You can pass them to higher-order functions and :require :as or :refer them into other namespaces just like any other function.

Do add your own constructor functions

You cannot modify or customize the constructor functions that defrecord and deftype create.

It’s common to want additional functionality around constructing an object, such as validation and default values. To get this, just define your own constructor function that wraps the default constructor.

(defrecord Customer [id name phone email])

(defn customer
  "Creates a new customer record."
  [{:keys [name phone email]}]
  {:pre [(string? name)
         (valid-phone? phone)
         (valid-email? email)]}
  (->Customer (next-id) name phone email))

You don’t necessarily have to use :pre conditions for validation; that’s just how I wrote this example.

It’s up to you to maintain a convention to always use your custom constructor function instead of the automatically-generated one.1

I frequently define a custom constructor function for every new record type, even if I don’t need it right away. That gives me a place to add validation later, without searching for and replacing every instance of the default constructor.

Even custom constructor functions should follow the rules for safe constructors. In general, that means no side effects and no “publishing” the object to another place before the constructor is finished. Keep the “creation” of an object (the constructor) separate from “starting” or “using” it, whatever that means for your code.

Footnotes:

1

Theoretically you could make the default constructors private with alter-meta!, but I’ve never found it necessary.