Clojure Don’ts: Concat – Digital Digressions by Stuart Sierra

Welcome to what I hope will be an ongoing series of Clojure do’s and don’ts. I want to demonstrate not just good patterns to use, but also anti-patterns to avoid.

Some of these will be personal preferences, others will be warnings from hard-won experience. I’ll try to indicate which is which.

First up: concat.

Concat, the lazily-ticking time bomb

concat is a tricky little function. The name suggests a way to combine two collections. And it is, if you have only two collections. But it’s not as general as you might think. It’s not really a collection function at all. It’s a lazy sequence function. The difference can be important.

Here’s an example that I see a lot in the wild. Say you have a loop that builds up some result collection as the concatenation of several intermediate results:¹

(defn next-results
  "Placeholder for function which computes some intermediate
  collection of results."
  [n]
  (range 1 n))

(defn build-result [n]
  (loop [counter 1
         results []]
    (if (< counter n)
      (recur (inc counter)
             (concat results (next-results counter)))
      results)))

The devilish thing about this function is that it works just fine when n is small.

(take 21 (build-result 100))
;;=> (1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6)

But when n gets sufficiently large,² suddenly this happens:

(first (build-result 4000))
;; StackOverflowError   clojure.core/seq (core.clj:133)

In the stack trace, we see concat and seq repeated over and over:

(.printStackTrace *e *out*)
;; java.lang.StackOverflowError
;;      at clojure.core$seq.invoke(core.clj:133)
;;      at clojure.core$concat$fn__3955.invoke(core.clj:685)
;;      at clojure.lang.LazySeq.sval(LazySeq.java:40)
;;      at clojure.lang.LazySeq.seq(LazySeq.java:49)
;;      at clojure.lang.RT.seq(RT.java:484)
;;      at clojure.core$seq.invoke(core.clj:133)
;;      at clojure.core$concat$fn__3955.invoke(core.clj:685)
;;      at clojure.lang.LazySeq.sval(LazySeq.java:40)
;;      at clojure.lang.LazySeq.seq(LazySeq.java:49)
;;      at clojure.lang.RT.seq(RT.java:484)
;;      at clojure.core$seq.invoke(core.clj:133)
;;      at clojure.core$concat$fn__3955.invoke(core.clj:685)
;;      at clojure.lang.LazySeq.sval(LazySeq.java:40)
;;      at clojure.lang.LazySeq.seq(LazySeq.java:49)
;;      ... hundreds more ...

So we have a stack overflow. But why? We used recur. Our code has no stack-consuming recursion. Or does it? (cue ominous music)

Call the bomb squad

Let’s look at the definition of concat more closely. Leaving out the extra arities and chunked sequence optimizations, it looks like this:

(defn concat [x y]
  (lazy-seq
    (if-let [s (seq x)]
      (cons (first s) (concat (rest s) y))
      y)))

lazy-seq is a macro that wraps its body in function and then wraps the function in a LazySeq object.

The loop in build-result calls concat on the LazySeq returned by the previous concat, creating a chain of LazySeqs like this:

Calling seq forces the LazySeq to invoke its function to realize its value. Most Clojure sequence functions, such as first, call seq for you automatically. Printing a LazySeq also forces it to be realized.

In the case of our concat chain, each LazySeq’s fn returns another LazySeq. seq has to recurse through them until it finds an actual value. If this recursion goes too deep, it overflows the stack.

Just constructing the sequence doesn’t trigger the error:

(let [r (build-result 4000)]
  nil)
;;=> nil

It only overflows when we try to realize it:

(let [r (build-result 4000)]
  (seq r)
  nil)
;; StackOverflowError   clojure.lang.RT.seq (RT.java:484)

This is a nasty bug in production code, because it could occur far away from its source, and the accumulated stack frames of seq prevent us from seeing where the error originated.

Don’t concat

The fix is to avoid concat in the first place. Our loop is building up a result collection immediately, not lazily, so we can use a vector and call into to accumulate the results:

(defn build-result-2 [n]
  (loop [counter 1
         results []]
    (if (< counter n)
      (recur (inc counter)
             (into results (next-results counter)))
      results)))

This works, at the cost of realizing the entire collection up front:

(time (doall (take 21 (build-result-2 4000))))
;; "Elapsed time: 830.66655 msecs"
;;=> (1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6)

This specific example could also be written as a proper lazy sequence like this:

(defn build-result-3 [n]
  (mapcat #(range 1 %) (range 1 n)))

Which avoids building the whole sequence in advance:

(time (doall (take 21 (build-result-3 4000))))
;; "Elapsed time: 0.075421 msecs"
;;=> (1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6)

Don’t mix lazy and strict

There’s a more general principle here:
Don’t use lazy sequence operations in a non-lazy loop.

If you’re using lazy sequences, make sure everything is truly lazy (or small). If you’re in a non-lazy loop, don’t build up a lazy result.

There are many variations of this bug, such as:

(first (reduce concat (map next-results (range 1 4000))))
;; StackOverflowError   clojure.core/seq (core.clj:133)

(nth (iterate #(concat % [1 2 3]) [1 2 3]) 4000)
;; StackOverflowError   clojure.core/seq (core.clj:133)

(first (:a (apply merge-with concat
                  (map (fn [n] {:a (range 1 n)})
                       (range 1 4000)))))
;; StackOverflowError   clojure.core/seq (core.clj:133)

It’s not just concat either — any lazy sequence function could potentially cause this. concat is just the most common culprit.

Update October 3, 2015: My friend Jon Distad has come up with a way to avoid this bug with a different implementation of concat. See Concat implementation without stack overflow on the Clojure mailing list.

Footnotes:

All these examples use Clojure version 1.6.0

Depending on your JVM settings, it may take more or fewer iterations to trigger a StackOverflowError.

4 Replies to “Clojure Don’ts: Concat”

Jakub HolÃ½ says:

April 27, 2015 at 10:55 am

Thanks!
d-t-w says:

May 6, 2015 at 9:34 am

Thanks Stuart, fine timing – I came across an issue in clojure.core/cache just recently which is related to this particular issue: http://dev.clojure.org/jira/browse/CCACHE-40
Alan Moore says:

May 9, 2015 at 3:30 pm

Thanks for this, you saved me time and frustration. I had read this and days later encountered a lazy/eager scenario so I knew what the root cause was.
Alan Moore says:

May 9, 2015 at 3:32 pm

Maybe every lazy fn in core should have a “don’t use this in an eager context” type warning, if they don’t already…

Comments are closed.