Welcome to what I hope will be an ongoing series of Clojure do’s and don’ts. I want to demonstrate not just good patterns to use, but also anti-patterns to avoid.
Some of these will be personal preferences, others will be warnings from hard-won experience. I’ll try to indicate which is which.
First up: concat
.
Concat, the lazily-ticking time bomb
concat
is a tricky little function. The name suggests a way to combine two collections. And it is, if you have only two collections. But it’s not as general as you might think. It’s not really a collection function at all. It’s a lazy sequence function. The difference can be important.
Here’s an example that I see a lot in the wild. Say you have a loop that builds up some result collection as the concatenation of several intermediate results:1
(defn next-results "Placeholder for function which computes some intermediate collection of results." [n] (range 1 n)) (defn build-result [n] (loop [counter 1 results []] (if (< counter n) (recur (inc counter) (concat results (next-results counter))) results)))
The devilish thing about this function is that it works just fine when n
is small.
(take 21 (build-result 100)) ;;=> (1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6)
But when n
gets sufficiently large,2 suddenly this happens:
(first (build-result 4000)) ;; StackOverflowError clojure.core/seq (core.clj:133)
In the stack trace, we see concat
and seq
repeated over and over:
(.printStackTrace *e *out*) ;; java.lang.StackOverflowError ;; at clojure.core$seq.invoke(core.clj:133) ;; at clojure.core$concat$fn__3955.invoke(core.clj:685) ;; at clojure.lang.LazySeq.sval(LazySeq.java:40) ;; at clojure.lang.LazySeq.seq(LazySeq.java:49) ;; at clojure.lang.RT.seq(RT.java:484) ;; at clojure.core$seq.invoke(core.clj:133) ;; at clojure.core$concat$fn__3955.invoke(core.clj:685) ;; at clojure.lang.LazySeq.sval(LazySeq.java:40) ;; at clojure.lang.LazySeq.seq(LazySeq.java:49) ;; at clojure.lang.RT.seq(RT.java:484) ;; at clojure.core$seq.invoke(core.clj:133) ;; at clojure.core$concat$fn__3955.invoke(core.clj:685) ;; at clojure.lang.LazySeq.sval(LazySeq.java:40) ;; at clojure.lang.LazySeq.seq(LazySeq.java:49) ;; ... hundreds more ...
So we have a stack overflow. But why? We used recur
. Our code has no stack-consuming recursion. Or does it? (cue ominous music)
Call the bomb squad
Let’s look at the definition of concat
more closely. Leaving out the extra arities and chunked sequence optimizations, it looks like this:
(defn concat [x y] (lazy-seq (if-let [s (seq x)] (cons (first s) (concat (rest s) y)) y)))
lazy-seq
is a macro that wraps its body in function and then wraps the function in a LazySeq object.
The loop in build-result
calls concat
on the LazySeq returned by the previous concat
, creating a chain of LazySeqs like this:
Calling seq
forces the LazySeq to invoke its function to realize its value. Most Clojure sequence functions, such as first
, call seq
for you automatically. Printing a LazySeq also forces it to be realized.
In the case of our concat
chain, each LazySeq’s fn
returns another LazySeq. seq
has to recurse through them until it finds an actual value. If this recursion goes too deep, it overflows the stack.
Just constructing the sequence doesn’t trigger the error:
(let [r (build-result 4000)] nil) ;;=> nil
It only overflows when we try to realize it:
(let [r (build-result 4000)] (seq r) nil) ;; StackOverflowError clojure.lang.RT.seq (RT.java:484)
This is a nasty bug in production code, because it could occur far away from its source, and the accumulated stack frames of seq
prevent us from seeing where the error originated.
Don’t concat
The fix is to avoid concat
in the first place. Our loop is building up a result collection immediately, not lazily, so we can use a vector and call into
to accumulate the results:
(defn build-result-2 [n] (loop [counter 1 results []] (if (< counter n) (recur (inc counter) (into results (next-results counter))) results)))
This works, at the cost of realizing the entire collection up front:
(time (doall (take 21 (build-result-2 4000)))) ;; "Elapsed time: 830.66655 msecs" ;;=> (1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6)
This specific example could also be written as a proper lazy sequence like this:
(defn build-result-3 [n] (mapcat #(range 1 %) (range 1 n)))
Which avoids building the whole sequence in advance:
(time (doall (take 21 (build-result-3 4000)))) ;; "Elapsed time: 0.075421 msecs" ;;=> (1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 5 6)
Don’t mix lazy and strict
There’s a more general principle here:
Don’t use lazy sequence operations in a non-lazy loop.
If you’re using lazy sequences, make sure everything is truly lazy (or small). If you’re in a non-lazy loop, don’t build up a lazy result.
There are many variations of this bug, such as:
(first (reduce concat (map next-results (range 1 4000)))) ;; StackOverflowError clojure.core/seq (core.clj:133)
(nth (iterate #(concat % [1 2 3]) [1 2 3]) 4000) ;; StackOverflowError clojure.core/seq (core.clj:133)
(first (:a (apply merge-with concat (map (fn [n] {:a (range 1 n)}) (range 1 4000))))) ;; StackOverflowError clojure.core/seq (core.clj:133)
It’s not just concat
either — any lazy sequence function could potentially cause this. concat
is just the most common culprit.
Update October 3, 2015: My friend Jon Distad has come up with a way to avoid this bug with a different implementation of concat
. See Concat implementation without stack overflow on the Clojure mailing list.
Thanks!
Thanks Stuart, fine timing – I came across an issue in clojure.core/cache just recently which is related to this particular issue: http://dev.clojure.org/jira/browse/CCACHE-40
Thanks for this, you saved me time and frustration. I had read this and days later encountered a lazy/eager scenario so I knew what the root cause was.
Maybe every lazy fn in core should have a “don’t use this in an eager context” type warning, if they don’t already…