This is probably my number one Clojure Don’t.
Laziness is often useful. It allows you to express “infinite” computations, and only pay for as much of the computation as you need.
Laziness also allows you to express computations without specifying when they should happen. And that’s a problem when you add side-effects.
By definition, a side-effect is something that changes the world outside your program. You almost certainly want it to happen at a specific time. Laziness takes away your control of when things happen.
So the rule is simple: Never mix side effects with lazy operations.
For example, if you need to do something to every element in a collection, you might reach for map
. If thing you’re doing is a pure function, that’s fine. But if the thing you’re doing has side effects, map
can lead to very unexpected results.
For example, this is a common new-to-Clojure mistake:
(take 5 (map prn (range 10)))
which prints
0 1 2 3 4 5 6 7 8 9
This is the old “chunked sequence” conundrum. Like many other lazy sequence functions, map
has an optimization which allows it to evaluate batches of 32 elements at a time.
Then there’s the issue of lazy sequences not being evaluated at all. For example:
(do (map prn [0 1 2 3 4 5 6 7 8 9 10]) (println "Hello, world!"))
which prints only:
Hello, world!
You might get the advice that you can “force” a lazy sequence to be evaluated with doall
or dorun
. There are also snippets floating around that purport to “unchunk” a sequence.
In my opinion, the presence of doall
, dorun
, or even “unchunk” is almost always a sign that something never should have been a lazy sequence in the first place.
Only use pure functions with the lazy sequence operations like map
, filter
, take-while
, etc. When you need side effects, use one of these alternatives:
doseq
: good default choice, clearly indicates side effectsrun!
: new in Clojure 1.7, can take the place of(dorun (map ...))
reduce
,transduce
, or something built on them
The last requires some more explanation. reduce
and transduce
are both non-lazy ways of consuming sequences or collections. As such, they are technically safe to use with side-effecting operations.
For example, this composition of take
and map
:
(transduce (comp (take 5) (map prn)) conj [] (range 10))
only prints 5 elements of the sequence, as requested:
0 1 2 3 4
The single-argument version of map
returns a transducer which calls its function once for each element. The map
transducer can’t control when the function gets evaluated — that’s in the hands of transduce
, which is eager (non-lazy). The single-argument take
limits the reduction to the first five elements.
As a general rule, I would not recommend using side-effecting operations in transducers. But if you know that the transducer will be used only in non-lazy operations — such as transduce
, run!
, or into
— then it may be convenient.
(defn operation [input] ;; do something with input, return result (str "Result for " input)) (prn (into #{} (comp (take 3) (map operation)) (range 100)))
reduce
, transduce
, and into
are useful when you need to collect the return value of the side-effecting operation.
What about stuff like DiffArrays? That is, structures that have mutable state, but abstract that state away? Or debugging?
As with anything else, use your own judgment. The key question to ask is, “Does it matter *when* this effect happens?”
Mapv is also eager I believe
I wrote this on the ClojureScript cheatsheet under the “Seq in, Seq out” tooltip:
“You can force a sequence to evaluate all its elements with the doall function. This is useful when you want to see the results of a side-effecting function over an entire sequence.”
Maybe I should change it to refence doseq instead? Thoughts?
Thanks,
Chris Oakman
My recommendation is to avoid side-effecting functions in lazy sequences, period.