There comes a point in a programming career — at least one as peripatetic as mine — at which learning a new programming language barely registers as an obstacle. I’m not talking about mind-meltingly different languages like APL, just your run-of-the-mainstream object/imperative mishmash. Grab a syntax cheat-sheet, skim the standard library docs, and off you go.
Recently, my personal consulting business led me into a nest of Python, a language I had somehow managed to avoid for the past 15 years. I found, somewhat to my surprise, that I rather liked it. I wouldn’t call it great, but it’s easy to pick up and gets the job done. Packaging and namespacing is a mess, but no worse than any other language of its era.
I found that Python has just enough functional paradigms to keep me from tossing it out the window. Nothing compared to Clojure, of course, but enough.
Then I hit a stumbling block. I was writing an API which needed to receive a collection of things. I needed to use the first item of the collection for some initial setup, then iterate over the whole collection. It seemed straightforward enough:
def do_work(things): first_thing = things print("Initial setup with %r" % first_thing) for thing in things: print("Doing something useful with %r" % thing)
I tested it with a list of
things and it worked fine.
do_work([1, 2, 3]) Initial setup with 1 Doing something useful with 1 Doing something useful with 2 Doing something useful with 3
Then, in another piece of code, I decided to push my growing confidence in Python and try one of these new-fangled generators I’ve heard so much about:
def generate(n): for i in range(n): yield "generated value %d" % i
I eagerly tested it and …
do_work(generate(3)) TypeError: 'generator' object is not subscriptable
OK, you caught me, I made the cardinal mistake when learning a new language: treating it like another language I already knew. In this case, I was trying to use a Python generator the same way I would use a Clojure sequence.
Clojure sequences are immutable and automatically cache generated values as long as they are reachable. In Clojure, I would have written something like this:
(defn do-work [things] (let [first-thing (first things)] (println "Initial setup with" first-thing) (doseq [thing things] (println "Doing something useful with" thing))))
Immutability makes things simpler: just because I did
(first things) doesn’t change the value of
Python generators are more like suspended computations; once they
yield a value there’s no way to get it back. So Python will not let me “subscript” a generator — as in
things — because a generator cannot fulfill the expected behavior of a list, i.e., that the elements stay put after you look at them.
I understood the Python error message pretty quickly. Then I spent several minutes banging around the internet looking for a Python equivalent to Clojure’s immutable sequences. There probably is such a beast, but I didn’t find it.
There are plenty of workarounds, of course, the easiest being to coerce the generator into a list:
do_work(list(generate(3))) Initial setup with 'generated value 0' Doing something useful with 'generated value 0' Doing something useful with 'generated value 1' Doing something useful with 'generated value 2'
I could have stopped there and been just fine. But that
list() stuck in my Clojurist’s craw. Forcing the entire generator to be realized in memory just so I could hang on to the first element felt wasteful. Who knows, maybe it would even break one day when handed a generator that produces thousands of items.
After a bit of tinkering, I came up with:
from itertools import chain def do_work(things): iterator = iter(things) first_thing = next(iterator) iterator = chain([first_thing], iterator) print("Initial setup with %r" % first_thing) for thing in iterator: print("Doing something useful with %r" % thing)
It’s not what I would call elegant. But it works, and it’s flexible:
things can be any iterable type, including a generator or a regular list.
(For the Clojurists in the room,
chain() is analagous to Clojure’s
concat: it creates a new iterator out of two iterable things.)
After I finished the requisite rant about silly languages that don’t embrace immutable values as a pervasive default, I got to thinking: Clojure’s immutable sequences are elegant and powerful, but their design has some subtle consequences that regularly trip people up.
Almost everyone learning Clojure has stumbled over the holding onto the head bug, which is a direct consequence of sequence caching. Keep at it long enough and you will eventually hit the stacked sequences problem. These bugs may not show up during development; they only come back to bite you when they hit a sufficiently large sequence in production.
Furthermore, it takes knowledge of what’s going on behind the scenes — how Clojure sequences actually work, as opposed to the surface API — to understand the problem. Recognizing memory leaks before they cause problems is usually a matter of experience and pattern-recognition. For someone who just grabbed a syntax cheat-sheet and skimmed the standard library docs, there’s a good chance of writing that bug but not much chance of knowing how to fix it.
By contrast, Python’s generators and iterators are a little more … mechanical. You can see how things work, because most of the parts are exposed for you to see. While I’m sure there are ways to write memory leaks in Python, I’m guessing iterators are not a common source.
This is not to make a point about one approach being better than the other. I’m not going to rehash worse is better for the nth time, nor the law of leaky abstractions. I merely note that there are trade-offs either way. A more “elegant” abstraction can produce shorter, more expressive programs, at the cost of hidden complexity. Exposed mechanisms are easier to observe, but force you to spend attention on mechanical details.