Apathy of the Commons

Eight years ago, I filed a bug on an open-source project.

HADOOP-3733 appeared to be a minor problem with special characters in URLs. I hadn’t bothered to examine the source code, but I assumed it would be an easy fix. Who knows, maybe it would even give some eager young programmer the opportunity to make their first contribution to open-source.

I moved on; I wasn’t using Hadoop day-to-day anymore. About once a year, though, I got a reminder email from JIRA when someone else stumbled across the bug and chimed in. Three patches were submitted, with a brief discussion around each, but the bug remained unresolved. A clumsy workaround was suggested.

Linus’s Law decrees that Given enough eyeballs, all bugs are shallow. But there’s a correlary: Given enough hands, all bugs are trivial. Which is not the same as easy.

The bug I reported clearly affected other people: It accumulated nine votes, making it the fourth-most-voted-on Hadoop ticket. And it seems like something easy to fix: just a simple character-escaping problem, a missed edge case. A beginning Java programmer should be able to fix it, right?

Perhaps that’s why no one wanted to fix it. HADOOP-3733 is not going to give anyone the opportunity to flex their algorithmic muscles or show off to their peers. It’s exactly the kind of tedious, persistent bug that programmers hate. It’s boring. And hey, there’s an easy workaround. Somebody else will fix it, right?

Eventually it was fixed. The final patch touched 12 files and added 724 lines: clearly non-trivial work requiring knowledge of Hadoop internals, a “deep” bug rather than a shallow one.

One day later, someone reported a second bug for the same issue with a different special character.

If there’s a lesson to draw from this, it’s that programming is not just hard, it’s often slow, tedious, and boring. It’s work. When programmers express a desire to contribute to open-source software, we think of grand designs, flashy new tools, and cheering crowds at conferences.

A reward system based on ego satisfaction and reputation optimizes for interesting, novel work. Everyone wants to be the master architect of the groundbreaking new framework in the hip new language. No one wants to dig through dozens of Java files for a years-old parsing bug.

But sometimes that’s the work that needs to be done.

* * *

Edit 2016-07-19: The author of the final patch, Steve Loughran, wrote up his analysis of the problem and its solution: Gardening the Commons. He deserves a lot of credit for being willing to take the (considerable) time needed to dig into the details of such an old bug and then work out a solution that addresses the root cause.

The Reluctant Dictator

I have a confession to make. I’m bad at open-source. Not writing the code. I’m pretty good at that. I can even write pretty good documentation. I’m bad at all the rest: patches, mailing lists, chat rooms, bug reports, and anything else that might fall under the heading of “community.” I’m more than bad at it: I don’t like doing it and generally try to avoid it.

I write software to scratch an itch. I release it as open-source in the vague hope that someone else might find it useful. But once I’ve scratched the itch, I’m no longer interested. I don’t want to found a “community” or try to herd a bunch of belligerent, independent-minded cats. I’m not in it for the money. I’m not even in it for the fame and recognition. (OK, maybe a little bit for the fame.)

But this age of “social” insists that everything be a community. Deoderant brands beg us to “like” their Facebook pages and advertising campaigns come accesorized with Twitter hash tags. In software, you can’t just release a bit of code as open-source. You have to create a Google Group and a blog and an IRC channel and a novelty Twitter account too.

The infrastructure of “social coding” has codified this trend into an expectation that every piece of open-source software participate in a world-wide collaboration / popularity contest. The only feature of GitHub that can’t be turned off is the pull request.

Don’t get me wrong, I love GitHub and use it every day. On work projects, I find pull requests to be an efficient tool for doing code reviews. GitHub’s collaboration tools are great when you’re only trying to collaborate with a handful of people, all of whom are working towards a common, mutually-understood goal.

But when it comes to open-source work, I use GitHub primarily as a hosting platform.[1] I put code on GitHub because I want people to be able to find it, and use it if it helps them. I want them to fork it, fix it, and improve it. But I don’t want to be bothered with it. If you added something new to my code, great! It’s open-source – have at it!

I’m puzzled by people who write to me saying, “If I were to write a patch for your library X to make it do Y, would you accept it?” First of all, you don’t need my or anybody else’s permission to modify my code. That’s the whole point of open-source! Secondly, how can I decide whether or not I’ll accept a patch I haven’t seen yet? Finally, if you do decide to send me a pull request, please don’t be offended if I don’t accept it, or if I ignore it for six months and then take the idea and rewrite it myself.

Why didn’t I accept your pull request? Not because I want to hog all the glory for myself. Not because I want to keep you out of my exclusive open-source masters’ club. Not even because I can find any technical fault with your implementation. I’ve just got other things to do, other itches to scratch.

If everyone thought that way, would open-source still work? Probably. Maybe not as well.

To be sure, there’s a big difference between one-off utilities written in a weekend and major projects sustained for years by well-funded organizations. Managing a world-wide collaborative open-source project is a full-time job. The benevolent-dictator-for-life needs an equally-benevolent corporate-sponsor-for-life.[2] You can’t expect the same kind of support from individuals working in their spare time, for free.

I sometimes dream of an open-source collaboration model that is truly pull-based instead of GitHub’s they-should-have-called-it-push request. I don’t want to be forced to look at anything on any particular schedule. Don’t give me “notifications” or send me email. Instead, and only when I ask for it, allow me to browse the network of forks spawned by my code. Let me see who copied it, how they used it, and how they modified it. Be explicit about who owns the modifications and under what terms I can copy them back into my own project. And not just direct forks — show me all the places where my code was copied-and-pasted too.

Imagine if you could free open-source developers from all the time spent on mailing lists, IRC, bug trackers, wikis, pull requests, comment threads, and patches and channel all that energy into solving problems. Who knows? We might even solve the hard problems, like dependency management.

Update Jan 17, 8:52am EST: I should mention that I have nothing but admiration and respect for people who are good at the organizational/community aspects of open-source software. I’m just not one of them.

Footnotes:

[1] I’m not the only one. Linus Torvalds famously pointed out flaws in the GitHub pull-request model, in particular its poor support for more rigorous submission/signoff processes.

[2] Even with a cushy corporate sponsor, accepting patches is a far more work than the authors of those patches typically realize. See The story with #guava and your patches.

The Problem With Common Lisp

… as explained by Sir Kenny,

From: Ken Tilton
Newsgroups: comp.lang.lisp
Date: Tue, 01 Apr 2008 14:53:07 -0400
Subject: Re: Newbie FAQ #2: Where’s the GUI?

Jonathan Gardner wrote:
> I know this is a FAQ, but I still don’t have any answers, at least answers that I like.

That’s because you missed FAQ #1 (“Where are the damn libraries?”) and the answer (“The Open Source Fairy has left the building. Do them your own damn self.”)

… message truncated …