Open-source Bundling

Cast your mind back to the halcyon days of the late ’90s. Windows 95/98. Internet Explorer 4. Before you laugh, consider that IE4 included some pretty cutting-edge technology for the time: Dynamic HTML, TLS 1.0, single sign-on, streaming media, and “Channels” before RSS. IE4 even pioneered — unsuccessfully — the idea of “web browser as operating system” a decade before Google Apps.

But if you remember anything about IE in the ’90s, it’s probably the word bundling. United States v. Microsoft centered on the tight integration of IE with Windows. If you had Windows, you had to have IE. By the time the lawsuit reached a settlement, IE was entrenched as the dominant browser.

Fast forward to the present. What an enlightened age we live in. Open-source has won and the browser market has fragmented. Firefox broke the IE hegemony, and Chrome killed it. The web browser really is an operating system.

But if you look around at software today, “bundling” is still with us, even in open-source software, that champion of choice and touchstone of tinkering.

To take an example (and to get the taste of IE out of your brain) let’s look at Hystrix, a Java fault-tolerance framework written at Netflix. First let me say that Hystrix is a fantastic piece of engineering. Netflix has given a great gift to the open-source community by releasing, for free, an essential part of their software infrastructure. I’ve learned a lot by studying the Hystrix documentation and source code.

But if you want to use Hystrix in your application, you have to use RxJava and Netflix’s Archaius configuration management framework. Via transitive dependencies, you also have to use Google’s Guava, the Jackson JSON processor, SLF4J, and Apache’s Commons Configuration, Commons Lang, and Commons Logging. For those of you keeping score at home, that’s two different logging APIs, two configuration APIs, and two grab-bag “utility” libraries.

There’s nothing wrong with these library choices. They may be suitable for your application or they may not. But either way, you don’t get a choice. If you want Hystrix, you have to have RxJava and all the rest. Even if you choose to ignore, say, Archaius, it’s still there, linked into your application code, with whatever bugs and security holes it might carry.

I don’t mean to pick on Netflix here either. As I said, Hystrix is a fantastic piece of engineering, and I’m very happy that Netflix released it. But it points to a mismatch between the goals of “internal-use” software and “open-source” software.

If you’re developing a tool or library for internal use within an organization, it makes sense to integrate closely with other software internal to that organization. It saves time, reduces development effort, and makes the software organization more efficient. When software is tightly integrated, each new tool or library multiplies the value of all the other software which came before it. That’s how technology companies like Netflix or Google can deliver consistently high-quality products and rapid innovation at scale.

The downside to this approach, from the open-source point of view, is that each new tool or library released by a software organization tends to be tightly coupled to the software which preceded it. More dependencies mean more opportunities for bugs, security holes, and misconfiguration. For the application developer using open-source libraries, each new dependency multiplies the cost of development and maintenance.

It’s not just corporate-sponsored open source that suffers from this problem — just look at the dependency tree of any Apache project.

The root problem is that great, hairy Minotaur which stalks the labyrinthine passages of any large code base: cross-cutting concerns. Almost any piece of code in an application will need, at some point, to deal with at least some of:

  • Logging
  • Configuration
  • Error handling & recovery
  • Process/thread management
  • Resource management
  • Startup/shutdown
  • Network communication
  • Filesystems
  • Data persistence
  • (De)serialization
  • Caching
  • Internationalization/translation
  • Build/provisioning/deployment

It’s much easier to write code if you know how each of these cross-cutting concerns will be handled. So when you’re developing something in-house, obviously you use the tools and libraries your organization has standardized on. Even if you’re writing something which you plan to make open-source, it’s easier to rely on the tools and patterns you already know.

It’s difficult to avoid coupling library code to one or more of these concerns. Take logging, for example. Java has had a built-in logging framework since 1.4. But many developers preferred Log4j or one of a handful of others. To avoid coupling libraries to a single logging framework, there is Apache Commons Logging, which tries to abstract over different logging frameworks with clever class-loading tricks. That turned out to be a brittle solution, so we got SLF4J, which puts responsibility for linking the correct logging APIs back in the hands of the application developer. But no one wants to take an entire day to slog through the SLF4J manual in the middle of building an application. Throw in the mysterious interactions of transitive dependencies in Maven-style build tools, and it’s no wonder every Java app starts up with an error message about logging. And logging is the easy case — most programmers could probably agree on what, broadly speaking, a logging framework needs to do. But still we have half a dozen widely-used, slightly-different logging APIs.

Developing a library which avoids making decisions about cross-cutting concerns is possible, but it takes painstaking attention to detail, with lots of extra extension points. (See Chris Houser’s talk on Exception Handling for an example.) Unfortunately, the resulting library is often less-than-satisfying to potential users because it has so many “holes” that need to be filled in. Who wants to spend half a day writing “glue” code and callbacks before you can even try out a new library? Busy application developers have an incentive to choose libraries that work “out of the box,” so library creators have an incentive to make arbitrary decisions about cross-cutting concerns. We justify this with the oxymoron “sensible defaults.”

The conclusion I draw from all this is that modern programming languages have succeeded at making software out of reusable parts, but have largely failed at making software out of interchangeable parts. You cannot just “swap in,” say, a different thread-management library. Hystrix itself exists to solve a problem with libraries and cross-cutting concerns in a services architecture. Quoting from the Hystrix docs:

Applications in complex distributed architectures have dozens of dependencies, each of which will inevitably fail at some point. If the host application is not isolated from these external failures, it risks being taken down with them.

These issues are exacerbated when network access is performed through a third-party client — a “black box” where implementation details are hidden and can change at any time, and network or resource configurations are different for each client library and often difficult to monitor and change.

Even worse are transitive dependencies that perform potentially expensive or fault-prone network calls without being explicitly invoked by the application.

Netflix has so many “API client” libraries, each making their own network calls with unpredictable behavior, that to make their systems robust they have to isolate each library in its own thread pool. Again, this is amazing engineering, but it was necessary precisely because too many libraries came bundled with their own networking, error handling, and resource management decisions.

A robust solution would seem to require everyone to agree on standards for every possible cross-cutting concern. That will obviously never happen. Even a so-called batteries-included language cannot keep the same batteries forever. This is a hard problem, and like all truly hard problems in software, it’s more about people than code.

I wish I had a perfect solution, but the best I can offer is some guidance. If you’re writing an open-source library, do everything in your power to avoid dependencies. Use only the features of the core language, and use those conservatively. Don’t pull in a library that deals with some cross-cutting concern just because it might be more convenient for your users. Build your API around plain functions and standard data structures.

Some examples, specific to Clojure:

  • Don’t depend on a logging framework unless it’s SLF4J.

  • Don’t use an error-handling framework: Throw ex-info with enough data for a handler to decide what to do.

  • If you need to do something asynchronous, use callbacks instead of core.async. Callbacks are easily integrated with core.async if that’s what the user wants to do. Likewise, if you need some kind of inversion of control, use function callbacks or protocols.

  • Don’t depend on any state-management framework or “ambient” state. Pass everything needed by an API function in its arguments. Provide operations for resource initialization and termination as part of your API. Same for configuration: pass a Clojure map as an argument.

  • Network communication and serialization: these are, admittedly, almost impossible to avoid if you’re writing a library for some network API. But you can at least give users the option of controlling their own networking by providing APIs to prepare requests and parse responses independently of making the actual network calls.

On the other hand, some “libraries” really are more like “embeddable services,” with their own internal state. Large frameworks like Hystrix fall into this category, as do a few sophisticated “client” libraries. These libraries might be expected to manage their own resources and state “under the hood.” That’s a reasonable design choice, but at least be clear about which goal you’re pursuing and what trade-offs you’re making. In most language runtimes, the behavior and dependencies of these libraries cannot be fully isolated from the rest of the code. As an application developer, I might be willing to invest time and effort arranging my code to accommodate one or two embedded services that offer significant power in exchange for the added complexity. For everything else, when I need a library, just give me some ordinary functions.

Crack for Engineers

I can’t help it. I just love big, complicated systems that let you get really precise about what you’re talking about. Types, classes, ontologies, schemas, normalization, denormalization, XML, RDF, XSLT, Java, … It’s all so cool. I can happily spend hours scribbling pages of hierarchies, interfaces, specifications, file formats, and the like.

But at the end of the day, I have a pile of hand-written notes and no code. All those fancy systems I love to study are not that useful when it comes to actually doing something useful. When I want to get something done, I hack up a Ruby script. It’s not elegant, but it works.

AltLaw makes a particularly tempting target for my engineering fantasies: all that unstructured data with so much potential structure. But to get the site to actually work, I use a flat index that stores everything as (gasp) strings.

Perhaps I’m learning the virtue of the “worse is better” approach to engineering. The only problem is, “worse is better” is worse fun.