When a developer named Ayrton Sparling disclosed the presence of malicious code in a popular npm module, event-stream, the response was disbelief—not because the code existed, but because of the way it got there.
The code, intended to steal users’ bitcoin wallets, had been injected by an unknown developer with the username right9ctrl. That person had gained commit access from event-stream’s author, Dominic Tarr, simply by asking for it. To many angry users, this was the equivalent of opening one’s front door to the first stranger who knocked, then grabbing one’s coat and leaving for the day. Right9ctrl, in response, did exactly what one might have expected: He headed straight for the valuables.
People were baffled by Tarr’s behavior. “You put at risk millions of people, and making something for free, but public, means you are responsible for the package,” wrote one user, XhmikosR.
But Tarr was unapologetic, offering an explanation that seemed maddeningly sincere: “He emailed me and said he wanted to maintain the module, so I gave it to him.”
That two experienced developers could hold such conflicting views on commit access belies a quiet but growing tension between past and present norms in open source. Big community projects such as Linux or Kubernetes may capture most of the mindshare, but they make up only a small fraction of widely used projects. A 2015 study of popular GitHub projects across language ecosystems found that nearly two-thirds have just one or two maintainers. Only three projects—less than 3 percent of the study’s sample—had more than 50 contributors.
What’s more, software is deliberately trending toward this style of maintenance. More developers are writing and publishing their own bits of code on GitHub, leading to a proliferation of public projects that can be discovered and used by anyone. And nowhere is this trend more visible than in the rapid rise of JavaScript and, by extension, npm. npm, Inc. recently estimated that 97 percent of the code in a modern web application now comes from npm. The npm ecosystem is modular by design, a tangled web of dependencies made of smaller subcomponents. A 2014 study from Bocoup suggests that 93 percent of npm modules have just one maintainer. Instead of a community of developers looking after the entire Rube Goldberg machine, we’ve now got individual developers in charge of a pulley here, a marble there.
If this comes as a surprise, it might be because of the way contributor counts are reported on GitHub, displayed on repositories as an aggregate number rather than by frequency. A tool like Fastlane lists over 900 contributors, but a deeper dive into their contributor graph reveals that just four developers contributed over 70 percent of the commits in 2018. Pandas, a Python library for data analysis, lists over 1,400 contributors, but four developers contributed nearly half of all commits in 2018.
Which number matters more: total number of contributors or percentage of commits per contributor? Well, both. Taken together, they tell a story: While there are, indeed, many contributors to open-source projects, the way that developers participate has changed. Instead of work being distributed across community members, we’re seeing a shift toward casual contributors: developers who make occasional, one-off contributions to a project but otherwise consider themselves to be passive users.
By standardizing version control, developer identity, and user experience, GitHub made it easier for developers to hop into any project and submit a contribution. As a result, projects now experience a greater volume of contributors, but many of those contributors interact with the project more superficially. A 2016 study of popular projects on GitHub found that nearly half of all contributors in the sample only contributed once, accounting for less than 2 percent of total commits.
Maybe this is simply because few-maintainer projects require less work, especially if our idea of what constitutes a “project” is shrinking in scope. But a quick glance at GitHub projects with the heaviest support volumes suggests that even these projects don’t necessarily grow into Linux-sized communities. These projects, too, often have just a few maintainers managing the biggest issue queues on GitHub. For example, Godot, a popular game engine, saw 9,471 issues opened in 2018 (roughly 182 per week), yet claims only two core developers. All this before anyone gets to write code at the end of the day!
Interestingly, the growing prevalence of few-maintainer projects doesn’t seem to be limited to individual efforts. Fastlane, for example, is currently developed under Google’s umbrella. React Native is supported by Facebook, yet it is maintained by just a handful of people. (The distinction here is a fuzzy one: There is no universal definition of “maintainer.”) Whether or not a project is sponsored by a company, the lesson appears to be similar: While open source is still participatory, it doesn’t look like Linux anymore. Perhaps it never did.
We could reflexively shift the blame onto maintainers, like a game of hot potato, putting the onus on them to recruit more contributors and distribute the burden of work. But instead of treating these conditions as a problem, what if we view them as a starting point? If few-maintainer projects are our new reality and active-contributor communities are the rare, exotic outliers, how does that change what we expect from open source?
Instead of a community of active contributors, open source often looks more like a few developers playing air traffic controller to thousands of users who are lightly involved. Rather than managing collaboration with developers across the globe, the new challenge facing these projects is coordination. If only a few developers are involved with a project for the long haul, how do they handle the influx of reactive work—requests, issues, and bug reports—while still making progress on proactive work, like new features, refactoring, and defining a vision and roadmap?
Returning to the question of commit access, Tarr not only defended his decision to share permissions with a stranger, but pointed out that he had followed best practices:
Of course, If [sic] I had realized they had a malicious intent I wouldn’t have, but at the time it looked like someone who was actually trying to help me. Since the early days of Node/npm, sharing commit access/publish rights, [sic] with other contributors was a widespread community practice.
Tarr cited a popular 2013 blog post by developer Felix Geisendörfer, which recommends “the pull request hack” as a way to reduce the burden of single maintainership: “Whenever somebody sends you a pull request, give them commit access to your project.”
At the other extreme lies Debian, one of the oldest open-source projects still active today. In order to gain commit access, Debian developers must follow an extensive onboarding process in which they are asked to read through a manual, find a mentor, and meet in person with a maintainer who can vouch for their identity. Debian’s process is built on the need for trust.
This style of maintainership makes sense for projects with bigger codebases and contributor communities (and, therefore, potentially bigger risks), but it doesn’t work so well in the context of few-maintainer projects. As open source moves toward high-frequency, low-touch interactions, this level of downside protection starts to feel unrealistic. Instead, maintainers must find ways to manage support requests and casual contributions without substantial investments of their time.
One of the reasons Git emerged as the dominant version control system is that changes are easier to sandbox and, if need be, revert. Changes are workshopped in branches and patches before merging, and this approach seems to work better for managing code at today’s scale. Similarly, norms that encourage users and contributors to do more work on their own help maintainers scale their time by reducing the amount of attention required by each contribution.
Opportunities for improvement can be found at every stage in the contributor cycle. For example, issues should utilize gentle friction, such as required fields, to improve quality and reduce overall volume. Bots can reduce repetitive human work by adding labels, standardizing code review, responding to new contributors, and ensuring checklists are followed. Documentation helps users and potential contributors get up to speed in a self-directed way. Contributions are welcome, but the responsibility to make them work with the rest of the project lies with the contributor, not the maintainer. And tests and status checks help both maintainer and contributor feel confident merging a contribution from someone they don’t know.
As Mike McQuaid, a maintainer of Homebrew, puts it: First-time contributors need documentation, second-time contributors need dedicated code review, and only when they come back for more should they warrant a serious investment of a maintainer’s time.
Given the shift from active participatory communities to few-maintainer projects, we need to reset our expectations about what it means to contribute to open source. The salient issue for maintainers today is less about growing contributor numbers and more about navigating the flow of developers who are clamoring for their time. In a world where single maintainers like Dominic Tarr maintain hundreds of tiny modules, we need to reframe the question from “Why would he do that?” to “How do we design for trustless interactions?”
While it’s tempting to look longingly at the past in search of a historic best-fit curve, we need to approach the problem with fresh eyes, encouraging a new set of best practices that not only align with current reality, but also help developers flourish and grow.