Optimizing for Operational Resiliency in Open Source

When we talk about software resiliency, we’re typically referring to a service’s ability to be consistently available to users despite problems in the system. Resiliency in open source, however, encompasses the people and processes behind a project: the maintainers and contributors who make it run. Complementary to software resiliency, I’d like to explore the concept of operational resiliency through an open-source lens. Think of it as an open-source project’s ability to withstand disruptions to access, changes in maintainers, and shifts in the ecosystem.

In open source, consistent availability means a project is supported enough to triage bug reports and feature requests from users, manage incoming pull requests from contributors, ship releases, and update associated documentation and the homepage. While changes in ownership and individual contributors are common in most engineering orgs, they’re especially difficult to deal with in open-source projects, which tend to have a smaller group of individual contributors, fewer developers with in-depth knowledge of the product, and fewer resources (specifically, money and services) with which to react to disruptions.

Let’s talk through some of the steps open-source maintainers can take to improve their projects’ operational resiliency.

Pass the baton

Open-source projects typically start with a single creator and grow as time passes and interest increases. Usually, the primary maintainer will retain permissions to repos, deployments, CI services, and so on. This works well in a project’s early days, but work in open source long enough and you’ll likely have the misfortune of trying to release a package you aren’t authorized to or make a deployment to an environment you don’t have permissions to. Not only are these issues vexing to deal with, they cause bottlenecks that slow down the project’s functions.

Don’t despair—modern tooling in the open-source ecosystem can help you work around these obstacles.

For starters, host code under an organizational account instead of a personal one once the project has gained a substantial number of users and contributors. This incentivizes joint ownership of the project and reduces the likelihood code will be lost if a personal account is compromised or deleted.

Second, give permissions to teams or shared accounts instead of individual contributors. Not only will you avoid the headache of having to assign the correct permissions to each individual, you’ll be able to extend access to more people.

Finally, have a backup plan for times when a maintainer or BDFL (that is, a benevolent dictator for life) is unavailable. Whether they’ve set their status to “away” due to a personal emergency or have decided to step down from the project, ensuring you have a process in place for propagating permissions and appointing a new maintainer or set of maintainers can reduce project downtime. Your hand-off process should establish a cohort of people who’ll be notified when a maintainer steps down or pauses participation, include protocols for notifying the community, and outline how maintainers will transfer their knowledge to others in the ecosystem.

Automate the boring stuff

A single maintainer is a single point of failure for a project—but having more maintainers doesn’t necessarily solve all your problems. Even with more people involved, maintainer availability can be sporadic or inconsistent. Few people can commit 100 percent of their time to open source, so availability ebbs and flows. Issue triage and pull request review usually take up the bulk of a maintainer’s time and are the first to go when they have reduced availability.

Automation can help reduce this burden. Maintainers can harness automation tools to respond to new issues and pull requests, close old issues, label issues as bugs or feature requests, and encourage issue openers to provide more information. GitHub has a marketplace of applications for automating some of this functionality, and you can use tools like Probot to set up more customized automated workflows. These help bring order to the chaos of sundry issues, pull requests, and operational tidbits maintainers must supervise.

Document for the sake of strangers

In theory, open source is built on a foundation of shared ideas and knowledge. In practice, maintainers are still more likely than contributors to have intimate insight into the project. Many contributors won’t be familiar with things like the project’s protocol for deploying its websites or releasing a new version of a package, older parts of the codebase and decisions made early in its life span, or even the project’s design and product philosophy.

To make this information more available, be proactive about documentation. Document code, write detailed pull requests and commit descriptions that provide a helpful audit trail, and keep a record of processes and procedures, like your testing plan and how you ship new versions of a project. (A good example: The ZeroMQ project maintains a list of RFCs that not only captures technical features but community features as well, including a description of the project’s community model.)

No downtime for the wicked

While some of the concepts underpinning software resiliency—like documenting release processes and test plans—can also help ensure operational resiliency, navigating the unique challenges of open source requires a certain scrappiness.

By documenting projects well, automating repetitive tasks, and dispatching access and permissions to an appropriate group of maintainers, we can make our open-source projects more resilient to compromised maintainer accounts, reductions in maintainer availability, and gaps in shared knowledge. And we can ensure our projects will be “up” for community members to learn from, build on, and extend.

Pass the baton

Automate the boring stuff

Document for the sake of strangers

No downtime for the wicked

About the author

Artwork by

Topics

Buy the print edition

Continue Reading

Open Source

Safia Abdalla

Beyond maintenance

Remote

Safia Abdalla

Open-source excursions: A journey into remote record keeping

Containers

Safia Abdalla

Open-source excursions: Digging into Docker

Mobile

Safia Abdalla

Open-source excursions: Views from the mobile stack

Planning

Safia Abdalla

Open-source excursions: The poetry of planning

Frontend

Safia Abdalla

On composable, modular frontends

Development

Suz Hinton

A guide to coding accessible developer tools

Documentation

Stephanie Blotner

A primer on documentation content strategy

Documentation

Dave Nunez

Why it’s worth it to invest in internal docs

Explore Topics

All Issues

Planning

Mobile

Containers

Reliability

Remote

APIs

Frontend

Software Architecture

Teams

Testing

Open Source

Internationalization

Security

Documentation

Programming Languages

Energy & Environment

Development

Cloud

On-Call