Open-source excursions: Optimizing for operational resiliency

Open-source excursions: Optimizing for operational resiliency

Documentation, automation, and a little sharing-is-caring can help OSS projects maintain their uptime.
Part of
Issue 16 February 2021


When we talk about software resiliency, we’re typically referring to a service’s ability to be consistently available to users despite problems in the system. Resiliency in open source, however, encompasses the people and processes behind a project: the maintainers and contributors who make it run. Complementary to software resiliency, I’d like to explore the concept of operational resiliency through an open-source lens. Think of it as an open-source project’s ability to withstand disruptions to access, changes in maintainers, and shifts in the ecosystem.

In open source, consistent availability means a project is supported enough to triage bug reports and feature requests from users, manage incoming pull requests from contributors, ship releases, and update associated documentation and the homepage. While changes in ownership and individual contributors are common in most engineering orgs, they’re especially difficult to deal with in open-source projects, which tend to have a smaller group of individual contributors, fewer developers with in-depth knowledge of the product, and fewer resources (specifically, money and services) with which to react to disruptions.

Let’s talk through some of the steps open-source maintainers can take to improve their projects’ operational resiliency.

Pass the baton

Open-source projects typically start with a single creator and grow as time passes and interest increases. Usually, the primary maintainer will retain permissions to repos, deployments, CI services, and so on. This works well in a project’s early days, but work in open source long enough and you’ll likely have the misfortune of trying to release a package you aren’t authorized to or make a deployment to an environment you don’t have permissions to. Not only are these issues vexing to deal with, they cause bottlenecks that slow down the project’s functions.

Don’t despair—modern tooling in the open-source ecosystem can help you work around these obstacles.

For starters, host code under an organizational account instead of a personal one once the project has gained a substantial number of users and contributors. This incentivizes joint ownership of the project and reduces the likelihood code will be lost if a personal account is compromised or deleted.

Second, give permissions to teams or shared accounts instead of individual contributors. Not only will you avoid the headache of having to assign the correct permissions to each individual, you’ll be able to extend access to more people.

Finally, have a backup plan for times when a maintainer or BDFL (that is, a benevolent dictator for life) is unavailable. Whether they’ve set their status to “away” due to a personal emergency or have decided to step down from the project, ensuring you have a process in place for propagating permissions and appointing a new maintainer or set of maintainers can reduce project downtime. Your hand-off process should establish a cohort of people who’ll be notified when a maintainer steps down or pauses participation, include protocols for notifying the community, and outline how maintainers will transfer their knowledge to others in the ecosystem.

Automate the boring stuff

A single maintainer is a single point of failure for a project—but having more maintainers doesn’t necessarily solve all your problems. Even with more people involved, maintainer availability can be sporadic or inconsistent. Few people can commit 100 percent of their time to open source, so availability ebbs and flows. Issue triage and pull request review usually take up the bulk of a maintainer’s time and are the first to go when they have reduced availability.

Automation can help reduce this burden. Maintainers can harness automation tools to respond to new issues and pull requests, close old issues, label issues as bugs or feature requests, and encourage issue openers to provide more information. GitHub has a marketplace of applications for automating some of this functionality, and you can use tools like Probot to set up more customized automated workflows. These help bring order to the chaos of sundry issues, pull requests, and operational tidbits maintainers must supervise.

Document for the sake of strangers

In theory, open source is built on a foundation of shared ideas and knowledge. In practice, maintainers are still more likely than contributors to have intimate insight into the project. Many contributors won’t be familiar with things like the project’s protocol for deploying its websites or releasing a new version of a package, older parts of the codebase and decisions made early in its life span, or even the project’s design and product philosophy.

To make this information more available, be proactive about documentation. Document code, write detailed pull requests and commit descriptions that provide a helpful audit trail, and keep a record of processes and procedures, like your testing plan and how you ship new versions of a project. (A good example: The ZeroMQ project maintains a list of RFCs that not only captures technical features but community features as well, including a description of the project’s community model.)

No downtime for the wicked

While some of the concepts underpinning software resiliency—like documenting release processes and test plans—can also help ensure operational resiliency, navigating the unique challenges of open source requires a certain scrappiness.

By documenting projects well, automating repetitive tasks, and dispatching access and permissions to an appropriate group of maintainers, we can make our open-source projects more resilient to compromised maintainer accounts, reductions in maintainer availability, and gaps in shared knowledge. And we can ensure our projects will be “up” for community members to learn from, build on, and extend.

About the author

Safia Abdalla is a software engineer working on open-source technologies at Microsoft and an open-source maintainer on the nteract project. When she’s not working on open source, she enjoys writing and running.


Artwork by

Ana Galvañ

Buy the print edition

Visit the Increment Store to purchase print issues.


Continue Reading

Explore Topics

All Issues