What a deploy bot taught us about documentation

What a deploy bot taught us about documentation

How Glossier used a reader-centric approach to simplify their development team’s code deployment process and improve velocity.
Part of
Issue 6 August 2018

Documentation

Technical documentation can be the difference between a software that’s approachable or inscrutable; tools that are reliable or finicky, obvious or arcane. It can make a complicated deployment pipeline feel enjoyable and productive. We write documentation to empower readers—whether they are colleagues, customers, or strangers on the internet—by giving them exposition, context, and examples. It helps them to understand components and use tools that they otherwise might not have understood.

I encourage you to think about technical documentation broadly as communication that assists in operating and maintaining software. It comprises conventional docs like the README for open-source projects, the API docs for a service, and an engineering team’s internal wiki. It includes things that might not immediately come to mind when you think “documentation”—the chat transcript discussing how to architect a feature or respond to an incident, code review comments, error messages, and the output of command line scripts. It can take shapes such as screencasts, CLI --help output, and error messages. But in all its forms, documentation sets the user’s expectations for what the tool can do and how it works, and it can help them to use it more productively.

Recently, I had the opportunity to improve my team’s code deployment process. While the tooling changes were fairly simple, getting a team to adopt a new process involves a lot of communication. Much of the work required involved discussing, documenting, and training the team on the new process. Here, I’ll share some of the techniques that led to the project’s success through documentation and sharing the right information with the right audience at the right time.

The original deployment process

Our original deployment process was relatively labor-intensive. Despite having a one-button deployment tool, there were a handful of manual steps we had to take in order to coordinate who was deploying, and a slow process of ensuring that your code was up to date with the latest changes before anyone could deploy. If the deployment didn’t go as expected, some engineers weren’t comfortable investigating or fixing issues themselves.

We had a long wiki page detailing the process developers should follow and how to investigate issues. In theory, this could have been sufficient—but in practice, engineers’ understanding of and comfort with the document varied widely.

The team deploys code several times per day. Deploying could take as little as 20 minutes of developer attention if no one else was deploying, or it could take several hours on busy afternoons when several people were trying to ship new features simultaneously. Clearly, there was an opportunity to empower our engineers to ship code more quickly and independently by creating slightly different tooling and writing better documentation to kickstart our developer training.

The new deployment process

In our new deployment process, after a developer merges code to the master git branch, our continuous integration service deploys that code to production as soon as tests have passed.

The new process began as a Request for Comments (RFC) about an automatic deployment process. In this case, I framed the RFC around improved developer productivity from a deployment process that required less developer attention and coordination to ship code.

We use GitHub Issues for RFCs as a lightweight format for both documenting decisions and discussions. In this case, we discussed the scope of the project (e.g., we wanted to make it easier to ship, but we’d consider improving test speed and coverage as a separate concern). It was a place to highlight edge cases and alternatives: For example, we explored how we could disable automatic deployments and deploy manually if our CI service was down, or if we were handling some other incident. These discussions eventually became a “Deploy issues runbook,” a quick list of investigation tools and mitigation tactics that developers refer to when deployments don’t go as expected.

A deploy bot is born

With our CI service deploying automatically behind the scenes, we needed a way to inform developers of what was happening on their behalf, and to remind them of their responsibilities when deploying new code. The solution was a simple Heroku app that receives deployment webhooks from our hosting service, which then turns them into nicely formatted messages on our #tech-deploys Slack channel. When our site is operating normally, #tech-deploys is a log of recent changes. And during an incident, alerts are posted there, making it a natural venue to coordinate investigation and remediation.

Here’s what that Slack message looks like:

💞 Deploying Service X: commit abc123 by @aaron
Hi @aaron, please monitor your deployment for at least 15 minutes.

⚠️️ If there are problems: use the Deploy issues runbook.

🕵️‍ Things to check:

  1. If you deployed a new feature, test that it works as expected.

  2. Watch for alerts in this channel and #sentry-errors.

  3. Check our Glossier Overview dashboard for issues like elevated errors, fewer checkouts, or slow performance

And, moments later, when the deployment is successful:

💚 Success! Service X is now running abc123.  @aaron: remember to test your feature and monitor for issues.

And if the deployment fails for some reason, there’s a “💔 Deployment failed” message with another link to the Deploy issues runbook.

Our team has also adopted the helpful habit of the deployer reacting with a 👍 emoji to acknowledge that their new code is working as expected.

These deploy bot messages serve several functions:

  1. They briefly remind the deployer of their responsibilities. They don’t assume that the deployer remembers, nor do they expect them to read a comprehensive document. The info developers need is pushed to them when they need it. It makes the deployment process more consistent and accessible to engineers with various levels of experience.

  2. The @-mention helps to ensure that the message won’t be missed. Our team agreed that deployments were important enough to merit this notification. The deployer should be monitoring this Slack channel when deploying.

  3. The messages link to more detailed information like the changeset, our dashboard for administering the service, and the runbook with more comprehensive information about the deployment process and tactics for investigating and mitigating incidents, making additional resources readily available.

  4. The messages are in a public channel, so they serve as an event log for others who may be responding to an incident, or who just want to know who changed what when.

The new deployment messaging has been popular and successful. It has led to developers shipping more quickly and confidently, and to product managers and team leaders seeing improved velocity. This new approach to documentation and communication allowed us to make a complicated deployment process more comfortable and approachable.

Two important takeaways about successful communication emerged from this process. Hopefully these can help you to create better documentation for your teams.

Repeat important messages

Don’t assume that everyone understood the message the first time they received it. Repeat important information, preferably in different media, to ensure broad reception. As we moved toward adopting the new deployment process, our team had an RFC discussion, an email announcement, several Q&A discussions that took place in a public Slack channel or during office hours, some informal 1:1 chats, and an updated wiki page and runbook. Most importantly, our deploy bot reiterates the key deployer responsibilities every time someone deploys.

Vary the level of detail based on the context and audience

Different audiences have different needs. Communication can’t be one-size-fits-all. On our team, we use RFCs, pull requests, and wiki docs for in-depth explanations and discussions about the architecture and operations of our application. This is particularly helpful for the engineers maintaining and evolving our architecture and processes. But for folks deploying a pull request or responding to a production alert, we deliver fast, actionable information in our Slack notifications and runbooks.

By communicating our new process in several media and with different levels of detail, we ensured that the team understood the process well. We also found that pushing the documentation to the developer when they needed it was the key to building their confidence in the process and their ease with the new tools.

Inherently, we knew that documentation empowers others to be more productive with the software tools we create. Our deploy bot is (virtual) living proof of that.

About the author

Aaron Suggs is a software developer at Glossier with a focus on developer happiness and infrastructure. He’s most excited when he’s helping others to be more productive. In his spare time, he makes educational apps for his kids.

@ktheory

Artwork by

Ori Toor

oritoor.com

Buy the print edition

Visit the Increment Store to purchase print issues.

Store

Continue Reading

Explore Topics

All Issues