Technical documentation can be the difference between a software that’s approachable or inscrutable; tools that are reliable or finicky, obvious or arcane. It can make a complicated deployment pipeline feel enjoyable and productive. We write documentation to empower readers—whether they are colleagues, customers, or strangers on the internet—by giving them exposition, context, and examples. It helps them to understand components and use tools that they otherwise might not have understood.
I encourage you to think about technical documentation broadly as communication that assists in operating and maintaining software. It comprises conventional docs like the
README for open-source projects, the API docs for a service, and an engineering team’s internal wiki. It includes things that might not immediately come to mind when you think “documentation”—the chat transcript discussing how to architect a feature or respond to an incident, code review comments, error messages, and the output of command line scripts. It can take shapes such as screencasts, CLI
--help output, and error messages. But in all its forms, documentation sets the user’s expectations for what the tool can do and how it works, and it can help them to use it more productively.
Recently, I had the opportunity to improve my team’s code deployment process. While the tooling changes were fairly simple, getting a team to adopt a new process involves a lot of communication. Much of the work required involved discussing, documenting, and training the team on the new process. Here, I’ll share some of the techniques that led to the project’s success through documentation and sharing the right information with the right audience at the right time.
Clearly, there was an opportunity to empower our engineers to ship code more quickly and independently by creating slightly different tooling and writing better documentation to kickstart our developer training.
Our original deployment process was relatively labor-intensive. Despite having a one-button deployment tool, there were a handful of manual steps we had to take in order to coordinate who was deploying, and a slow process of ensuring that your code was up to date with the latest changes before anyone could deploy. If the deployment didn’t go as expected, some engineers weren’t comfortable investigating or fixing issues themselves.
We had a long wiki page detailing the process developers should follow and how to investigate issues. In theory, this could have been sufficient—but in practice, engineers’ understanding of and comfort with the document varied widely.
The team deploys code several times per day. Deploying could take as little as 20 minutes of developer attention if no one else was deploying, or it could take several hours on busy afternoons when several people were trying to ship new features simultaneously. Clearly, there was an opportunity to empower our engineers to ship code more quickly and independently by creating slightly different tooling and writing better documentation to kickstart our developer training.
In our new deployment process, after a developer merges code to the master git branch, our continuous integration service deploys that code to production as soon as tests have passed.
An RFC is a broad, inclusive discussion around a suggested change to the team’s processes and tooling or the app’s architecture.
The new process began as a Request for Comments (RFC) about an automatic deployment process. In this case, I framed the RFC around improved developer productivity from a deployment process that required less developer attention and coordination to ship code.
We use GitHub Issues for RFCs as a lightweight format for both documenting decisions and discussions. In this case, we discussed the scope of the project (e.g., we wanted to make it easier to ship, but we’d consider improving test speed and coverage as a separate concern). It was a place to highlight edge cases and alternatives: For example, we explored how we could disable automatic deployments and deploy manually if our CI service was down, or if we were handling some other incident. These discussions eventually became a “Deploy issues runbook,” a quick list of investigation tools and mitigation tactics that developers refer to when deployments don’t go as expected.
Prior to automatic deployments, the team of about 30 people used a #technology channel for team-wide discussions, announcements, and system notifications about GitHub PRs and deployments. It was difficult to follow. I created the #tech-deploys channel to focus specifically on changes to our production system. This allowed #technology to be a place for more thoughtful discussions about engineering tools, patterns, and architecture.
With our CI service deploying automatically behind the scenes, we needed a way to inform developers of what was happening on their behalf, and to remind them of their responsibilities when deploying new code. The solution was a simple Heroku app that receives deployment webhooks from our hosting service, which then turns them into nicely formatted messages on our #tech-deploys Slack channel. When our site is operating normally, #tech-deploys is a log of recent changes. And during an incident, alerts are posted there, making it a natural venue to coordinate investigation and remediation.
Here’s what that Slack message looks like:
💞 Deploying Service X: commit abc123 by @aaron
Hi @aaron, please monitor your deployment for at least 15 minutes.
⚠️️ If there are problems: use the Deploy issues runbook.
🕵️ Things to check:
- If you deployed a new feature, test that it works as expected.
- Watch for alerts in this channel and #sentry-errors.
- Check our Glossier Overview dashboard for issues like elevated errors, fewer checkouts, or slow performance.
And, moments later, when the deployment is successful:
💚 Success! Service X is now running abc123.
@aaron: remember to test your feature and monitor for issues.
And if the deployment fails for some reason, there’s a “💔 Deployment failed” message with another link to the Deploy issues runbook.
Our team has also adopted the helpful habit of the deployer reacting with a 👍 emoji to acknowledge that their new code is working as expected.
These deploy bot messages serve several functions:
- They briefly remind the deployer of their responsibilities. They don’t assume that the deployer remembers, nor do they expect them to read a comprehensive document. The info developers need is pushed to them when they need it. It makes the deployment process more consistent and accessible to engineers with various levels of experience.
- The @-mention helps to ensure that the message won’t be missed. Our team agreed that deployments were important enough to merit this notification. The deployer should be monitoring this Slack channel when deploying.
- The messages link to more detailed information like the changeset, our dashboard for administering the service, and the runbook with more comprehensive information about the deployment process and tactics for investigating and mitigating incidents, making additional resources readily available.
- The messages are in a public channel, so they serve as an event log for others who may be responding to an incident, or who just want to know who changed what when.
The new deployment messaging has been popular and successful. It has led to developers shipping more quickly and confidently, and to product managers and team leaders seeing improved velocity. This new approach to documentation and communication allowed us to make a complicated deployment process more comfortable and approachable.
Two important takeaways about successful communication emerged from this process. Hopefully these can help you to create better documentation for your teams.
Repeat important messages
Don’t assume that everyone understood the message the first time they received it. Repeat important information, preferably in different media, to ensure broad reception. As we moved toward adopting the new deployment process, our team had an RFC discussion, an email announcement, several Q&A discussions that took place in a public Slack channel or during office hours, some informal 1:1 chats, and an updated wiki page and runbook. Most importantly, our deploy bot reiterates the key deployer responsibilities every time someone deploys.
Vary the level of detail based on the context and audience
Different audiences have different needs. Communication can’t be one-size-fits-all. On our team, we use RFCs, pull requests, and wiki docs for in-depth explanations and discussions about the architecture and operations of our application. This is particularly helpful for the engineers maintaining and evolving our architecture and processes. But for folks deploying a pull request or responding to a production alert, we deliver fast, actionable information in our Slack notifications and runbooks.
By communicating our new process in several media and with different levels of detail, we ensured that the team understood the process well. We also found that pushing the documentation to the developer when they needed it was the key to building their confidence in the process and their ease with the new tools.
Inherently, we knew that documentation empowers others to be more productive with the software tools we create. Our deploy bot is (virtual) living proof of that.