Building software at scale with hundreds or even thousands of developers is a notoriously complicated enterprise. Large-scale software projects are often over time and over budget, and the bigger an engineering organization gets, the more time teams need to spend meeting, coordinating, planning, and actively managing dependencies and deadlines. The root causes of these compounding complexities are manifold: teams may find their work in conflict due to competing objectives, the system’s architecture may be insufficiently decoupled, and so on.
Writing software is fundamentally a collaborative effort.
And yet, we often believe one more tool can fix it: the shared Slack channel, the Jira ticketing system, the team wiki. No need to sort out the underlying issues, whether technological or organizational—just add another piece of technology to your stack and poof, your problems are gone. If only it were so simple!
While the right tooling can make our work less manual and more scalable, it’s no magic elixir. Writing software is fundamentally a collaborative effort. By understanding tooling as a complement to our planning processes rather than an alternative to them, we can better tackle the technical and human challenges of software development and build with increased clarity and velocity.
Tooling to reduce toil
From issue 17
A primer on containers
On core technologies, the engineering needs they’re best suited to serve, and possibilities for the containerized future.
The way we build and deploy software has evolved enormously over the years in terms of the scalability and complexity of the systems a fixed number of developers can create. Many of these changes have been driven by new technologies. Think of container tools like Docker, which allow us to set up the same environment locally and remotely, with build tools and dependencies managing and automating the process. Or CI/CD, which allows us to automatically rebuild, test, and deploy systems. Or microservices architecture, which reduces inter-team dependencies during deployment. These modernizations improved upon highly manual processes, enabling developers to collaborate and build at unprecedented scale.
But while tooling can improve developer productivity by circumventing tedious manual processes, nothing can automate away the need for thorough planning and collaborative decision-making. Consider the microservices architecture: Those responsible for different parts of the system still need to design it in such a way that components are structurally decoupled through proper abstractions. Meanwhile, changes across several components—replacing a central database with an event-driven architecture, for instance—will still require coordination among any teams connected to those components. This may require reconfiguring systems to use new endpoints, changing how data is passed around, or even rewriting a piece of software to conform to the new system’s requirements.
By the same token, creating a Slack channel to speed up a sluggish product launch can reduce the friction of sharing and archiving information, but if the team hasn’t articulated a clear vision and plan, communication may remain unclear and next steps hazy. Tooling in this context is the messenger, not the message: Developers first need to come together to clarify goals, establish ownership and responsibilities, and align on next steps. The Slack channel then becomes the venue for the team to discuss the progress they’re making toward a precisely articulated vision.
If tooling is most effective at automating manual processes that create dependencies, then when you’re thinking about adding that extra tool to your stack, consider whether your problem is one another tool can solve. Is there some manual step that requires a lot of coordination, perhaps because only a few engineers can do it? Automating that work could reduce a point of friction. Are you seeking a software design that enables teams to build features in parallel without stepping on each other’s toes (or code)? That will require getting together, figuring out each team’s requirements, and agreeing on a solution or compromise. Tooling will be a part of that solution—but it begins with a plan.
Tooling for humans
I like to think of team structures and processes as “tools for people.” They’re repeatable procedures we can follow to achieve a desired outcome—in this case, fewer manual steps and dependencies across teams—and they enable engineers to move faster, with fewer coordination bottlenecks.
One example of such a process is the “domain-driven design” approach put forward by Eric Evans in his 2003 book of the same name. Domain-driven design involves designing software systems in close collaboration with business users, with concepts representing business entities like customers and articles. These are organized into bounded contexts, such as sales and support, which provide clear guidance on how to partition responsibilities between teams. The resulting architectures are better aligned with real-world use cases, affording development teams clarity on how to map business requirements to distinct parts of the software. This allows individual teams to make changes with greater autonomy.
Another approach to streamlining team structures and software architectures, posited in the 2019 book Team Topologies by Manuel Pais and Matthew Skelton, recommends designing teams according to the flow of changes required on a system. The authors advocate for creating “stream-aligned” teams that own all of the work required to deliver a feature. These are complemented by teams that provide a platform service, teams whose purpose is to enable other teams, and teams that manage a complex subsystem, like a search engine cluster. Aligning team structures and responsibilities in this way helps ensure each team can operate more predictably, independently, and with a reduced need for cross-team coordination of day-to-day work.
Both of these approaches, and others that emphasize clear boundaries between functions and responsibilities, can be powerful complements to technological tools that reduce toil. They enable individual developers to work more autonomously and project managers and teams to focus on the larger-scale planning and coordination work that can’t be automated away, or addressed by even the most well-reasoned team structures and processes.
Tooling for the future
Much as microservices, containers, and CI/CD have enabled teams to automate away some day-to-day drudgery, emerging technologies could further minimize the coordination challenges engineers routinely face—at least, the ones that don’t require dedicated conversations and hashing out big ideas and issues.
From issue 5
A snapshot of programming language history
Programming languages have evolved in incredible and innovative ways. Here’s a quick look at just some of the languages that have sprung up over the decades.
Take programming languages, for example. Most of the languages in use today, like Java, Ruby, and Python, were developed at a time when networking wasn’t as central to how systems were built. As a consequence, developers still need to write a lot of boilerplate code to create a microservice. Application frameworks like Spring have cropped up to reduce some of the difficulties of creating REST interfaces, but the overhead required to create a REST API remains significant, and developers still need to address a lot of the error handling explicitly in the code. Languages that are better suited to building distributed systems could reduce the amount of code required to maintain a system, which would, in turn, grant individual developers more independence and lessen the amount of planning needed to create software systems.
More advanced, fundamentally distributed approaches to data could also provide scalable infrastructure that enables teams to build more extensible, decoupled systems. For example, in his 2017 book, Designing Data-Intensive Applications, Martin Kleppmann offers an approach to combining event-driven architectures, event streams, databases, ad hoc views, and other techniques to build distributed data systems. The design patterns he describes offer blueprints for developers to create such systems without starting from scratch, reducing the risk of a costly rewrite down the road. (Tools like Kafka or Pulsar provide pieces of such architectures, but for now, building distributed systems that can handle data at scale continues to require specialized knowledge. We may have a ways to go before these ideas yield a more complete set of products.)
As these tools evolve, we’ll need to remain vigilant not to stray into the “tooling will save us” trap. No matter how robust our tools become, engineers will still have to come together to consider factors like how to apportion responsibilities, coordinate inherently interdependent work, or re-architect a system to keep pace with changing requirements and business realities. But tooling is what makes planning manageable.
Tooling plus planning
Communication, coordination, and planning are fundamental to writing software at scale. No technology, or set of technologies, will eliminate the need for planning sessions, detailed documentation, and other low-tech tools for coordinating work among people.
But technologies have always played an important role in enabling us to become more productive and creative software developers. The key is to understand what these tools and technologies are best at, where they falter, and where people-centric approaches and processes come into play. So the next time you’re considering a tool you hope will make a planning problem disappear, ask yourself whether it’s automating away manual work that will truly reduce friction, or whether there’s an underlying coordination challenge you’ll need to resolve. If the latter, reach out to your team (via your technology of choice) and start the conversation.