A primer on containers

Remember when people used cloud computing because it was cheaper? It (often) still is—but what began as a way to cut costs has led to a sea change in IT. Similarly, containerization, which started as an incremental shift in how code is packaged and deployed, has fundamentally altered how code is written, as well as the architecture of the services it supports. For example, most large software companies used to release new code once a month at most. Now, the most successful teams release code to production at least once a day, an acceleration made possible by containerization.

A container, in simple terms, is a bundle of everything an application needs in order to run, including libraries and dependencies. Unlike a virtual machine (VM), a container doesn’t include a full operating system kernel, relying instead on containerization platforms such as Docker, LXC, or rkt to get what it needs from the operating system layer. Containers can offer a range of benefits over VMs. For one, they generally use less memory and storage space when running applications. More broadly, they enable architecture that’s flexible and resilient, in which software runs consistently and scales smoothly.

As with most tools, however, containers aren’t a universal solution. They work best when used to fulfill specific engineering needs—something to keep in mind as we explore their advantages and disadvantages. Let’s dive in, shall we?

Tools of the trade

When it comes to container technologies, Docker and Kubernetes are often mentioned in the same breath. Docker is the most popular platform for creating and organizing container images, which are lightweight, executable software packages that include all the requirements for running an application, while tools like the Docker Hub image library make it easy to share such images within a team and with the public. Meanwhile, Kubernetes, the leading container orchestration tool, allows users to manage multiple containerized images in any number of architectures. Its use cases are flexible and varied, but a common pattern is to have Kubernetes run a cluster of Docker images and identify events or trends that require a change, like high demand, to stand up more images as needed.

If Docker containers are the bricks with which you build your service, Kubernetes is the architect. This isn’t a perfect analogy, since Kubernetes would be an extremely hands-on architect who keeps an eye on every single brick (yikes). But it’s a useful metaphor for understanding how Docker and Kubernetes relate: You can’t do container orchestration without containers to orchestrate.

The distinction between the two is important as their use cases vary widely. Many projects need bricks, but they don’t all need architects; Kubernetes allows you to run a large, sophisticated stack, whereas Docker can be used to package up very small bits of code, from hobbyist projects to servers for the smallest of small businesses. While the former is perhaps more glamorous, pretty much every developer and ops engineer will find the latter more useful on a day-to-day basis. You can use bricks to build a vast cathedral, but you can also use them to build a garden wall in your backyard.

The benefits

Improved consistency

Over the course of a decade in tech, I’ve seen countless incidents caused by code on the production servers running in a different environment than the one it was built for. A telling example: I can vividly recall an incident involving a release that had introduced a new dependency on a caching layer. In your average incident, the caching layer wouldn’t be present, so the code would fail in production. The reality, however, was far worse: The caching layer was present, but it was already in use by another component, causing requests to sometimes go through and sometimes get misdirected. This caused tough-to-diagnose failures that kept a major service inoperable for over 12 hours.

While investigating the issue, the incident response team asked whether the release had added a new dependency. Unfortunately, the team lead wasn’t aware that a drive-by improvement had added a need for caching. In the aftermath, it would have been easy to blame the team lead (or the team as a whole) for insufficient information sharing. But this wasn’t the real problem. Rather, it was that the deployment system allowed developers to mistakenly believe they had working code, causing them to release code with new dependencies that would only be revealed by failures in production.

Containers could have prevented the aforementioned issue in a few ways:

When the developer added the feature that required caching, they wouldn’t have been able to rely on a caching layer outside their container. Instead, they would have had to add the dependency to the package information.
When the team merged the new feature, the dependency would have been clearly visible as a change to the package dependency.
In all likelihood, the code deployed to production would have been packaged with its caching dependency, meaning the incident wouldn’t have happened in the first place.
Even if it had, the team lead on call would likely have seen the added dependency during code review. If not, the commit history would have included a clear indication that the release added a need for caching.

Communication, documentation, and clear team processes are all essential to a software team’s success and the reliability of their services. But none of these fallible human practices should be your only line of defense against major failures. This is where containerization comes in, adding a layer of stability and consistency: As long as the container framework is in place, the package contains everything your code needs to work properly.

Smooth scaling and a resilience bump

Horizontal scaling has gotten much easier on modern VMs. Cloud platforms like AWS simplify the process of duplicating VMs, even doing so automatically in response to high traffic. But while an advanced VM platform should be capable of replicating your virtual machines (albeit rather slowly), vertical scaling—increasing resources to a whole operating system—can be a real challenge. Kubernetes’ container orchestration layer offers much tighter control over containers’ size and resources, as well as how they start and run. This makes horizontal and vertical scaling quicker and easier, and enables both to be done automatically in response to high traffic or other metrics that indicate a need for more capacity.

Since Kubernetes manages complete, self-sufficient packages that contain everything they need to stand up new instances (aka containers), it’s extremely good at starting up more instances in response to high demand. Its performance monitoring tools also offer reliability and resilience advantages: Rather than requiring a developer to step in, Kubernetes can automatically restart crashed container instances. (There are limits to automation, of course: You don’t want to burn resources restarting something that crashes over and over again.)

Greater developer velocity

Containers also allow you to run code on your laptop in the (nearly) exact environment it will use in production, allowing development teams to alter the coding, configuration, and packaging of an application and simulate it on local machines in exactly the way it will run on a production server. Previously, certain kinds of debugging, like performance optimization, were difficult to do locally, and required experimenting on a staging environment. (There’s a whole article to be written about whether staging environments should still be used, but for the moment, consider Charity Majors’ 2018 QCon talk “Yes, I Test in Production (and So Do You)” a starting point.)

They also make possible the distribution of more complex packages. Say a service requires an application, a database, and an authentication store. Before containers, the shared version of the service would include a README with several pages of laborious steps to install and configure the other components. With a container image, you can package up everything the application needs, including dummy data for the database if necessary.

The main benefit in both cases is developer velocity: giving developers more runway to try new things and share code packages with each other. When you don’t need to spend all morning installing dependencies manually, experimenting with another engineer’s project becomes a whole lot more pleasant. A secondary benefit is that developers can better understand how their code is actually running—the “Ops” side of the DevOps equation.

The obstacles

Expertise and specialization

I started this article talking about the move to cloud computing. While containerization is, in many ways, connected to cloud computing, there’s a key difference: Cloud computing generally means you’re buying both the service and a cloud provider’s expertise in running it, whereas containers and container orchestration can feel like a decidedly more DIY affair.

Both container platforms like Docker and container orchestration tools like Kubernetes require expertise to use effectively. With a team of 20 engineers using AWS EC2, it’s unlikely anyone will be working with the service full time. Enough of it is automated that any ops person can handle it with a little training. But containers and their orchestration require more specialization, including at least one container expert to a team, to run successfully. That expertise can be tough to find—while coding boot camps and computer science programs mint new engineers every day, there aren’t similar programs for Kubernetes engineers.

Every CTO has nightmares of organizational failures wherein new features are delayed while otherwise experienced engineers trawl Stack Overflow for solutions to complex problems specific to their architecture. Here’s an illustrative example: In a 2019 article on the Tinder Engineering blog, a group of engineering managers and developers described the company’s complex, time-consuming Kubernetes migration, which took two years. At the end of it, all they could point to in benefit was a few minutes shaved off the startup time of new application instances. Ouch. While not mentioned in the article, I can all but guarantee features were delayed and other tech debt went unaddressed while the team struggled to get Kubernetes working.

Adopting a tool like Kubernetes comes with trade-offs. You’ll gain expertise and velocity, but will likely lose precious hours that could have been spent focusing on other areas of the business.

Knowing when not to use containers

Despite the benefits of containerization, the truth of any new infrastructure or model is that it doesn’t solve every problem. There are many workloads for which containers’ flexibility and performance aren’t necessary. If you’re selling novelty sneakers online, pursuing eleven 9s and microsecond response times is probably not going to affect your bottom line.

My rule of thumb: If your service isn’t trying to compete on reliability, cost, or flexibility, you may be better off using a SaaS tool like Amazon EC2 or Heroku to manage your servers, with or without containers. While I wouldn’t go as far as cloud economist Corey Quinn, who’s argued that Kubernetes is a fad that will soon disappear, his observation that the system is large, complicated, and expensive to deploy holds water. Kubernetes is, after all, a tool—it’s flexible and applicable in many situations, but no single tool meets every need.

The containerized future

To me, the answer to the question, “Is it better to distribute software in a package that includes its dependencies and config?” is a resounding yes. The alternative—relying on an ops team to make sure the right dependencies are available to your code—is less than ideal. So I have no doubt containerization has a bright future.

In the years to come, I suspect we’ll see clever AI implementations bundled up with their modeling tools, databases, and web frontends and shared with developers all over the world. Instead of copy-pasting “here’s my configuration” on discussion boards, more people will post their whole container image when asking for help. The “next big thing” in web frameworks will be distributed not as bare code with documentation about needed packages, but as a container image.

As containers increasingly become a standard tool in developers’ tool kits, the future of software development looks more fun, flexible, and distributed. With these considerations in hand, I hope you’ll be able to use them to their fullest.

Tools of the trade

The benefits

Improved consistency

From issue 3

An introduction to local development with containers

Smooth scaling and a resilience bump

Greater developer velocity

From issue 10

I test in prod

The obstacles

Expertise and specialization

Knowing when not to use containers

The containerized future

About the author

Artwork by

Topics

Buy the print edition

Continue Reading

Containers

Liz Rice

Containers in the keep

Containers

Frederic Branczyk

Observing containers

Containers

Amit Saha

Best practices for container compliance

Containers

Andrew Leonard

The container incubator

Containers

Husayn Arrah, Stefan Kruger, and Ifeoluwa Sobogun

The process: Building on-demand staging environments at Paystack

Containers

Ayden Férdeline

Toward sustainable software engineering

Containers

Chris Stokel-Walker

Containers for the future

Containers

Michael Hausenblas

How to cloud native

Containers

Increment Staff

Containers at scale

Explore Topics

All Issues

Planning

Mobile

Containers

Reliability

Remote

APIs

Frontend

Software Architecture

Teams

Testing

Open Source

Internationalization

Security

Documentation

Programming Languages

Energy & Environment

Development

Cloud

On-Call