Joe Beda is one of the creators of Kubernetes, a container orchestration system developed at Google and released in 2014. Beda left Google the following year to recharge his batteries, and, after a period spent as an entrepreneur in residence at a venture capital firm, he started Heptio in 2016 to provide support to enterprises adopting or using Kubernetes. VMware purchased Heptio in 2018, and Beda continues to lead the team inside the larger company.
Increment spoke with Beda in early 2021 about the impetus behind Kubernetes, the community that drives it, and what he sees as its future.
This interview has been edited and condensed for length and clarity.
Increment: The group you were working with at Google wasn’t focused on containers. What itch was Kubernetes created to scratch?
Joe Beda: The project I was working on was Google Compute Engine, which is Google’s answer to EC2, the raw [VM] engine. That was built on top of Google’s internal infrastructure called Borg, which used containers—or at least [similar] technology in the Linux kernel. From a 10,000-foot view, it was containers, but not the way people know Docker containers outside of Google.
A bunch of things came together to make Kubernetes happen. We had Docker, which popularized containers and made them very usable from a developer point of view. Docker at that time was focused on creating an experience on one machine, whether that be a production or laptop machine. That was genius in terms of making this technology very accessible.
At the same time, we recognized that we needed to do something different to create a competitive advantage for Google. The analogy I used a lot was, how do we shake the snow globe and get people to rethink the way they deploy applications in a way that is fundamentally aligned with how Google thinks about these things? Kubernetes was a way to start aligning some of that thinking.
Kubernetes was sort of a spiritual successor to Borg. I think it worked well for Google in terms of [shifting] the conversation away from competing apples to apples with Amazon. That’s also why we did it open source. If Google had come out with Kubernetes as something you could only get on GCP [Google Cloud Platform], it wouldn’t have had nearly the same impact.
One of the things you realize at scale is that something is always failing.
How did Kubernetes let you manage scale while offering more reliability?
One of the things you realize at scale is that something is always failing. In any distributed system, whether we’re talking about hardware or software, some part of it is in some sort of failure mode. Part of building these systems is building for resiliency and self-healing. A big part of what we did as we moved from Docker on a single machine to Kubernetes across a lot of machines was to recognize that machines will come and go, things will fail, unexpected things will happen.
SRE is this whole movement. The term started at Google. It’s this idea of upleveling the operations function so that instead of just patching over problems, [engineers] get involved in root cause analysis and fundamentally fixing [those problems]. How this manifests in Borg is that you want to get to the point where all the things that will wake you up in the middle of the night can automatically be resolved, and you make systems more self-managing.
As you go from one computer to multiple computers, you have to deal with things failing. And then you have to deal with the dynamism of, well, if this machine fails, how do we bring up a replacement machine, re-level the workloads on top of it, and make sure we can take care of storage and networking along the way? Dealing with the dynamism of an ever-changing system drove all the features that eventually ended up in Kubernetes.
How do developers need to think about Kubernetes compared to standard compute instances?
Dealing with the dynamism of an ever-changing system drove all the features that eventually ended up in Kubernetes.
When we look at storage, we have everything on physical disks. Amazon has block devices, which are sort of virtual physical disks. But for the operating system, these things still look like physical disks. Then you move into a logical realm where you start talking about file systems, databases, and objects and buckets in S3 [Amazon Simple Storage Service]. You can walk into a data center and point at a disk; you can’t walk into a data center and point at a file.
That level of abstraction—moving from a hardware abstraction to a logical abstraction—happens in storage. That hadn’t happened in compute, at least not in a distributed way. You have a computer, and inside that computer you have a bunch of processes running in your OS, but the processes were fundamentally confined to a single machine. Kubernetes brought this class of systems for creating logical abstractions that developers can interact with across [a cluster of] machines. On top of that, you can build systems that allow you to automatically roll stuff out or roll it back. It’s much more fluid than worrying about spinning up whole new machines, because you don’t care about the machines, you care about the processes.
So you were trying to eliminate the need for app developers to deal with the guts of managing physical devices?
Exactly. When you think about Kubernetes, there’s the platform operator, or the people who run Kubernetes, and then there’s the user, or the application team that’s running stuff on top of Kubernetes. What we end up being able to do is create a separation of concerns, such that you can have a specialist team managing a set of Kubernetes instances presenting a facade, an interface to the application developers. And those application developers now have a whole host of things they no longer have to sweat the details around. The goal here is to make that team highly leveraged. You can have a small number of people managing a platform that’s widely used by a large number of applications and application teams.
There’s nothing in the core Kubernetes world called “application.” Kubernetes doesn’t know what an application is. Kubernetes knows what, essentially, containers are, and pods, and network services, and disks, and all that.
What role did the community play in creating and developing Kubernetes?
Community is 100 percent one of the keys to success for Kubernetes. It started as a Google project, but very early on we brought Red Hat on. It was a bunch of senior-ish engineers with a common vision who could work together fluidly across company lines, finding common cause.
From the very start, we wanted to be open and create something that a lot of other people felt like they owned—that’s what made the project really explode. We gave them responsibilities, and they ran with it. It’s humbling to see how impactful and big that community has gotten. I think the stars have to align to have that happen, and they definitely did for Kubernetes.
The future of the project was handed off to the Cloud Native Computing Foundation (CNCF). How did that go?
My partner at Heptio, Craig [McLuckie], helped get the CNCF started while he was at Google. One of the goals with the CNCF was not to make it a Kubernetes foundation, but to be something that brought in a larger ecosystem. And it succeeded amazingly well in terms of rooting that larger ecosystem.
There’s this very delicate matter of how you slip in a governance structure where there’s a set of people who can make decisions without alienating all the other people who put in time, energy, and passion. It was fascinating to live through this and see the parallels between the governance of open-source communities and governance in real life.
You cofounded Heptio, which is now part of VMware. What did you want to do with Kubernetes at Heptio that you couldn’t achieve before?
I saw the power of Kubernetes to help create an open-ended ecosystem in the cloud world as a counterbalance to the concentration of power we see within certain cloud providers. To some degree, there’s a democratization function that Kubernetes plays in terms of making a lot of this technology widely accessible. We thought that Heptio could play a part in terms of bringing this to mainstream enterprises, whether they’re running on a cloud, across different clouds, on their own on-premises data centers, or what have you.
That also informed our decision to join VMware. VMware is really set up to be a partner with enterprises as they focus on how to deal with the “dancing with giants” that is working with cloud providers. How can we give them a tool set that isn’t tied to the underlying cloud provider?
How has Kubernetes evolved over time?
Kubernetes started as a way to manage containers and workloads running on a cluster. That’s still the core of what Kubernetes is as a system and what brings most people to it.
As we added extensibility mechanisms, we ended up creating something that, if you stepped back, you could call a universal control plane. That universal control plane aspect is incredibly powerful. And I think that’s giving Kubernetes a second life. It started as a container orchestration system, and it’s really becoming a common control plane for building cloud-like systems that aren’t tied to any cloud provider.
What’s something Kubernetes can’t do that you think it should? Where is the project headed?
Good infrastructure is boring. You turn on the light switch, the light goes on. Nobody gets excited about the next version of the Linux kernel.
Early on in the Kubernetes project, we were inundated with feature requests and people wanting to extend and add more and more to Kubernetes. Our approach was to lean back on the ecosystem and community side of things. We didn’t want to set the Kubernetes project up as a gatekeeper for all the extended features that people were going to be building. The project instead tilted toward creating extensible systems: People can build extensions to Kubernetes that can coexist and be coequal with the features built into Kubernetes.
Our goal now, moving forward, is we want Kubernetes to be boring. Good infrastructure is boring. You turn on the light switch, the light goes on. Nobody gets excited about the next version of the Linux kernel. But we also want to have a thriving ecosystem around it, with no gatekeepers, so that we can ignite an ecosystem of innovation where everybody is on equal footing in terms of being able to participate.