Committing to collaboration

In 1993, Jim Kingdon started working as a programmer for Cygnus Solutions, a Bay Area company that specialized in building GNU free software operating system components.

But Kingdon didn’t want to move to Silicon Valley—he was living on a commune in Virginia and wanted to stay there. It’s hard to imagine in 2020, when most of those in tech can work from anywhere, but in 1993, Kingdon’s cross-country ideal posed a problem. At that time, CVS, the software program Cygnus used for coordinating the work of multiple programmers into a single source code tree, was set up so that only people with access to a local computer at HQ could work together. There was literally no way for Kingdon to remotely log on to CVS from Virginia and do his work from home.

So Kingdon did what any self-respecting hacker would do: He cobbled together a client-server mode for CVS that made it possible to use the software over the internet. He could now check out code from the CVS repository in the Bay Area onto his own computer in Virginia, work on it to his heart’s content, and commit it back in when he was finished.

Cygnus was a company rooted in hacker culture and had no problem with Kingdon’s self-serving modifications. As cofounder Michael Tiemann told Increment, “it was a totally normal thing for somebody to say, ‘I’m going to get the life I want by changing the code I need.’”

According to Jim Blandy, Kingdon’s former college housemate and a fellow free software programmer who has contributed to several version control software projects, Kingdon didn’t make a big deal of his changes—he just used them to do his work. But Blandy had been looking for a way to collaborate long distance on a software project with a colleague of his own and was blown away when he learned about Kingdon’s hack. “It was this amazing thing,” he said on Software That Doesn’t Suck, a podcast he recorded for CoRecursive in July 2020. “By sheer force of will he ripped [CVS] into a client and server.”

After Blandy got Kingdon’s modifications to work on his own machine, he eventually convinced the official maintainers of CVS to add it to their main distribution. The impact of the changes, he believes, was transformative. Kingdon’s changes to CVS had ushered in a new era of collaboratively made software.

“You work locally and nobody gets in the way, and then you commit, and poof, it’s there for anyone to see,” Blandy said on the podcast. “It took the open-source world by storm. Within a few years, CVS was the standard.”

The term “open source” was officially coined in 1998 by the executive director of the Foresight Institute, Christine Peterson. But the concept—volunteer programmers collaborating on software intended to be freely shared and modified—has deep roots in the earliest programming cultures of the emerging internet. Before Kingdon’s self-motivated upgrade, however, coordinating the work of programmers physically separated from each other by long distances was clunky, usually involving programmers emailing each other huge tarballs of compressed code along with associated patches for adding new features or fixing bugs. Kingdon removed a major obstacle to online collaboration, and in so doing demonstrated a provocative link between the rise of open-source software and the phenomenon of remote work.

From their earliest days, open-source software projects have relied on the contributions of global communities of remote workers isolated from each other by vast distances and time zones. As a consequence, the evolution of open-source software development tools has been relentlessly driven by a need to make remote work easier. The history of open-source software tells a clear story: As programmers obtained a better understanding of the fundamental constraints imposed by remote work, they improved their tools to overcome them.

Brian Behlendorf, a cofounder of the Apache web server project, believes that, generally speaking, successful software engineering tools “anticipated enabling remote work from day one.” Even programmers working for the same company in a discrete physical location weren’t necessarily on the same floor or in the same building. “[But] open source was an even bigger forcing function for that, because [to be successful], you had to presume that people not only were not in the same room, they weren’t even on the same continent.”

Allison Randal, former president of the Open Source Initiative and current Perl Foundation board member, says that “as a general pattern it all progressed in tandem: Open-source developers collaborated remotely from the very beginning, so they felt the pain of the tools they had and created or adopted new ones. And as tools like email, chat, interactive websites, source control, and sharing and downloads got better, the tools made the collaborative culture of open source more feasible, which in turn helped it to spread more widely.”

Examples of such tools include straightforward necessities such as mailing lists and IRC (Internet Relay Chat) for dealing with asynchronous and synchronous communication among physically and temporally separated programmers, the SSH protocol for security and authentication, and various bug and issue trackers. But no single innovation was more critical to the facilitation of successful remote collaborative software development than the all-important job of version control. Making it possible for multiple people to efficiently work on the same codebase is literally job number one for open-source software development.

A close look at three version control systems that debuted successively in the ’80s, ’90s, and 2000s—CVS, Subversion, and Git—provides ample support for the argument that the evolution of software development tools embraced by the open-source world was shaped by the needs of remote workers. Each new iteration of version control software made it easier for remote workers to write great software by overcoming obstacles that impeded collaboration. Broadly speaking, success was achieved by balancing two seemingly opposed design imperatives: ensuring equitable access to precise information about the state of a project and decentralizing control over the process of creation as much as possible.

CVS was originally created in 1986 by Dutch programmer and university lecturer Dick Grune. Grune was working with two of his students on a C compiler, but their schedules rarely aligned for face-to-face work. According to Grune’s account on his own web page, CVS’s entire purpose was “that it allowed us to commit versions independently.”

I emailed Grune to ask if his creation demonstrated that one of the most important attributes of a version control system in a distributed environment is that it must be able to negotiate the widely varying circumstances—location, time zone, etc.—of programmers in open-source settings.

“Very much so,” wrote Grune, adding that his solution to the collaboration conundrum also applies to commonplace circumstances like when volunteer programmers find themselves forced to focus on their day jobs rather than their open-source contributions.

“Real life may interfere,” he writes. “Suddenly you’re out of the loop for two months. Then, when you’re back, you do ‘CVS update’ and you’re (almost, CVS is not a miracle worker) in again.”

To the battle-scarred veterans who remember working with it, the juxtaposition of the words “miracle worker” and “CVS” is likely to elicit rueful groans, as, for all it offered, it was also notorious for its bugs and idiosyncrasies. In the late 1990s, when Behlendorf’s startup CollabNet funded the creation of the version control system Subversion, the primary goal was hardly revolutionary. Open-source software programmers were simply desperate for a hassle-free working environment.

As Karl Fogel, one of Subversion’s lead developers and the author of Producing Open Source Software, joked when I spoke with him, “It wasn’t this grand vision of how open-source collaboration should work; it was more like, ‘the following things annoy me every day in CVS and I would be less annoyed if we had a system that didn’t have those problems.’”

First released in 2000, Subversion subsequently enjoyed wide popularity in the open-source world, precisely because it did avoid many of the headaches associated with CVS. And Fogel notes that some of its improvements turned out to be of great relevance to remote workers.

Fogel shares that one of CVS’s great drawbacks was that there was often a lack of clarity on what the codebase’s precise state was at any given moment. For example, if a developer lost their internet connection while only halfway through uploading a sequence of commits, CVS might get confused as to whether the commit had actually been completed. Similar imprecision could arise if two developers simultaneously uploaded commits to the same part of the code tree. To solve the problem, Fogel and his collaborator Blandy (still committed, half a decade after first tinkering with CVS, to improving version control software for open-source collaboration) made sure that Subversion incorporated the principle of “atomic commits,” a computer science term Fogel defines as “either the entire thing happened or nothing happened.”

“Being able to have a way to talk about the specific state of the code,” says Blandy, “is certainly important to collaboration, and that is something that Subversion introduced on top of CVS.”

The underlying principle emphasized something that, in retrospect, was obviously critical to building software used by an assemblage of remote workers. If your team members are distributed between Finland, Taiwan, and, say, a commune in Virginia, separated by time zones and vast distances, everybody needs to know exactly what’s going on with the codebase in order to contribute productively.

“All these tools produce what I call permanent sub-referenceable written trails,” says Fogel. “‘Sub-referenceable’ means if there is an exchange of information going on among multiple parties, any message in that exchange or close to it is referenceable by a unique link.”

He also notes that “the thing that people distributed across different time zones need is a way to establish exactly what is the entity we are talking about at any given moment. [All] the different open-source tools are really fantastic at automatically [creating] these kinds of written, sub-referenceable trails, and that enables remote collaboration.”

Behlendorf notes that nine-to-five, in-office software engineering teams “were used to physical whiteboards, lots of printouts, and communications processes that were very much about body language. Those teams now are using issue tracking and project planning tools that presume most people are at home, [which means] that everybody is equally [enfranchised] to be part of that process.”

After its initial release in 2000 and ensuing adoption by the Apache web server project, Subversion’s popularity in the open-source world grew steadily for the following decade. But in 2005, a dispute between the Linux kernel development community and Larry McEvoy, the creator of Bitkeeper, a proprietary version control system that McEvoy allowed Linux hackers to use free of charge, impelled Linus Torvalds to take a break from maintaining the kernel and start work on a new version control system called Git.

By his account, Torvalds started coding Git on April 3, 2005, and released a self-hosting version on April 7. Other Linux developers swiftly joined the effort, and by July Torvalds had passed the responsibility of overseeing Git to one of the most active contributors, Junio Hamano.

Today, by many measures—number of repositories, Google search trends, developer surveys—Git is the most popular version control software in the world. (In 2018, a survey of developers by Stack Overflow found that almost 88 percent of 74,000 respondents used Git.) As even Blandy acknowledged to Increment, “Git is the future.”

A presentation given by Torvalds at Google in 2007—two years before Google itself chose Git as the version control system for Android software development—makes a clear argument that a key reason for Git’s success was its profound commitment to decentralization. And as recently as this April, Hamano continued to emphasize the importance of Git’s distributed architecture. In an interview for GitHub’s blog, Hamano said, “Another beauty of a ‘distributed’ development style is that it allows us to completely separate the act of committing and making the result public, and I think that aspect of ‘distributed’-ness had the biggest impact.”

Prior to Git, most version control systems were built around a central server which held the main code repository. But in the Git universe, a centralized server that controls who has access is considered a design flaw. Git encourages users to make their own copies of code repositories and work independently.

“It is important to be able to do anything you want to do from any location without having to be able to access a server,” said Torvalds in his Google presentation. “Centralized systems inevitably have problems when you have groups in different locations.”

Incidentally, Torvalds’ Google presentation also suggested that while, technically speaking, open-source software tools gradually changed over time to meet the logistical needs of remote workers, culturally speaking, at least insofar as Linux and Git are concerned, the rhetoric was less than welcoming to a broader community of developers. Torvalds’ speech is laced with deprecating comments about others’ intelligence, a precursor to the public reckoning that finally greeted his harsh management approach over a decade later.

The above notwithstanding, Fogel applauds Git’s design philosophy of “getting rid of as many commit barriers as possible” and its commitment to the idea that anyone “can effectively make and publish a branch of anyone’s codebase without getting permission from them, assuming that the license is okay.”

And Git’s emphasis on decentralization, says Behlendorf, introduced new flexibility to the collaborative programming process.

“Git made it easier for [people] to peel away, do a bunch of work, and then push their changes as a group back into the upstream core,” explains Behlendorf.

A universe in which everyone is encouraged to infinitely fork off their own code trees might seem to be at cross-purposes with a collaborative environment that requires everyone to be able to access precise representations of a codebase at any given time. Eric Allman, the creator of Sendmail, a core part of the internet’s early email transport infrastructure, and a longtime contributor to the FreeBSD open-source operating system project, says he initially assumed widespread use of Git would result in a mess. “I thought this would create chaos,” he wrote in an email, “but that’s much more rare than I expected.”

The trajectory of open-source software evolution suggests that this seeming paradox actually illuminates what’s truly necessary for success in remote collaboration: A thriving open-source project must balance freedom with precision. Everyone needs access to the code and to accurate information about the state of the project, while at the same time having the freedom to do what they want and experiment.

Building and nurturing systems that encourage greater communication while giving up or reducing top-down control isn’t always easy. But in a world where remote work seems likely to be a major part of many developers’ lives going forward, meeting that challenge is of paramount importance. The lesson from the open-source world is that this delicate dance is not only feasible, but is in fact the foundation for collective productivity. If we give people tools that help them communicate, while at the same time surrender our illusions of total control, who knows what wonders we could build?

Andrew Leonard

Remote

About the author

Artwork by

Buy the print edition

Continue Reading

Containers

Andrew Leonard

The container incubator

Mobile

Andrew Leonard

How to make games and influence the industry

Explore Topics

All Issues

Planning

Mobile

Containers

Reliability

Remote

APIs

Frontend

Software Architecture

Teams

Testing

Open Source

Internationalization

Security

Documentation

Programming Languages

Energy & Environment

Development

Cloud

On-Call