When developers are working on a technology project, we often wonder how it might shape the future. We might assume the most important decisions we’ll make will be about the product design, the technical architecture, or the stack we use. In my experience, these are the topics we most often get into heated arguments about in the early stages of a project.
But my perspective has broadened as I’ve grown more interested in the history of software engineering. This interest is born of necessity: When we’re trying to build the future, it’s extremely helpful to look to the past. At some point in my work on distributed social network software and protocols, I found it necessary to dig back to the very invention of the network, as well as its earliest protocols. In the course of this research, I’ve come to believe API design is the work most likely to have a lasting, wide-ranging influence on the future of technology.
The API, loosely defined, is the interface between one software component and another. It’s also an agreement: If you’re making software that talks to my software, we both agree that my server will receive certain requests and send back certain responses. While in one sense an API is a compact between two or more pieces of technology, it’s ultimately a compact between two or more people. This understanding is necessarily future-looking: We can’t change the past, but we can agree on how to behave in the future.
I’d like to illustrate this point with an example of an agreement made in 1971. It originated with a piece of ARPANET technology you’ve probably never heard of, yet it ended up influencing a bit of data web developers still deal with daily: the HTTP status code.
Meet the HTTP status code
If you’ve ever worked with a REST API, you’re probably familiar with HTTP status codes: 200 OK
, 301 Moved Permanently
, 401 Unauthorized
, and 500 Internal Server Error
. Even casual internet users know the dreaded 404 Not Found
.
There are over 60 official HTTP status codes assigned by the Internet Assigned Numbers Authority. If you’re particularly geeky about this sort of thing, you might be aware that there’s an overall logic to the way the codes are numbered: A code in the 100s is “informational,” a code in the 200s is “successful,” a code in the 300s means “redirection,” a code in the 400s means “client error,” and a code in the 500s means “server error.”
But why these numbers? The story of HTTP is long and complicated (and documented elsewhere), but the first HTTP 0.9 prototype in 1991 didn’t even specify status codes. They first showed up in a 1992 proposal to upgrade the original HTTP protocol, which, when it was later implemented, didn’t even have a version number assigned to it. As with many things related to technical specifications, the truth gets a little weird.
ARPANET and the conference deadline
The Advanced Research Projects Agency Network project, or ARPANET, is popularly understood to be a precursor to today’s internet that was funded by the U.S. military. But that’s not the whole story. Let’s rewind to the late 1960s and early 1970s: Dozens of networks were operating around the world, and all of them contributed to the development of the internet in their own way. Even the idea of long-distance computer connectivity goes back to at least 1965, when engineers connected a computer at System Development Corporation in California to an MIT Lincoln Lab computer 3,000 miles away in Massachusetts. ARPANET holds a special place in this history because it was the first operational long-distance packet-switched computer network that allowed computers with different hardware and operating systems to communicate with one another. It resembled today’s internet in many ways, and it would eventually host all kinds of services that are functionally similar to web APIs we use today.
In the spring of 1969, computer programmers at Stanford, UCLA, the University of California, Santa Barbara, and the University of Utah worked with BBN, a Boston-based research and development company, to form the ad-hoc Network Working Group (NWG) and create the basic infrastructure of the ARPANET. The group was created to facilitate informal information sharing between universities, companies, and government agencies working on the ARPANET project. Coordinating an open-ended, asynchronous conversation between institutions distributed around the United States—pre-email—was a feat. Over the course of several months, group members developed something called the Host-Host Protocol, which allowed two computers to open a link from one to the other that would be routed over BBN’s network of interface message processors, and which connected everything to the national telephone system. (Think of it as a precursor to TCP/IP, which would be invented a decade later by Vint Cerf and Robert Kahn at the Defense Advanced Research Projects Agency.) In September of that same year, NWG member Steve Carr also proposed the first version of Telnet—yes, the very same one you occasionally have to dust off and use today.
In 1970, while the members of the NWG were working hard to lay the groundwork for ARPANET, they weren’t releasing many applications for the network. But in early 1971, the number of new applications on the network exploded. That’s because ARPANET participants were encouraged by Steve Crocker, the semiformal head of the NWG, to demo what they were working on at the upcoming Spring Joint Computer Conference, then the world’s biggest gathering in computing. Crocker told everyone they needed to submit their papers by May 1 in order to be added to the agenda.
The May 1, 1971 deadline resulted in about a dozen software and protocol releases between January and April of that year, many of which would significantly influence the future of networked computing. In the first few months of 1971, for instance, two “Remote Job Entry” (RJE) systems went online at UCLA and UC Santa Barbara, which allowed users to connect to a remote computer and submit their FORTRAN or other code. The computer would run this code, then post the results for the user to download at their convenience.
Three major releases were announced in the third week of April 1971 alone, right before the submission deadline. One was the initial handshake required for one computer to log in to another, formalized by the NWG as the “Initial Connection Protocol” (ICP) after 18 months of informal implementation. Another was the first-ever specification for “A File Transfer Protocol,” which was published by Abhay Bhushan of MIT and sent to the NWG for comment. (And yes—as with Telnet, this is essentially the same FTP you may occasionally use to transfer files in 2020.) The third was a document, produced by a subcommittee of the NWG, outlining a “Data Reconfiguration Service.” This service, accessible on the network, could be hosted by any number of servers as long as they followed the spec and allowed the user to give it instructions in a domain-specific language, telling it how to transform a stream of data from one format to another. The user would then give it the data stream, and it would return the results after some time.
Small decision, big consequences
This sudden proliferation of services, inspired by a single conference deadline, meant there was a new need for interoperability between ARPANET programs. By 1972, the task of developing standards for ARPANET had become complicated enough to require formal working groups for different sub-areas of network functionality. In March of that year, one of these working groups announced a “Data and File Transfer Workshop” to be held at MIT the following month with the aim of arriving “at a unified convention for ARPANET data and file transfer.” Official notes from the workshop mention the creation of “standard error codes and responses.”
FTP and RJE were the most widely used data transfer protocols in early 1972, and the main outcome of this workshop was the publication of updated specifications for both. The first was for RJE, published on June 24, 1972. This new RJE specification defined what replies from the RJE service should look like:
A leading 3-digit numeric code followed by a blank followed by a text explanation of the message. The numeric codes are assigned by groups for future expansion to hopefully cover other protocols besides RJE (like FTP). The numeric code is designed for ease of interpretation by processes.
The specification went on to state that the first of the three digits could be set to values 0 to 5, where 0 means an informational push message, 1 is an informational reply, 2 is a positive acknowledgement (with 200 signaling a general “okay”), 3 means incomplete information (for example, the user stopped sending a message partway through and the server needs the rest of it), 4 means the request was correct but could not be completed, and 5 means the request was incorrectly formatted.
While RJE died out well before HTTP ever saw the light of day, this status code schema was incorporated into the newly revised FTP specification two weeks later, and has been a part of FTP ever since. The people building out HTTP status codes in the early 1990s were very familiar with FTP, and based their HTTP status codes on the earlier FTP codes.
So: 20 people got together at MIT in 1972 for a weekend workshop. On the second day, a handful of people in a breakout session decided it would be a good idea to standardize error messages between two services for transferring data, even though those two services had not necessarily planned to speak to one another. One thing led to another, and now 404
is synonymous with “I can’t find the thing.”
RJE was impressive for its time, kind of like an early cloud computing system that let you pay to rent computing time on a (potentially very distant) machine your organization didn’t even own. Today it’s largely forgotten, but its legacy is the API design for its status codes. After all, agreements that work tend to propagate and even outlast the tenures of the individuals originally involved.
Historical contingency and everyday API design
HTTP status codes are largely an accident of history. The people who came up with them didn’t plan on defining a numerical namespace that would last half a century or work its way into popular culture. You see this pattern over and over in the history of technology. A simple example is line breaks, which are encoded as <CR><LF>
rather than <LF><CR>
because of the mechanical constraints of typewriters: The former is slightly faster to execute on a teletype machine. A minor tweak to squeeze in an extra couple of characters per second of printout on a popular model of teletype dictated a mnemonic that many programmers unfortunately have to remember (or at least look up!) to do their jobs in 2020.
Because technology isn’t immune to historical contingency, it’s important for us as engineers to remember that long-lasting technical inflection points can occur at any time. Sometimes we know these decisions are important when we’re making them. Other times, they seem perfectly trivial. I’m not trying to suggest that you need to treat every tiny technical decision as if it might change the world. You’d never get anything done, and you shouldn’t over-engineer your software in service of a perfection you’ll never attain. Rather, if you find yourself working on API design when you’d rather be making product decisions, remember the humble RJE status code. The work of building the future belongs as much to these small decisions about interoperability as it does to the biggest, flashiest software releases.