Programming as translation

Programming as translation

Converting the real world into digital abstractions requires distillation. And, like literary translators, developers must understand their biases.
Part of
Issue 8 February 2019

Internationalization

What does it mean ‘to translate’? A quick answer could be:
to say the same thing in a different language.
— Umberto Eco

Metaphor is a powerful tool for approaching new problems and finding creative solutions to them. Let’s use a framing metaphor: What could we learn from looking at programming as translation?

More specifically, programming translates domain problems—a hardware store inventory, a public library catalogue, a ticket reservation system—into computer programs.

Many factors come into play when we adapt a system from the real world into the digital world. Converting the analog into the digital requires discretization, leaving things out. What we filter out—or what we focus on—depends on our biases. How do conventional translators handle issues of bias? What can programmers learn from them?

Book, volume, manual, or gender?

Let’s explore this question through an example borrowed from William Kent’s Data and Reality: a library of books. First we determine what main concepts comprise our domain and translate those into natural language. Then, we’ll form a vocabulary for our domain. In this case, we will have books, authors, publishers, genres, and so on. With a vocabulary, we can translate concepts into a programming language, creating abstractions to represent them in a way that a computer can understand. But how do we know which concepts from reality should become abstractions in our program?

In building software for a library, we can easily assume that a book will be an abstraction in our program. But what constitutes a book? If an author—say, sociologist Judy Wajcman—has published two books, then a purely bibliographic database will have two representatives of a book in its contents—say, TechnoFeminism and Feminism Confronts Technology.

What about a database built to lend books? If a library buys five copies of two books, it now has records for 10 books. What “book” means changes according to context: in the bibliographic database, it may refer to discrete titles; in the lending database, it may refer to total copies owned.

If a library acquires Donald Knuth’s four-volume The Art of Computer Programming, how might it be described? Is it a single “book” or four? What about a Lord of the Rings omnibus edition that has bound the trilogy’s three books as one (as Tolkien originally wanted)? What initially appears to be an easy problem—a book is a book— shows itself to be more complex once we get to the nitty-gritty details.

Classification is not a matter to be taken lightly. In the case of books, an incorrect classification can make a title harder to find, limiting access to its information. Biased or inflexible classification systems can also lessen an author’s impact. Consider the case of women in countries where they adopt married names. Computer pioneer Kathleen Booth, née Britten, authored the 1947 paper in which she introduced the first assembly language as Britten, but went on to publish under her married name, Booth. If a database doesn’t link the works published under both her birth and married names, you could search for her papers and mistakenly underestimate her influence. A similar thing happens if we search papers from Barbara Liskov (née Huberman), and so on. And the principle of discovery extends beyond books.

We can get into even more dangerous terrain with abstractions that affect someone’s personhood. What if we bring something like gender into the discussion? As a recent paper about automatic gender recognition (AGR) shows, considering gender as binary can have disastrous consequences for nonbinary people—enabling gender violence, misgendering, or plain erasure of their identities. The author of the paper found that many AGR techniques reviewed in the paper don’t even discuss what they mean by “gender.” Does their abstraction of the concept of person consider gender to be immutable, something that won’t change throughout a person’s life? Is it physiological, i.e., based on physical traits as factors that distinguish gender? How can we interpret their results if they don’t communicate their understanding of gender as a category? Translating real-world concepts into software is not as easy as it seems.

What is “the thing”?

At its root, translation is an act of interpretation. When translators face a text in order to render it in a different language or medium, they know they will produce an interpretation—rather than a replica—of the original. It prompts the question: What actually constitutes the sense of the original text in the original text?

Umberto Eco calls this the problem of “saying the same thing.” It’s almost impossible to determine a text’s core, he argues, the elemental thing a text is about.

Eco’s famous 1980 novel, The Name of the Rose, opens with Adso of Melk, a medieval monk, quoting the Bible by heart and making some mistakes as he does so. An English translator considered that “the thing” in this passage was the Bible, rather than Adso’s mistakes. Instead of translating them as Adso had spoken them—as Eco had written them—the translator includes verbatim, the verses as they appear the Bible. Gone were Adso’s errors.

But the novel is playing a game with the verse—a 14th-century monk would have been able to quote the Bible largely by heart. The errors in the Bible passages tell us something, whether we believe it is about Eco or Adso, that was evidently lost in translation. By focusing on the Bible passage as the thing, the translator erased an element of the work’s spirit.

This is not an anomaly: Translators prioritize certain aspects of a text while considering others less important. In its oldest recorded form, Homer’s Odyssey was written in dactylic hexameters, the traditional poetic meter of ancient Greek epics. Today, most translators render the poem in free verse, without consistent meter. Classics scholar Emily Wilson, who published her own translation of The Odyssey in 2018, disagrees with this tactic. Instead, she argues that Homer’s epic, as a poem, “needs to have predictable and distinctive rhythm.” She chose to translate it into iambic pentameter, a form of meter common to English literature. This decision reflects Wilson’s priorities and beliefs. Some translators focus solely on content—capturing the exact meaning of individual words and phrases—rather than form, thinking that meaning can be distorted in an effort to fit lines into a particular poetic structure. For others, like Wilson, the form is just as important a component of the work. It’s the thing.

Which brings us back to our binary gender example, where we’ve translated all the richness and fluidness of gender, in all its expressions, into a digital zeroes-and-ones form. How many websites present web forms that abstract a person to what the form’s creators consider to be an individual’s most essential characteristics: name, date of birth, gender? How often do these same forms only offer binary gender options: male or female? Does this translation of personhood focus on the right things?

A translation not only alters and augments the language in which it arrives, writes Judith Butler in her introduction to Jacques Derrida’s Of Grammatology, it also affects the language in which the original was written. In his essay “Simulacra and Simulations,” Jean Baudrillard reminds us that “abstraction today is no longer that of the map, the double, the mirror, or the concept.” He expands: “The territory no longer precedes the map.” Instead, “it is the map that engenders the territory.” In the case of binary gender options, a portion of humanity is elided because a translator—a programmer—put their focus elsewhere.

That word doesnʼt mean what you think it means

The metaphor of programming as translation is further complicated by the multiplication of meanings that occurs when words appear both in English and in programming languages. The words we use in our programs, while they look and sound exactly the same as the ones we use for communicating in English, don’t have the same meaning.

In a typical English conversation, the word “user” will bring to mind the image of a person interacting with a program (or object or structure), as well as all the various actions a user might perform, their qualities, and so on. In a conversation with a customer, we may talk about the users of the application we’re building for them. In the app we’ll probably end up with a users class that represents the real-world person who uses the app.

As our conversation progresses, the word “user” may evolve: for instance, from describing every user of a website (i.e., customers and merchants) to only a specific subset (i.e., customers but not merchants). No matter what shared definition of “user” we negotiate, the definition reflected in the code will not match the new idea until the code is updated to reflect it. The word “user” employed in conversation represents the actual person, but the word “user” in the outdated code will not. In fact a term, in code, will represent only those qualities that a developer codes for it.

This has implications for when other developers read our code, or when our future selves try to understand code we wrote months ago, or where we might have confused memories about what a user in the real world does versus what “user” means in the program’s text. While reading programs, we are always trying to revert the translation to its original meaning, to imagine the possible worlds in which that meaning makes sense. Sometimes this is not possible.

Derrida’s metaphor for this translation issue is that of a ruin. Whenever we try to perform reversibility on a translation to approximate its original, we become archeologists trying to understand what a ruin looked like before its decay. From a programming perspective, this is particularly challenging when it comes to working with legacy code. As programmers we should ask ourselves: How well did we represent entities in our code? When it is read in the future, will it be possible to understand the problem the code was actually trying to solve? Which is the possible world that matches this code?

Saying the same thing, as Eco has it, is not so easy after all. What can we do about it?

Negotiate for flexibility

Eco, thankfully, offers help. Rather than trying to say the same thing, he encourages translators to instead aim for “almost.” Introducing a qualifier is essential. The Earth is almost like Mars, Eco explains in Dire Quasi la Stessa Cosa, since both go around the Sun and have a spherical shape—but since they are almost like spheres, they could also be compared to oranges or soccer balls. Defining the parameters of what “almost” means becomes an act of negotiation, of choosing trade-offs.

In programming, we need to introduce a similar element of flexibility. We are inherently flexible during conversations, adapting and negotiating with our interlocutors the meaning of the words that we use. Let’s do the same in our work.

Steven S. Skiena once had to design an algorithm to find the cheapest flight from City X to City Y. His first approach at translating this real-world problem into a program was to use Dijkstra’s shortest path algorithm. Here we see his biases: This is a graph problem, I’ll solve it by using the shortest path. His customers quickly made it clear that Skiena was missing crucial context—namely aviation industry rules. They continued negotiating until Skiena solved the problem with a priority queue. He had to discard his biases and learn a new focus. Eventually his solution “proved to be fast enough to provide interactive response to the user,” Skiena relates. He had learned what “almost the same thing” really meant for this problem.

The same idea could be applied to our gender concept from before, or to the idea of a user, and so on. We need to approach coding with an attitude of understanding that change and adaptation are inevitable.

Conclusion

So, what did we learn from looking at programming from the frame provided by translation?

We saw that from a business perspective, there are many ways to translate the real world into our programs. Asking what the thing is that our client wants us to solve will help us address their needs more quickly and more directly.

From a programming perspective, we saw that our translation of the world will project new versions, possible visions, of it onto our colleagues’ minds as they try to understand the problem our code attempts to solve. Did we choose the right abstractions? Did we focus on the right part of the problem?

Finally, from an ethical perspective, we saw that to translate the real world into code affects the real-world entities represented in our program as well as the users that engage with them. Even in the example of a library, the wrong classification of books or authors can render information inaccessible, contributions erased. When we mis-map the lives or traits of people, like in the example of gender, the stakes are even higher.

The best translations are critical contributions to the original work being translated. The best programs, too, can function as critical texts about reality. If that’s so, then programmers are responsible for helping to improve the reality we represent.

About the author

Alvaro Videla is a developer advocate at Microsoft, and he organizes DuraznoConf. He is the coauthor of RabbitMQ in Action and has written for the Association for Computing Machinery.

@old_sound

Artwork by

Charlotte Ager

charlotteager.co.uk

Buy the print edition

Visit the Increment Store to purchase print issues.

Store

Continue Reading

Explore Topics

All Issues