The process: Rearchitecting after acquisition – Increment: Software Architecture

When one software company is acquired by another, it’s a safe (if not guaranteed) bet that the acquired business will be incorporated into its new parent company in some way. But what many don’t consider is that the acquired company’s source code may also have to be rebuilt to some degree. When GitLab acquired my former company, Gemnasium, in January 2018, we faced precisely this task.

We designed Gemnasium to monitor software dependencies, both on-premises and as a SaaS product. It gave customers a quick overview of the health of their software projects, notified them when security vulnerabilities were detected in open-source libraries, and helped them resolve those issues. However, as larger software development platforms began offering their own security monitoring and alerting systems in late 2017, it became clear that we wouldn’t be able to compete in this space alone: their free (albeit less robust) services would eventually put us out of business. Fortunately, GitLab was also looking to build out and strengthen its own security features, so we came to a mutual agreement that GitLab would take in Gemnasium and incorporate Gemnaisum’s source code into its own.

Before the acquisition was finalized, we worked with GitLab to evaluate how our codebase would be restructured without eliminating any of Gemnasium’s core characteristics or workflows—such as security alerts on vulnerable dependencies—which had made it work so well in the past. Gemnasium and GitLab also shared much of the same organizational DNA: Both companies were 100 percent remote, and both codebases were built on similar technology stacks—Ruby on Rails and Go. These elements, along with a shared focus on iteration and collaboration, were key to our speedy transition because they gave us a clear mutual understanding of what Gemnasium could bring to GitLab—and how.

After the acquisition was finalized in early 2018, it was important for the Gemnasium team to show results quickly. I’ve been through acquisitions in the past, and I know that if engineers can’t see the immediate results of their work, they can lose motivation. Because the Gemnasium team had the opportunity to stay together (which isn’t always the case in transitions like this one), we wanted everyone at GitLab to see that we were more than the sum of our individual engineers: We were a highly effective unit.

Our first mission was to migrate Gemnasium’s infrastructure into a new GitLab cluster. Fortunately, the entire migration took less than a week because Gemnasium’s infrastructure was managed with infrastructure as code. After that, we had some major work to do.

Though GitLab and Gemnasium’s stacks were very similar, there were also key areas of difference. In 2015, Gemnasium had initiated three large-scale changes, all of which were nearing completion by early 2018: moving from Rails to Go, from a monolithic app to microservices, and from a manual deployment process to fully automated continuous deployment. These earlier migrations taught us lessons we could then apply in our move to GitLab.

When we first started the transition to Go, for instance, we chose the smallest extractable part from our then-gigantic Rails app that would also deliver a performance boost when switched. After investigating our logs and our New Relic data—which we used to identify where we were spending the most time responding to HTTP requests—we decided to target project badges, small dynamic pictures that developers could add to their README pages reflecting the status of the project (“up to date,” “outdated,” and “security issue,” with green, yellow, and red backgrounds, respectively). While badge images were small, they were called a lot compared to the other controllers because they were present on multiple popular project READMEs, such as Ruby on Rails. (For every browser request on github.com/rails/rails, we were getting a hit on gemnasium.com.) The cumulative time spent responding to these requests was considerable, making project badges a prime candidate for a first microservice: small perimeter (easy to extract) and big impact (immediately visible results).

Rewriting this service in Go helped us get up to speed with the language and the new architecture we wanted to put in place. From there, it was easy to extract one feature at a time, taking baby steps. This paid off during the GitLab transition because we had kept only the services necessary to synchronize with registries and to keep our data up to date—which also made it easier to eventually relocate our infrastructure. Whatever we didn’t need was disabled by default when we didn’t migrate these microservices to the new GitLab cluster.

Having migrated our infrastructure, the obvious path forward would have been to bundle Gemnasium completely into GitLab, shipping it as an internal service that functioned as a part of the whole GitLab package: an omnibus. Gemnasium’s source code would be incorporated into GitLab’s, making it possible to directly call Gemnasium’s functions and methods as well as to create a similar user experience inside GitLab’s app. But this path had significant obstacles.

To begin with, GitLab runs wholly on Rails. While Gemnasium still had a few services on Rails, they were legacies that we had planned to phase out. To go this route, we’d have to rewrite portions of Gemnasium’s source code from Go back to Ruby—and quickly. Merging codebases would also have required adding several new services to GitLab—our NSQ, our microservices—in the face of a lot of unknowns. (For example, any components used by Gemnasium but not on GitLab’s list of acceptable licenses would be a legal liability.) And while GitLab and Gemnasium were using the same database engine, PostgreSQL, that didn’t mean we could merge the two schemas just like that. Our database was composed of many different schemas, each with specific user access and permissions. For example, our project badge service was only allowed to access project names and statuses. Nothing else. In an ideal world, we’d create not one but two databases in the PostgreSQL cluster for both GitLab and Gemnasium, then merge the two public schemas later on, so GitLab would be able to leverage Gemnasium data without going through an API and vice versa. (Though it’d be a lot of work to ensure nothing was in conflict.)

In this scenario, the more services the hypothetical Gemnasium-GitLab omnibus ran, the more resources it’d consume, the more logs it’d generate, and the greater the security exposure. And if anything didn’t run as expected, both GitLab and Gemnasium teams would need to investigate. Gemnasium engineers would also have to thoroughly grasp GitLab’s entire codebase to be able to incorporate Gemnasium into it. Faced with a rapidly changing piece of monolithic software like GitLab, this task would have been impractical. As soon as Gemnasium engineers would have gotten up to speed, their understanding would likely already be obsolete.

Last but not least, if we had to replicate Gemnasium in GitLab, we would have had to store the project dependencies in a relational database and process their statuses—work that would have taken months. We wanted to maintain momentum. So, rather than seek a seamless, complete, and impossibly perfect integration into GitLab, we opted for an intermediate workaround.

You’ve probably heard the term “orthogonality” to describe when one thing can be changed without impacting anything else. I’d compare it to building a rocket: No single person can work on all parts, and one team’s heat shields shouldn’t affect another team’s boosters. The same goes for software. It’s why we often define APIs first, to use them as contracts between teams. Orthogonality, in other words, describes a state of minimal friction. For instance, if you need to synchronize with a repository, or you want to fetch code from GitLab, GitHub, or Atlassian, an orthogonal solution could be a single service. Even if you change the way that service fetches the source code, it’s not going to change the status of the project or any of the other functionalities.

The Gemnasium team looked for an orthogonal solution. We decided to expose an external API for all GitLab instances, meaning that Gemnasium’s codebase could still communicate and work with GitLab, and both would remain independently deployed. Every job requiring dependency_scanning was actually sending the name of packages used in the current project to Gemnasium servers, then Gemnasium would reply with the same list but decorated with security alerts. This workflow freed us from having to understand all of GitLab’s core concepts and data schemas before we got started, and allowed us to produce data almost instantly. We created a small CLI in Go to send the data, and a new API endpoint to respond with corresponding advisories. We think simplicity is a good sign that we made the right decision, and we knew this solution was a good fit because it was extremely simple. The first results were available a few days after we started at GitLab.

When Gemnasium first joined GitLab, GitLab understandably offered a lot of guidance and direction. Often, when small startups like ours get acquired by larger companies, they lose control over their products and the principles that informed their development. This can lead to integral players walking away shortly afterward. This kind of outcome was the last thing Gemnasium wanted. How we integrated our codebase gave us latitude to assimilate at our own pace—without changing our source code or creating dependencies on other teams. This in turn led to faster review cycles, which GitLab executives noticed. When Sid Sijbrandij, GitLab’s CEO, saw how quickly we were producing results, he knew he could trust us to deliver without being micromanaged. It’s now been two years since I joined GitLab, and I still feel as empowered as when I was leading Gemnasium on my own.

About the author

Artwork by

Topics

Buy the print edition

Continue Reading

Energy & Environment

Ella Grimshaw

Stripe’s carbon-neutral journey

Documentation

Ines Montani

The process: Transforming spaCy’s docs

Testing

Myra Awodey and Karin Tsai

The process: Launching Duolingo’s Arabic language course

Software Architecture

Scott Wlaschin

A primer on functional architecture

Software Architecture

John Millikin

Exit the haunted forest

Software Architecture

Andrew Howden

Systems analysis through postmortems

Software Architecture

Tomasz Kania-Orzeł

Ask an expert: What’s the best way to upgrade an outdated codebase?

Software Architecture

Lisa Phillips

Architecting privacy

Frontend

Ian Feather

How to reevaluate your frontend architecture

Explore Topics

All Issues

Planning

Mobile

Containers

Reliability

Remote

APIs

Frontend

Software Architecture

Teams

Testing

Open Source

Internationalization

Security

Documentation

Programming Languages

Energy & Environment

Development

Cloud

On-Call