Software architecture at scale

Leaders at Foursquare, Hulu, and Twitter discuss early architecture decisions, downstream effects, and architectural philosophies.
Part of
Issue 12 February 2020

Software Architecture

Jeremy Tryba

Senior VP of engineering
Foursquare

350 employees

Gert Drapers

VP of architecture
Hulu

2,400 employees

Smith Wadhwa

Senior software engineer
Twitter

4,300 employees


What are the most significant downstream effects of the early architecture decisions your organization made?

Being a Scala shop has had a major impact on the evolution of our architecture. That choice has led to scenarios, like building web frameworks, where we’ve had to build solutions in-house.

— Jeremy Tryba, Foursquare

We started by building our website first, which was a straightforward and simple architecture. But as Hulu began to grow, we added complexity to our architecture—living-room devices, mobile—thus creating a variety of clients that we had to manage on the backend. This began driving a lot of implicit architectural decisions solely in support of clients’ needs. As the organization grew, developers were solving the respective problems impacting their clients. At the scale we’re at now, we’re trying to make the overall architecture more connected and efficient. We’re in the midst of what we’re calling the platformization initiative, where we’re taking a domain (e.g., user profile, sign-ups) and making it available through a formal set of APIs. Now, if another team needs user profile info, they can communicate through that API. This initiative helps evolve implementation more autonomously.

— Gert Drapers, Hulu

The decision about monolithic architecture was made long ago. Since then, we’ve had challenges in scaling up systems with iteratively growing ads-business asks. We got stuck in organizational interdependencies, like code reviews. Given the size of our current org—and the evolving technology—we came up with a plan to move to microservices at the end of 2018. We were also early adopters of Scala for a few of Twitter’s ad-serving components, which were good from a technical perspective. But hiring became tough and slow since there were few people with the skills in such a nascent language. Lastly, we initiated the in-house production of solutions around caching, streaming, storage, and more. Given the resource constraints to support these in the long term, as well as the availability of market solutions that better fit our needs, we ended up adopting market solutions for few of them.

— Smita Wadhwa, Twitter

What kind of architecture (or architectures) does your company use?

We do a lot of big data processing through EMR using Hadoop, MapReduce, and Spark. We’re a Scala shop, so we also use a lot of Scalding when building MapReduce processing graphs. On the online side, we use Fargate for container orchestration where we host Scala microservices. We’re big MongoDB users for data storage for online services.

— Jeremy Tryba, Foursquare

Since Hulu’s founding in 2007, the system grew from a single private data-center solution to multiple private data centers, now combined with public cloud resources into a hybrid hosting platform to power our services. Those services went through a similar evolution from monoliths to coarse-grain services to a set of well-defined API-based platform services.

— Gert Drapers, Hulu

Different products and services use different kinds of architectures, including client-server, component-based, data-centric, event-driven, microservices, pipes and filters, representational state transfer (REST), service-oriented, shared-nothing architecture, and more.

— Smita Wadhwa, Twitter

What is your organization’s overall philosophy around software architecture?

We prefer to rely on established open-source solutions wherever we can. When we decide to build it ourselves or adopt something at the bleeding edge, we try to do so only when there’s no viable incumbent or where our use cases are legitimately unique.

— Jeremy Tryba, Foursquare

Our technology needs to coexist with our key business needs. In that vein, architecture is the enabling function, helping deliver the engineering platform and solutions needed by the business.

— Gert Drapers, Hulu

Our organization’s overall philosophy encompasses availability, extensibility, performance, scalability, portability, compliance, and agility. These philosophies might vary within some limits for different projects, such as the choices presented by CAP and PACELC theorems, like having to choose between latency and consistency.

— Smita Wadhwa, Twitter

What about your organization’s business has an unlikely or unusual bearing on your architecture?

We started building a big data-analytics processing engine eight years ago, before there was a thriving ecosystem of open-source and managed solutions to this problem. Our version is well tuned to aspects of our problem space that are unique, but we’re constantly prototyping alternatives to gut check whether it’s still the best tool for the job.

— Jeremy Tryba, Foursquare

Hulu’s architecture is different because video in itself is a unique application. For on-demand video to be successful, you need a massive amount of users, and therefore scale. Our service now offers both on-demand and live content. For the former, the scale is absorbed by content data networks (CDNs). We’re good at taking content, processing it, and pushing it out. Live TV is where this all turns upside down: We are the server and the origin of the signal. With live TV there are also spikes caused by major unplanned events, such as breaking news, and we’ve done a ton of work to ensure we’re delivering a consistent and stable experience when we see a high number of concurrent viewers.

— Gert Drapers, Hulu

To operate, scale, and optimize for real-time scenarios, we moved from offline to real-time training for user-ad engagements (a.k.a. online training for predicting user-ad engagements), where the model is continuously trained in real time. With these models, we experience minimal to no delays. This has proved useful because just as people’s interests and intents change in real time, as context changes in real time, so do new advertisers and new campaigns need to respond in real time. Our ML model dynamically adapted to changes quickly, from a few hours to even a few minutes. And the new models were substantially better than offline-trained models. Twitter’s internal metrics had a significant acceleration after the launch.

— Smita Wadhwa, Twitter

How do you know that architecture is—or isn’t—working?

Operational pain, developer aversion to making changes, and costs that don’t scale with the way the company makes money are all signals we look for to decide whether architecture needs to change.

— Jeremy Tryba, Foursquare

The two questions I ask myself when evaluating architecture are, “Is it adoptable?” and, “Does it support business needs?” Architecture is an enabler, so its adoption needs to support a business’s needs and vision.

— Gert Drapers, Hulu

In post-production, we look to see if the architecture is not meeting the desired goals, if there’s no scope for scaling, if adding a feature request is neither simple nor possible to accomplish in the required time, and—if it’s a web service—if turnaround time is more than expected. In preproduction, prototyping can help us make a judgement call. This means defining both success and proxy metrics, as well as experimenting with a certain percentage of traffic and analyzing the results.

— Smita Wadhwa, Twitter

What key elements go into your architectural decision-making?

Don’t reinvent the wheel. Make it easy to make changes. Don’t engineer to problems you don’t have yet. Make sure your choices are intentional instead of accidental.

— Jeremy Tryba, Foursquare

Architectural decisions rely on the constant alignment of requirements, feasibility, and implementation. In other words, they are product management, engineering, and architecture. As an architect, you are the glue between these three things.

— Gert Drapers, Hulu

The metrics evolve based on the ask or state of the business. In addition to Twitter’s first principles, mentioned earlier, I would prefer to have minimal to no cross-organization interdependency. I would also evaluate the infrastructure cost involved in the multiple design choices, such as not storing the data with shorter retention on the Hadoop Distributed File System, better partitioning of the data to be stored in expensive caches or database, alerting of under-provisioned systems for dynamic reallocation, and more.

— Smita Wadhwa, Twitter

Do you have formal architect roles within your organization?

We have staff engineers who serve as architects for their teams. Our staff engineers also spend significant time implementing features. We value having experienced engineers who can shape the architecture, but we’re still small enough that everyone can dogfood what they build.

— Jeremy Tryba, Foursquare

Hulu has two types of architect roles: domain-level software architects, who are embedded within teams responsible for a specific end-to-end domain (and its cross-domain interactions), and organization-level architects, who are focused on big-picture concerns as well as long-range technical strategy. These two groups form a virtual architecture community whose collective wisdom and expertise we tap into as an organization, like when solving integration challenges such as live linear video processing.

— Gert Drapers, Hulu

No. Senior ICs largely fill the roles of architects. What matters most to Twitter is that those ICs are in charge of (and doing a good job at) long-term strategic technical thinking.

— Smita Wadhwa, Twitter

What is the most significant architectural change your organization has undertaken?

We’re completing our migration from a data center to AWS this year. For offline processing, this has meant shifting from a few long-lived Hadoop clusters to transient EMR clusters. This has required us to take a new look at how we think about the efficiency of our YARN applications.

— Jeremy Tryba, Foursquare

The most recent was the shift toward a platform architecture in which autonomous and isolated services are abstracted through API interfaces—a refinement and formalization of an earlier containerization effort. The goal of this platform initiative is to decouple services, enable independent evolution, and allow better functional composition.

— Gert Drapers, Hulu

An ads project prompted the decision to split Twitter’s ad-serving monolith into horizontal and vertical microservices. We planned horizontal splits for selecting and ranking ads, and vertical splits to support multiple demand types, like bidders. We hoped to make systems more scalable, add new features with minimal effort, reduce code dependency between organizations, and make faster releases possible. It’s still a work in progress, but we’ve made good headway.

— Smita Wadhwa, Twitter

Buy the print edition

Visit the Increment Store to purchase print issues.

Store

Continue Reading

Explore Topics

All Issues