Architecting privacy

How to protect user data at every layer of a web application.
Part of
Issue 12 February 2020

Software Architecture

It’s not PII
There’s no way it’s PII
It was PII
— a GDPR data privacy haiku

With sweeping confidence, we book doctor’s appointments, make travel arrangements, undertake financial planning, and consume entertainment online. We do so knowing that unauthorized access to personal data has the potential to negatively impact our lives, whether it’s our credit card data targeted for financial fraud or Social Security numbers used for identity theft. We do so knowing that our address and location information, in the wrong hands, can be used for physical or psychological harm.

So while we’re quick to give personal data to online applications, we expect privacy in return. This puts the responsibility on software designers, developers, and operators to treat personally identifiable data with integrity as we architect applications for the web. Through our work, we can ensure the privacy of people online with each layer of application stack development. Architecting software that protects user data requires that developers actively think about user data, both during the design phase of a new product and while making major changes to an existing one.

My background is in engineering: I’ve worked in TechOps, data reliability engineering, and SRE, as well as in leadership roles in these areas. It’s given me a unique perspective—close to code and engineers, while keyed into legal, security, and compliance obligations. But because I didn’t come to my current role as Fastly’s vice president of data governance through a traditional infosec, compliance, or legal path, I don’t think of privacy protection as solely belonging to any one of those domains. Architecting comprehensive privacy controls for an application shouldn’t be the responsibility of any one person or department. No single department—much less individual—has all the skill sets needed to do so properly. Privacy by design must be the work of an organization, and it must start early.

Too often privacy work is siloed or reactive. Security controls for personal, financial, or other sensitive data are first scoped to specific environments to meet compliance obligations, like the Payment Card Industry Data Security Standard or the Sarbanes-Oxley Act. Then they’re left to administrators to enforce during the implementation and operational stages. But traditional means of data governance and privacy through security controls (like having one owner for security architecture, or waiting for the implementation phase to secure data) may be too slow or too onerous for an engineering team to adopt in their agile processes.

Not taking action at the software design phase might conserve resources in the short term, but it can lead to data-privacy vulnerabilities and, in turn, distrust in your products or services in the long term. If your organization doesn’t delegate security architecture to a single owner, engineers can and should protect the customer-facing endpoints, secure the transmission of and encrypt storage of privacy-related data, provide minimum internal access, and review these practices regularly. Privacy patterns can help translate privacy by design into a software’s architecture. Design solutions to common privacy problems—such as location granularity and tracking protection—can be found at privacypatterns.org, where they’re offered under a Creative Commons Attribution license. And discussing data obligations with a privacy professional—a data protection officer, governance team, or legal or security expert—up front may avoid a situation later where deleting data is no longer possible or is more operationally troublesome.

To better promote agile development and to better share responsibility, these obligations can be distilled into principles that can be applied at every layer of the stack, in any environment, and in everyday decision-making by any engineer. For me, these principles are: Don’t collect data in the first place; collect the minimum necessary amount of data; drop data you don’t need; anonymize and aggregate all personally identifiable information (PII); determine if your data can be used (in whole or in part) to reverse engineer PII; delete raw data as soon as possible; programmatically manage retention and data-storage practices; understand the business or technical reasons why you collect data; and be transparent about your data-collection practices. Above all, data use must always be balanced against the protection of the individuals whose data is being collected.

At Fastly, we used a privacy-by-design model in developing the architecture, implementation, and product documentation for Fastly Insights, an optional service that monitors and analyzes network performance characteristics as well as the overall state of the internet. By design, we agreed that the sole focus of this service is to observe network performance in pursuit of making the internet work better for our customers and their customers. We also agreed that, in doing so, Fastly Insights does not need to collect personal data. By employing the techniques mentioned above in the software design, this optional service allows us to learn more about the performance of end-user clients on the internet without unnecessarily compromising individual user privacy online.

When someone working on Fastly Insights asks to add a new data field, we first try to understand why the additional data may be necessary to capture, if the data collection adheres to the promises made in our documentation, and if we feel confident that an individual’s personal or behavioral data isn’t collected through the process. If we decide to increase the amount of metadata we collect from client connections, we then explicitly document our findings for end users on our website. For example, an engineer may want to process all the geographic data from an HTTP request, but that may give specific location information that could lead to personal identification. In our discussions, we’d first measure the need for whole-region geographic data versus more granular location data. Then, we’d determine which choice least compromises personal data while also contributing to the service’s overall purpose: making the internet faster through client-specific routing decisions.

Around the globe, governments and industries are moving quickly to regulate data-handling practices of personal information online. (You may be ready to meet EU General Data Protection Regulation requirements, but are you up to speed with the California Consumer Privacy Act?) If a company has a customer-facing presence that collects or processes personally identifiable data, it’s imperative that people in all roles think deeply about these issues—and dedicate resources to them— regardless of their size. Consider setting up a program to provide governance for data at your company that regularly brings relevant stakeholders together: engineers, managers, compliance and legal analysts, HR, business representatives, and any others who work directly with data. This forum can ensure multidirectional and real-time communication about data use and privacy obligations.

Take time to model your system—especially the collection, processing, and storage of data, as well as the flows between these elements—and include a privacy professional in a dialogue before implementation. Designing software that protects user privacy is a group effort, and it begins with software architecture. Privacy by design can make the web what it should be: positive and safe.

About the author

Lisa Phillips has worked for nearly 25 years in tech and database operational roles at, among other places, LiveJournal, Six Apart, and Twitter—where she helped kill the Fail Whale. She is currently vice president of data governance at Fastly, an edge cloud provider.

@lisaphillips

Buy the print edition

Visit the Increment Store to purchase print issues.

Store

Continue Reading

Explore Topics

All Issues