The Metropolitan Museum of Art is a 150-year-old museum in New York City with an encyclopedic collection of art that spans 5,000 years of human culture and history. As the world has become increasingly digital and mobile, The Met has evolved too, with rich digital content offerings like MetKids and the Timeline of Art History and an online collection featuring nearly half a million artworks. Our challenge has been how to go beyond the walls of the museum, expanding the reach and impact of these works by bringing our collection to a global digital audience. To meet that challenge, The Met uses both partnerships and platforms to connect with audiences wherever they are.
The Met’s Open Access program launched in 2017 with over 375,000 public domain images from our collection made freely available under a Creative Commons CC0 license. A collaborative effort between the museum’s digital and legal departments, the program—years in the making—required approval from the director’s office and trustees. Its goal was to make the museum’s public domain images and data as accessible as possible, allowing artists, designers, developers, students, and scientists to use, analyze, and remix the content without restriction.
Without the ability to transfer content and data at any sort of scale, we wouldn’t have the tools we needed to really grow partnerships or provide content to third-party platforms.
When the program launched, the images were available as part of our website, and the collection metadata was available as a public CSV. But the collection development team—three developers working within The Met’s 60-plus-person digital department—knew that, while CSVs are great for data analysis, we would need to make the museum’s data available in a more machine-readable format in order to maximize our reach. Without the ability to transfer content and data at any sort of scale, we wouldn’t have the tools we needed to really grow partnerships or provide content to third-party platforms. An API was the logical solution.
We knew we needed a way to share consistent collection metadata, like dates, artist names, and object dimensions, across our digital products. The source of truth for collection data at The Met is our collections cataloguing system, which allows museum staff to maintain scholarly information about the works in our collection. The new data source needed to be built as close to this system as possible and serve as a one-stop data shop for everything digital at the museum. We have an extract, transform, load (ETL) system called CRD (Collection Research Database) that pulls relevant data from the cataloguing system on a nightly basis and houses the display logic for much of what you see on the websiteʼs collection pages. Ideally we would have begun by replacing or refactoring CRD in order to access the data directly from our cataloguing system, but in the interest of speed we decided to use CRD as the foundation. We felt we could address CRD later if needed.
With the long-term goal of building a public API, we built an internal, private API for our collection object page, carving it out of the existing website as a separate codebase. Separating it from the rest of the site not only let us deploy independently, but gave us a clearly defined use case to drive development of the API.
We opted for a RESTful service with JSON responses, using current web standards to ensure a wide range of systems could consume it. This decision paid dividends: Our frontend developers were able to get started easily, and we saved weeks of work when we replatformed the Timeline of Art History on our website. Because everything uses a common API, the artwork metadata and images are updated and in sync with all other API-driven digital products every 24 hours. It also enables internal projects like Chrome extensions and Slack bots that display random artworks, as well as tools for saving artwork to Pinterest.
It wasn’t until we had the private API that we could really consider how to work with partners and platforms. At the time, one of our partners, Google Arts & Culture, was creating online collections from many institutions. While The Met had been contributing for a number of years, our manual process meant we’d only uploaded 700 images. When we began to build the public API, we worked with Google Arts & Culture to identify the key endpoints, data fields, and functionality the platform needed to ingest The Met’s Open Access images and data. Together, we ensured the endpoints and schema had everything future consumers or partners might need, and we were able to stress test the API. Once our API was in place, we became the largest collection on the platform.
In October 2019, we launched The Met Collection API and announced the API-based integration with Google Arts & Culture. This was a foundational step toward making the museum’s collection one of the most accessible and useful on the web and creating a pipeline to new partnerships and audiences.
To lower the barrier to entry and encourage experimentation, the API has always been open, with no authentication required.
What makes The Met’s API stand out among other museum APIs is not only the breadth and depth of the collection it supports, but also the fact that it is under active development, is refreshed nightly, and links to other public data sets. Our commitment to treating this as a living API means we continue to focus on making our data available to the widest possible audience and using it to scale our reach. We get nightly updates for free because we used CRD as our data source, which refreshes every 24 hours as changes are made to our cataloguing system. Meanwhile, by providing links to other data sets, the collection can be represented in a wider context. To lower the barrier to entry and encourage experimentation, the API has always been open, with no authentication required. (We didn’t want to build the infrastructure for authentication without a clear need, and we’re able to control traffic to the site without it.)
Since launching the API, the collection development team has actively sought to collaborate with more corporations and academic institutions to further expand and improve upon it. In collaboration with Microsoft, we’ve explored how the addition of a keyword data set and artificial intelligence can reveal new connections between works of art. For instance, we were able to trace the user journey from searching for the painting Washington Crossing the Delaware to exploring the work of Edmonia Lewis, one of the first African-American and Native American sculptors of international renown. We’re also working with data visualization students from Parsons School of Design to give them the opportunity to explore our works, deepen their understanding of the collection, and interpret the data in new and creative ways.
Each collaboration has provided further insight into what data might make the API even more useful. For example, there are many ways to describe an artwork, and museums differ in how they spell artists’ names and the terminology they use to describe subject matter. At The Met, for instance, Rembrandt is known as Rembrandt (Rembrandt van Rijn). Elsewhere he’s known as Rembrandt Harmensz. van Rijn, Rembrandt Harmenszoon van Rijn, and Rembrandt van Rhyn, among other variations. Chairs, similarly, might be called side chairs, armchairs, fauteuils, bergères, sgabellos, or caquetoires depending on the time period or country of origin. Using OpenRefine, we recently added links to the Getty Research Institute’s Union List of Artist Names and Art and Architecture Thesaurus. These databases provide standardized, controlled vocabulary for art researchers, making it easier to connect with other data sets using the same standards and rendering our data more accessible. We’ve added links to Wikidata items where they exist for artists and artworks in our collection, as well as identifiers for works created by women.
The API has also afforded us the opportunity to work with the data science community in the fields of computer vision and AI. Last year, researchers used our API to create a data set for a machine learning competition focused on fine-grain attributes—which go beyond basic object recognition and attempt to identify specific subject matter and individual characteristics—on Kaggle, a platform for data science challenges. We’d hoped for 10 to 20 participants, and were surprised and delighted when over 500 teams participated in the three-month competition. This reflected an interest we hadn’t anticipated and showed how the API could have an impact on a new field far beyond traditional art history. These learnings will likely benefit museums in turn: A museum might, for instance, harness an artwork-specific AI model to generate subject keywords for their objects or aid in searching online collections.
We’ve also worked with Wikimedia to refine AI capabilities. We first partnered with them in 2017 when we launched our Open Access program, working together to get our Open Access images and data disseminated through their platforms and ultimately uploading about 300,000 images to Wikimedia Commons. The Met’s API was also the focus of a 2018 hackathon which brought together people from the museum, Microsoft, MIT, and Wikimedia, and produced a number of innovative prototypes ranging from voice recognition to generative adversarial networks (GANs). Members of the Wikimedia community created a Wikidata game based on the API images which functioned as a way for users to verify AI-generated subject tags for the artworks in our collection. Through a very simple interface, users were asked to accept or reject suggested AI tags, which helped train the AI model. The tool had the added benefit of generating a new Wikidata entry for accepted suggestions, creating another opportunity for people to access the artwork. Initially the tool included only Met artworks, but Wikimedia has since expanded it to include works from other museum collections as well. We continue to partner with Wikimedia to encourage people to create new pages and edit existing ones using Met artworks, hosting four Wikipedia edit-a-thons in 2019 alone.
When we began this project, we set out to make our collection the most accessible, discoverable, and useful one on the web.
When we began this project, we set out to make our collection the most accessible, discoverable, and useful one on the web. Since the launch of our Open Access program and API, one of The Met’s key metrics has become the number of users accessing our API. In the API’s first quarter, Q3 2018, we averaged 688 distinct IP addresses and 2,466,106 requests per month. By May 2020, that had grown to 4,522 distinct IP addresses and 6,466,020 requests. We also measure traffic to Met content on third-party platforms.
The API is an effective and scalable way for developers to access our data and work with the images in our collection. It has been rewarding to be able to place art in new contexts, like machine learning and computer vision, and provide new perspectives on works in the collection, such as color palettes and scale. It demonstrates the countless ways art can inspire and connect us to knowledge, creativity, and new ideas. We are continually surprised and delighted by what happens when people bring their own questions, perspectives, and context to a collection that spans millennia.