Designing automated tests for React

Practical insights from Mailchimp’s approach to frontend testing.
Part of
Issue 10 August 2019


Why write automated tests? Because we want to make sure our expectations hold true over time. Tests are maintenance: Long after your code is written, people will edit it. Tests are also documentation: They record your code’s expected usage and behavior. An ideal test only fails when behavior unintentionally changes.

When I joined Mailchimp’s frontend platform team in August 2018, they had no JavaScript tests in use at all. The frontend team is relatively small, and tests are new to many frontend engineers, so we wrote a test design philosophy and created common ground through constant pairing and pull request review. Our first test was merged in November 2018, and hundreds have followed. We’ve come up with testing approaches that work for us. Maybe some will work for you too.

What to test

At its core, a test checks whether something behaves as expected.



Slot in input (arguments, scope) and a function will rattle out output (return value, side effects). Tests verify that the output matches your expectations.

expect(add(-1, 4)).toBe(3); 

Sometimes functions manipulate the environment they’re in. You see that most often when functions access global scope or browser APIs like fetching, localState, or DOM.

disableInternalTools(); expect(_window_._localStorage_.getItem('internal-tools:enabled')) .toBe('false'); 

Want to write tests for all possible combinations of inputs? I’ll see you in five years. Instead, you’ll have to find the right balance of edge cases and failure modes and the inputs that cause them.


At Mailchimp, classes are rare. Developers usually use classes as implementation details for something else. So we often don’t test the class directly— we test the code that uses it.

React components

A developer renders a component, then end users interact with it. Components take input (props, user interaction) and output React elements and side effects. The basic structure of your test case is: If the end user does something (input), something happens (output).

Note how I didn’t mention state at all. We test in terms of how the end user sees it—through the interactive user interface. State is something internal to the component. Under the hood, state might come from a React class component, React hooks, or somewhere else. That implementation detail doesn’t matter; only the component’s output does.

User interaction

Given a <Dropdown> component, we can test user interaction. To test its toggle behavior, we can render the component, find the toggle element, then click it. In order to find the toggle element, we must think like an end user. As end users interact with content on our site, they scan for particular strings of text: add to cart, edit shipping address, submit, your action was successful. It’s that text we can query for. Sometimes code speaks better than words:

const { getByText, queryByText } = render(
<Dropdown toggle='Restaurant menu'>
<DropdownItem>Scrumptious biscuits</DropdownItem>

// Verify that Dropdown items are hidden from the DOM
expect(queryByText('Scrumptious biscuits')).toBeNull();

// Click the Dropdown toggle'Restaurant menu'));

// Verify that the Dropdown items now appear in the DOM
await waitForElement(() => getByText('Scrumptious biscuits'));

Side effects

Sometimes the output of interacting with a React component is a side effect. For example, clicking a drop-down menu on the /about-cats page might result in a tracking function being called:

const { _getByText _} = render(<AboutCatsPage/>); const dropdownToggle = getByText('lord meowmuff');; expect(_track_).toHaveBeenCalledWith('lord_meowmuff', 'click'); 


Text is the most accessible form of content. Not videos. Not podcasts. Text is involved in all the ways end users consume content: reading the site visually, hearing the website through a screen reader, using a tactile Braille display, and so on. Interacting with text provides a baseline level of accessibility. For instance, @testing-library/react exposes a getByLabelText function that errors if it can’t find the DOM element associated with a , aria-labelledby, or aria-label.

Accessibility is often an afterthought, but if your components aren’t accessible, your application is broken. Accessibility-aware developers can write automated tests for interactions, which other developers often don’t consider but which are a way of life for disabled people, like managing focus as a modal pops in and out of view, or using arrow keys instead of a mouse to adjust a slider.

Accessibility streamlines automated testing. Assistive technology, like screen readers or voice navigation software, depends on semantic HTML and ARIA to make the web usable by disabled people. We can leverage those semantics in automated tests: We can restrict queries to particular sections of the page, categorized by ARIA roles and labels. We can find a DOM element through a relative’s ARIA properties. As the end user interacts with a component, we can check whether relevant ARIA states change:

const { getByText } = render(
<Dropdown toggle='open menu'>
<DropdownItem>First item</DropdownItem>


const dropdownToggle = getByText('open menu');
expect(dropdownToggle).toHaveAttribute('aria-expanded', 'false');

// Clicking toggle marks Dropdown as expanded;

expect(dropdownToggle).toHaveAttribute('aria-expanded', 'true');

Visual regression testing

Finally, because components are rendered, they involve display. We can write a screenshot test to verify that opening a drop-down menu visually displays the drop-down items:

const { getByText } = render(
<Dropdown toggle='open menu'>
<DropdownItem>First item</DropdownItem>


// Screenshot the Dropdown's closed state

// Open Dropdown; verify menu opened visually
const dropdownToggle = getByLabelText('open menu');;

React context

Components that consume context receive that context from a provider, so the provider and consumer should be tested together. At one point, we had a React consumer test that mocked its React provider. A few months later, we deleted the entire test file. The problem with mocking a React consumer or its provider is that you put so much effort into creating the mock, and you still have to maintain the mock when things change. No one has the energy for that.

Testing the real consumers and providers together treats context as an implementation detail. As a bonus, you can write tests using real code a developer would use and real interactions an end user would perform.

React hooks

When testing a hook, we construct a test-only component that uses the hook, then interact with the component.

Usage of a hook is typically tucked away in the implementation details of a component. In application tests, we test the behavior of the components that use the hook.

Application code

Developers stitch together an application for end users to interact with. Application tests at Mailchimp are typically integration tests, like a button click opening a modal with certain properties, and verification of network effects.

Async code

Async code implies the passage of time. You must either let time pass naturally or mock the flow of time. Our approach depends on the public interface and the environment we’re operating in.

Writing end-to-end tests? The experience should be as close to real user interaction as you can get, so avoid mocking as much as possible.

Testing timers? Use a timer mock. If we’re testing the logic of a debounce function, then we can test that a specific amount of time has passed. That logic is relevant to developers. If a React component opens a menu after X seconds, an end user doesn’t care whether it happens after exactly X seconds. The end user only cares that it opens.

Testing animations? If the animation is used in the context of something else, like application code, we mock the animation with its end state. If we’re testing the animation itself, we test that animation in isolation.

Testing fetch requests? Use a fetch mock or generate fake data. Code here is kept asynchronous, but it resolves in the next tick. With fetch requests, you may want to test loading, failure, and success modes, or abstract those concerns away.

And so on...

At Mailchimp, we test our Webpack plugins, compilers, and ESLint plugins as black boxes, with source code fixtures and a set of expected outputs. They aren’t React, but they’re all part of our React ecosystem. And everything is tested from the perspective of the developers.

How to test

End users don’t care how your code is implemented; they only care whether your code works. Tests should reflect that. As such, tests should exercise the code’s public interface without reaching into the internals or implementation details.

Test the sum of your parts

The more complex your code gets, the more parts are composed together, like musical notes. But you don’t need to test every single thing. It’s often easier and more convenient to test the music as a whole rather than its constituent parts. When we test a Babel plugin, rather than test individual steps, we test the plugin as a whole by running it against code fixtures and testing that the output matches our expectations.

Test some parts of the whole

Sometimes a part of a whole sees significant complexity, like an algorithm or a helper function with complex behavior. Take an isUrl(url) function—a function that takes a string and returns true if it’s a valid URL. That could certainly warrant tests; it’s difficult to keep track of all the edge cases.

Use different approaches for library versus application code

Library code is meant to be reusable and flexible across multiple environments; its value is clear. To test library code, test the public interfaces that developers use. Anything not public is an implementation detail.

Application code, on the other hand, tends to be single-use, so value is less apparent on the surface. But in a changing codebase, business logic that almost never changes is valuable to test. If a completely unrelated team changes a minor API or upgrades a third-party library and business logic in your slice of the application breaks, your tests could catch that. To test application code, find the concrete event handlers that result in observable state changes.

Follow the glue

Glue code can lead to opportunities to test application code—your product. Let’s say we create a shared Button component in @mc/components/Button:

// @mc/components/Button.js
import React from 'react';

const Button = ({ children, onClick }) => {
return (
<button type="button" onClick={onClick}>


export default Button;

The Button is part of the @mc library. An engineer wants to test that the onClick prop is passed through properly. Hooking up onClick is glue code: It’s abstract behavior, not specific behavior. We’re passing references. A test might look like this:

// @mc/components/Button.test.js
describe('Button', () => {
it('calls "onClick" prop on button click', () => {
const onClick = jest.fn();
const { getByText } = render(
<Button onClick={onClick}>
Click me

expect(onClick).not.toHaveBeenCalled(); me/));

Does this test have value? Well, since the Button component is part of the @mc library, it probably does. We absolutely don’t want to break that contract—when an end user clicks a button, the onClick prop must be called.

Now, imagine this was a single-use button deep in application code. It’s a trivial part of the whole, so we probably wouldn’t test this button at all. A test for onClick in that context is merely a check for typos, and moreover verifies that React works. It works. Instead, we follow the glue up to a concrete definition of the onClick handler and test that business logic. For example, if a user clicks the “Adopt cat” button on the <AdoptionAgency> page, we can test for a network request to /cats/adopt-human.

How not to test

Testing has costs, too. As the codebase evolves, expectations change. The cost of writing tests and the cost of maintaining those tests compound over time. Tests that depend on implementation details may result in false negatives and false positives. Fragile test design can lead to tests that are hard to maintain and don’t give much value.

Avoid code coverage

If we test nothing, someone tasked to wade into a code swamp and change critical behaviors will experience anxiety about whether they missed something. But testing everything isn’t free either, and it yields diminishing returns. Even 100 percent coverage doesn’t mean you’ve explored all relevant edge cases. Stick to testing behavior.

Avoid implementation details

Tests shouldn’t break if we change code without changing its behavior. To avoid inappropriate breakage, write tests that treat the code as a black box. Don’t access private states or methods you have no business accessing. Test the public interface, not its implementation.

We’ve all written this kind of test before:

import React from 'react';
import { mount } from 'enzyme';
import Dropdown from '@mc/components/Dropdown';

const SimpleDropdown = () => (
<Dropdown toggle="Open navigation">
<a href="/">Home</a>
<a href="/contact">Contact</a>

describe('Dropdown', () => {
it('opens when the user clicks the toggle button', () => {
const wrapper = mount(<SimpleDropdown />);

This test is brittle. Why? The test body verifies internal state rather than external behavior. If we refactor Dropdown to use React hooks internally, this code will fail. If Dropdown renames the state variable isOpen, this code will fail. If Dropdown swaps Button with a different component, this code will fail. Refactoring Dropdown makes the tests fail, even though the component behavior never changes.

This can partly be solved by tooling. enzyme lets you peek into React internals, making this style of test too easy. Switching over to @testing-library/react, we can treat the component as a unit, and our test now reads:

import React from 'react';
import { render } from '@testing-library/react';
import Dropdown from '@mc/components/Dropdown';

const onOpen = jest.fn();

const SimpleDropdown = () => (
<Dropdown toggle="Open navigation" onOpen={onOpen}>
<a href="/">Home</a>
<a href="/contact">Contact</a>


describe('Dropdown', () => {
it('opens when the user clicks the toggle button', () => {
const wrapper = render(<SimpleDropdown />);
wrapper.getByText('Open navigation').click();

Dropdown is now a black box. We find the drop-down toggle the same way an end user would, by scanning for text: Open navigation. Then we trigger a click on the DOM element associated with that text. Lastly, we test whether the drop-down menu opened by checking the onOpen callback on its public interface. The test uses the Dropdown code as it’s meant to be used, and becomes more resilient once we’ve avoided implementation details.

Avoid mocking

We want to be sure that our expectations hold true over time. Mocks cause the test environment to further diverge from the production environment. We may not know when production is broken because we’ve mocked out that behavior in tests, so ideally we should mock the bare minimum needed to make a test work effectively.

Creating a mock is necessarily an implementation detail. But, though we aim to avoid implementation details, sometimes it’s the best trade-off. Here’s a rough heuristic.

Mock your code:

  • If your code calls functions in the global scope (for example, fetch, localStorage, setTimeout, et cetera).

  • To avoid nondeterminism: Often, mocking prevents unexpected nondeterministic behavior from occurring, like in dates, random numbers, third-party libraries, et cetera.

  • For fake data—especially data we get from some API or server.

Otherwise, exercise real code paths whenever possible.

Avoid Jest snapshots

Snapshots are rife with misuse. The way I see it, they have a number of drawbacks:

Lacking in intent. Reviewers need to review snapshots, too. But snapshot tests rarely have a clear intent, so they are often painful to review.

Fragile. Snapshots are commonly used to capture structure, not behavior. If you snapshot React components, they typically get serialized as HTML. Changing a component’s HTML doesn’t mean behavior changes, but the snapshot test will fail regardless.

Too easy to update. Because you can pass an option to update all snapshots at once, you run the risk of developers ceasing to analyze failures over time. If snapshot changes get too noisy, they’re useless.

Too much indirection. In order to see the full intent behind a snapshot test, you have to open another file and scroll to the correct case. Instead, be explicit about what you’re testing and co-locate the expected output. (This is somewhat mitigated by inline snapshots.)

Good snapshot tests have to make all that worth it. We use snapshots to test the output of compilers and renderers; the structure is what we want to test.

Test manually, too

Automated tests fish for regressions in an environment separate from production. You will miss something. That’s okay. If code review, feature flagging, and staging environments don’t catch it, companies typically have processes in place to fix bugs reported by end users in production. Blameless postmortems can identify how to prevent similar issues in the future. But you should still test your changes manually—especially if your company doesn’t have some of the aforementioned ceremony.

Automated tests for accessibility are limited. Many report failures when there aren’t any, and none can uncover all accessibility issues. At Mailchimp, we developed internal tools that automatically check the page for accessibility issues. But even with automatic accessibility checking in place, there’s no replacement for actually auditing your application with assistive technology and disabled people. If your application is inaccessible, your application is broken.

There are tools to make manual tests easier: A component playground like Cosmos or Storybook allows you to tinker on components in isolation. Hosted playgrounds make code review easier and benefit not just developers but also designers, product managers, and QA testers.

Finally, testing with end users is the ultimate verification. That’s when you go beyond “correctness” and validate whether this is the behavior your audience actually wants. Your application must be usable by a diverse population—disabled people, Black people, women, LGBTQ people, and so on.

Testing means trade-offs

Automated tests are harder to write than code. You’ll have to tinker and discover what fits. What mistakes do developers make that tests could catch? Asking that question determines which tests are most valuable to write. Discuss with your peers and question which expectations should hold true over time.

And remember, automated tests are a small part of a larger story. Writing tests is writing code, and maintaining code is work. The fewer automated tests you have, the less code there is to maintain or change. The tests we do write should be quick and easy to create. But finding the right balance is hard. At Mailchimp, our approach to testing is always evolving, and we might already be diverging from the methods described here. It’s trade-offs all the way down.

Further resources for the ambitious reader

Testing opinions are as varied as the stars in the sky, and each opinion is worth considering.

Some articles

Some tweets

About the author

David Peter is a staff engineer at Mailchimp. He is a supporter of underdogs and a lover of cats.


Buy the print edition

Visit the Increment Store to purchase subscriptions and individual issues.


Keep in touch

Share your email so Stripe can send you occasional email updates about Increment.

Thank you!

We’ll be in touch.

You can unsubscribe at any time. Please see our Privacy Policy for more information.

Continue Reading

Explore Topics

All Issues