Table of Contents
I began working on visual regression testing on June 4th, 2024.
On August 5th, 2025—the day before my 31st birthday—I accepted all of a build’s screenshots for the first time. Thus ended 428 days of sporadic toil.
I’ve had the tests practically finalized for a while. Problem was, they were flaky. I tried reading Playwright documentation, tutorials, and best-practice
guides. I long conversed with AIs. I even offered to pay $400 so that a professional would help me tidy up. The response was—and I quote—“this is 100% a trap lol… I’ve debugged playwright before and it’s not worth $400.”
I was on my own, but hopefully I can transfer some of my painful learning. Here are the tricks I learned to keep my code clean, my tests reliable, and my site not visually regressed.

lost-pixel to examine and reject an unintended change.To get started, here are two best-practices guides which I recommend:
- Official Playwright best practices,
and - Say Goodbye to Flaky Tests: Playwright Best Practices Every Test Automation Engineer Must Know.

- Don’t wait for a set amount of time
- Both
page.waitForTimeoutandexpect.pollrely on explicit timings. You should almost always use a better alternative.
- Test approximate equality for scalars
- If you’re testing the
yposition of an element, useexpect(...).toBeCloseToinstead ofexpect(...).toBe. - Don’t run tests in parallel mode
- Parallelism
is supposed to work but it never did for me. Instead, I use dozens of shards on CI, each of which runs a few tests in sequence.
- Lint, lint, and then lint some more
- Linting is not a luxury. My Playwright struggles went from “hopeless” to “winning” when I installed
eslint-plugin-playwrightto catch Playwright code smells.
- Create a dedicated “test page”
- I can scroll my test page
and see nearly all of the site’s styling conditions. The page is a living document, expanding as I add new formatting features or remember additional edge cases.
- Debug failures using Playwright traces

- Traces let you inspect every moment of the test. You can see the state of the dom before and after every Playwright command. On CI, save the traces as artifacts and use the
retain-on-failureoption.
I ended up using the free lost-pixel app
to examine screenshot deltas and judge visual diffs. No matter what tool you use, though, you’ll want your screenshots to be targeted and stable.
- Targeted screenshots only track a specific part of the site, like the different fonts.
They don’t include e.g. the sidebars next to the fonts.
- Stable screenshots only change when the styling in question changes. For example, I often dealt with issues where a video’s loading bar would display differently in different screenshots due to slight timing differences—that is not stable. If the video didn’t appear at all, however, I would want the screenshot to reflect that.
It took me a long time to achieve these goals. Practically, I recommend directly using my visual_utils.ts. Here are screenshot lessons I learned:
- Stabilize screenshots using
toHaveScreenshot - Use
await expect(page).toHaveScreenshotinstead of
await page.screenshot. The first is much more robust. For example,toHaveScreenshotrepeatedly takes screenshots and waits for consecutive screenshots to be identical—automatically waiting for painting to finish. A lot of my externally loaded assets did not stably render until I usedtoHaveScreenshot—waiting fornetworkidleis not enough. -
When using
npx playwright test, make sure to pass in--update-snapshotsor else your CI will go “errr, there r no snapshot” and then error out. - Target screenshots to specific elements
- Instead of taking a screenshot of the entire page, I take a screenshot of e.g. a particular table. The idea is that modifying table styling only affects the table-containing screenshots.
- For elements with the
controlsattribute, scrub to the end - Embedded audio and video elements fetch a varying number of bytes before the test takes a screenshot. That varying number of bytes means a varying “loaded” portion of the loading bar, creating a flaky visual difference. Before each test, I now scrub each audio element to the end, ensuring the element is displayed as fully loaded.
-
In the loading bar, the medium shade displays how much data has been fetched. - Isolate the relevant dom
- While
toHaveScreenshotguarantees stability within a session, my screenshots were still wobbling in response to unrelated changes earlier in the page. For some reason, there were a few pixels of difference due to e.g. an additional line being present earlier in the page. -
I made a helper function which deletes unrelated parts of the dom. For example, suppose I have five
<span>s in a row. I want to screenshot the third<span>. The position of the first two<span>s affects the position of the third. Therefore, I edit the dom to exclude siblings of ancestors of the element I want to screenshot. I would then exclude the other four<span>s. - Mock the content
- When I take screenshots of site styling, they’re almost all of the test page content. The test page decouples site styling from updates to content around my site, ruling out alerts from “changed” screenshots which only show updated content.
- Know when to give up
- In my visual regression testing, there are five or so discrepancies between the CI screenshots and the local screenshots. I tried for at least an hour to fix each discrepancy, but ultimately gave up. After all, visual regression testing just needs to tell me when the appearance changes. I’ve just approved those screenshots and kept an explicit list of what’s different.
Find out when I post more content: newsletter
& rss
alex@turntrout.com
(pgp)