Skip to content

Visual Snapshots

Nick Hamze edited this page Apr 22, 2026 · 2 revisions

Visual Snapshots

bin/snap.py lets agents (and humans) see what every theme actually looks like without uploading screenshots over chat. It boots each theme's WordPress Playground locally via @wp-playground/cli, drives Playwright Chromium across bin/snap_config.py::ROUTES × VIEWPORTS, captures a screenshot plus diagnostic artifacts for every cell (rendered HTML, console messages, page errors, network failures, DOM-heuristic findings, axe-core a11y violations, computed dimensions for INSPECT_SELECTORS, optional interactive states from INTERACTIONS), and runs a tiered gate that classifies the result as pass | warn | fail. bin/check.py --visual is the recommended pre-commit gate.

Quick reference

# First time? Verify all deps are ready before booting Playground.
python3 bin/snap.py doctor

# Capture one theme at every route/viewport
python3 bin/snap.py shoot chonk

# Just the desktop checkout (fastest inner loop)
python3 bin/snap.py shoot chonk --routes checkout-filled --viewports desktop

# Quick subset (snap_config.QUICK_*) -- fastest "did anything explode" sweep
python3 bin/snap.py shoot chonk --quick

# Smart sweep -- only re-shoot themes whose files moved in git
python3 bin/snap.py shoot --changed
# (framework changes under bin/* fall back to all themes)

# Capture every theme in parallel (~400MB RAM/worker, ~2x faster)
python3 bin/snap.py shoot --all --concurrency 2

# Boot a single theme and leave it running for interactive poking via the
# cursor-ide-browser MCP (or any browser) at http://localhost:9400/
python3 bin/snap.py serve chonk          # admin auto-login is enabled
                                          # for /wp-admin/ access

# Aggregate findings into reviewable markdown + the tiered gate verdict
python3 bin/snap.py report --open
# -> tmp/snaps/<theme>/review.md   per-theme triage with **GATE: …** badge
# -> tmp/snaps/<theme>/review.json per-theme JSON (gate + counts + routes)
# -> tmp/snaps/review.md           cross-theme rollup + parity drift
# -> tmp/snaps/review.json         overall gate + per-theme gates
# Final line: STATUS: PASS | WARN | FAIL

# Visual regression: compare current snaps to committed baselines
python3 bin/snap.py diff --all
python3 bin/snap.py diff chonk --threshold 0.5

# Promote latest snaps -> committed baselines (after reviewing diffs)
python3 bin/snap.py baseline --all
python3 bin/snap.py baseline chonk --route home --viewport desktop

# The single pre-commit gate: shoot + diff + report --strict, scoped to
# the themes that actually changed by default (--visual-scope=changed).
python3 bin/check.py --visual
# Full sweep before a release:
python3 bin/check.py --visual --visual-scope=all
# Smoke test for one theme + the QUICK_* subset:
python3 bin/check.py chonk --visual --visual-scope=quick

Per-cell artifacts

tmp/snaps/<theme>/<viewport>/<route>.png                 # screenshot (Read directly)
tmp/snaps/<theme>/<viewport>/<route>.html                # final rendered DOM
tmp/snaps/<theme>/<viewport>/<route>.findings.json       # heuristics + axe + console + 4xx/5xx + INSPECT
tmp/snaps/<theme>/<viewport>/<route>.a11y.json           # raw axe-core violations report
tmp/snaps/<theme>/<viewport>/<route>.<flow>.png          # interactive cells (e.g. cart-filled.line-remove.png)
tmp/snaps/<theme>/review.md                              # per-theme review with GATE badge
tmp/snaps/<theme>/review.json                            # per-theme machine-readable summary
tmp/snaps/review.md                                      # cross-theme rollup + parity drift
tmp/snaps/review.json                                    # overall gate + per-theme gates
tmp/diffs/<theme>/<viewport>/<route>.png                 # per-pixel diff overlay
tests/visual-baseline/<theme>/<viewport>/<route>.png     # committed reference

The tmp/ tree is .gitignored; the tests/visual-baseline/ PNGs are committed. bin/vendor/axe.min.js is also gitignored — the framework downloads it from a version-pinned CDN URL on first run.

DOM heuristics

The snap framework runs a custom set of DOM-heuristic checks on every captured page, in addition to axe-core. Each finding has a severity (error / warn / info), a stable kind, a human-readable message, and any extra context (selectors, measurements, source URLs):

  • horizontal-overflow — page is wider than the viewport.
  • wc-error / wc-info / wc-message / wc-validation-error — visible WC notices, captured verbatim.
  • php-debug-output — PHP notice/warning/fatal text leaked into the body.
  • raw-i18n-token — a literal __() token rendered (means a string was never translated).
  • broken-image<img> failed to load.
  • img-missing-alt — visible image without alt.
  • img-oversized — natively > 4000px wide.
  • responsive-image-overserved — served > 3× the rendered slot.
  • responsive-image-blurry — served < 0.75× the rendered slot.
  • text-overflow-truncated — ellipsis is actively hiding content.
  • empty-landmark<main>/<nav>/<aside> rendered with no visible text or media.
  • narrow-sidebar — a sidebar selector matched but rendered < 200px on a desktop viewport.
  • view-transition-name-collision — two or more elements share the same view-transition-name. Chrome aborts every transition with InvalidStateError on the next navigation when this happens; the heuristic catches it from the static DOM by walking computed style.
  • inspect-selector-missing — a selector listed in INSPECT_SELECTORS matched zero elements (likely time to update the config).
  • Network: any HTTP response ≥ 400 is captured into network_failures[], split into 4xx (warn) and 5xx (fail).
  • Console: console.error is captured into console[] (warn-tier) and pageerror into page_errors[] (fail-tier), both filtered against KNOWN_NOISE_SUBSTRINGS.

The tiered gate

Every cell's findings are classified into one of three buckets:

  • fail (build-blocking, exit 1): heuristic error, uncaught JS (after noise filter), HTTP 5xx, axe critical/serious.
  • warn (loud banner, exit 0): heuristic warn/info, HTTP 4xx, console errors, axe moderate/minor, parity drift, perf-budget exceedances, interaction-failed.
  • pass: nothing flagged.

The verdict appears as a STATUS: PASS | WARN | FAIL line at the end of every report and check run. It also lives at the top of each per-theme review.md as a **GATE: …** badge so triage starts with the verdict, not the table.

Recommended loops

When you make ANY change that could affect rendered output (template, theme.json, CSS, pattern, blueprint), the loop is:

  1. Make the change.
  2. python3 bin/snap.py shoot <theme> --routes <route> --viewports <viewport> for the affected cell(s).
  3. Read the PNG to verify.
  4. python3 bin/snap.py report and read the STATUS: line; drill into per-theme review.md if anything is non-pass.
  5. If wider impact possible: python3 bin/snap.py check --changed (smart, fast) or python3 bin/snap.py check (full sweep before a release).
  6. If diffs are intentional: python3 bin/snap.py baseline --all and commit the updated baselines alongside the change.

Build-pipeline integration

Other build scripts grew matching --snap flags so the gate runs inline after a mutation:

  • python3 bin/clone.py <name> --snap — auto-baseline a freshly-cloned theme.
  • python3 bin/sync-playground.py --snap — re-shoot affected themes after blueprint sync.
  • python3 bin/append-wc-overrides.py --snap — re-shoot after appending WC override CSS.

WP-admin Themes-card screenshot

bin/build-theme-screenshots.py consumes the snap framework's home-route output to generate each theme's screenshot.png — the 1200×900 image WordPress shows on the Themes admin card. It looks for the home shot in this order: committed baseline (tests/visual-baseline/<theme>/desktop/home.png), then the freshest unbaselined shot (tmp/snaps/<theme>/desktop/home.png). If neither exists it tells you which snap.py shoot command to run. So the canonical inner loop after editing tokens that change the home page is:

python3 bin/snap.py shoot mybrand --routes home --viewports desktop
python3 bin/build-theme-screenshots.py mybrand

bin/check.py's check_theme_screenshots_distinct fails when two themes ship identical screenshot.png bytes (the failure mode bin/clone.py produces by copying Obel's placeholder verbatim). Re-running build-theme-screenshots.py is always the fix.

Configuration

bin/snap_config.py is the single config file:

  • ROUTES — every (slug, URL path) the framework visits. Add a route here and it appears in every theme's review.
  • VIEWPORTS — Playwright viewport sizes (mobile / tablet / desktop / wide). Same idea.
  • INSPECT_SELECTORS — per-route map of CSS selectors whose computed width, height, display, and grid-template-columns get captured into *.findings.json and rendered into the per-theme review.md "Inspector measurements" tables. This is how the cart/checkout sidebar regression got diagnosed without re-shooting — add an entry here when you find yourself running ad-hoc Playwright probes to measure layout issues, so the next regression is visible immediately.
  • INTERACTIONS — per-route list of scripted flows (menu-open, qty-increment, swatch-pick, line-remove, field-focus). Each flow renders an extra <route>.<flow>.png cell so the post-interaction state is reviewable side-by-side with the static one.
  • KNOWN_NOISE_SUBSTRINGS — substring filter for pre-confirmed-harmless console / page errors. Add to it only after investigation confirms upstream noise — never to silence a real theme bug.
  • BUDGETS — soft thresholds for console_warning_count, page_weight_kb, image_count, request_count. Exceedances become findings at the configured severity. Set max: None to disable a budget.
  • QUICK_* — subsets used when shoot is invoked with --quick.

First-time setup

python3 -m pip install --user playwright Pillow
playwright install chromium      # ~90 MB Chromium download
python3 bin/snap.py doctor       # verifies everything is wired up

@wp-playground/cli is fetched on demand by npx --yes; no global install required. First boot takes ~2 minutes (WordPress download, plugin install, content seeding); subsequent boots are ~30 seconds when the playground cache is warm.

Clone this wiki locally