-
Notifications
You must be signed in to change notification settings - Fork 0
Visual Snapshots
bin/snap.py lets agents (and humans) see what every theme actually looks like without uploading screenshots over chat. It boots each theme's WordPress Playground locally via @wp-playground/cli, drives Playwright Chromium across bin/snap_config.py::ROUTES × VIEWPORTS, captures a screenshot plus diagnostic artifacts for every cell (rendered HTML, console messages, page errors, network failures, DOM-heuristic findings, axe-core a11y violations, computed dimensions for INSPECT_SELECTORS, optional interactive states from INTERACTIONS), and runs a tiered gate that classifies the result as pass | warn | fail. bin/check.py --visual is the recommended pre-commit gate.
# First time? Verify all deps are ready before booting Playground.
python3 bin/snap.py doctor
# Capture one theme at every route/viewport
python3 bin/snap.py shoot chonk
# Just the desktop checkout (fastest inner loop)
python3 bin/snap.py shoot chonk --routes checkout-filled --viewports desktop
# Quick subset (snap_config.QUICK_*) -- fastest "did anything explode" sweep
python3 bin/snap.py shoot chonk --quick
# Smart sweep -- only re-shoot themes whose files moved in git
python3 bin/snap.py shoot --changed
# (framework changes under bin/* fall back to all themes)
# Capture every theme in parallel (~400MB RAM/worker, ~2x faster)
python3 bin/snap.py shoot --all --concurrency 2
# Boot a single theme and leave it running for interactive poking via the
# cursor-ide-browser MCP (or any browser) at http://localhost:9400/
python3 bin/snap.py serve chonk # admin auto-login is enabled
# for /wp-admin/ access
# Aggregate findings into reviewable markdown + the tiered gate verdict
python3 bin/snap.py report --open
# -> tmp/snaps/<theme>/review.md per-theme triage with **GATE: …** badge
# -> tmp/snaps/<theme>/review.json per-theme JSON (gate + counts + routes)
# -> tmp/snaps/review.md cross-theme rollup + parity drift
# -> tmp/snaps/review.json overall gate + per-theme gates
# Final line: STATUS: PASS | WARN | FAIL
# Visual regression: compare current snaps to committed baselines
python3 bin/snap.py diff --all
python3 bin/snap.py diff chonk --threshold 0.5
# Promote latest snaps -> committed baselines (after reviewing diffs)
python3 bin/snap.py baseline --all
python3 bin/snap.py baseline chonk --route home --viewport desktop
# The single pre-commit gate: shoot + diff + report --strict, scoped to
# the themes that actually changed by default (--visual-scope=changed).
python3 bin/check.py --visual
# Full sweep before a release:
python3 bin/check.py --visual --visual-scope=all
# Smoke test for one theme + the QUICK_* subset:
python3 bin/check.py chonk --visual --visual-scope=quicktmp/snaps/<theme>/<viewport>/<route>.png # screenshot (Read directly)
tmp/snaps/<theme>/<viewport>/<route>.html # final rendered DOM
tmp/snaps/<theme>/<viewport>/<route>.findings.json # heuristics + axe + console + 4xx/5xx + INSPECT
tmp/snaps/<theme>/<viewport>/<route>.a11y.json # raw axe-core violations report
tmp/snaps/<theme>/<viewport>/<route>.<flow>.png # interactive cells (e.g. cart-filled.line-remove.png)
tmp/snaps/<theme>/review.md # per-theme review with GATE badge
tmp/snaps/<theme>/review.json # per-theme machine-readable summary
tmp/snaps/review.md # cross-theme rollup + parity drift
tmp/snaps/review.json # overall gate + per-theme gates
tmp/diffs/<theme>/<viewport>/<route>.png # per-pixel diff overlay
tests/visual-baseline/<theme>/<viewport>/<route>.png # committed reference
The tmp/ tree is .gitignored; the tests/visual-baseline/ PNGs are committed. bin/vendor/axe.min.js is also gitignored — the framework downloads it from a version-pinned CDN URL on first run.
The snap framework runs a custom set of DOM-heuristic checks on every captured page, in addition to axe-core. Each finding has a severity (error / warn / info), a stable kind, a human-readable message, and any extra context (selectors, measurements, source URLs):
-
horizontal-overflow— page is wider than the viewport. -
wc-error/wc-info/wc-message/wc-validation-error— visible WC notices, captured verbatim. -
php-debug-output— PHP notice/warning/fatal text leaked into the body. -
raw-i18n-token— a literal__()token rendered (means a string was never translated). -
broken-image—<img>failed to load. -
img-missing-alt— visible image withoutalt. -
img-oversized— natively > 4000px wide. -
responsive-image-overserved— served > 3× the rendered slot. -
responsive-image-blurry— served < 0.75× the rendered slot. -
text-overflow-truncated— ellipsis is actively hiding content. -
empty-landmark—<main>/<nav>/<aside>rendered with no visible text or media. -
narrow-sidebar— a sidebar selector matched but rendered < 200px on a desktop viewport. -
view-transition-name-collision— two or more elements share the sameview-transition-name. Chrome aborts every transition withInvalidStateErroron the next navigation when this happens; the heuristic catches it from the static DOM by walking computed style. -
inspect-selector-missing— a selector listed inINSPECT_SELECTORSmatched zero elements (likely time to update the config). -
Network: any HTTP response ≥ 400 is captured into
network_failures[], split into 4xx (warn) and 5xx (fail). -
Console:
console.erroris captured intoconsole[](warn-tier) andpageerrorintopage_errors[](fail-tier), both filtered againstKNOWN_NOISE_SUBSTRINGS.
Every cell's findings are classified into one of three buckets:
-
fail (build-blocking, exit 1): heuristic
error, uncaught JS (after noise filter), HTTP 5xx, axe critical/serious. -
warn (loud banner, exit 0): heuristic
warn/info, HTTP 4xx, console errors, axe moderate/minor, parity drift, perf-budget exceedances, interaction-failed. - pass: nothing flagged.
The verdict appears as a STATUS: PASS | WARN | FAIL line at the end of every report and check run. It also lives at the top of each per-theme review.md as a **GATE: …** badge so triage starts with the verdict, not the table.
When you make ANY change that could affect rendered output (template, theme.json, CSS, pattern, blueprint), the loop is:
- Make the change.
-
python3 bin/snap.py shoot <theme> --routes <route> --viewports <viewport>for the affected cell(s). -
Readthe PNG to verify. -
python3 bin/snap.py reportand read theSTATUS:line; drill into per-themereview.mdif anything is non-pass. - If wider impact possible:
python3 bin/snap.py check --changed(smart, fast) orpython3 bin/snap.py check(full sweep before a release). - If diffs are intentional:
python3 bin/snap.py baseline --alland commit the updated baselines alongside the change.
Other build scripts grew matching --snap flags so the gate runs inline after a mutation:
-
python3 bin/clone.py <name> --snap— auto-baseline a freshly-cloned theme. -
python3 bin/sync-playground.py --snap— re-shoot affected themes after blueprint sync. -
python3 bin/append-wc-overrides.py --snap— re-shoot after appending WC override CSS.
bin/build-theme-screenshots.py consumes the snap framework's home-route output to generate each theme's screenshot.png — the 1200×900 image WordPress shows on the Themes admin card. It looks for the home shot in this order: committed baseline (tests/visual-baseline/<theme>/desktop/home.png), then the freshest unbaselined shot (tmp/snaps/<theme>/desktop/home.png). If neither exists it tells you which snap.py shoot command to run. So the canonical inner loop after editing tokens that change the home page is:
python3 bin/snap.py shoot mybrand --routes home --viewports desktop
python3 bin/build-theme-screenshots.py mybrandbin/check.py's check_theme_screenshots_distinct fails when two themes ship identical screenshot.png bytes (the failure mode bin/clone.py produces by copying Obel's placeholder verbatim). Re-running build-theme-screenshots.py is always the fix.
bin/snap_config.py is the single config file:
-
ROUTES— every (slug, URL path) the framework visits. Add a route here and it appears in every theme's review. -
VIEWPORTS— Playwright viewport sizes (mobile / tablet / desktop / wide). Same idea. -
INSPECT_SELECTORS— per-route map of CSS selectors whose computed width, height, display, and grid-template-columns get captured into*.findings.jsonand rendered into the per-themereview.md"Inspector measurements" tables. This is how the cart/checkout sidebar regression got diagnosed without re-shooting — add an entry here when you find yourself running ad-hoc Playwright probes to measure layout issues, so the next regression is visible immediately. -
INTERACTIONS— per-route list of scripted flows (menu-open,qty-increment,swatch-pick,line-remove,field-focus). Each flow renders an extra<route>.<flow>.pngcell so the post-interaction state is reviewable side-by-side with the static one. -
KNOWN_NOISE_SUBSTRINGS— substring filter for pre-confirmed-harmless console / page errors. Add to it only after investigation confirms upstream noise — never to silence a real theme bug. -
BUDGETS— soft thresholds forconsole_warning_count,page_weight_kb,image_count,request_count. Exceedances become findings at the configured severity. Setmax: Noneto disable a budget. -
QUICK_*— subsets used whenshootis invoked with--quick.
python3 -m pip install --user playwright Pillow
playwright install chromium # ~90 MB Chromium download
python3 bin/snap.py doctor # verifies everything is wired up@wp-playground/cli is fetched on demand by npx --yes; no global install required. First boot takes ~2 minutes (WordPress download, plugin install, content seeding); subsequent boots are ~30 seconds when the playground cache is warm.
Fifty on GitHub · Live demos · GPL-2.0-or-later · Block-only WooCommerce themes, zero CSS files, zero JS, zero build step