feat(regen): hard canvas-size gate with delta-aware repair messages by MarkusNeusinger · Pull Request #7517 · MarkusNeusinger/anyplot

MarkusNeusinger · 2026-05-20T13:43:45Z

Summary

Hard canvas-size enforcement for the regen pipeline: every saved PNG must end at exactly 3200×1800 (landscape) or 2400×2400 (square), within ±16 px (≤0.5 %).
Three-layer defense: STEP-0 contract in impl-generate-claude.md, "Canvas — hard rule" sections in all 10 library prompts, and a pre-AI-review pixel gate in impl-review.yml that routes drift into the existing repair cascade.
Delta-aware repair feedback: when the gate fails, the synthetic weakness names the actual dims, the closest target, the signed delta, and a library-specific likely cause — so the repair model knows which knob to turn and which direction, not just "use exact size."
Monitoring script + 12 unit tests lock the ::notice::canvas_gate log contract so we can spot libraries whose prompt is still leaking before steady-state compute waste.

Background

After yesterday's per-library sizing change (1b1f279a6), last night's daily-regen produced systematic canvas drift across 7 of 10 libraries — see /tmp/regen-compare/overview.html audit from earlier today:

Lib	Last night's outcome
ggplot2, pygal	8/8 on target (ggplot2 wins because it's generated fresh — no prior impl to anchor on)
highcharts, bokeh	8/8 stuck at 3200×1661 / 2400×2261 (Chrome chrome + bokeh toolbar)
matplotlib, seaborn	8/8 trimmed by `bbox_inches='tight'` to ~3160×1740 / ~2200×2336
altair	every render a different size (vl-convert pads outside `width`/`height`)
plotly, plotnine	dims OK but wrong aspect 2/4 times

Two prior soft tightenings (13cf81342, 7f4a78a04) added "Base style ALWAYS wins" to the regen prompt — advisory, kept losing to the in-context anchor of the previous impl's figsize/width/height.

Changes

Layer 1 — `prompts/workflow-prompts/impl-generate-claude.md`

New "Step 0: Canvas dimensions (HARD CONTRACT)" placed above the previous-impl reference. Names the two canonical pixel pairs, declares prior values historical, warns about the post-render gate.
New Step 3b PIL self-check that fails fast inside Claude's existing 3-attempt loop.

Layer 2 — `prompts/library/*.md`

matplotlib, seaborn: removed bbox_inches='tight' from all savefig examples (this was the documented cause of ~40 px trim).
altair: explicit configure_view(continuousWidth, continuousHeight) + zero padding + mandatory in-impl PIL crop/pad to exact target (escape valve so the gate never deadlocks altair structurally).
plotly: forbid autosize=True; require explicit margin=dict(...).
highcharts: concrete Chrome workaround — Emulation.setDeviceMetricsOverride via CDP + --hide-scrollbars (the --window-size value isn't authoritative in headless mode).
bokeh, pygal, plotnine, letsplot, ggplot2: re-labeled the canvas section "Canvas — hard rule, no deviation"; documented both canonical pairs.

Layer 3 — `.github/workflows/impl-review.yml` + `prompts/workflow-prompts/ai-quality-review.md`

New "Canvas dimension gate" step after the "Verify both theme renders exist" step. Reads plot-light.png dims, emits a structured ::notice::canvas_gate library=… status=… actual=… target=… delta=… attempt=… line, and on drift writes a synthetic weakness to /tmp/anyplot-canvas-gate.txt naming actual, closest target, signed delta, direction, optional aspect hint, and a library-specific likely cause.
ai-quality-review.md now reads that file in a new "5c2" step and (a) copies the weakness verbatim as the first item in its output, (b) forces VQ-05 to 0/4 — which drags quality_score below the repair threshold so the existing label-driven cascade routes the PR into impl-repair.yml. No raw exit 1 (would bypass repair).

Monitoring

automation/scripts/canvas_gate_report.py parses canvas_gate notice lines from recent impl-review runs and prints per-library first-attempt fail rates. Decision rule: anything > 20 % over 14 days means the library prompt is still leaking and warrants another tightening pass.

Test plan

Local replay against last night's 79 PNGs: gate classifies 38 pass / 41 fail; messages match the manual audit (correct actual dims, signed delta, library-specific cause).
Inline gate snippet smoke-test: emits the exact ::notice:: format the parser expects.
Unit tests: 12 new tests lock the log-line contract between the gate step and the monitor script; full unit suite (1485 tests) passes.
Ruff lint + format clean on new files.
End-to-end run on one spec: after merge, dispatch bulk-generate.yml -f specification_id=sn-curve-basic (highcharts/bokeh cropped the X-axis title last night → real reproducer). Expectations: ggplot2/pygal/letsplot/plotnine/plotly pass first attempt; matplotlib/seaborn/altair/highcharts/bokeh either pass or repair to pass on attempt 2 with the named delta as feedback. No PR auto-closed.
Two-week watch: run uv run automation/scripts/canvas_gate_report.py daily for 3 days, then weekly. Flag any library > 20 % first-attempt fail rate.

🤖 Generated with Claude Code

Tag chips on the spec/impl detail page and on the /stats tag-distribution block built URLs as /?<param>=<value>. Since the page split, / is the LandingPage and no longer accepts filter query params — only /plots does. The links resolved to the landing page with stray query params and lost the filter intent. Fixes: - SpecTabs.handleTagClick: navigate("/plots?…") instead of "/?…". One call site covers both spec-overview and impl-detail since SpecPage mounts <SpecTabs> for both router modes. - StatsPage tag links: target /plots, encodeURIComponent the value, and fire `tag_click` with `source: 'stats'` (StatsPage was silent on this event — only spec_detail was tracked before). - docs/reference/plausible.md: list StatsPage.tsx as a tag_click source and document the `source ∈ spec_detail, stats` enum. Drive-by: add `crypto` to the eslint browser-globals list so FeedbackWidget.tsx (uses crypto.randomUUID / getRandomValues) stops failing `no-undef`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses Copilot review feedback on #7516: the StatsPage tag-link behavior changed in the parent commit had no test coverage (also flagged by codecov/patch/frontend). Adds one test that locks in: - the link's href is /plots?plot=scatter (regression guard for the pre-fix /?plot=scatter that silently dropped the filter intent), and - clicking fires trackEvent('tag_click', { param, value, source: 'stats' }). The useAnalytics mock is hoisted into a module-level mockTrackEvent so the assertion can see the call. Cleared per test via mockClear() because vi.restoreAllMocks() does not reset hoisted vi.fn() instances on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The prior test asserted href === '/plots?plot=scatter', which proves the /plots routing but does not exercise encodeURIComponent — 'scatter' has no special chars, so the assertion would still pass if the encoding were removed. Copilot's review explicitly called out '/plots?{param}={encodedTag}', so add a 'time series' tag (space in the value) under data_type. The new assertion locks in href === '/plots?data=time%20series', which fails the moment encodeURIComponent is dropped from the source. Also clicks both links and asserts trackEvent receives the raw (unencoded) value, matching the documented plausible.md contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Yesterday's per-library sizing change (1b1f279) shipped, but last night's daily-regen exposed systematic canvas drift across 7 of 10 libraries: ggplot2 (generated fresh, no prior impl to anchor on) and pygal hit the 3200×1800 / 2400×2400 target exactly, while highcharts/bokeh ended ~140 px short, matplotlib/seaborn were trimmed by bbox_inches='tight', and altair had no canvas control at all. The two prior soft tightenings of "Base style ALWAYS wins" couldn't beat the in-context anchor of the previous implementation's figsize/width/height values. This change enforces canvas dimensions through three coordinated layers: - impl-generate-claude.md: STEP 0 canvas contract above the previous-impl reference, with a PIL self-check sub-step in Step 3 so Claude catches drift inside its own retry loop before pushing. - per-library prompts: explicit "Canvas — hard rule" sections naming both canonical pixel pairs. Surgical fixes for the actual offenders — drop bbox_inches='tight' from matplotlib/seaborn save calls; add a PIL crop/pad normalizer to altair (vl-convert pads outside width/height); forbid autosize=True + pin margin in plotly; use Chrome CDP setDeviceMetricsOverride in highcharts (--window-size alone gets eaten by Chrome chrome). - impl-review.yml: pre-AI-review canvas gate that emits a structured ::notice:: log line per evaluation and, on drift, writes /tmp/anyplot-canvas-gate.txt with actual×actual, closest target, signed delta, and a library-specific likely cause. ai-quality-review.md picks the file up, copies the synthetic weakness verbatim, and forces VQ-05 to 0/4 — which drags quality_score below the repair threshold and routes the PR through the existing 5-review/ 4-repair cascade. Routing through review (not a raw step failure) preserves attempt counting and PR cleanup; impl-repair is dispatched only by the ai-rejected label, never by a hard step exit. The repair feedback names the *actual* drift, not just the target — so the repair model knows which knob to turn and which direction, rather than guessing on the next attempt. automation/scripts/canvas_gate_report.py aggregates the structured notice lines across recent impl-review runs and prints per-library first-attempt fail rates; anything above 20 % over 14 days means the library prompt is still leaking and gets another tightening pass rather than burning compute in steady-state repair. Verified by replaying the gate logic against the 79 PNGs from last night's regen (/tmp/regen-compare/): 38 pass (ggplot2 + pygal + most plotly/plotnine/ letsplot), 41 fail with delta + cause messages matching the manual audit. 1485 unit tests pass; 12 new tests lock the ::notice:: log contract between the gate step and the report parser. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR hardens the regeneration pipeline against canvas-size drift by introducing a pre-AI-review pixel gate, updating generation/review prompts to treat canvas size as a hard contract (with delta-aware repair feedback), and adding a monitoring script + tests to track first-attempt failure rates. It also fixes frontend tag-chip navigation to route to /plots (and tracks tag_click from StatsPage).

Changes:

Add a “Canvas dimension gate” to impl-review.yml that emits structured ::notice::canvas_gate … lines and writes delta-aware repair guidance for the AI review step.
Tighten regen + library prompts to treat canvas size as a non-negotiable contract and provide library-specific guidance (including a local PIL self-check).
Add canvas_gate_report.py + unit tests to monitor gate outcomes from workflow logs; update frontend tag routing + analytics docs/tests.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tests/unit/automation/scripts/test_canvas_gate_report.py`	Unit tests locking the structured `canvas_gate` log-line regex/aggregation contract.
`automation/scripts/canvas_gate_report.py`	New CLI script to scan `impl-review` logs and report per-library first-attempt fail rates.
`.github/workflows/impl-review.yml`	Adds the canvas dimension gate step and synthetic weakness output for repair routing.
`prompts/workflow-prompts/impl-generate-claude.md`	Adds Step 0 hard canvas contract + Step 3b PIL self-check instructions.
`prompts/workflow-prompts/ai-quality-review.md`	Requires ingesting `/tmp/anyplot-canvas-gate.txt` (if present) and forcing VQ-05 to 0/4 on drift.
`prompts/library/matplotlib.md`	Adds “Canvas — hard rule” guidance; removes `bbox_inches='tight'` in save examples.
`prompts/library/seaborn.md`	Adds “Canvas — hard rule” guidance; removes `bbox_inches='tight'` in save examples.
`prompts/library/altair.md`	Adds view/padding constraints plus post-save PIL normalization guidance for exact PNG dimensions.
`prompts/library/plotly.md`	Adds hard canvas sizing pairs; forbids `autosize=True` and pins margins.
`prompts/library/highcharts.md`	Adds CDP viewport override and PIL normalization guidance; documents multi-place sizing sync.
`prompts/library/bokeh.md`	Adds hard canvas sizing section and warns about toolbar-induced drift.
`prompts/library/pygal.md`	Adds hard canvas sizing section.
`prompts/library/plotnine.md`	Adds hard canvas sizing section and guidance consistent with matplotlib backend.
`prompts/library/letsplot.md`	Adds hard canvas sizing section and canonical size pairs.
`prompts/library/ggplot2.md`	Adds hard canvas sizing section for `ggsave` + `ragg`.
`app/src/pages/StatsPage.tsx`	Routes tag links to `/plots?...` and emits `tag_click` analytics from StatsPage.
`app/src/pages/StatsPage.test.tsx`	Adds regression test for `/plots` routing and `tag_click` tracking (including URL encoding).
`app/src/components/SpecTabs.tsx`	Updates tag navigation to `/plots?...` from spec detail.
`docs/reference/plausible.md`	Updates `tag_click` event documentation to include `StatsPage.tsx` and `source=stats`.
`app/eslint.config.js`	Declares `crypto` as a readonly global for ESLint.

+              "letsplot":   "Check `ggsize(W, H)` and `ggsave(..., scale=4)` pair — only the two canonical pairs land on target.",
+              "ggplot2":    "Check `ggsave(width=…, height=…, units='in', dpi=400)` with `ragg::agg_png`.",
+          }
+          cause = causes.get(LIBRARY, "Review `prompts/library/${LIBRARY}.md` 'Canvas — hard rule' section.")


+    return datetime.fromisoformat(value).astimezone(timezone.utc)
+
+


+TARGET = (3200, 1800)  # or (2400, 2400) for square
+
+def normalize_to_target(path: str, target: tuple[int, int], page_bg: str) -> None:
+    """Crop or pad the PNG so its dimensions exactly equal `target`."""
+    img = Image.open(path).convert("RGB")
+    tw, th = target
+    w, h = img.size


codecov · 2026-05-20T13:50:28Z

Codecov Report

❌ Patch coverage is 61.20690% with 45 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
automation/scripts/canvas_gate_report.py	60.17%	45 Missing ⚠️

📢 Thoughts on this report? Let us know!

Three fixes for sensible review comments on #7517: 1. impl-review.yml: the fallback `cause` string used shell-style `${LIBRARY}` inside a Python string, so it would be written literally into the synthetic weakness. Switch to an f-string so the message names the actual library. 2. canvas_gate_report.py parse_since(): `datetime.fromisoformat` accepts `Z` on 3.13 but the test suite documents 3.13+ as the floor, and a naive `--since 2026-05-20` would otherwise be interpreted in the host timezone. Normalize trailing `Z` to `+00:00` and treat naive datetimes as UTC explicitly so `--since 2026-05-20` is unambiguous. 3. altair.md: the `def normalize_to_target(...)` helper would have violated CQ-01 (KISS — no functions/classes in plot impls) and cost VQ points in the review. Replace with an inline crop/pad block that runs immediately after `chart.save(...)`. The PNG save sites now reference the inline block instead of calling a helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-reject (#7528) ## Summary Follow-up to #7517. The canvas gate worked perfectly on yesterday's `sn-curve-basic` regen (17/17 PASS at exact 3200×1800), but altair's PIL center-crop fallback silently chopped the title at the top edge and the first digit of every y-axis tick label off the left edge. AI review scored that 84/100, which cleared the attempt-2 ≥80 threshold, and a visibly broken altair chart shipped to the gallery. Two coordinated fixes: 1. **Altair PAD-only:** shrink inner-view defaults so vl-convert's external title/legend/axis padding still fits within target, and replace the crop-or-pad with PAD-only (raise `SystemExit` instead of cropping if vl-convert still overshoots — repair will pick that up). 2. **AR-09 EDGE_CLIPPING:** new hard auto-reject (Score=0, REJECTED) in the AI review. Fires when any text touches or extends past the canvas border. Distinct from the soft VQ-05 "no overflow" check — that one only deducts; AR-09 rejects outright. ## Changes ### `prompts/library/altair.md` - Landscape inner view: `(800, 450)` → `(620, 320)` (leaves ~500 px / 580 px of room for title + legend + axes inside 3200×1800) - Square inner view: `(600, 600)` → `(500, 460)` (similarly for 2400×2400) - Normalization: PAD-only with `Image.new(...) + paste(...)`. If `vl-convert` still produces an oversized PNG, raise `SystemExit` with the actual dims — impl-repair triggers and the next attempt shrinks further. - Removed the destructive center-crop branch entirely. ### `prompts/workflow-prompts/ai-quality-review.md` - Expanded section 6 from "Check for Auto-Reject (AR-08)" to "(AR-08, AR-09)". - AR-09 lists the five concrete clipping patterns ("title cropped at top", "y-tick missing leftmost digit", "x-axis label cut at bottom", "legend behind border", "annotation bounding box partially outside PNG"). - False-positive guard so the rule does not over-trigger on tooltips, decorative borders, or text that overflows its axis but stays within the canvas. ### `prompts/quality-evaluator.md` + `prompts/quality-criteria.md` - AR-09 added to both AR catalogues with description, trigger conditions, exceptions, and check order updated to AR-01 → … → AR-09. ## Why now The canvas-size hard gate (#7517) enforces *dimensions* but cannot see *what is at those edges*. The combination of "exact 3200×1800 PNG" + "title pixels chopped off" is the worst possible outcome because it gives a green check while shipping a broken chart. AR-09 closes that gap. ## Test plan - [x] Local diff inspection — altair example block uses (620, 320), normalization is PAD-only with `SystemExit` on overshoot. - [ ] After merge, dispatch `bulk-generate.yml -f specification_id=sn-curve-basic` (where altair's title-clip was reproducible) and confirm: - altair first-attempt PNG has the full title visible at the top - all y-axis tick labels show their leading digit - if AR-09 fires on any other library, score is 0 and repair triggers - [ ] Watch the `canvas_gate_report.py` output over the next few daily-regen runs — altair should now consistently hit target without padding ugliness. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MarkusNeusinger and others added 4 commits May 20, 2026 14:29

Copilot AI review requested due to automatic review settings May 20, 2026 13:43

Copilot started reviewing on behalf of MarkusNeusinger May 20, 2026 13:44 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

MarkusNeusinger merged commit ace2673 into main May 20, 2026
8 of 9 checks passed

MarkusNeusinger deleted the feat/canvas-size-hard-gate branch May 20, 2026 14:04

MarkusNeusinger mentioned this pull request May 20, 2026

fix(altair,review): PAD-only normalization + AR-09 edge-clipping auto-reject #7528

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(regen): hard canvas-size gate with delta-aware repair messages#7517

feat(regen): hard canvas-size gate with delta-aware repair messages#7517
MarkusNeusinger merged 5 commits into
mainfrom
feat/canvas-size-hard-gate

MarkusNeusinger commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return datetime.fromisoformat(value).astimezone(timezone.utc)

Conversation

MarkusNeusinger commented May 20, 2026

Summary

Background

Changes

Layer 1 — prompts/workflow-prompts/impl-generate-claude.md

Layer 2 — prompts/library/*.md

Layer 3 — .github/workflows/impl-review.yml + prompts/workflow-prompts/ai-quality-review.md

Monitoring

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Layer 1 — `prompts/workflow-prompts/impl-generate-claude.md`

Layer 2 — `prompts/library/*.md`

Layer 3 — `.github/workflows/impl-review.yml` + `prompts/workflow-prompts/ai-quality-review.md`

codecov Bot commented May 20, 2026 •

edited

Loading