feat(regen): hard canvas-size gate with delta-aware repair messages#7517
Merged
Conversation
Tag chips on the spec/impl detail page and on the /stats tag-distribution
block built URLs as /?<param>=<value>. Since the page split, / is the
LandingPage and no longer accepts filter query params — only /plots does.
The links resolved to the landing page with stray query params and lost
the filter intent.
Fixes:
- SpecTabs.handleTagClick: navigate("/plots?…") instead of "/?…". One
call site covers both spec-overview and impl-detail since SpecPage
mounts <SpecTabs> for both router modes.
- StatsPage tag links: target /plots, encodeURIComponent the value, and
fire `tag_click` with `source: 'stats'` (StatsPage was silent on this
event — only spec_detail was tracked before).
- docs/reference/plausible.md: list StatsPage.tsx as a tag_click source
and document the `source ∈ spec_detail, stats` enum.
Drive-by: add `crypto` to the eslint browser-globals list so
FeedbackWidget.tsx (uses crypto.randomUUID / getRandomValues) stops
failing `no-undef`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Copilot review feedback on #7516: the StatsPage tag-link behavior changed in the parent commit had no test coverage (also flagged by codecov/patch/frontend). Adds one test that locks in: - the link's href is /plots?plot=scatter (regression guard for the pre-fix /?plot=scatter that silently dropped the filter intent), and - clicking fires trackEvent('tag_click', { param, value, source: 'stats' }). The useAnalytics mock is hoisted into a module-level mockTrackEvent so the assertion can see the call. Cleared per test via mockClear() because vi.restoreAllMocks() does not reset hoisted vi.fn() instances on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior test asserted href === '/plots?plot=scatter', which proves
the /plots routing but does not exercise encodeURIComponent — 'scatter'
has no special chars, so the assertion would still pass if the encoding
were removed.
Copilot's review explicitly called out '/plots?{param}={encodedTag}',
so add a 'time series' tag (space in the value) under data_type. The
new assertion locks in href === '/plots?data=time%20series', which
fails the moment encodeURIComponent is dropped from the source. Also
clicks both links and asserts trackEvent receives the raw (unencoded)
value, matching the documented plausible.md contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yesterday's per-library sizing change (1b1f279) shipped, but last night's daily-regen exposed systematic canvas drift across 7 of 10 libraries: ggplot2 (generated fresh, no prior impl to anchor on) and pygal hit the 3200×1800 / 2400×2400 target exactly, while highcharts/bokeh ended ~140 px short, matplotlib/seaborn were trimmed by bbox_inches='tight', and altair had no canvas control at all. The two prior soft tightenings of "Base style ALWAYS wins" couldn't beat the in-context anchor of the previous implementation's figsize/width/height values. This change enforces canvas dimensions through three coordinated layers: - impl-generate-claude.md: STEP 0 canvas contract above the previous-impl reference, with a PIL self-check sub-step in Step 3 so Claude catches drift inside its own retry loop before pushing. - per-library prompts: explicit "Canvas — hard rule" sections naming both canonical pixel pairs. Surgical fixes for the actual offenders — drop bbox_inches='tight' from matplotlib/seaborn save calls; add a PIL crop/pad normalizer to altair (vl-convert pads outside width/height); forbid autosize=True + pin margin in plotly; use Chrome CDP setDeviceMetricsOverride in highcharts (--window-size alone gets eaten by Chrome chrome). - impl-review.yml: pre-AI-review canvas gate that emits a structured ::notice:: log line per evaluation and, on drift, writes /tmp/anyplot-canvas-gate.txt with actual×actual, closest target, signed delta, and a library-specific likely cause. ai-quality-review.md picks the file up, copies the synthetic weakness verbatim, and forces VQ-05 to 0/4 — which drags quality_score below the repair threshold and routes the PR through the existing 5-review/ 4-repair cascade. Routing through review (not a raw step failure) preserves attempt counting and PR cleanup; impl-repair is dispatched only by the ai-rejected label, never by a hard step exit. The repair feedback names the *actual* drift, not just the target — so the repair model knows which knob to turn and which direction, rather than guessing on the next attempt. automation/scripts/canvas_gate_report.py aggregates the structured notice lines across recent impl-review runs and prints per-library first-attempt fail rates; anything above 20 % over 14 days means the library prompt is still leaking and gets another tightening pass rather than burning compute in steady-state repair. Verified by replaying the gate logic against the 79 PNGs from last night's regen (/tmp/regen-compare/): 38 pass (ggplot2 + pygal + most plotly/plotnine/ letsplot), 41 fail with delta + cause messages matching the manual audit. 1485 unit tests pass; 12 new tests lock the ::notice:: log contract between the gate step and the report parser. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the regeneration pipeline against canvas-size drift by introducing a pre-AI-review pixel gate, updating generation/review prompts to treat canvas size as a hard contract (with delta-aware repair feedback), and adding a monitoring script + tests to track first-attempt failure rates. It also fixes frontend tag-chip navigation to route to /plots (and tracks tag_click from StatsPage).
Changes:
- Add a “Canvas dimension gate” to
impl-review.ymlthat emits structured::notice::canvas_gate …lines and writes delta-aware repair guidance for the AI review step. - Tighten regen + library prompts to treat canvas size as a non-negotiable contract and provide library-specific guidance (including a local PIL self-check).
- Add
canvas_gate_report.py+ unit tests to monitor gate outcomes from workflow logs; update frontend tag routing + analytics docs/tests.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/automation/scripts/test_canvas_gate_report.py |
Unit tests locking the structured canvas_gate log-line regex/aggregation contract. |
automation/scripts/canvas_gate_report.py |
New CLI script to scan impl-review logs and report per-library first-attempt fail rates. |
.github/workflows/impl-review.yml |
Adds the canvas dimension gate step and synthetic weakness output for repair routing. |
prompts/workflow-prompts/impl-generate-claude.md |
Adds Step 0 hard canvas contract + Step 3b PIL self-check instructions. |
prompts/workflow-prompts/ai-quality-review.md |
Requires ingesting /tmp/anyplot-canvas-gate.txt (if present) and forcing VQ-05 to 0/4 on drift. |
prompts/library/matplotlib.md |
Adds “Canvas — hard rule” guidance; removes bbox_inches='tight' in save examples. |
prompts/library/seaborn.md |
Adds “Canvas — hard rule” guidance; removes bbox_inches='tight' in save examples. |
prompts/library/altair.md |
Adds view/padding constraints plus post-save PIL normalization guidance for exact PNG dimensions. |
prompts/library/plotly.md |
Adds hard canvas sizing pairs; forbids autosize=True and pins margins. |
prompts/library/highcharts.md |
Adds CDP viewport override and PIL normalization guidance; documents multi-place sizing sync. |
prompts/library/bokeh.md |
Adds hard canvas sizing section and warns about toolbar-induced drift. |
prompts/library/pygal.md |
Adds hard canvas sizing section. |
prompts/library/plotnine.md |
Adds hard canvas sizing section and guidance consistent with matplotlib backend. |
prompts/library/letsplot.md |
Adds hard canvas sizing section and canonical size pairs. |
prompts/library/ggplot2.md |
Adds hard canvas sizing section for ggsave + ragg. |
app/src/pages/StatsPage.tsx |
Routes tag links to /plots?... and emits tag_click analytics from StatsPage. |
app/src/pages/StatsPage.test.tsx |
Adds regression test for /plots routing and tag_click tracking (including URL encoding). |
app/src/components/SpecTabs.tsx |
Updates tag navigation to /plots?... from spec detail. |
docs/reference/plausible.md |
Updates tag_click event documentation to include StatsPage.tsx and source=stats. |
app/eslint.config.js |
Declares crypto as a readonly global for ESLint. |
| "letsplot": "Check `ggsize(W, H)` and `ggsave(..., scale=4)` pair — only the two canonical pairs land on target.", | ||
| "ggplot2": "Check `ggsave(width=…, height=…, units='in', dpi=400)` with `ragg::agg_png`.", | ||
| } | ||
| cause = causes.get(LIBRARY, "Review `prompts/library/${LIBRARY}.md` 'Canvas — hard rule' section.") |
Comment on lines
+74
to
+76
| return datetime.fromisoformat(value).astimezone(timezone.utc) | ||
|
|
||
|
|
Comment on lines
+29
to
+35
| TARGET = (3200, 1800) # or (2400, 2400) for square | ||
|
|
||
| def normalize_to_target(path: str, target: tuple[int, int], page_bg: str) -> None: | ||
| """Crop or pad the PNG so its dimensions exactly equal `target`.""" | ||
| img = Image.open(path).convert("RGB") | ||
| tw, th = target | ||
| w, h = img.size |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Three fixes for sensible review comments on #7517: 1. impl-review.yml: the fallback `cause` string used shell-style `${LIBRARY}` inside a Python string, so it would be written literally into the synthetic weakness. Switch to an f-string so the message names the actual library. 2. canvas_gate_report.py parse_since(): `datetime.fromisoformat` accepts `Z` on 3.13 but the test suite documents 3.13+ as the floor, and a naive `--since 2026-05-20` would otherwise be interpreted in the host timezone. Normalize trailing `Z` to `+00:00` and treat naive datetimes as UTC explicitly so `--since 2026-05-20` is unambiguous. 3. altair.md: the `def normalize_to_target(...)` helper would have violated CQ-01 (KISS — no functions/classes in plot impls) and cost VQ points in the review. Replace with an inline crop/pad block that runs immediately after `chart.save(...)`. The PNG save sites now reference the inline block instead of calling a helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
MarkusNeusinger
added a commit
that referenced
this pull request
May 20, 2026
…-reject (#7528) ## Summary Follow-up to #7517. The canvas gate worked perfectly on yesterday's `sn-curve-basic` regen (17/17 PASS at exact 3200×1800), but altair's PIL center-crop fallback silently chopped the title at the top edge and the first digit of every y-axis tick label off the left edge. AI review scored that 84/100, which cleared the attempt-2 ≥80 threshold, and a visibly broken altair chart shipped to the gallery. Two coordinated fixes: 1. **Altair PAD-only:** shrink inner-view defaults so vl-convert's external title/legend/axis padding still fits within target, and replace the crop-or-pad with PAD-only (raise `SystemExit` instead of cropping if vl-convert still overshoots — repair will pick that up). 2. **AR-09 EDGE_CLIPPING:** new hard auto-reject (Score=0, REJECTED) in the AI review. Fires when any text touches or extends past the canvas border. Distinct from the soft VQ-05 "no overflow" check — that one only deducts; AR-09 rejects outright. ## Changes ### `prompts/library/altair.md` - Landscape inner view: `(800, 450)` → `(620, 320)` (leaves ~500 px / 580 px of room for title + legend + axes inside 3200×1800) - Square inner view: `(600, 600)` → `(500, 460)` (similarly for 2400×2400) - Normalization: PAD-only with `Image.new(...) + paste(...)`. If `vl-convert` still produces an oversized PNG, raise `SystemExit` with the actual dims — impl-repair triggers and the next attempt shrinks further. - Removed the destructive center-crop branch entirely. ### `prompts/workflow-prompts/ai-quality-review.md` - Expanded section 6 from "Check for Auto-Reject (AR-08)" to "(AR-08, AR-09)". - AR-09 lists the five concrete clipping patterns ("title cropped at top", "y-tick missing leftmost digit", "x-axis label cut at bottom", "legend behind border", "annotation bounding box partially outside PNG"). - False-positive guard so the rule does not over-trigger on tooltips, decorative borders, or text that overflows its axis but stays within the canvas. ### `prompts/quality-evaluator.md` + `prompts/quality-criteria.md` - AR-09 added to both AR catalogues with description, trigger conditions, exceptions, and check order updated to AR-01 → … → AR-09. ## Why now The canvas-size hard gate (#7517) enforces *dimensions* but cannot see *what is at those edges*. The combination of "exact 3200×1800 PNG" + "title pixels chopped off" is the worst possible outcome because it gives a green check while shipping a broken chart. AR-09 closes that gap. ## Test plan - [x] Local diff inspection — altair example block uses (620, 320), normalization is PAD-only with `SystemExit` on overshoot. - [ ] After merge, dispatch `bulk-generate.yml -f specification_id=sn-curve-basic` (where altair's title-clip was reproducible) and confirm: - altair first-attempt PNG has the full title visible at the top - all y-axis tick labels show their leading digit - if AR-09 fires on any other library, score is 0 and repair triggers - [ ] Watch the `canvas_gate_report.py` output over the next few daily-regen runs — altair should now consistently hit target without padding ugliness. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
impl-generate-claude.md, "Canvas — hard rule" sections in all 10 library prompts, and a pre-AI-review pixel gate inimpl-review.ymlthat routes drift into the existing repair cascade.::notice::canvas_gatelog contract so we can spot libraries whose prompt is still leaking before steady-state compute waste.Background
After yesterday's per-library sizing change (
1b1f279a6), last night's daily-regen produced systematic canvas drift across 7 of 10 libraries — see/tmp/regen-compare/overview.htmlaudit from earlier today:bbox_inches='tight'to ~3160×1740 / ~2200×2336width/height)Two prior soft tightenings (
13cf81342,7f4a78a04) added "Base style ALWAYS wins" to the regen prompt — advisory, kept losing to the in-context anchor of the previous impl's figsize/width/height.Changes
Layer 1 —
prompts/workflow-prompts/impl-generate-claude.mdLayer 2 —
prompts/library/*.mdbbox_inches='tight'from allsavefigexamples (this was the documented cause of ~40 px trim).configure_view(continuousWidth, continuousHeight)+ zeropadding+ mandatory in-impl PIL crop/pad to exact target (escape valve so the gate never deadlocks altair structurally).autosize=True; require explicitmargin=dict(...).Emulation.setDeviceMetricsOverridevia CDP +--hide-scrollbars(the--window-sizevalue isn't authoritative in headless mode).Layer 3 —
.github/workflows/impl-review.yml+prompts/workflow-prompts/ai-quality-review.mdplot-light.pngdims, emits a structured::notice::canvas_gate library=… status=… actual=… target=… delta=… attempt=…line, and on drift writes a synthetic weakness to/tmp/anyplot-canvas-gate.txtnaming actual, closest target, signed delta, direction, optional aspect hint, and a library-specific likely cause.impl-repair.yml. No rawexit 1(would bypass repair).Monitoring
automation/scripts/canvas_gate_report.pyparses canvas_gate notice lines from recent impl-review runs and prints per-library first-attempt fail rates. Decision rule: anything > 20 % over 14 days means the library prompt is still leaking and warrants another tightening pass.Test plan
::notice::format the parser expects.bulk-generate.yml -f specification_id=sn-curve-basic(highcharts/bokeh cropped the X-axis title last night → real reproducer). Expectations: ggplot2/pygal/letsplot/plotnine/plotly pass first attempt; matplotlib/seaborn/altair/highcharts/bokeh either pass or repair to pass on attempt 2 with the named delta as feedback. No PR auto-closed.uv run automation/scripts/canvas_gate_report.pydaily for 3 days, then weekly. Flag any library > 20 % first-attempt fail rate.🤖 Generated with Claude Code