Visual regression CI with semantic VLM triage layer by MengpingZhang · Pull Request #19578 · kubestellar/console

MengpingZhang · 2026-06-25T08:00:27Z

Runs without a VLM by default. This PR needs no API key: with no VISUAL_TRIAGE_API_KEY configured, the workflow runs in detect-only mode — a real visual change is routed to human review, fails the check, and files a tracking issue (update the baseline if intended, otherwise fix). The cheap pixel rules (identical / sub-noise / full-page) still resolve the obvious cases with no model. Pointing a vision-capable model at VISUAL_TRIAGE_API_KEY later upgrades this to automated semantic classification — no other change needed. The auto-fix half of the loop already reuses the repo's Hive automation (org-credentialed), so no per-contributor key is required for fixes either.

Draft for review (Mengping/VSIP). Baselines are app-version-specific, so screenshots may not match upstream main yet — see CI note at the bottom.

Adds a durable, self-perpetuating visual-regression system for the console: a gated
Playwright screenshot suite with committed baselines, plus a semantic triage layer
that classifies each pixel diff (regression / intended change / noise) and drives an
auto-issue → AI-fix → close-on-green loop.

What it does

Gated GitHub Actions workflow (runs only on UI-path changes) builds the app, screenshots
core routes/components across viewports with a determinism settle helper, and compares
against committed Linux/Chromium baseline PNGs. Compare vs. generate-baseline are separated.
On a pixel diff, a semantic triage layer (scripts/visual-diff-triage.py) classifies the diff;
cheap cases (0-pixel / sub-threshold / full-page) resolve with no model call, the rest go to a
VLM. It fails closed to human review on model error / low confidence / high-risk paths.
On failure the system auto-creates a structured GitHub issue (the failure "schema" lives in
visual-regression-failure-issue.yml) with confidence-gated labels so the AI fixer (Hive) can
pick up confident regressions; close-on-green auto-closes the issue and writes a resolution
verdict back to an append-only ledger for accuracy metrics.

Triggers & flow

pull_request on web/src/** or web/e2e/visual/** (and the workflow files) → compare mode.
Diff → triage → CI red on regression/human_review; auto-issue with triage/accepted+ai-fix-requested
(confident regression) or kind/bug+needs-triage (otherwise). Green again → issue auto-closed.

Refreshing baselines

See web/e2e/visual/BASELINES.md. Use the workflow's generate-baseline mode (workflow_dispatch
generate_baselines) to regenerate the Linux baselines as an artifact, then commit them.

Config knobs (.github/visual-triage-config.json)

confidence_cutoff (CI-fail bar), auto_accept_min_confidence (auto-fix bar), high_risk_globs,
per-run token/call budget, eval_min_accuracy, target_regression_precision, min_samples.

Accuracy gate & metrics

web/e2e/visual/triage-eval/ holds a curated eval set; visual-triage-eval.yml runs the same
pipeline (real VLM when VISUAL_TRIAGE_API_KEY is set, else a deterministic mock smoke) and fails
below eval_min_accuracy. A metrics workflow publishes a regression-precision badge.

This is meant to keep running as standing infrastructure, not a one-off check.

Files: 99 changed (+3171/−87) — the 5 workflows, the visual specs + 75 Linux baselines + eval set under web/e2e/visual/, the visual-settle.ts determinism helper, scripts/visual-diff-triage.py + merge_ledger.py, .github/visual-triage-config.json, and small touches to web/e2e/helpers/setup.ts, app-visual.config.ts, and docs/security/SECURITY-AI.md. No unrelated files.

kubestellar-prow · 2026-06-25T08:00:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mikespreitzer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2026-06-25T08:00:31Z

✅ Deploy Preview for kubestellarconsole canceled.

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`8eb9068`
🔍 Latest deploy log	https://app.netlify.com/projects/kubestellarconsole/deploys/6a3d41d201433400088bb782

github-actions · 2026-06-25T08:00:36Z

👋 Welcome to the KubeStellar community! 💖

Thanks and congrats 🎉 for opening your first PR here! We're excited to have you contributing.

Before merge, please ensure:

✅ DCO Sign-off — All commits signed with git commit -s (DCO)
✅ PR Title — Starts with an emoji: ✨ feature | 🐛 bug fix | 📖 docs | 🌱 infra/tests | ⚠️ breaking

📬 If you're using KubeStellar in your organization, please add your name to our Adopters list. 🙏 It really helps the project gain momentum and credibility — a small contribution back with a big impact.

Resources:

🖥️ KubeStellar Console — Try the live multi-cluster dashboard
📖 Console Docs — Installation, features, and architecture
🧩 Marketplace — Community extensions and cards
📚 Knowledge Base — Troubleshooting and how-tos
📖 Full Documentation | 🤝 Contributor Handbook | 💬 Slack

A maintainer will review your PR soon. Hope you have a great time here!

🌟 ~~~~~~~~~~ 🌟

📬 If you like KubeStellar, please ⭐ star ⭐ our repo to support it!

🙏 It really helps the project gain momentum and credibility — a small contribution back with a big impact.

+          SOURCE_RUN_ID: ${{ env.SOURCE_RUN_ID }}
+          ARTIFACT_ROOT: visual-regression-artifacts
+        with:
+          script: |


Adds a durable, self-perpetuating visual-regression system for the console: a gated Playwright screenshot suite with committed baselines, plus a semantic triage layer that classifies each pixel diff (regression / intended change / noise) and drives an auto-issue → AI-fix → close-on-green loop. What it does - Gated GitHub Actions workflow (runs only on UI-path changes) builds the app, screenshots core routes/components across viewports with a determinism `settle` helper, and compares against committed Linux/Chromium baseline PNGs. Compare vs. generate-baseline are separated. - On a pixel diff, a semantic triage layer (scripts/visual-diff-triage.py) classifies the diff; cheap cases (0-pixel / sub-threshold / full-page) resolve with no model call, the rest go to a VLM. It fails closed to human review on model error / low confidence / high-risk paths. - On failure the system auto-creates a structured GitHub issue (the failure "schema" lives in visual-regression-failure-issue.yml) with confidence-gated labels so the AI fixer (Hive) can pick up confident regressions; close-on-green auto-closes the issue and writes a resolution verdict back to an append-only ledger for accuracy metrics. Triggers & flow - pull_request on web/src/** or web/e2e/visual/** (and the workflow files) → compare mode. - Diff → triage → CI red on regression/human_review; auto-issue with `triage/accepted`+`ai-fix-requested` (confident regression) or `kind/bug`+`needs-triage` (otherwise). Green again → issue auto-closed. Refreshing baselines - See web/e2e/visual/BASELINES.md. Use the workflow's generate-baseline mode (workflow_dispatch `generate_baselines`) to regenerate the Linux baselines as an artifact, then commit them. Config knobs (.github/visual-triage-config.json) - confidence_cutoff (CI-fail bar), auto_accept_min_confidence (auto-fix bar), high_risk_globs, per-run token/call budget, eval_min_accuracy, target_regression_precision, min_samples. Accuracy gate & metrics - web/e2e/visual/triage-eval/ holds a curated eval set; visual-triage-eval.yml runs the same pipeline (real VLM when VISUAL_TRIAGE_API_KEY is set, else a deterministic mock smoke) and fails below eval_min_accuracy. A metrics workflow publishes a regression-precision badge. This is meant to keep running as standing infrastructure, not a one-off check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Default operation without a VLM key: the workflow runs in detect-only mode — a real visual change is routed to human review, fails the check, and files a tracking issue (update the baseline if intended, else fix). Configuring a vision-capable model via VISUAL_TRIAGE_API_KEY upgrades this to automated semantic classification. Signed-off-by: Mengping Zhang <mengping.zhang@bytedance.com>

kubestellar-prow Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2026

kubestellar-prow Bot added the dco-signoff: no Indicates the PR's author has not signed the DCO. label Jun 25, 2026

kubestellar-prow Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 25, 2026

github-advanced-security AI found potential problems Jun 25, 2026

View reviewed changes

Comment thread .github/workflows/visual-regression-failure-issue.yml

SOURCE_RUN_ID: ${{ env.SOURCE_RUN_ID }}

ARTIFACT_ROOT: visual-regression-artifacts

with:

script: |

MengpingZhang force-pushed the visual-regression-system branch from 7b11236 to 606982d Compare June 25, 2026 14:28

kubestellar-prow Bot added dco-signoff: yes Indicates the PR's author has signed the DCO. and removed dco-signoff: no Indicates the PR's author has not signed the DCO. labels Jun 25, 2026

MengpingZhang force-pushed the visual-regression-system branch from 606982d to 8eb9068 Compare June 25, 2026 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Visual regression CI with semantic VLM triage layer#19578

Visual regression CI with semantic VLM triage layer#19578
MengpingZhang wants to merge 1 commit into
kubestellar:mainfrom
DavidDiaz0317:visual-regression-system

MengpingZhang commented Jun 25, 2026 •

edited

Loading

Uh oh!

kubestellar-prow Bot commented Jun 25, 2026

Uh oh!

netlify Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MengpingZhang commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kubestellar-prow Bot commented Jun 25, 2026

Uh oh!

netlify Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubestellarconsole canceled.

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MengpingZhang commented Jun 25, 2026 •

edited

Loading

netlify Bot commented Jun 25, 2026 •

edited

Loading