Skip to content

feat(pipeline): candidate-accuracy gates wiring + feedback loop (skills/workflow)#216

Merged
yasirhamza merged 8 commits into
mainfrom
feat/pipeline-candidate-accuracy
Jun 11, 2026
Merged

feat(pipeline): candidate-accuracy gates wiring + feedback loop (skills/workflow)#216
yasirhamza merged 8 commits into
mainfrom
feat/pipeline-candidate-accuracy

Conversation

@yasirhamza

Copy link
Copy Markdown
Owner

AndroDR half of the candidate-accuracy spec (docs/superpowers/specs/2026-06-10-pipeline-candidate-accuracy-design.md). Sigma-repo half merged as rules#32 + rules#33.

Phase 1 — deterministic gates

  • Submodule bumped twice: candidate-accuracy lints (severity cap keyed on top-level category per runtime applyCap semantics, lone-exploited-CVE, retired-ID registry) + reviewer-driven hardening (category enum check closing a case-sensitivity bypass, empty-CVE-list rejection, clean non-dict display error). 40/40 sigma tests incl. a regression sweep over every shipped rule.
  • e2e workflow: Gate 5 now runs as an independent reviewer agent per candidate (no gate results / author reasoning in its context); validator runs Gates 1/1.2/2/3 only; verdicts merged in workflow code.
  • e2e workflow: one author-repair round — failed candidates go back to a fresh author with validator stderr, then re-validate + re-review; skip_note escape hatch; second failure is final.

Phase 2 — feedback loop

  • Dispatcher Step 8.3: public run ledger (pipeline-runs/YYYY-MM-DD-<mode>.yml) recording every HitL verdict + reason + failure class; ledger failures never block rule commits.
  • Dispatcher Step 8.4: lessons curation (cap 20, HitL-gated) + approval-without-modification trend over the last 5 runs.
  • validation/authoring-lessons.yml (seeded with the posture-cap, no-TROJAN, and lone-CVE lessons) injected into author, validator, reviewer, and repair prompts in both the dispatcher path and the e2e workflow.

Review

Two-reviewer cycle ran mid-branch; all findings fixed: dispatcher-path lessons gap, decisions-manifest wipe on repair ([] truthiness), temp-file slot uniqueness for duplicate rule_ids, empty-parallel guard, validate-skill Gate-5 wording contradiction, deferred-list completeness, two spec amendments. One finding intentionally deferred: a pytest job in the sigma repo's CI (GHA budget policy — suites run locally per project convention).

Verification: workflow body parses under the harness wrapper; discover suite 19/19; sigma validation suite 40/40. No Android app code touched.

🤖 Generated with Claude Code

Yasir and others added 8 commits June 11, 2026 06:59
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s injection (prompts/schemas)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e repair round

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…uration; author/validator lessons + lint notes

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ves applyCap, display.category is UI-only

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lidator path, decisions-preserve guard, unique temp slots, deferred-list completeness, spec amendments

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@yasirhamza yasirhamza merged commit 3ae7c11 into main Jun 11, 2026
3 of 7 checks passed
@yasirhamza yasirhamza deleted the feat/pipeline-candidate-accuracy branch June 11, 2026 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant