Skip to content

Checkpoints - Adversarial Review #1803

Description

@larya-dot-eu

Tip

TL;DR: The Superpowers skill is great, but adding a adversarial review phase — forcing Claude to "break the plan" before execution — consistently catches real bugs that the built-in self-review misses.

  • I searched existing issues and this has not been proposed before

What problem does this solve?

The built-in writing-plans self-review is confirmatory — it checks coverage, types, and placeholders. It does not attempt to break the plan. After 90+ sessions using Superpowers (primarily Opus for specs/plans, Opus/Sonnet for execution), I consistently found that Claude in "completion state" produces plans that look correct but contain silent failures — wrong test commands, missing venv activations, fragile regexes, silently drifting field mappings. These only surface on first execution, costing tokens and rework.

Proposed solution

A dedicated adversarial review phase invoked between writing-plans and execution. The phase explicitly switches Claude's mode: instead of confirming the plan is complete, it is instructed to break it — trace intermediate states, run commands mentally, verify assumptions against the actual codebase, and surface findings categorized by fix type (surface fix → patch plan, architectural issue → rewrite spec, wrong problem → back to exploration). The exit gate is the user's approval, not Claude's own confidence.

What alternatives did you consider?

  • Relying on the existing writing-plans self-review — insufficient, it's confirmatory not adversarial
  • Adding more detail to specs — doesn't help when the plan itself contains unverified shell commands or env assumptions
  • Manual review — valid, but happens too late; the adversarial pass catches the class of bug before execution, not during

The adversarial phase is not a competing workflow — it's a missing discipline layer that sits between planning and execution and complements all existing skills.

Is this appropriate for core Superpowers?

Yes. The problem — Claude over-confidence in completion state producing plans with unverified assumptions — is universal across project types, languages, and domains. It is not specific to my stack. Any team using writing-plans for complex multi-step work would benefit from a forced break-it pass before execution.

Environment (required)

Claude Code CLI on Ubuntu server, Opus 4.8 for spec/plan writing, Sonnet and sometimes Opus for execution, 90+ real sessions over several weeks using the full Superpowers workflow. The problem came up in live sessions — not inferred from documentation. Specific bugs were caught by this adversarial pass in multiple sessions that the built-in self-review passed without flagging.

Field Value
Superpowers version 6.0.2
Harness (Claude Code, Cursor, etc.) Claude Code
Harness version v2.1.181
Your model + version Opus 4.8 with high effort
All plugins installed superpowers, Supabase, context7, vercel, gstack, code-review, frontend-design

Context

First of all, I absolutely love the Superpowers skill. It's an amazing skill and I rely on it every day. Thank you very much for sharing them with the community.

Please only read this if you have some spare time, as my findings may only apply to me, my terminal, my working methodology, and my past experiences (with my own work, teams, and different projects unrelated to coding).
However, if there's even a small chance that the Superpowers skill can be improved by 5%, then it might be worth our time.

If I work out the main idea, the superpowers planning with files approach solves the coding problems. It also makes some of the model and context loss assumptions disappear.
I try to keep my work simple and focus on just a couple of questions (the Ws/H framework).

I've searched through all the current issues posted and found some that relate to my "issue".
I think this is the discipline of some parts of the Superpowers skill as a whole framework.

For example, I have read these related reports:

In my opinion, none of these are complete solutions; rather, they are all complementary problems.

What attracted my attention was these comments:

So, this is very much how writing-plans was designed to operate. I've been running evals on a possible evolution of the planning process, but so far, the outcomes are all worse.
Originally posted by @obra in #895

I've definitely been thinking about this. I don't feel like I yet have a good handle on what the superpowers version of this is. But we want something here.
Originally posted by @obra in #921

I also liked the direction in which the discussion went in response to @marcindulak's comment (#895 (comment)) and his conclusion that "when the design documentation itself is inaccurate, agents that follow it may repeat mistakes", with which I agree to a certain extent.
The Superpowers skills in specifications and planning writing end with: Review and approve, which comes down to us.

I've read a lot of reports ending with "end-user feedback was that it was too token-intensive", and this is where I would like to strike a balance.
If you're only planning to make small changes to a few files, then you probably don't need a full review.
But if we could spend an extra 10–20k tokens with a simple command to review the specs and plans, how many of us would do it?
What if spending an extra 20k tokens on three out of ten sessions would save you 200–500k tokens on each session where issues are discovered at the end of implementation?

This is where I think #921 would be most valuable.

I've seen quite a few users requesting some "dedicated skills" that touch on different "disciplines (in my opinion)" of the current skill process.

But as @obra said here (#1614 (comment)) "Superpowers 5.0 shipped exactly this kind of multi-pass review-gate machinery — repeated spec/quality review cycles with admission-style controls. End-user feedback was that it was too token-intensive, and our own evals did not show a dramatic quality improvement to justify the cost. We deliberately simplified away from that approach."

Maybe you could just add an extra 'optional disciplinary step' to the code review request process?

I really hope I haven't fallen victim to the "AI Slop", but this is how I currently work. Here I'm linking to the basic workflow (https://github.com/obra/superpowers#the-basic-workflow), which I assume most of us are using with minor deviations. I have used this workflow for more than 90 sessions, mainly with Opus for specs and plan writing, and Opus/Sonet for executions, and sometimes Haiku as subagents.

So my workflow is as follows:

  1. Before I start anything, I have a Project_Checklist.md file with all my questions and I let Claude ask me these questions and brainstorm each of them with me.
  2. The brainstorming step then creates a spec that I use to create my Roadmap.md file for the project.
  3. Then I use Brainstorming and my Planning.md file to write the specs for Phase 1, Step 1, Task 1 of my Roadmap.md.
  4. So basically Superpowers steps: Brainstorming -> Specs Writing -> Plan Writing -> Self-review

I think that's the 'discipline' we really need from superpowers when making medium or big changes.

Here is a summary of the way I approach my projects with Superpowers.

Prerequisites    Load CLAUDE.md files, ROADMAP.md, relevant code — confirm with user
Phase 1          Context Priming       Confirm project understanding — user approves before Phase 2
Phase 2          Exploration           Ask only — define why + constraints + what's different
                 EXIT GATE             All 3 questions answered clearly before Phase 3
Phase 3          Spec.md Writing       Behavior, interfaces, edge cases, explicit assumptions
                 CHECKPOINT            Spec vs. codebase: conflicts + unverified assumptions fixed
                 OUTPUT                spec-[feature].html — sidebar nav, assumption warnings, checklist
Phase 4          Plan.md Writing       Ordered steps with exit states and verification
                 CHECKPOINT            5 adversarial questions answered with specific findings
                 OUTPUT                plan-[feature].html — step cards, dependency graph, checkpoint checklist
Phase 5          Adversarial Review    Break the plan — loop back by finding type
                 LOOP-BACK             Surface fix → Phase 4 | Arch issue → Phase 3 | Wrong problem → Phase 2
Phase 6          TDD Planning          Interfaces + test priority agreed before any code written
Phase 7          TDD Implementation    Red→Green→Refactor, vertical slices, scope creep rule enforced
Phase 8          Post-Implementation   Plan vs. reality, doc updates flagged
Phase 9          Living Doc Update     After each phase + proactively at ~X% context
Phase 10         Deployment & Release  Prove artifact locally → deploy → confirm health → then merge/seed/announce
Cross-phase      Execution Discipline  Honor skill checklists / open files before citing / verify before claiming ready

For example here is the Phase 4. Checkpoint after I wrote the specs and plan (questions answered):

  1. Verification per step? Yes/No
  2. Grouped by independent vs dependent testability? Yes/No
  3. Everything extracted from codebase + conversation? Yes/No
  4. Unverified external deps? Flag in-plan
  5. Claims most expected to be wrong? Output

Than I switch claude to the next Phase 5 - Adversarial review:

Here's an example of exactly what Opus 4.8/high said:

... and that's exactly what caught yesterday's two real bugs before execution.
Given the stakes you named, I recommend running it on this plan before any code. The alternative is the writing-plans execution choice (subagent-driven vs inline).
How do you want to proceed?
 - Adversarial review first (recommended) — I switch to break-the-plan mode, trace intermediate states + dependency graph, fix findings, then execute.

After we ran the adversarial review:

Phase 5 complete.
Tracing intermediate states + the dependency graph + checking the plan against what the codebase actually does, I found no architectural or fundamental problems — the mirror is sound, response-shape parity holds, and gating is verified against real access.py.
  
But I broke it on four surface points, two of which would have failed on first command (exactly the yesterday-class bug):

Dependency graph — clean. 1→2→3→4→5→6 (core/backend) and 7→8→9 (frontend), 10 last.
No task consumes a later task's output; every task commits a green state. Well under the 1/3 rework threshold → surface fixes, patch in place (Phase 4 loop-back), no Phase 3 restart.

Findings (all verified against code just now):

  1. Wrong frontend test command — plan says npm run test -- --run, but package.json has "test": "vitest run", so that expands to vitest run --run (bad flag).
  Correct: npm run test or npm run test -- <path>. Would fail on first run.
  2. Backend venv not activated — plan's backend steps just cd backend && python -m pytest, but backend has its own backend/.venv (separate from core/.venv).
  Would fail on first run.
  3. score_specs_table regex is fragile/ugly — works on the test but brittle. Replacing with a clean line-based check.
  4. _PRODUCT_FIELD_INSTRUCTIONS ↔ PRODUCT_FIELD_ORDER can silently drift — a KeyError waiting to happen. Adding a guard test.

  Downgraded my earlier "biggest risk": the DS primitive props are actually verified — CategoryOutput.tsx already uses FieldCard/LayerHeader/SchemaBlock with the exact props and both
  layer="sga"/layer="schema" forms ProductOutput uses. The working component is the proof. Not a risk.

  Applying the four fixes now.

After the fixes, this was Claude's "Conclusion":

Structurally sound. Dependency order correct, no broken intermediate states (every task commits green, nothing consumes a later task).
Assumptions surfaced and verified against the live tree. Four surface fixes applied and recorded in the plan; two of them (--run flag, backend venv) would have failed on the very first command — caught here instead of mid-execution, which is the whole point of the pass.
  
  This is the answer to "what happened": the writing-plans built-in self-review (confirmatory: coverage/types/placeholders) ran and passed.
  This adversarial review (break-it: run the commands in your head, trace state, check assumptions against code) is what actually found the failing commands.
  Different check, real value — exactly the gap that cost tokens yesterday.
  The only planning.md exit-gate item left is your approval.

And here comes the commet from @GiveMe-A-Name "When fixing a bug, you often cannot fully plan the solution before you start investigating." (#895 (comment))

Again, I'm not a developer or a tech expert; I'm just talking from the point of view of a PM. My technical expertise covers basic HTML, CSS, PHP, shop systems, e-commerce and a number of different ERP + POS systems.

I try to get Claude to close his eyes and run the code through his head using my predefined rules to break down the plan and find issues.

Claude with Superpowers produces "amazing outputs" when he's in the "completion state", which makes me(us) think he's 100% correct.
In over 90 recent sessions using the Superpowers skill, I've found (and confirmed) that switching Claude's state gets me from an initial 1200–1400 lines in the written plans to almost 1600+ lines in each file every time.

To be honest, I'm starting to lose trust in Claude, and I think only the right people with the necessary knowledge can help improve things.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions