A Humanbased project, built with crosscheck.
Stop merging AI slop. Crosscheck turns agent-written PRs into merge-ready patches with a configurable Review -> Fix -> Recheck pipeline.
AI coding agents are fast, but they can still ship regressions, half-finished fixes, brittle edge cases, and "early victory" PRs that look done before they are solid. Crosscheck adds an independent safety loop: let one agent write the patch, let another review it, send the findings back to the author, then recheck the result before merge.
Define the workflow in workflow.yml: review-only, review + fix, or the full review + fix + recheck cycle. Each step runs through the claude or codex CLI against your existing subscriptions — no API keys, no per-review cost.
Built by Humanbased as a showcase of practical engineering craft for the agentic coding era.
Read the field report: What 295 Agentic PRs Taught Us About Code Review. It explains why Humanbased built Crosscheck after analyzing 295 agentic PRs and retained Crosscheck logs from real internal workflow.
- Combats AI slop before merge — catches code regressions, incomplete fixes, hallucinated assumptions, and brittle "looks green" patches.
- Uses independent reviewers — route Claude-authored PRs to Codex, Codex-authored PRs to Claude, or run a single-vendor loop when that is what you have.
- Closes the loop — review findings can become a fix step and then a recheck step, so the PR moves toward real merge readiness instead of another comment thread.
- Runs on your machine or server — no new hosted code review service, no per-review API bill, no extra infrastructure to trust.
Start with one low-risk PR before turning on continuous watch mode. You only need GitHub CLI plus one authenticated reviewer CLI.
# 1. Install crosscheck
npm install -g @humanbased/crosscheck
# 2. Authenticate GitHub
brew install gh && gh auth login
# 3. Authenticate one reviewer
npm install -g @openai/codex && codex login --device-auth
# or:
npm install -g @anthropic-ai/claude-code && claude
# 4. Check your setup
crosscheck status
# 5. Review the public fixture PR
crosscheck review https://github.com/humanbased-ai/crosscheck-proof-fixture/pull/1 --reviewer codexThis fixture PR intentionally contains a realistic agentic-code regression, so you can see whether Crosscheck produces a useful review before pointing it at your own repo. Use --reviewer claude if Claude Code is the authenticated reviewer. After the fixture review works, swap in one low-risk PR from your repo, then run crosscheck onboard to configure repos, workflow mode, and continuous monitoring.
# 1. Install crosscheck and the agent CLIs
npm install -g @humanbased/crosscheck
npm install -g @anthropic-ai/claude-code && claude # Claude Pro/Max subscription
npm install -g @openai/codex && codex login --device-auth # ChatGPT Plus/Pro subscription
brew install gh && gh auth login # GitHub CLI
# 2. Guided setup — repos, review mode, workflow pipeline
crosscheck onboard
# 3. Start watching
crosscheck watch # personal laptop
crosscheck serve # always-on team servercrosscheck onboard # guided setup — pick repos, mode, and pipeline
crosscheck watch # personal use — tunnel + webhook + listening on your laptop
crosscheck serve # team use — fixed port, register webhook once
crosscheck review <pr-url> # one-shot review of a specific PR
crosscheck run <pr-url> # run the full workflow: review → (fix → recheck) × max_rounds
crosscheck scan # show open PR workflow state across monitored repos
crosscheck detect-step <pr-url> # explain the next workflow step for one PR
crosscheck kickass # advance stale PRs from an interactive operator queue
crosscheck init # check prerequisites, write starter config
crosscheck status # auth state, config summary, CLI versionsOperator queue (scan + kickass)
crosscheck scan tracks two independent dimensions per PR:
Workflow stage (reviewState) |
Meaning | Next action |
|---|---|---|
NEEDS_REVIEW |
No crosscheck review for current HEAD | review |
NEEDS_FIX |
Reviewed — fix requested | fix |
NEEDS_RECHECK |
Fix committed, recheck pending | recheck |
APPROVED |
Reviewed and approved | merge |
Verdict (verdict) |
Meaning |
|---|---|
UNREVIEWED |
No review found |
APPROVE |
AI approved |
NEEDS_WORK |
AI requested changes |
BLOCK |
AI hard-blocked merge |
BLOCK and NEEDS_WORK both map to NEEDS_FIX stage — same next action, but the verdict field preserves severity so operators can prioritise.
How workflow steps are counted
Crosscheck reconstructs PR workflow state from visible artifacts:
| Evidence | Counts as |
|---|---|
Review or recheck comment with <!-- crosscheck: ... verdict=... --> |
completed review / recheck step |
Fix or conflict-resolve comment, such as <!-- crosscheck: fix_applied ... --> |
completed fix / conflict-resolve step |
PR commit trailer, such as Crosscheck-Step: fix |
completed step declared by that trailer |
Commit trailers are accepted as operator-declared workflow state. In practice, a PR author may command Claude, Codex, or another agent to apply a fix outside a standalone Crosscheck post; if the resulting PR commit carries Crosscheck-Step: fix, Crosscheck counts it as fix evidence.
That evidence only advances the next step to recheck when the fix commit is the current PR HEAD. If another commit lands after the fix evidence, Crosscheck starts a fresh review round so the newer code is reviewed normally. This prevents an old fix trailer from marking later changes as ready for recheck.
crosscheck scan [--tidy] [--stale-after <duration>] [--force] [--json]
crosscheck kickass [--dry-run] [--stale-after <duration>] [--force]crosscheck review --reviewer, crosscheck run --reviewer, crosscheck run --fixer, and crosscheck run --vendor accept vendor aliases:
- Claude:
claude,claude-code,cc,anthropic - Codex:
codex,openai
Continuous improvement (experimental)
crosscheck diagnose # surface failure patterns from review logs
crosscheck optimize [--apply] # rewrite reviewer instructions based on diagnose output
crosscheck impact [--money] # time saved, issues caught, code quality trends
crosscheck issue # draft and file a bug report from recent error logsInteractive setup wizard. Picks repos/orgs to monitor, selects single-vendor or cross-vendor mode, configures the review pipeline, and writes ~/.crosscheck/config.yml and workflow.yml.
crosscheck onboard # guided setup
crosscheck onboard --personal # skip persona prompt, go straight to personal mode
crosscheck onboard --team # skip persona prompt, go straight to team mode
crosscheck onboard -y # accept all defaults non-interactivelyPersonal mode. Starts an SSH tunnel (localhost.run), registers GitHub webhooks, and listens for PR events. Everything self-cleans on Ctrl+C.
crosscheck watch
crosscheck watch --no-backtrace # skip startup scan for unreviewed open PRs
crosscheck watch --reconfigure # re-run deployment setup before startingTeam mode. Binds to a fixed port — register the webhook once, cover the whole team. Designed for a mac-mini or home server.
crosscheck serve
crosscheck serve --no-backtrace # skip startup scan
crosscheck serve --personal # personal scope this session only
crosscheck serve --reconfigure # re-run deployment setupOne-shot review of a single PR. Clones, checks out, reviews, and posts the comment.
crosscheck review https://github.com/org/repo/pull/42
crosscheck review <pr-url> --reviewer claude # force Claude regardless of detection
crosscheck review <pr-url> --reviewer codex # force Codex regardless of detection
crosscheck review <pr-url> --reviewer cc # alias for Claude
crosscheck review <pr-url> --reviewer openai # alias for CodexRuns the full configured workflow against one PR: review → (fix → recheck) × max_rounds. Same logic as watch/serve, but triggered manually.
crosscheck run <pr-url>
crosscheck run <pr-url> --steps review # only the review step
crosscheck run <pr-url> --steps fix,recheck # skip initial review
crosscheck run <pr-url> --reviewer claude # force review/recheck agent
crosscheck run <pr-url> --fixer claude # force fix agent
crosscheck run <pr-url> --vendor claude # force review/recheck/fix agent
crosscheck run <pr-url> --dry-run # review without posting or fixing
crosscheck run <pr-url> --crazy # 🔥🔥 loop until APPROVE
crosscheck run <pr-url> --half-crazy # 🔥 loop until not BLOCK
crosscheck run <pr-url> --timeout 10m # custom reviewer timeoutScans every open PR in the configured monitor scope and reports where each one is in the crosscheck workflow. Results are cached for 60 seconds.
States: NEEDS_REVIEW · NEEDS_FIX · BLOCK · NEEDS_RECHECK · APPROVE
crosscheck scan # all open PRs, grouped stale/not-stale
crosscheck scan --tidy # stale actionable rows only
crosscheck scan --stale-after 4h # custom staleness threshold (default 24h)
crosscheck scan --force # bypass cache
crosscheck scan --json # machine-readable outputExplains the workflow history for one PR and prints the next step Crosscheck would run. Use this when a PR has mixed evidence from comments, Crosscheck commits, or ad hoc agent commits with Crosscheck-Step trailers.
crosscheck detect-step <pr-url>
crosscheck detect-step <pr-url> --jsonSelects stale PRs from the operator queue and advances them — runs scan first, presents a multi-select picker, shows a preflight summary, then executes after confirmation.
crosscheck kickass # interactive operator queue
crosscheck kickass --dry-run # preflight only — no mutations
crosscheck kickass --stale-after 2h # tighter staleness threshold
crosscheck kickass --force # bypass scan cache before picking
crosscheck kickass --crazy # 🔥🔥 auto loop until APPROVE
crosscheck kickass --half-crazy # 🔥 auto loop until not BLOCKActions: NEEDS_REVIEW → CR · NEEDS_FIX/BLOCK → Fix · NEEDS_RECHECK → Recheck · APPROVE → Merge
kickass + watch combo
For the best recovery experience when a batch of PRs is stuck (timed out, stopped before watch was running), run both commands together. Each plays a distinct role:
kickasskicks each stuck PR one step at a time — it usesdetect-stepto read live PR history and dispatches only the next needed step (review, fix, or recheck).watchowns all continuation — it listens for the webhooks each completed step produces and runs the full remaining pipeline from there.
crosscheck kickass
└─ ck run <url> --trigger kickass (one step; detect-step finds where to start)
└─ detect-step → "review" run review only → posts comment
└─ detect-step → "fix" run fix only → pushes commit
└─ detect-step → "recheck" run recheck only → posts verdict
crosscheck watch
├─ issue_comment (type=review) → pick up fix step automatically
└─ synchronize (fix commit) → pick up recheck step automatically
Note:
crosscheck run <pr-url>invoked directly runs the full remaining pipeline from the detected starting step. The one-step behaviour above applies only when kickass dispatches it with--trigger kickass.
Start watch first, then run kickass in a second terminal:
# terminal 1
crosscheck watch
# terminal 2
crosscheck scan --force # refresh PR state
crosscheck kickassHow the review→fix bridge works: after
kickassposts a review comment, GitHub fires anissue_commentwebhook (not apull_requestevent).watchsubscribes toissue_commentand, when it sees a crosschecktype=reviewannotation on an open PR, fetches the current PR head and runs the fix step automatically — no new commit required to wake it up. (Introduced in #193.)
Autonomous loop modes
--crazy and --half-crazy turn run and kickass into autonomous fix→recheck loops that keep going until the verdict improves — no manual re-runs needed.
| Flag | Stops when | Max rounds | Timeout |
|---|---|---|---|
--crazy 🔥🔥 |
verdict = APPROVE |
∞ | none |
--half-crazy 🔥 |
verdict ≠ BLOCK |
∞ | none |
Both flags disable all reviewer subprocess timeout constraints — long fixes on large PRs won't be cut short. Use --timeout <duration> (e.g. --timeout 10m) without these flags to set a custom cap.
# Run full workflow and keep looping until approved
crosscheck run <pr-url> --crazy
# Advance every stale PR until it's no longer blocked
crosscheck kickass --half-crazy
# Custom timeout without looping
crosscheck run <pr-url> --timeout 10mCrosscheck uses ~/.crosscheck/config.yml by default. If that file exists, it wins over ./crosscheck.config.yml unless you pass --config ./crosscheck.config.yml.
# crosscheck.config.yml
quality:
tier: balanced # fast | balanced | thorough| Tier | Claude model | Codex model | Latency |
|---|---|---|---|
fast |
Haiku | default | ~10s |
balanced |
Sonnet (default) | default | ~30s |
thorough |
Opus | default | ~60s |
steps:
- name: review
type: review
reviewer: auto # auto | claude | codex | origin
- name: fix
type: fix
reviewer: origin
when: review.verdict != 'APPROVE'
- name: recheck
type: recheck
reviewer: auto
when: fix.applied_count > 0# ~/.crosscheck/config.yml
orgs:
- your-org
routing:
allowed_authors:
- your-github-login
mode: cross-vendor # cross-vendor | single-vendor
vendors:
claude:
enabled: true
codex:
enabled: true
quality:
tier: balanced
clone_protocol: ssh # ssh (default) | httpsFull reference: get-started.md
| Minimum | |
|---|---|
| Node.js | 18+ |
| Claude Code CLI | latest — npm install -g @anthropic-ai/claude-code |
| Codex CLI | latest — npm install -g @openai/codex |
| GitHub CLI | 2.65+ — brew install gh |
GITHUB_TOKEN is derived automatically from gh auth login. No manual export needed.
| get-started.md | Full setup guide — prerequisites, all flags, complete config reference, FAQ |
| What 295 Agentic PRs Taught Us About Code Review | Humanbased field report on agentic PR quality, review routing, and why Crosscheck exists |
| docs/fixture-pr.md | Safe public fixture PR for the first Crosscheck review |
| crosscheck.config.example.yml | Annotated config with every option |
| CHANGELOG.md | Release notes |
Issues and PRs welcome at github.com/humanbased-ai/crosscheck.
MIT — Copyright (c) 2025–2026 Humanbased PTE LTD.

