Skip to content

Latest commit

 

History

History
451 lines (318 loc) · 17.6 KB

File metadata and controls

451 lines (318 loc) · 17.6 KB

crosscheck

A Humanbased project, built with crosscheck.

crosscheck

crosscheck watch — live pipeline view

Your agents ship fast. Crosscheck makes sure they ship right.

AI coding agents create PRs faster than review habits can absorb. The failure mode isn't broken builds — it's early victory: patches that pass CI, look complete, and still hide regressions, brittle edge cases, or half-finished fixes.

Crosscheck adds an independent Review → Fix → Recheck loop. One agent writes the patch. Another reviews it. Findings go back to the author to repair. The result gets rechecked before merge. The PR moves toward genuinely merge-ready — not just "looks green."

No new hosted service. No per-review API bill. Crosscheck runs through the claude and codex CLIs you already have — your existing subscriptions, your machine or server.

Built by Humanbased. Read the field report: What 295 Agentic PRs Taught Us About Code Review — 295 agentic PRs analyzed, real Crosscheck logs included.

Why crosscheck?

Agent velocity without lowering the merge bar.

  • Independent eyes, not self-review — route Claude-authored PRs to Codex and vice versa. Self-review is exactly where early-victory failures hide.
  • Review → Fix → Recheck, not just comments — findings return to the author agent for repair; a clean recheck follows before merge. PRs move forward, not sideways.
  • No new vendor — runs through the claude and codex CLIs you already pay for. No per-review bill, no extra trust surface.
  • Configurable for any team size — review-only mode, review + fix, or the full loop. Personal laptop via watch, always-on server via serve, one-shot via review.

Who uses crosscheck

Persona Problem How crosscheck helps
Solo agentic builder Same agent that wrote the code may self-approve incomplete work Independent reviewer from a different vendor, on your machine
Technical founder AI PRs look done before delivering stable value Closes the loop: review finding → agent fix → clean recheck
Engineering lead Agent use is hard to supervise or standardize Configurable workflow, review-only mode, and a visible PR audit trail
OSS maintainer Review bandwidth is scarce; comments must be actionable One-shot crosscheck review posts concrete findings directly on the PR

Common workflows

# Catch regressions before merging a solo PR
crosscheck run <pr-url>

# Continuous review on every incoming agent PR
crosscheck serve            # always-on team server

# Review stale PRs from a queue
crosscheck scan && crosscheck kickass

# One-shot review of a specific PR
crosscheck review <pr-url>

# Loop until the agent produces an approved patch
crosscheck run <pr-url> --crazy

Quick start

First useful review in 10 minutes

Start with one low-risk PR before turning on continuous watch mode. You only need GitHub CLI plus one authenticated reviewer CLI.

# 1. Install crosscheck
npm install -g @humanbased/crosscheck

# 2. Authenticate GitHub
brew install gh && gh auth login

# 3. Authenticate one reviewer
npm install -g @openai/codex && codex login --device-auth
# or:
npm install -g @anthropic-ai/claude-code && claude

# 4. Check your setup
crosscheck status

# 5. Review the public fixture PR
crosscheck review https://github.com/humanbased-ai/crosscheck-proof-fixture/pull/1 --reviewer codex

This fixture PR intentionally contains a realistic agentic-code regression, so you can see whether Crosscheck produces a useful review before pointing it at your own repo. Use --reviewer claude if Claude Code is the authenticated reviewer. After the fixture review works, swap in one low-risk PR from your repo, then run crosscheck onboard to configure repos, workflow mode, and continuous monitoring.

Continuous mode

# 1. Install crosscheck and the agent CLIs
npm install -g @humanbased/crosscheck
npm install -g @anthropic-ai/claude-code && claude        # Claude Pro/Max subscription
npm install -g @openai/codex && codex login --device-auth # ChatGPT Plus/Pro subscription
brew install gh && gh auth login                          # GitHub CLI

# 2. Guided setup — repos, review mode, workflow pipeline
crosscheck onboard

# 3. Start watching
crosscheck watch        # personal laptop
crosscheck serve        # always-on team server

Commands

crosscheck onboard                  # guided setup — pick repos, mode, and pipeline
crosscheck watch                    # personal use — tunnel + webhook + listening on your laptop
crosscheck serve                    # team use — fixed port, register webhook once
crosscheck review <pr-url>          # one-shot review of a specific PR
crosscheck run <pr-url>             # run the full workflow: review → (fix → recheck) × max_rounds
crosscheck scan                     # show open PR workflow state across monitored repos
crosscheck detect-step <pr-url>     # explain the next workflow step for one PR
crosscheck kickass                  # advance stale PRs from an interactive operator queue
crosscheck init                     # check prerequisites, write starter config
crosscheck status                   # auth state, config summary, CLI versions

Operator queue (scan + kickass)

crosscheck scan tracks two independent dimensions per PR:

Workflow stage (reviewState) Meaning Next action
NEEDS_REVIEW No crosscheck review for current HEAD review
NEEDS_FIX Reviewed — fix requested fix
NEEDS_RECHECK Fix committed, recheck pending recheck
APPROVED Reviewed and approved merge
Verdict (verdict) Meaning
UNREVIEWED No review found
APPROVE AI approved
NEEDS_WORK AI requested changes
BLOCK AI hard-blocked merge

BLOCK and NEEDS_WORK both map to NEEDS_FIX stage — same next action, but the verdict field preserves severity so operators can prioritise.

How workflow steps are counted

Crosscheck reconstructs PR workflow state from visible artifacts:

Evidence Counts as
Review or recheck comment with <!-- crosscheck: ... verdict=... --> completed review / recheck step
Fix or conflict-resolve comment, such as <!-- crosscheck: fix_applied ... --> completed fix / conflict-resolve step
PR commit trailer, such as Crosscheck-Step: fix completed step declared by that trailer

Commit trailers are accepted as operator-declared workflow state. In practice, a PR author may command Claude, Codex, or another agent to apply a fix outside a standalone Crosscheck post; if the resulting PR commit carries Crosscheck-Step: fix, Crosscheck counts it as fix evidence.

That evidence only advances the next step to recheck when the fix commit is the current PR HEAD. If another commit lands after the fix evidence, Crosscheck starts a fresh review round so the newer code is reviewed normally. This prevents an old fix trailer from marking later changes as ready for recheck.

crosscheck scan [--tidy] [--stale-after <duration>] [--force] [--json]
crosscheck kickass [--dry-run] [--stale-after <duration>] [--force]

crosscheck review --reviewer, crosscheck run --reviewer, crosscheck run --fixer, and crosscheck run --vendor accept vendor aliases:

  • Claude: claude, claude-code, cc, anthropic
  • Codex: codex, openai

Continuous improvement (experimental)

crosscheck diagnose                 # surface failure patterns from review logs
crosscheck optimize [--apply]       # rewrite reviewer instructions based on diagnose output
crosscheck impact [--money]         # time saved, issues caught, code quality trends
crosscheck issue                    # draft and file a bug report from recent error logs

crosscheck onboard

Interactive setup wizard. Picks repos/orgs to monitor, selects single-vendor or cross-vendor mode, configures the review pipeline, and writes ~/.crosscheck/config.yml and workflow.yml.

crosscheck onboard              # guided setup
crosscheck onboard --personal   # skip persona prompt, go straight to personal mode
crosscheck onboard --team       # skip persona prompt, go straight to team mode
crosscheck onboard -y           # accept all defaults non-interactively

crosscheck watch

Personal mode. Starts an SSH tunnel (localhost.run), registers GitHub webhooks, and listens for PR events. Everything self-cleans on Ctrl+C.

crosscheck watch
crosscheck watch --no-backtrace       # skip startup scan for unreviewed open PRs
crosscheck watch --reconfigure        # re-run deployment setup before starting

crosscheck serve

Team mode. Binds to a fixed port — register the webhook once, cover the whole team. Designed for a mac-mini or home server.

crosscheck serve
crosscheck serve --no-backtrace       # skip startup scan
crosscheck serve --personal           # personal scope this session only
crosscheck serve --reconfigure        # re-run deployment setup

crosscheck review <pr-url>

One-shot review of a single PR. Clones, checks out, reviews, and posts the comment.

crosscheck review https://github.com/org/repo/pull/42
crosscheck review <pr-url> --reviewer claude    # force Claude regardless of detection
crosscheck review <pr-url> --reviewer codex     # force Codex regardless of detection
crosscheck review <pr-url> --reviewer cc        # alias for Claude
crosscheck review <pr-url> --reviewer openai    # alias for Codex

crosscheck run <pr-url>

Runs the full configured workflow against one PR: review → (fix → recheck) × max_rounds. Loops autonomously through fix→recheck cycles up to the max_rounds value configured in workflow.yml (default: 1). Use --crazy or --half-crazy to loop until approved or unblocked, ignoring max_rounds.

crosscheck run <pr-url>
crosscheck run <pr-url> --steps review           # only the review step
crosscheck run <pr-url> --steps fix,recheck      # skip initial review
crosscheck run <pr-url> --reviewer claude        # force review/recheck agent
crosscheck run <pr-url> --fixer claude           # force fix agent
crosscheck run <pr-url> --vendor claude          # force review/recheck/fix agent
crosscheck run <pr-url> --dry-run                # review without posting or fixing
crosscheck run <pr-url> --crazy                  # 🔥🔥 loop until APPROVE
crosscheck run <pr-url> --half-crazy             # 🔥  loop until not BLOCK
crosscheck run <pr-url> --timeout 10m            # custom reviewer timeout

crosscheck scan

Scans every open PR in the configured monitor scope and reports where each one is in the crosscheck workflow. Results are cached for 60 seconds.

States: NEEDS_REVIEW · NEEDS_FIX · BLOCK · NEEDS_RECHECK · APPROVE

crosscheck scan                          # all open PRs, grouped stale/not-stale
crosscheck scan --tidy                   # stale actionable rows only
crosscheck scan --stale-after 4h         # custom staleness threshold (default 24h)
crosscheck scan --force                  # bypass cache
crosscheck scan --json                   # machine-readable output

crosscheck detect-step

Explains the workflow history for one PR and prints the next step Crosscheck would run. Use this when a PR has mixed evidence from comments, Crosscheck commits, or ad hoc agent commits with Crosscheck-Step trailers.

crosscheck detect-step <pr-url>
crosscheck detect-step <pr-url> --json

crosscheck kickass

Selects stale PRs from the operator queue and advances them — runs scan first, presents a multi-select picker, shows a preflight summary, then executes after confirmation.

crosscheck kickass                       # interactive operator queue
crosscheck kickass --dry-run             # preflight only — no mutations
crosscheck kickass --stale-after 2h      # tighter staleness threshold
crosscheck kickass --force               # bypass scan cache before picking
crosscheck kickass --crazy               # 🔥🔥 auto loop until APPROVE
crosscheck kickass --half-crazy          # 🔥  auto loop until not BLOCK

Actions: NEEDS_REVIEW → CR · NEEDS_FIX/BLOCK → Fix · NEEDS_RECHECK → Recheck · APPROVE → Merge

kickass + watch combo

For the best recovery experience when a batch of PRs is stuck (timed out, stopped before watch was running), run both commands together. Each plays a distinct role:

  • kickass kicks each stuck PR one step at a time — it uses detect-step to read live PR history and dispatches only the next needed step (review, fix, or recheck).
  • watch owns all continuation — it listens for the webhooks each completed step produces and runs the full remaining pipeline from there.
crosscheck kickass
  └─ ck run <url>  --trigger kickass  (one step; detect-step finds where to start)
       └─ detect-step → "review"      run review only → posts comment
       └─ detect-step → "fix"         run fix only → pushes commit
       └─ detect-step → "recheck"     run recheck only → posts verdict

crosscheck watch
  ├─ issue_comment (type=review) → pick up fix step automatically
  └─ synchronize   (fix commit)  → pick up recheck step automatically

Note: crosscheck run <pr-url> invoked directly runs the full remaining pipeline from the detected starting step. The one-step behaviour above applies only when kickass dispatches it with --trigger kickass.

Start watch first, then run kickass in a second terminal:

# terminal 1
crosscheck watch

# terminal 2
crosscheck scan --force    # refresh PR state
crosscheck kickass

How the review→fix bridge works: after kickass posts a review comment, GitHub fires an issue_comment webhook (not a pull_request event). watch subscribes to issue_comment and, when it sees a crosscheck type=review annotation on an open PR, fetches the current PR head and runs the fix step automatically — no new commit required to wake it up. (Introduced in #193.)

Autonomous loop modes

--crazy and --half-crazy turn run and kickass into autonomous fix→recheck loops that keep going until the verdict improves — no manual re-runs needed.

Flag Stops when Max rounds Timeout
--crazy 🔥🔥 verdict = APPROVE none
--half-crazy 🔥 verdict ≠ BLOCK none

Both flags disable all reviewer subprocess timeout constraints — long fixes on large PRs won't be cut short. Use --timeout <duration> (e.g. --timeout 10m) without these flags to set a custom cap.

# Run full workflow and keep looping until approved
crosscheck run <pr-url> --crazy

# Advance every stale PR until it's no longer blocked
crosscheck kickass --half-crazy

# Custom timeout without looping
crosscheck run <pr-url> --timeout 10m

Configuration

Crosscheck uses ~/.crosscheck/config.yml by default. If that file exists, it wins over ./crosscheck.config.yml unless you pass --config ./crosscheck.config.yml.

Review depth (quality.tier)

# crosscheck.config.yml
quality:
  tier: balanced    # fast | balanced | thorough
Tier Claude model Codex model Latency
fast Haiku default ~10s
balanced Sonnet (default) default ~30s
thorough Opus default ~60s

Pipeline (workflow.yml)

steps:
  - name: review
    type: review
    reviewer: auto          # auto | claude | codex | origin

  - name: fix
    type: fix
    reviewer: origin
    when: review.verdict != 'APPROVE'

  - name: recheck
    type: recheck
    reviewer: auto
    when: fix.applied_count > 0

Config snapshot

# ~/.crosscheck/config.yml
orgs:
  - your-org

routing:
  allowed_authors:
    - your-github-login

mode: cross-vendor          # cross-vendor | single-vendor

vendors:
  claude:
    enabled: true
  codex:
    enabled: true

quality:
  tier: balanced

clone_protocol: ssh         # ssh (default) | https

Full reference: get-started.md


Requirements

Minimum
Node.js 18+
Claude Code CLI latest — npm install -g @anthropic-ai/claude-code
Codex CLI latest — npm install -g @openai/codex
GitHub CLI 2.65+ — brew install gh

GITHUB_TOKEN is derived automatically from gh auth login. No manual export needed.


Documentation

get-started.md Full setup guide — prerequisites, all flags, complete config reference, FAQ
What 295 Agentic PRs Taught Us About Code Review Humanbased field report on agentic PR quality, review routing, and why Crosscheck exists
docs/fixture-pr.md Safe public fixture PR for the first Crosscheck review
crosscheck.config.example.yml Annotated config with every option
CHANGELOG.md Release notes

Contributing

Issues and PRs welcome at github.com/humanbased-ai/crosscheck.


License

MIT — Copyright (c) 2025–2026 Humanbased PTE LTD.