crosscheck is an AI code review orchestrator. It monitors your GitHub repos and ensures every open PR that hasn't been reviewed yet gets a review from the right AI agent — automatically.
In cross-vendor mode (default): each PR is reviewed by the vendor that didn't write it (Claude reviews Codex PRs; Codex reviews Claude PRs). In single-vendor mode: every PR in scope gets reviewed by the one configured vendor. Either way, crosscheck handles detection, attribution, assignment, and posting — no manual coordination required.
It runs locally using your existing AI subscriptions — no separate API billing required.
Published as @motivation-labs/crosscheck on npm.
- Universal coverage — every open PR in the monitored scope that lacks a crosscheck review is a candidate, regardless of whether the authoring agent can be identified
- Use existing subscriptions — run
claudeandcodexCLIs locally, no per-token billing - Zero infrastructure — one command on any machine with both CLIs installed
- Config-as-code — one flat YAML file, readable and writable by coding agents
- Two deployment modes —
watchfor laptops,servefor always-on machines - Org-level coverage — one webhook covers all repos in an org
- Self-improving —
diagnose+optimizecreate a feedback loop from observed failures to better review instructions; crosscheck gets more useful the longer it runs - Self-annotating — each crosscheck action embeds machine-readable attribution metadata that makes future attribution more accurate, without relying on external conventions
- Not a replacement for human code review
- Not a merge gate — posts comments, does not block PRs
- Not a hosted service — runs on your machine
- Not a one-size-fits-all reviewer — instructions should adapt to your stack and team conventions
These principles are authoritative. They override default assumptions when building new features or resolving ambiguous design decisions.
Whether to queue a PR for review is determined entirely by scope and review status — not by whether the authoring agent can be identified.
A PR is a review candidate if and only if:
- Its author is in scope (
allowed_authorsif set, or any author in the monitored org/user/repo otherwise), AND - It has no existing crosscheck review comment.
allowed_authors is a scope gate (which authors crosscheck watches), not an attribution filter (who wrote the code). A PR from beingzy is in scope if beingzy is in allowed_authors, regardless of whether beingzy used Claude, Codex, or wrote the code by hand.
Consequence: attribution failure never silently drops a review. A PR that reaches the assignment step with unknown attribution is handled by the fallback policy, not silently skipped.
Once a PR is queued (passes P1), attribution determines which reviewer is assigned. It does not re-gate whether a review happens.
Attribution detection chain — evaluated in order, stops at first match:
- PR body — scan for configured
codex_reviews_patterns/claude_reviews_patterns - Commit messages — same patterns applied to each commit message in the PR
- Branch name prefix —
claude_branch_prefixes/codex_branch_prefixes - PR comments — scan for crosscheck annotation tags (see P3); skipped if steps 1–3 resolved
author_routes— explicitlogin → vendormap in config- Fallback — apply
routing.fallback_reviewer(default:skip)
Assignment logic given detected origin and config:
| Mode | Origin | Reviewer |
|---|---|---|
| cross-vendor | claude | codex (if enabled) |
| cross-vendor | codex | claude (if enabled) |
| cross-vendor | human / unknown | routing.fallback_reviewer (skip | claude | codex) |
| single-vendor | any | the one enabled vendor |
In single-vendor mode, attribution is still detected and logged (for analytics) — it just has no effect on assignment.
Every action crosscheck takes embeds a machine-readable annotation so future runs can read attribution state without guessing.
Annotation format (these are stable — changing them is a breaking change):
| Action | Annotation location | Format |
|---|---|---|
| Review comment posted | End of comment body | <!-- crosscheck: origin=claude reviewer=codex verdict=APPROVE --> |
| Address commit pushed | Commit message trailer | Crosscheck-Reviewer: codex |
Detection step 4 (PR comments) scans for <!-- crosscheck: origin=... --> tags to recover attribution from a prior crosscheck review, even when the PR body and commit messages contain no patterns.
Why this matters: the first time crosscheck reviews a PR, attribution may be inferred from patterns. On follow-up events (new commits, recheck), the annotation provides a durable, precise attribution record — no re-inference needed.
Every time crosscheck invokes an AI agent — to review a PR, apply a fix, recheck after changes, improve review instructions, or analyze logs — the agent's behavior is defined by a harness file: a plain Markdown document that specifies what the agent receives, what it must do, and what format it must output. No agent behavior is hardcoded in TypeScript prompt strings. Logic lives in harness files.
workflow.yml — declares steps and which harness section governs each
↓ cites
AGENT.md — behavioral specification for all PR workflow steps
↓ produces
instructions.md — persistent reviewer constraints, injected into every review call
At the bottom, instructions.md is what actually gets injected into claude and codex
during review. It is maintained automatically by crosscheck optimize, which reads
AGENT.md to know how to improve it.
AGENT.md — the master behavioral guide, shipped with the package.
Has one section per workflow step. Each section is self-contained: it defines what data the agent receives, what it must produce, and what constraints apply.
| Section | Used by | What it governs |
|---|---|---|
## optimize |
crosscheck optimize |
How to read diagnose output and improve instructions.md |
## review |
crosscheck review, crosscheck watch/serve |
How the reviewer agent should analyze a PR diff and produce a VERDICT |
## fix |
address step in crosscheck run |
How the fixer agent should apply changes based on review findings |
## recheck |
recheck step in crosscheck run |
How the re-reviewer should verify fixes and produce a final VERDICT |
workflow.yml steps cite a section by name (harness: AGENT.md#review). At runtime,
crosscheck reads the named section and prepends it to the agent invocation. The agent
receives: harness section + instructions.md constraints + PR diff/context.
ISSUE.md — specialized harness for crosscheck issue --opportunities.
Guides an agent to analyze ~/.crosscheck/logs/*.ndjson for session stability, tunnel
reliability, and process health patterns — and draft a GitHub issue for the
highest-priority finding. Uses the same override chain as AGENT.md.
{cwd}/<FILE>.md
{cwd}/.crosscheck/<FILE>.md
{packageRoot}/<FILE>.md ← bundled fallback, always present after npm install
The first file found wins. Teams can ship a custom AGENT.md in their repo to override
review behavior for their stack without forking crosscheck.
# .crosscheck/workflow.yml
steps:
- type: review
harness: AGENT.md#review # injects ## review section into the reviewer agent
- type: fix
harness: AGENT.md#fix # injects ## fix section into the fixer agent
when: verdict != APPROVE
- type: recheck
harness: AGENT.md#review # same review harness for consistency
when: fix.applied == trueWhen harness is omitted from a step, crosscheck injects only instructions.md — the
current behavior. When harness is present, it prepends the named section before
instructions.md.
A project may supply a custom section by placing a local AGENT.md with a ## review
override — crosscheck picks it up via the override chain automatically.
- Plain Markdown only — no code, no secrets, no runtime-generated content
- Keep each file under 400 lines — longer files reduce instruction-following fidelity
- Each section must define: (1) what data the agent receives, (2) what it must produce, (3) exact output format the TypeScript caller parses
- Output format changes are breaking — bump minor version and update the parser together
- Every harness file ships in
package.jsonfilesso it is always present afternpm install
- Add a
## <step-name>section toAGENT.md(or create a new<NAME>.mdfor a distinct concern likeISSUE.md) - Add
load<Name>Harness(cwd)using the three-path override chain - Add the file to
package.jsonfiles - Update the harness table above
crosscheck init— environment check, auto-generates webhook secret, writes starter configcrosscheck review <pr-url>— manual one-shot review with--reviewer codex|claudecrosscheck watch— local dev mode with auto-smee tunnel and auto-webhook registrationcrosscheck serve[BETA] — always-on mode on a fixed portcrosscheck status— shows auth state, config summary, CLI versions- Cross-vendor mode (Claude ↔ Codex) and single-vendor mode
- Org-level and repo-level webhook support
- PR deduplication (owner/repo#pr@sha in-memory set)
- Auto-generated webhook secret persisted to
~/.crosscheck/webhook-secret - Published to npm as
@motivation-labs/crosscheck - CI: typecheck + build on Node 18/20/22 on every push
- CD:
@betaon merge to main,@latestonv*tag (with production approval gate)
servemode is functional but not battle-tested in production- Codex subscription auth does not support model selection (API key auth required for that)
--baseand prompt are mutually exclusive incodex review; focus instructions use.codex/instructionsfile
npm no longer supports TOTP authenticator apps for 2FA. Interactive publish requires a passkey/security key. For terminal publishing, use a granular access token with publish permissions:
NPM_TOKEN=npm_xxx npm publish --access publicCI/CD uses NPM_TOKEN stored as a GitHub Actions secret — no interactive auth needed.
repo— required for all commandswrite:org— required for org-level webhook registration inwatch/serve- Repo-level webhooks only need
repo
These four items fix the two-phase model described in Design Principles. Any new feature that touches detection or routing depends on them being correct first.
-
Consolidate skip decision: delete
shouldReview(), rely solely onassignReviewer—shouldReview(detector.ts:82) andassignReviewerboth independently gate onorigin === 'human'in cross-vendor mode. Addingfallback_reviewerto only one of them creates a silent divergence. Full spec below.- Acceptance Criteria:
shouldReviewis deleted fromdetector.tsand from all exports.- Call sites in
serve.ts,watch.ts, andreview.tsthat checkshouldReview(...)are replaced withassignReviewer(...) !== null. - Behavior for all existing origin values is unchanged.
- No type error or lint warning from the removed export.
- Technical Notes: call sites pass
originandconfigtoshouldReview; the equivalent isassignReviewer(origin, config) !== null. Both code paths already havereviewerfromassignReviewerin scope — no new variable needed. - Tests Required:
shouldReviewnot exported; serve/watch/review behavior unchanged fororigin: 'claude','codex','human'.
- Acceptance Criteria:
-
routing.fallback_reviewerconfig field (P2 assignment) — full spec in Next Up. -
Comment-based attribution detection (P2 step 4) — full spec in Next Up.
-
Crosscheck annotation system (P3) — full spec in Next Up.
-
crosscheck onboarddestroys existing config customizations on re-run — fixed in PR #74. All write logic extracted intoapplyOnboardConfig; routing.* fields now initialised on first run only and preserved exactly on re-runs; workflow.yml only written when it doesn't already exist and only deleted when the pipeline explicitly changes away from recheck. -
review → fix → re-check loop is not implemented —
crosscheck onboardlets users select thereview-fix-recheckpipeline and writes~/.crosscheck/workflow.yml, but no execution engine reads or runs that file duringwatch/serve. Thepost_review.auto_fixconfig drives a single fix pass; there is no loop that re-reviews after the fix to confirm issues are resolved.- User: Anyone who selected
review → fix → re-checkduring onboard and expects a full loop. - Acceptance Criteria:
loadWorkflow()insrc/lib/workflow.tsresolvesworkflow.ymlvia the override chain:{cwd}/.crosscheck/workflow.yml→~/.crosscheck/workflow.yml→ built-in default.- For a three-step workflow (review → fix → recheck): after the fix step completes and
fix.applied_count > 0, the recheck step runs the reviewer again on the updated branch. - Verdict from the recheck step is posted as a follow-up comment on the original PR.
- If no fix was applied (reviewer approved or fixer found nothing to change), the recheck step is skipped.
- Loop terminates after
max_roundsper step (default: 1) — no infinite looping.
- Technical Notes:
src/lib/workflow.tshas the schema andloadWorkflow()but the step execution logic inwatch.ts/serve.tsonly runs a single review pass. The fix step (post_review.auto_fix) is partially implemented but the recheck step after fix is missing. Theworkflow.ymlwhen:condition (fix.applied_count > 0) requires a step-result context object to be threaded through the execution loop. - Tests Required: mock workflow with review → fix → recheck; assert recheck fires only when
fix.applied_count > 0; assert loop stops atmax_rounds: 1; assert recheck is skipped when review verdict is APPROVE.
- User: Anyone who selected
-
Onboard — workflow, mode, and pipeline steps — extend
crosscheck onboardwith interactive steps for review mode (cross-vendor vs single-vendor) and workflow pipeline (review-only, review→fix, review→fix→re-check). Mode step is shown only when both AI CLIs are authenticated; pipeline step always shown.- User: Anyone running
crosscheck onboardfor the first time, or re-running after installing a second AI CLI. - Acceptance Criteria:
checkEnv()now returns{ ok, claudeOk, codexOk }so downstream steps know which CLIs are available.- Step 4 — review mode: shown when both claude and codex are authenticated.
[1] cross-vendor(default) →mode: cross-vendor, both vendors enabled.[2] single-vendor→ asks which vendor; disables the other invendors.*.enabled.- When only one CLI is available, step auto-selects single-vendor and skips the prompt.
--yeskeeps the existing mode from config without prompting.
- Step 5 — workflow pipeline:
[1] review only→post_review.auto_fix.enabled: false.[2] review → fix(recommended default) →auto_fix.enabled: true, trigger: on_issues, delivery.mode: commit.[3] review → fix → re-check→ same as fix, plus writes~/.crosscheck/workflow.ymlwith a three-step pipeline (review, fix, recheck).--yespreserves existing auto_fix.enabled setting.
- Step 6 (was 4) summary shows
modeandpipelinerows in addition to existing fields. loadWorkflow()checks~/.crosscheck/workflow.ymlas a global fallback (after project-local.crosscheck/workflow.yml, beforeDEFAULT_WORKFLOW).
- Technical Notes:
src/commands/onboard.ts:checkEnv()returnsEnvCheckResult; new helperspromptVendorMode()andpromptWorkflowPipeline(); config write also patchesmode,vendors.*.enabled,post_review.auto_fix.*.src/lib/workflow.ts:loadWorkflowcandidates array extended withjoin(homedir(), '.crosscheck', 'workflow.yml').- Global workflow.yml only written for the
review-fix-recheckpreset;review-onlyandreview-fixare handled entirely through config fields.
- Related items: Custom Workflow Engine (workflow.yml schema), Post-Review Auto-Fix (auto_fix config), Deployment Mode (patchDeploymentConfig pattern). The global workflow.yml fallback unblocks the full review→fix→recheck loop for users who don't have a per-project workflow file.
- User: Anyone running
-
~/.crosscheck/workflow.ymlwritten for every pipeline preset; per-step instructions inline —workflow.ymlis the single source of truth for both the pipeline topology and the per-step reviewer instructions. Currently it is only written for thereview-fix-recheckpreset;review-onlyandreview-fixencode their configuration inpost_review.auto_fix.*config fields instead. This split means users cannot find or edit their pipeline in one place, andcrosscheck optimizehas no structured location to write per-step instructions (it falls back to a monolithicinstructions.md).- User: Anyone running
crosscheck onboardwho wants to understand, inspect, and customize how their reviews run. - Design decision: Per-step instructions live inline in
workflow.yml(not in separate files). The existinginstructions: stringfield per step already supports this. A single file to back up, a single file to hand tocrosscheck optimize. - Acceptance Criteria:
-
crosscheck onboardalways writes~/.crosscheck/workflow.ymlregardless of which pipeline preset is selected. -
Three preset templates, each with per-step inline instructions:
review-only:
# crosscheck workflow — generated by crosscheck onboard on: [opened, synchronize] steps: - name: review type: review reviewer: auto max_rounds: 1 instructions: | ## Constraints - Do not run tsc, ts-node, or build commands — inspect source files directly. - Do not install packages or modify lock files. ## Output format Structure your output as: ## Summary, ## Critical Issues, ## Warnings, ## Suggestions. Be concise. Skip praise. ## Verdict End with one of: VERDICT: APPROVE | NEEDS WORK | BLOCK
review-fix: same as review-only, plus:
- name: fix type: fix reviewer: origin when: "review.verdict != 'APPROVE'" max_rounds: 1 instructions: | Only fix issues explicitly called out in the review. Do not refactor unrelated code, rename variables, or add tests unless specifically requested. If a comment requires deeper understanding of business logic, skip it.
review-fix-recheck: same as review-fix, plus:
- name: recheck type: recheck reviewer: auto when: "fix.applied_count > 0" max_rounds: 1 instructions: | This is a follow-up review after automated fixes were applied. Focus on whether the flagged issues from the original review have been resolved. Use the same verdict scale: APPROVE if all critical issues are addressed, NEEDS WORK if some remain, BLOCK if regressions were introduced.
-
applyOnboardConfigwrites the correct template to~/.crosscheck/workflow.ymlfor the selected preset. If the file already exists, it is not overwritten (same rule as before — the user may have customized it). -
loadWorkflowprecedence is unchanged:.crosscheck/workflow.ymlin project root →~/.crosscheck/workflow.yml→DEFAULT_WORKFLOW(built-in constant, last resort only). -
DEFAULT_WORKFLOWremains in code as a fallback for environments where onboard was never run. Its inline instructions remain the same defaults that get written into the onboard-generated templates. -
post_review.auto_fix.*fields inconfig.ymlcontinue to be written by onboard for backward compatibility with olderwatch/servecode that reads them directly. This duplication is intentional during the transition period while the workflow engine implementation is pending. -
crosscheck optimize --applyupdates theinstructionsfields of individual steps within~/.crosscheck/workflow.ymlusing targeted YAML mutations (load → mutate step'sinstructionsfield → dump). It no longer writes to~/.crosscheck/instructions.md. Existinginstructions.mdfiles are left in place for users who created them manually.
-
- Technical Notes:
src/commands/onboard.ts(applyOnboardConfig): replace the three-branch workflow.yml write with per-preset YAML templates that include inline instructions. UseWORKFLOW_TEMPLATESconstant keyed byWorkflowPreset.src/lib/workflow.ts(DEFAULT_WORKFLOW): keep as-is. The templates written by onboard are generated from this same content so they stay in sync.src/commands/optimize.ts: replacewriteFileSync(instructionsPath, ...)with a targeted mutation ofworkflow.yml— load raw YAML, find the step by name, setinstructions, dump. Falls back to writinginstructions.mdifworkflow.ymldoesn't exist.- Schema (
WorkflowStepSchema): no change needed —instructions: z.string().optional()already supports inline text.
- Tests Required:
applyOnboardConfigwithreview-only→workflow.ymlwritten with one step and correct inline instructions.applyOnboardConfigwithreview-fix→ two steps.applyOnboardConfigwithreview-fix-recheck→ three steps.- Re-run with same preset when file exists → file unchanged.
loadWorkflowreads inlineinstructionsfrom file and returns them on the step objects.
- User: Anyone running
-
~/.crosscheck/as the persistent customization home — document all files and their roles — crosscheck accumulates customization across multiple commands (onboard,optimize, manual edits). Today these relationships are implicit. Formalise the full file map in docs and incrosscheck statusoutput so users know exactly what they have configured and how it is consumed.- User: Anyone who wants to understand what crosscheck has remembered, back it up, or restore it after a reinstall.
- Acceptance Criteria:
-
get-started.mdhas a dedicated "Customization home" section that lists every file in~/.crosscheck/, what writes it, what reads it, and which fields are owned vs user-managed. -
README.mdhas a brief "Customization files" table linking to the full section. -
crosscheck statusoutput includes a "files" row listing which customization files exist (✓ present / — absent):config.yml,workflow.yml,instructions.md,webhook-secret,logs/. -
The complete file map (canonical reference):
File Written by Read by Purpose ~/.crosscheck/config.ymlonboard,init,watch(first run)all commands Main config — deployment, repos, mode, vendors, quality, tunnel, routing, budget, branding ~/.crosscheck/workflow.ymlonboard(recheck preset only)watch,serve,runGlobal pipeline steps override — written once, never overwritten unless explicitly changed ~/.crosscheck/instructions.mdoptimize --applywatch,serve,run(injected into every review prompt)Learned reviewer constraints — grows over time via optimize ~/.crosscheck/webhook-secretinit,watch(auto-generated)watch,serveHMAC secret for GitHub webhook signature verification ~/.crosscheck/logs/YYYY-MM-DD.ndjsonwatch,servediagnose,optimize,impact,issueStructured review event log — one file per day, 7-day retention .crosscheck/workflow.yml(in repo)manual watch,serve,runPer-project pipeline override — takes priority over global .crosscheck/AGENT.md(in repo)manual optimizePer-project harness override — takes priority over bundled AGENT.md(bundled)npm package optimizeDefault harness — defines how optimizebuilds reviewer instructions -
Ownership rules (enforced by
applyOnboardConfig):onboardowns:deployment,orgs,repos,mode,vendors.*.enabled/effort,quality.tier,tunnel.*,post_review.auto_fix.*onboardinitialises on first run only:routing.*(never overwritten on re-runs)onboardnever touches:quality.focus,quality.custom_prompt,budget.*,branding.*,server.*,logs.*,backtrace.*,instructions.md, harness files
-
- Technical Notes:
crosscheck statusalready reads config; extend it to check for the existence of each~/.crosscheck/file and print a summary row. No schema changes needed. - Tests Required: status output includes files row; each file shows correct ✓/— based on filesystem state.
-
Smart onboard re-run — "last choice" indicators and instant re-confirm — re-running
crosscheck onboardafter a reinstall or on a new machine should feel instant. Since~/.crosscheck/config.ymlpersists, every picker can pre-select the previous answer and mark it visually, letting the user press Enter through all steps to confirm without re-reading every option.- User: Anyone re-running onboard after
npm install -g @motivation-labs/crosscheck(upgrade or fresh machine with backed-up~/.crosscheck/). - Acceptance Criteria:
- Each
promptSinglePickerstep pre-selects the value from the existing config (already done viadefaultIndex). - The pre-selected item shows a
(current)suffix appended to its description line so users can visually confirm they are accepting the previous value, not a random default. - A one-line dim header appears at the top of
crosscheck onboardwhen a config is detected:Resuming from ~/.crosscheck/config.yml — press Enter to keep each setting. crosscheck onboard --yesis the fully silent re-confirm path: reads config, applies all previous values, writes config, exits. Zero prompts. Suitable for scripted reinstalls.- Repo picker pre-selects repos from
config.repos+ orgs fromconfig.orgs(already done viainitialSelected).
- Each
- Technical Notes: The description suffix
(current)is appended in-place inpromptVendorMode,promptQualityTier,promptWorkflowPipeline,promptConnectionTypewhen the item matches the existing config value. No change topromptSinglePickersignature needed — use thedescriptionfield. - Tests Required: When existing config has
quality.tier: thorough, thethoroughitem's description includes(current). When no config exists, no(current)suffix appears on any item.
- User: Anyone re-running onboard after
-
Board output redesign — section reorder + unified PR line format — restructure the
watch/servelive board (board.ts) so sections read top-to-bottom in information priority order, and tighten the completed-PR two-line format to a consistentPR | CR | Fixpipeline layout.- User: Anyone running
crosscheck watchorcrosscheck servewho wants a cleaner, scannable terminal output. - Acceptance Criteria:
- Section order (live block, top to bottom):
- Summary — compact single line: aggregate stats (PRs received, CRs completed, fixes applied, avg CR time) + uptime. Replaces
row3(currently the third line). This anchors the stable numbers at the top where the eye returns. - Connectivity / status — tunnel/endpoint URL + alive indicator +
connLogentries (webhook events, connection events). Currently split betweenrow1/row2and the conditionalconnLogsection; merged into one coherent block. - Active PR catalog — active in-flight PR slots (two-line per PR, separated by blank lines). Grows downward as more PRs arrive concurrently; collapses to
waiting for PRs...when idle.
- Summary — compact single line: aggregate stats (PRs received, CRs completed, fixes applied, avg CR time) + uptime. Replaces
- Completed PR entry format (printed to scrollback via
completePR()):- Line 1 (unchanged):
HH:MM:SS AM [verdict badge] #N owner/repo branch (Xs) → url - Line 2 (new):
PR [bar 10] Nloc | CR [bar 8] N issues (VERDICT) | Fix [bar 6] N fixes- Separator is
|(space-pipe-space) between the three pipeline stages. N issuesreplaces the current·Nsuffix — more readable in isolation.(VERDICT)appended after issue count in the CR section:(APPROVE),(NEEDS WORK),(BLOCK). Always use the full unabbreviated form — no short aliases.N fixesreplacesN applied— consistent noun form withN issues.- If fix step did not run (review-only workflow or APPROVE verdict), Fix section reads
Fix ░░░░░░ —(empty bar, dash) rather than being omitted — keeps the three columns visually consistent.
- Separator is
- Line 1 (unchanged):
- Active PR slot format (live block,
renderPRSlot()):- Line 1: spinner +
#N repo branch+ right-aligned elapsed +⠋ phase-label(phase label moves to end of line 1, freeing line 2 for data only). - Line 2: same
PR | CR | Fixlayout as completed entries; CR and Fix sections show empty bars withpendinglabel until data is available.
- Line 1: spinner +
board.tsrender()function reorders its sections to match the new top/middle/bottom layout; no new public API changes toPRBoard.- All existing
PRBoardpublic methods (addPR,updatePR,completePR,failPR,log,logConnectivity,setTunnel,setConfig,start,stop) keep their signatures.
- Section order (live block, top to bottom):
- Technical Notes:
src/lib/board.ts: reorderrender()sections; updaterenderPRSlot()line 1/2 split; updatecompletePR()line 2 format string.- The
statsRow()anduptime()helpers are unchanged — just reposition their output inrender(). connLogand tunnel display move below the stats row rather than above it.- The separator lines (
─.repeat(w-1)) remain — one above the connectivity block, one above/below the PR catalog section. - Pending CR/Fix state: use the existing
undefinedcheck —slot.verdict === undefinedmeans CR hasn't arrived yet;slot.fixCount === undefinedmeans fix hasn't run. Render░bars +pendingin those cases.
- Tests Required: No new behavioral logic — pure rendering change. Smoke-test: start
watchagainst a real PR; verify (a) stats appear at top, (b) tunnel line appears below stats, (c) completed PR entries print inPR | CR | Fixformat with(VERDICT)suffix. - Decided: Verdict in parentheses uses the full unabbreviated form:
(APPROVE),(NEEDS WORK),(BLOCK). Consistent across active slots and completed entries.
- User: Anyone running
-
ckshort alias — support bothcrosscheck [method]andck [method]as equivalent invocations.- User: Any developer who wants faster CLI invocations.
- Acceptance Criteria:
ck <command>works identically tocrosscheck <command>for all subcommands.ck --helpshowsUsage: ck [options] [command](notcrosscheck).crosscheck --helpcontinues to showUsage: crosscheck [options] [command].- Both aliases are published to npm and installed as symlinks on
npm i -g.
- Technical Notes: Add
"ck": "dist/ck.js"topackage.jsonbinfield.src/ck.tssetsargv[1]='ck'via dynamic import so the name is correct on all platforms including Windows shims. - Tests Required: invocation-name detection unit test; no CLI contract change (patch bump).
-
Fix
watchevent log timestamp misalignment — zero-pad single-digit hours so all timestamps are the same width (01:08:08 PMnot1:08:08 PM).fmtTime()helper added toboard.ts; alltoLocaleTimeString()calls replaced. -
Fix
watchstatus bar embedded in scrolling log — confirmed already anchored viawriteLive(); no structural change needed. -
Fix
watchevent log — show failure state in counters —errorsOccurredstat counter added; shown in red in the status bar when > 0, omitted when 0. -
Fix
watchevent log — improve two-line event readability —board.log()prepends a blank line for 2-line events so consecutive PR entries are visually separated in the scrollback. -
Fix
crosscheck issuecodex invocation — replace-qwithexecsubcommand —runWithCodex()insrc/commands/issue.tswas callingcodex -q, which was removed from the Codex CLI; replaced withcodex exec(issue #57). -
crosscheck issue— harness guide for directed pattern analysis — introduce a bundledISSUE.mdharness (parallel toAGENT.mdforoptimize) that gives the coding agent structured direction when analyzing logs viacrosscheck issue. Instead of always hunting for error patterns to file a bug report, the agent should be able to run named analyses — session stability, tunnel reliability, throughput trends — and surface patterns with directional context.- User: Developer running
crosscheck watchwho wants to understand how the tool has been behaving over time, not just what broke. E.g., "why do sessions keep dying?", "are tunnels getting more stable?", "how long does a typical watch session last?" - Background — interaction that shaped this spec (2026-05-10): Manual analysis of
~/.crosscheck/logs/*.ndjsonacross May 7–9 (64 sessions, ~9,400 events) revealed: average session lifespan of 32.1 min; 50% of sessions had nosession_end(abrupt kill); a 3.5-hour rapid-restart storm on May 9 09:08–12:38 UTC (19 sessions, none reachedtunnel_opened); 236 tunnel errors (208 SSH timeouts, 28 SSH code-255 exits); longest stable session 275 min. None of these patterns were surfaced bydiagnosebecause they require session-level reasoning, not error-row counting. This is the analysis the harness should be able to perform. - Acceptance Criteria:
ISSUE.mdexists at the package root and is included in the npm package (filesinpackage.json).- The harness defines at minimum these named analyses, each with: what data to read, what to compute, and what the output format should be:
session-stability— per-session table (start, end, duration, tunnel URL, end reason clean vs inferred), aggregate stats (total sessions, average lifespan, min/max, clean-exit %, crash %). Crash = nosession_end. Flag sessions wheresession_endis absent but atunnel_openedwas logged (suggests abrupt kill, not startup failure).tunnel-reliability— counts oftunnel_opened,tunnel_closed,tunnel_errorby error subtype (SSH timeout, SSH code-255, other); reconnect rate (reconnecting: true); terminal close rate; % of sessions that never reachedtunnel_opened.throughput— PRs received, reviews completed, fixes applied, per-day and per-session averages.
crosscheck issue --analysis <name>runs the named analysis using the selected coding agent and prints the structured report to stdout. No GitHub issue is filed for analysis runs (unlike the default bug-report flow).crosscheck issue --analysis session-stabilityis the canonical way to get the session report we produced manually on 2026-05-10.crosscheck issue --list-analysesprints the available analysis names from the active harness.- Harness override: looks for
ISSUE.mdat{cwd}/ISSUE.md→{cwd}/.crosscheck/ISSUE.md→ bundled. Same override pattern asAGENT.mdinoptimize. - Agent selection: reuses
selectOptimizeAgent()— same vendor-selection logic (prefer higher-success-rate vendor; fallback to claude). - Analysis output is printed to stdout; exit 0 always (reporting tool, not a gate).
--since <date>scopes the log window (default: last 7 days).--jsonemits the structured report as JSON for scripting.
- Technical Notes:
- New file: bundled
ISSUE.mdat package root. Plain Markdown; no build step. Keep under 400 lines. src/commands/issue.ts: add--analysis <name>and--list-analysesflags to the existing Commander definition. When--analysisis present, skip the interactive bug-report flow entirely; build the analysis prompt from the harness section matching<name>, invoke the agent viarunWithClaude/runWithCodex, stream output to stdout.- New helper
src/lib/harness.ts:loadIssueHarness(cwd: string): string— same override lookup asloadAgentHarnessinoptimize.ts.parseAnalysisNames(harness: string): string[]— extracts## <name>sections. ISSUE.mdstructure: one## <analysis-name>section per named analysis; each section contains:### Input(which log events and fields to read),### Computation(what to calculate),### Output format(markdown table or JSON schema). Prose-only; no code in the harness.- Log parsing for analysis runs: reuse
loadErrorEntriesForPattern/sanitizeEntrywhere applicable; for session-level analysis, build agroupBySessions(entries)helper that splits onsession_startevents. - New function
src/lib/log-analysis.ts:groupBySessions(entries: RawLogEntry[]): Session[]whereSession = { start: Date; end: Date | null; events: RawLogEntry[]; tunnelUrl: string | null; cleanExit: boolean }.
- New file: bundled
- Tests Required:
groupBySessionssplits entries correctly on multiplesession_startevents; last session with nosession_endgetscleanExit: false.parseAnalysisNamesextracts correct section names from a fixture harness string.loadIssueHarnessrespects override order (cwd → .crosscheck → bundled).--list-analysesprints names without invoking an agent.--analysis unknown-nameexits 1 with a clear message listing valid names.--jsonoutput is valid JSON;--sincefilters log files by date correctly.
- User: Developer running
-
crosscheck diagnose— analyze~/.crosscheck/logs/*.ndjson, surface failure patterns and review quality signals as a human-readable report (with--jsonfor machine output). This is the observability foundation thatoptimizeand future tooling build on.- User: Anyone whose reviews are failing silently or who wants to understand what's working.
- Acceptance Criteria:
crosscheck diagnosereads all log files in~/.crosscheck/logs/; accepts--since YYYY-MM-DDto limit range.- Groups
errorentries by pattern:command_not_found(which command, which reviewer),base_branch_missing(which branch),timeout,auth_failure,other. - Reports review outcome distribution: APPROVE / NEEDS WORK / BLOCK counts and percentages.
- Reports per-reviewer success rate (attempts vs successes).
- Reports repos and file types seen in reviewed PRs (for language detection in
optimize). - Produces a
suggestions[]array: each suggestion hastype(add_constraint,investigate,config_change), a human-readablereason, and an optionalinstructionstring ready to paste. --jsonflag outputs the full structured report as JSON to stdout; default outputs a formatted terminal report.- Exit 0 always (it is a reporting tool, not a gate).
- Technical Notes:
- New file:
src/commands/diagnose.ts. - Parser reads NDJSON line-by-line; tolerates malformed lines (skip + count).
- Language detection: scan
repofield in log entries; for each unique repo, check if apackage.json/tsconfig.json/requirements.txt/Cargo.toml/go.mod/pom.xmlexists in the clone tmpDir path logged with the entry (or fall back to heuristics from the PR diff path names). - Suggestion rules (seeded set — grows over time via AGENT.md improvements):
command_not_found: tsc|npx|jest|vitest→ suggest adding "Do not run tsc / npm / jest." to instructionscommand_not_found: pytest|pip→ suggest Python constraintcommand_not_found: cargo→ suggest Rust constraintbase_branch_missing→ flag as known infrastructure bug, link to fixtimeout→ suggest increasingtimeout_msin config or reducing quality tier
- Wire into
cli.tsascrosscheck diagnose [--json] [--since <date>].
- New file:
- Tests Required: parse a fixture NDJSON file with known errors → correct pattern counts;
--jsonoutput is valid JSON matching schema;--sincefilters correctly; tolerates empty log dir.
-
crosscheck optimize— rundiagnoseinternally, select the best available local AI agent, feed the report into it usingAGENT.mdas the harness, diff the result against~/.crosscheck/instructions.md, and apply on--apply. Dry-run by default.-
User: Anyone who wants crosscheck to adapt to their repos and fix recurring review failures without manual config editing.
-
Agent selection — how optimize picks which AI to use: The agent used to run
optimizeis chosen dynamically from the vendors already configured incrosscheck.config.yml, not hardcoded. This means optimize works regardless of whether the user has Claude, Codex, or both.Selection logic (
selectOptimizeAgent(config, diagnoseReport)):- Collect
enabledvendors: those withconfig.vendors[v].enabled === true. - If only one vendor is enabled → use it.
- If both are enabled → look at
diagnoseReport.reviewer_performance: pick the vendor with the highersuccessRate(successes ÷ attempts) over the log period. - If rates are equal or there is no log data → prefer
claude(handles the long-form AGENT.md harness with higher fidelity). --agent claude|codexflag overrides all of the above.- If no vendor is enabled or the selected vendor's CLI is not installed → exit 1 with a clear message naming the missing CLI.
Examples:
- Config has only
codex: enabled: true→ uses codex, no claude needed. - Config has both enabled; codex has 80% success rate vs claude's 50% → uses codex.
- Config has both enabled; no log data → uses claude.
- User passes
--agent codex→ uses codex regardless.
- Collect
-
Acceptance Criteria:
crosscheck optimize(no flags) runs diagnose, selects the agent per the logic above, generates improved instructions, prints a unified diff of old vs newinstructions.md, and exits without writing.crosscheck optimize --applywrites the improved~/.crosscheck/instructions.md.crosscheck optimize --dry-runis a synonym for the default no-flag behavior.crosscheck optimize --agent <claude|codex>forces a specific agent.- Terminal output shows which agent was selected and why:
agent codex (success rate 80% > claude 50%). - On first run (no existing
instructions.md), the diff shows the full new file as additions. - If
diagnosefinds no errors and no suggestions, optimize still runs and may refine wording; it never produces an empty instructions file (preserves at minimum the VERDICT format constraint). - Respects a project-level
AGENT.mdoverride at{cwd}/AGENT.mdor{cwd}/.crosscheck/AGENT.md; falls back to the bundledAGENT.md.
-
Technical Notes:
- New file:
src/commands/optimize.ts. selectOptimizeAgent(config, report)→'claude' | 'codex'— pure function, easy to test.- Agent invocation:
claude:claude --print "<agentMd>\n\n<diagnoseJson>\n\nCurrent instructions.md:\n<current>"codex:codex reviewcannot be reused here; instead runcodex --print(or equivalent non-interactive mode) with the same prompt. If codex does not support--print, fall back to the next available agent and log a warning.
- AGENT.md lookup order:
{cwd}/AGENT.md→{cwd}/.crosscheck/AGENT.md→{packageRoot}/AGENT.md. - Diff: small inline unified-diff helper (no new dependency).
- Wire into
cli.tsascrosscheck optimize [--apply] [--dry-run] [--agent <claude|codex>] [--since <date>].
- New file:
-
Tests Required:
selectOptimizeAgentwith only codex enabled → returns'codex'; with both enabled and codex higher success rate → returns'codex'; with both enabled and no log data → returns'claude';--agentflag overrides; diff rendering shows +/- lines; AGENT.md lookup respects override order.
-
-
AGENT.md— bundled optimize harness — ship a well-craftedAGENT.mdat the repo root that guides claude duringoptimize. This file defines how to read diagnose output, detect languages, write good constraints, and stay within quality guardrails.- User: crosscheck itself (read by
optimize); power users who want to fork and customize the optimization logic. - Acceptance Criteria:
AGENT.mdexists at the project root and is included in the npm package (filesinpackage.json).- Contains: purpose, input format spec, output format spec, language-detection mapping table, rules for good/bad instructions, VERDICT format preservation rule, reversibility rule (remove stale constraints), and worked examples.
- Produces instructions that pass
npm run typecheckafter being applied (i.e., no instructions that break the.codex/instructionsformat). - Can be overridden by placing
AGENT.mdor.crosscheck/AGENT.mdin the project root.
- Technical Notes:
- File is plain Markdown; no build step.
optimize.tsreads it at runtime viafs.readFileSyncresolved fromimport.meta.url(package root).- Keep it under 400 lines — longer files reduce claude's instruction-following accuracy.
- User: crosscheck itself (read by
-
Adaptive instructions file — both
codex.tsandclaude.tsread~/.crosscheck/instructions.mdand append its content to the review prompt /.codex/instructions. Seeded with safe defaults on first run. Replaces the hardcodednoBuildToolsNoteincodex.ts.- User: Anyone running
watch/serve— they get out-of-box sane constraints and can improve them viaoptimize. - Acceptance Criteria:
~/.crosscheck/instructions.mdis created on first review if it doesn't exist, seeded with the default no-build-tools constraint.- Project-level
.crosscheck/instructions.mdoverrides the user-level file if present. - Both
codex.tsandclaude.tsappend the instructions content; neither has hardcoded constraint strings. - If the file is empty or missing, reviews still work (graceful degradation).
crosscheck statusshows the instructions file path and whether it exists.
- Technical Notes:
- New helper
src/lib/instructions.ts:readInstructions(repoDir?: string): string— checks project-level then user-level; seeds default if neither exists; returns empty string on any read error. - Default seed content: the current
noBuildToolsNoteplus a header comment explaining the file is managed bycrosscheck optimizebut can be edited manually. - Remove
noBuildToolsNoteconstant fromcodex.ts.
- New helper
- User: Anyone running
-
Local debug log file — persist structured runtime logs to
~/.crosscheck/logs/for debugging. Enabled by default; configurable retention (default 7 days, max 30).- User: Anyone running
watch/servein production or debugging a failed review. - Acceptance Criteria:
- Logs written to
~/.crosscheck/logs/YYYY-MM-DD.ndjson(one file per UTC day, NDJSON format — one JSON object per line). - Events captured:
session_start,pr_received,review_started,review_complete,comment_posted,webhook_registered,webhook_deleted,tunnel_opened,error. - Each entry has at minimum:
{ ts, level, event, ...contextFields }. - Config keys
logs.enabled(bool, defaulttrue) andlogs.retention_days(int 1–30, default7) control behaviour. - When
logs.enabled: false, no files are created or written. - On startup, files older than
retention_daysare deleted automatically. crosscheck statusshows log location and size of today's log file.
- Logs written to
- Technical Notes:
- New file:
src/lib/logger.ts— module-level singleton; exportsinitLogger(config)andlog(entry).initLoggerruns retention cleanup and opens today's append stream. Ifenabled: false, all calls are no-ops. - Schema: add
LogsConfigSchema = z.object({ enabled: z.boolean().default(true), retention_days: z.number().int().min(1).max(30).default(7) })toschema.ts; addlogs: LogsConfigSchema.default({})toConfigSchema. watch.ts/serve.ts: callinitLogger(config)near the top; augment the locallog()closure to also calllogger.log(...)forinfoevents; wrap the PR handler catch block to calllogger.log({ level: 'error', event: 'error', ... }).review.ts: same — logpr_received,review_started,review_complete,comment_posted,error.status.ts: add aLogssection showing path, enabled state, and today's file size if it exists.- Do NOT log review text content — only metadata (pr key, reviewer, verdict, duration, error messages). No secrets, no diffs.
- New file:
- Tests Required:
initLoggerwithenabled: falsewrites nothing; retention cleanup deletes files older than N days and keeps newer ones; log entries are valid JSON;review.tsemits expected events.
- User: Anyone running
-
GITHUB_TOKENfalse failure whenghis authenticated —crosscheck initshows✗ GITHUB_TOKEN missingeven whengh auth loginwas run andgh CLIpasses. TheGITHUB_TOKENcheck is logically redundant whenghis already authenticated via stored credentials; the two checks test the same thing ("can we talk to GitHub?") via different paths.- User: Anyone running
crosscheck initwho authenticated viagh auth loginrather than exportingGITHUB_TOKEN. - Acceptance Criteria:
- If
gh auth statusreports "Logged in", theGITHUB_TOKENrow incrosscheck initshould show ✓ (not ✗). - If neither
GITHUB_TOKEN/GH_TOKENenv var norgh auth statusis authenticated, the row shows ✗ with the current fix hint. - At runtime (
watch,serve,review), ifGITHUB_TOKENis unset butghis authenticated, crosscheck derives the token viagh auth tokenand injects it before constructing the Octokit client — no manual export required.
- If
- Technical Notes:
src/commands/init.tsline 51:GITHUB_TOKENcheck fires unconditionally. Gate it on!ghAuthed(reuse theauthedbool already computed on line 43).src/config/loader.ts: add aresolveGithubToken()helper that returnsprocess.env.GITHUB_TOKEN ?? process.env.GH_TOKEN ?? execSync('gh auth token').trim()(catch onexecSyncfailure).src/github/client.ts: callresolveGithubToken()instead of reading the env var directly.
- Tests Required:
gh authenticated + no GITHUB_TOKEN env→ init shows ✓;gh not authenticated + no env→ shows ✗;gh not authenticated + GITHUB_TOKEN set→ shows ✓.
- User: Anyone running
-
Fix
watchmode tunnel — replacedgh webhook forward(not available in gh 2.65.0) withlocalhost.runSSH tunnel. SSH is pre-installed on macOS/Linux, no account needed. Tunnel URL shown in watch banner; webhooks auto-registered and deleted on exit. -
Clean up
watchoutput — subprocess output no longer dumped raw; structured log lines only. -
Auto-detect
allowed_authorson first run —crosscheck initandcrosscheck watchdetect the signed-in GitHub login viagh api userand write it torouting.allowed_authorsin the config automatically. One-time: once written, subsequent runs skip detection. Prevents the footgun of reviewing all PRs in an org because the author filter was never set. -
Fix
watchbanner display real-estate — compress three banner rows (deployment,mode,quality) into a singleprofilerow; show users repo-count inline on theusersrow instead of as an indented sub-row. Net: banner shrinks from 8 rows to 5 rows.- Why: the
modeandqualityvalues already appear in the live status line, making their banner rows redundant. The users sub-row had inconsistent indentation and used a line for metadata (repo count) that belongs inline. - How:
watch.tsbanner emitsprofile <deployment> · <mode> · <quality>andusers <login> (<n> repos). Theconfig ← edit to change abovehint stays on the config row.
- Why: the
-
Fix
watchlive board separator wrap glitch and idle height — separator width changed fromwtow - 1(prevents exact-terminal-width cursor ambiguity that causes the next line's●to appear at the end of the separator);writeLivenow counts visual rows (accounts for wrapping) instead of logical newlines; connectivity section only rendered when it contains entries (removes 3 always-present empty rows in the idle state).- Why: full-width separators trigger an ambiguous terminal cursor position that made consecutive render frames appear on the same line. Empty connectivity rows wasted vertical space and inflated the
liveLinescount, causing eraseLive to under-erase.
- Why: full-width separators trigger an ambiguous terminal cursor position that made consecutive render frames appear on the same line. Empty connectivity rows wasted vertical space and inflated the
-
Fix
origin/<base>ref missing in PR clone — reviews receiving 0loc —git fetch origin <base>:<base>creates a local branch but not theorigin/<base>remote-tracking ref thatcodex review --base <branch>andcomputePRLocrequire. Changed togit fetch origin <base>(no refspec) which properly populatesorigin/<base>.- Why: codex internally runs
git diff origin/<base>...HEAD; without the remote-tracking ref the diff fails silently, the review runs against an empty diff, and the completion line shows0loceven though the PR has code changes. - How:
watch.tsclone setup replacesexecSync('git fetch origin ${ref}:${ref}', ...)withexecSync('git fetch origin ${ref}', ...).
- Why: codex internally runs
-
Deployment Mode & Smart Scope Detection — formalize the three monitoring scope levels (repo, org, user/personal) and introduce a
deployment: personal | teamconfig field.crosscheck watchandcrosscheck serveeach prompt the user to choose a mode on first run (whendeploymentis absent from config), then auto-detect scopes from GitHub credentials and write the result to config. Subsequent runs skip the prompt entirely. Closes the gap where AI agents opening PRs to personal repos go unwatched, and removes the footgun of serving an entire org with no author filter in team mode.-
User: Personal developer running
crosscheck watch(wants all of their own PRs reviewed across personal repos and orgs). Team operator runningcrosscheck serve(wants all org PRs reviewed, personal repos excluded). -
Acceptance Criteria:
Scope levels (all three work independently and combine additively):
repos:— monitor specific repos. At startup, validate each configured repo is accessible via GitHub API; log✗ repo not found or inaccessible: owner/name — skippedand continue (do not crash).orgs:— monitor all repos in the listed orgs via one org-level webhook per org.users:— monitor all non-archived repos owned by the listed GitHub personal accounts; enumerated at startup viaGET /users/{username}/repos?type=owner.
deploymentconfig field:- New top-level field:
deployment: personal | team. No default in schema — absence triggers the first-run prompt in watch/serve. personal— monitorsusers=[self]+orgs=[all-memberships];allowed_authors=[self](only the owner's PRs reviewed).team— monitorsorgs=[all-memberships]only (no personal repos);allowed_authors=[](all PRs in org scope reviewed).
crosscheck init— no change to scope logic:- Remains a pure environment check: CLIs, GitHub token, webhook secret, config file creation.
- Does not prompt for deployment mode; does not detect org memberships.
- Existing
allowed_authorsauto-detection behaviour is preserved for backward compatibility.
First-run prompt in
crosscheck watchandcrosscheck serve:- Triggered once when
deploymentkey is absent from config (i.e., first run or pre-existing config from an older version). - Printed before the startup banner:
How are you using crosscheck? [1] personal — monitor all your repos and orgs; review only PRs you author [2] team — monitor org repos only; review all PRs from any author Choice [1]: - Default is
[1](personal). Pressing Enter accepts the default. - After the user chooses, crosscheck detects GitHub login + org memberships, writes
deployment:,users:(personal only),orgs:, andallowed_authors:to config, then continues startup without restart. - Subsequent runs:
deploymentis present → prompt is skipped entirely.
One-time override —
--personal/--teamflags:- Use the specified mode for this session only. Config is not read or written.
- Scopes are auto-detected at runtime (same detection logic as normal mode); nothing persisted after exit.
- Intended for CI pipelines, one-off runs, or trying a mode before committing to it.
- Example:
crosscheck watch --teamreviews all org PRs this session; next run reverts to whateverdeploymentsays in config.
Permanent reconfigure —
--reconfigureflag:- Re-triggers the setup prompt unconditionally, even if
deploymentis already set. - Shows current saved mode:
Current: personal. - After the user chooses, overwrites
deployment:,users:,orgs:,allowed_authors:in config. - Useful when joining a new org, switching from personal to team use, or correcting a first-run mistake.
- Example:
crosscheck watch --reconfigure.
crosscheck watchruntime behavior (after mode is known):- When
users,orgs,reposare all empty: auto-detect scopes from GitHub credentials based ondeployment. Prints✦ scopes auto-detected from GitHub credentials. - Explicit
users/orgs/reposin config always take precedence over auto-detection. - Banner shows
deployment personalordeployment team.
crosscheck serveruntime behavior (after mode is known):- Same auto-detection logic as watch, keyed on
deployment. - In
teammode with emptyallowed_authors, replace the existing warning with:author filter all PRs (team mode — set allowed_authors to restrict). - Banner shows deployment mode.
-
Technical Notes:
- Schema: add
deployment: z.enum(['personal', 'team']).optional()toConfigSchema— intentionally no default so absence can trigger the prompt. - New function
src/github/client.ts:listUserOrgs(token: string): Promise<string[]>—GET /user/memberships/orgs?state=active&per_page=100, paginates, returns org login strings; returns[]on error (never throws). - New function
src/github/client.ts:checkRepoAccessible(owner: string, repo: string, token: string): Promise<boolean>— returns false on 404/403, true on 200. - New function
src/config/loader.ts:detectScopesForDeployment(deployment: 'personal' | 'team', token: string): Promise<{ users: string[]; orgs: string[] }>— callsdetectGitHubLogin()+listUserOrgs(); returns{ users: [login], orgs }for personal,{ users: [], orgs }for team. - New function
src/config/loader.ts:patchDeploymentConfig(configPath, deployment, login, orgs): boolean— writesdeployment:,users:(personal only),orgs:,allowed_authors:to config YAML; no-op ifdeploymentkey already present (useforce: trueto overwrite for--reconfigure). - Repo accessibility check:
watch.tsandserve.ts, after loadingconfig.repos, callcheckRepoAccessiblefor each in parallel; log warning and filter out inaccessible ones before building scopes. watch.ts/serve.ts: prompt logic runs before the startup banner.--personal/--teamskip all config reads/writes and use runtime-only scopes.--reconfigureruns the prompt withforce: trueand rewrites config.crosscheck.config.example.yml: add commenteddeployment: personalwith explanation of both values.get-started.md: add a Deployment mode section documenting the three flags.
- Schema: add
-
Tests Required:
listUserOrgspaginates correctly; returns[]on API error.checkRepoAccessiblereturns false on 404; true on 200.detectScopesForDeployment('personal', token)→{ users: [login], orgs: [...] }.detectScopesForDeployment('team', token)→{ users: [], orgs: [...] }.patchDeploymentConfigwrites all fields; is a no-op ifdeploymentpresent andforceis false; overwrites whenforce: true.- First-run prompt shown when
deploymentabsent; not shown when present. --personalflag: uses personal scopes this session; config is not written.--teamflag: uses team scopes this session; config is not written.--reconfigureflag: shows prompt even whendeploymentalready set; shows current mode; writes new choice to config.- Watch with empty scopes +
deployment: personal→ auto-detects users + orgs. - Watch with empty scopes +
deployment: team→ auto-detects orgs only. - Serve
teammode + emptyallowed_authors→ shows positive confirmation, not warning. - Inaccessible repo in
repos:→ warning logged, repo skipped, remaining repos monitored.
-
-
Live connectivity log section in
watchdashboard — add a dedicated 2-line section between the top status dashboard and the per-PR work area. Shows the 2 most recent connectivity events (tunnel open/close, webhook registrations) in-place without cluttering the scrollback.- User: Anyone running
crosscheck watchwho wants to see tunnel/webhook status at a glance alongside active PR work. - Acceptance Criteria:
- A fixed 2-line connectivity section appears between the 3-row dashboard and the PR slots in the live display.
- Shows the 2 most recent events: tunnel ready, tunnel disconnected, webhook registered, webhook failed.
- Each line is timestamped:
9:18:14 AM ✓ tunnel ready: https://... - Lines are padded with empty strings until 2 events have occurred, so the section height is stable.
- Connectivity events do NOT appear in the scrollback (they are in-place only).
- Tunnel errors and webhook errors still also appear in scrollback via
bLog(so they're not lost on reconnect).
- Technical Notes:
board.ts: addprivate connLog: string[](max 2 entries);logConnectivity(line): voidappends with timestamp, shifts oldest when full.render(): addSection 1.5betweensepand PR slots, alwaysCONN_LOG_MAXlines.watch.ts: addcLog(line)helper →board.logConnectivity(line)+fileLog; route tunnel open/close/fail and webhook registered/failed tocLog.
- User: Anyone running
-
Comment-based attribution detection (P2 step 4) — extend
detectOriginFullto scan PR review comments for crosscheck annotation tags (<!-- crosscheck: origin=... -->), positioned after body/commit/branch checks and beforeauthor_routes. This is the mechanism that enables P3 annotations to feed back into future detections.- User: Anyone running crosscheck on repos where PRs are authored by agents that don't leave body footers (e.g., agents that only commit code, or agents whose PR template doesn't include attribution text).
- Acceptance Criteria:
detectOriginFullgains a new step 4: if steps 1–3 return null, fetch PR comments viaGET /repos/{owner}/{repo}/issues/{number}/comments; scan each body for<!-- crosscheck: origin=(claude|codex) -->. Return the first match found.- Step 4 is only called when steps 1–3 are inconclusive — no extra API call on the fast path.
- Comment fetch failure is non-fatal; falls through to
author_routes(same pattern as commit fetch). detectOriginFulllogsmethod: 'comment'when attribution is resolved at this step.- Works correctly with both
<!--(HTML comment) and<!-- crosscheck: ... -->— the crosscheck annotation subset.
- Technical Notes:
- New function
src/github/client.ts:listPRComments(owner, repo, prNumber, token): Promise<string[]>— returns comment bodies, paginated (100/page). src/github/detector.ts: new step between branch check and author_routes; wrapslistPRComments+matchPatternson the annotation tag<!-- crosscheck: origin=(claude|codex).- Annotation regex:
/<!--\s*crosscheck:\s*origin=(claude|codex)/i— case-insensitive, whitespace-tolerant. src/__tests__/detector.test.ts: add test cases for comment-resolved attribution (mocklistPRComments), step-4-skipped-when-body-matches, API failure falls through.
- New function
- Tests Required: Step 4 resolves 'claude' from annotation tag; step 4 resolves 'codex'; step 4 not called when body matches; step 4 API failure → falls through to author_routes; no annotation tag in comments → falls through; annotation tag with extra fields → still matches.
-
routing.fallback_reviewerconfig field (P2 assignment) — when attribution detection returns 'human' or unknown in cross-vendor mode, apply a configured policy instead of silently skipping. Defaultskippreserves backward-compatible behavior; setting toclaudeorcodexensures all unattributed in-scope PRs get reviewed.- User: Anyone running crosscheck on repos where PR attribution is inconsistent — mixed human/AI authors, agents that don't leave footers, or early adopters who want to ensure 100% coverage regardless of attribution confidence.
- Acceptance Criteria:
- New field
routing.fallback_reviewer: 'claude' | 'codex' | 'skip'inRoutingConfigSchema, default'skip'. assignReviewerusesfallback_reviewerwhenorigin === 'human'in cross-vendor mode. Returnsnullwhenskip; returns the named vendor whenclaudeorcodex.- Existing behavior (silently skip human-origin PRs) is preserved when
fallback_revieweris absent or'skip'. serve.ts/watch.tslogorigin_method: 'none', fallback: 'claude'(or skip) when the fallback fires.crosscheck.config.example.ymldocuments the field with a comment explaining the coverage/noise tradeoff.crosscheck statusshows the active fallback policy.
- New field
- Technical Notes:
src/config/schema.ts: addfallback_reviewer: z.enum(['claude', 'codex', 'skip']).default('skip')toRoutingConfigSchema.src/github/detector.ts:assignRevieweralready returnsnullfororigin === 'human'in cross-vendor mode; update toreturn config.routing.fallback_reviewer === 'skip' ? null : config.routing.fallback_reviewer.crosscheck.config.example.yml: add commentedfallback_reviewer: skipunderrouting:.
- Tests Required:
fallback_reviewer: 'skip'→ null on human origin;fallback_reviewer: 'claude'→ 'claude' on human origin;fallback_reviewer: 'codex'→ 'codex' on human origin; single-vendor mode ignores fallback_reviewer; cross-vendor with both vendors enabled + fallback → fallback vendor returned; named fallback vendor is disabled → still returned (reviewer availability check is separate).
-
Crosscheck annotation system (P3) — embed machine-readable attribution metadata in every review comment crosscheck posts, and in every commit crosscheck pushes. This creates a durable, self-consistent attribution record that Step 4 detection (comment-based) can read on future events.
- User: Long-running crosscheck installations where PRs accumulate history — reruns, rechecks, follow-up commits. Without annotations, each event re-infers attribution from scratch, which can diverge if PR body is edited or commits are amended.
- Acceptance Criteria:
- Every review comment posted by
postReviewCommentinclient.tsappends\n<!-- crosscheck: origin=<origin> reviewer=<reviewer> verdict=<verdict> -->to the comment body (after the existing footer). - Every
[crosscheck]commit pushed inaddressstep includesCrosscheck-Reviewer: <reviewer>as a Git trailer in the commit message. - The annotation tag is invisible in GitHub's rendered Markdown (HTML comment).
parseAnnotation(commentBody: string): { origin: PROrigin; reviewer: string; verdict: string } | null— pure function insrc/lib/annotation.ts. Returns null if no tag found.postReviewCommentacceptsoriginparameter (already known at call site in serve.ts/watch.ts) and embeds it.- Round-trip:
parseAnnotation(postReviewComment output)returns the correct origin/reviewer/verdict. - Annotation format is tested as a stable schema — any format change is flagged in tests.
- Every review comment posted by
- Technical Notes:
- New file
src/lib/annotation.ts:buildAnnotation(origin, reviewer, verdict): string→<!-- crosscheck: origin=<o> reviewer=<r> verdict=<v> -->;parseAnnotation(body): Annotation | null— regex parse. src/github/client.ts:postReviewCommentgainsorigin: PROriginparameter; appendsbuildAnnotation(origin, reviewer, verdict)after the existing footer.- Call sites in
serve.ts/watch.ts/review.ts: passorigintopostReviewComment. src/lib/runner.tsaddressstep: append\nCrosscheck-Reviewer: <reviewer>to[crosscheck]commit message.src/__tests__/annotation.test.ts: round-trip tests; format stability tests (snapshot the exact tag string).
- New file
- Tests Required:
buildAnnotation('claude', 'codex', 'APPROVE')produces expected tag;parseAnnotationreturns correct fields;parseAnnotationon comment without tag returns null;parseAnnotationis whitespace-tolerant; annotation appended correctly to full comment body; round-trip: build then parse returns original values; format snapshot — tag string does not drift between test runs.
-
Custom Workflow Engine —
workflow.ymlper-repo pipeline definition: ordered steps (review,address,recheck),whenconditions on verdict/context, per-stepinstructionsfor behavior steering, andmax_roundsguard. Enables the review → auto-fix → re-review loop without code changes.- User: Teams with high PR volume who want crosscheck to close the feedback loop, not just comment. Also teams that want different reviewer behavior at each pipeline stage.
- Acceptance Criteria:
loadWorkflow(repoDir, configDir)always returns a valid step list. When noworkflow.ymlis found, it returns theDEFAULT_WORKFLOWconstant (singlereviewstep) — no separate fallback code path.watch.ts/serve.tsalways callloadWorkflow+runWorkflow; there is no conditional that bypasses the runner for the no-file case.crosscheck initgenerates a.crosscheck/workflow.ymltemplate with the default step active andaddress/rechecksteps present but commented out.- Supported step types:
review(run AI reviewer, post comment),address(read review comment, commit fixes to PR branch),recheck(re-review after fixes). whenfield: evaluated as a boolean expression; step skipped if false. Supported context:verdict,<step-name>.applied_count,<step-name>.verdict.- Per-step
instructionsfield appended to AI prompt for that step only, extending global~/.crosscheck/instructions.md. max_roundsonaddresssteps (default 1); hard cap of 5[crosscheck]commits per PR.- All
addresscommits prefixed[crosscheck]in the message. crosscheck review <pr-url> --workflowexercises the full workflow against a single PR for testing.- No
addressstep ever merges;auto_mergeis always false.
- Technical Notes:
src/lib/workflow.ts:DEFAULT_WORKFLOWconstant; Zod-validated schema;loadWorkflow(repoDir, configDir)returnsDEFAULT_WORKFLOWwhen no file found — never null.src/lib/runner.ts:runWorkflow(steps, context)— iterates steps, dispatches handlers.addresshandler: parse AI response as file-level patches →git apply→ push[crosscheck]commit.whenevaluation: minimal expression evaluator (equality + comparison, no scripting engine).watch.ts/serve.ts: unconditionally callloadWorkflow+runWorkflow; delete the direct reviewer call.init.ts: write.crosscheck/workflow.ymltemplate during init (see Feature Design section).
- Tests Required:
loadWorkflowreturnsDEFAULT_WORKFLOWon absent file;loadWorkflowparses a valid file correctly;when: "verdict == 'APPROVE'"skipsaddressstep;max_roundscap respected;addresscommits prefixed[crosscheck]; runner withDEFAULT_WORKFLOWproduces identical output to current direct-call behavior.
-
Auto-init on
watch/serve—crosscheck watchandcrosscheck servedetect whether first-time setup has been done and run init steps automatically before starting the monitor.crosscheck initbecomes optional, not required.- User: Anyone running crosscheck for the first time. The current expectation ("run init first") is undiscoverable — most users just try
crosscheck watchand hit missing-config errors. - Acceptance Criteria:
- On
crosscheck watch/crosscheck servestartup, before opening the tunnel or binding the port, callensureInit(cwd). - If
~/.crosscheck/.initializedexists and contains the current crosscheck version,ensureInitskips global setup (webhook secret generation) but still runs cheapexistsSyncchecks for the two repo-local files (crosscheck.config.yml,.crosscheck/workflow.yml). If either is missing, it is created before returning. No subprocess spawns on the fast path. - If sentinel is absent or version differs, print
✦ first run — setting up crosscheck..., run missing setup steps, write sentinel, then continue. - Auth checks (gh, claude, codex CLIs) remain in
crosscheck initonly — not run byensureInit(they require subprocess spawns and would defeat the fast-path goal). - After auto-init completes, watch/serve continues normally without requiring a restart.
crosscheck initremains a standalone command; bypasses sentinel (--forceinternally) and always runs the full check + prints status table. Re-running does not overwrite existing files.--no-initflag onwatch/serveskips theensureInitcall entirely for CI environments.
- On
- Technical Notes:
- New file:
src/lib/setup.ts—ensureInit(cwd, opts?): sentinel check first; on miss, runs setup steps and writes~/.crosscheck/.initialized. init.tscallsensureInitwith{ force: true, verbose: true }then prints status table.watch.ts/serve.ts:await ensureInit(process.cwd())beforeloadConfig.
- New file:
- Tests Required: sentinel present + version match + repo-local files exist → no files written; sentinel present + version match + repo-local files absent → creates missing repo-local files only (no webhook secret re-generated); sentinel absent → runs all three setup steps; sentinel version mismatch → re-runs changed steps;
--no-initbypasses call;crosscheck initoverwrites sentinel even if present; second repo with same version → repo-local files created even though sentinel already exists.
- User: Anyone running crosscheck for the first time. The current expectation ("run init first") is undiscoverable — most users just try
-
crosscheck issue— scan recent logs for errors, draft a GitHub issue using the local AI agent, ask targeted multiple-choice follow-up questions, and submit tomotivation-labs/crosscheckafter user confirmation. Zero manual log-digging required.- User: Anyone who hits a recurring or unexpected review failure and wants to report it without writing the issue from scratch or navigating log files manually.
- Acceptance Criteria:
crosscheck issuereads~/.crosscheck/logs/for the most recent 3 days (default);--since YYYY-MM-DDoverrides the window.- Reuses the same error-grouping logic as
diagnose(extracted intosrc/lib/log-analysis.ts). If noerror-level entries are found, printsNo errors found in recent logs — nothing to reportand exits 0. - If multiple error patterns are found, shows a numbered menu and prompts
Which issue do you want to report? [1–N]before proceeding. - Passes the selected log entries + current version, platform, and config summary (mode, enabled vendors — no repo names or secrets) to the local AI agent to draft an issue with: a concise title, description (what failed and likely cause), steps to reproduce (inferred from the log event sequence), sanitized log excerpt, and environment block (version, platform, reviewer, config mode).
- After generating the draft, asks exactly 3 targeted multiple-choice questions to improve the report:
Can you reproduce this consistently?→[1] Every time [2] Sometimes [3] Happened onceWhich command triggered this?→[1] watch [2] serve [3] review [4] Unknown(skip if unambiguous from logs)Is this blocking you from using crosscheck?→[1] Blocked [2] Degraded [3] Cosmetic(sets label priority) Answers are appended to the issue body under## User Context. No free-text input required.
- Shows the final draft in the terminal and prompts
Submit to motivation-labs/crosscheck? [y/N]. --yes/-yskips the confirmation step and submits immediately after displaying the draft.--dry-runprints the draft and exits 0 without callinggh, regardless of--yes.- Submission uses
gh issue create --repo motivation-labs/crosscheck. Falls back to printing the exactgh issue createcommand the user can copy-run ifghis not authenticated or the call fails. - Adds label
bugalways; adds labelpriority:highwhen impact answer isBlocked. - On success, prints the issue URL.
- Sanitization rules (non-negotiable — applied before passing log entries to AI and before posting):
- Strip:
owner/repopatterns, PR titles, file paths, GitHub usernames, branch names, any string matching a GitHub URL. - Replace with:
[repo],[pr-title],[file-path],[username]. - Webhook secrets and tokens are never present in log entries (enforced by
logger.ts) — no special handling needed.
- Strip:
- Technical Notes:
- New file:
src/commands/issue.ts. - Extract error-grouping logic from
diagnose.tsintosrc/lib/log-analysis.ts; bothdiagnose.tsandissue.tsimport from it. - Agent selection: same
selectOptimizeAgent(config, report)fromoptimize.ts. - Agent prompt structure:
You are drafting a GitHub issue for the crosscheck project. Error pattern: {pattern} Frequency: {count} occurrences in the last {days} days Sanitized log entries: {entries} Environment: crosscheck {version} · {platform} · reviewer: {reviewer} · mode: {mode} User context: - Reproducibility: {reproducibility} - Trigger: {command} - Impact: {impact} Output exactly: TITLE: <title> --- <markdown body> - Parse
TITLE:line as issue title; everything after---as the body. - Wire into
cli.tsascrosscheck issue [--since <date>] [--dry-run] [--yes].
- New file:
- Tests Required: sanitizer removes repo names, PR titles, file paths, usernames; no errors found → exits 0 with message; multiple patterns → prompts menu;
--dry-runprints draft and skipsgh;--yesskips confirmation; draft parsing extracts title and body correctly;ghnot authenticated → prints manual command;priority:highlabel added when impact isBlocked.
-
crosscheck impact— report cumulative value crosscheck has created: time saved through automation, issues caught before merge, and second-order code quality signals. Pulls from local logs; no telemetry, no network calls.- User: Anyone who wants to understand whether crosscheck is pulling its weight — developers justifying continued use, team leads making tooling decisions, engineering managers tracking process improvement.
- Acceptance Criteria:
crosscheck impactprints a human-readable report to stdout;--jsonoutputs structured JSON.--since YYYY-MM-DDlimits the analysis window (default: all time).- Time-saving section:
- Shows total PRs reviewed, total estimated human-hours saved, and average minutes saved per PR.
- Calculation:
time_saved_per_pr = assumed_human_review_min − actual_ai_review_min. Defaultassumed_human_review_min = 60(configurable viaimpact.assumed_human_review_minutesincrosscheck.config.yml).actual_ai_review_minis derived fromreview_complete.duration_msin the logs; falls back to 2 min when data is absent. - Displays the assumption so users can calibrate:
ⓘ assumes 60 min avg human review — set impact.assumed_human_review_minutes to adjust.
- Issues caught section:
- APPROVE / NEEDS_WORK / BLOCK verdict counts and percentages.
issues_caught = NEEDS_WORK + BLOCKverdicts — PRs that would have shipped with unreviewed feedback had crosscheck not run.- BLOCK count surfaced separately with a plain-language note: "potential bugs or breaking changes caught before merge".
- Code quality signal section:
- Trend line: BLOCK rate over the analysis period (weekly buckets). A declining BLOCK rate may indicate improved code quality upstream.
- Top file types with NEEDS_WORK/BLOCK verdicts — surfaces where the most issues appear.
- Monetary estimate (opt-in):
- Hidden by default; shown with
--money. - Formula:
estimated_value = (hours_saved × hourly_rate) + (issues_caught × defect_cost). Defaults:hourly_rate = 150(USD),defect_cost = 150(one hour of engineer time per issue). Both configurable viaimpact.hourly_rate_usdandimpact.defect_cost_usd. - Shown with a clear disclaimer: "rough estimate based on configurable assumptions; not accounting data."
- Hidden by default; shown with
- Exit 0 always (reporting tool, not a gate).
- Gracefully handles empty log dir: prints
No review data yet — run crosscheck watch to start collecting.
- Technical Notes:
- New file:
src/commands/impact.ts. - Reuse the log parser from
diagnose.ts— extract intosrc/lib/log-reader.tsif not already a standalone module. - New config fields in
schema.ts:Added toImpactConfigSchema = z.object({ assumed_human_review_minutes: z.number().int().min(1).default(60), hourly_rate_usd: z.number().min(0).default(150), defect_cost_usd: z.number().min(0).default(150), })ConfigSchemaasimpact: ImpactConfigSchema.default({}). - Duration data comes from
review_completelog entries that includeduration_ms. For entries without duration, omit them from the per-review average (don't assume a value). - Verdict data comes from
review_completelog entries with averdictfield. Entries with no verdict are counted asUNKNOWNand excluded from BLOCK/NEEDS_WORK totals. - Wire into
cli.tsascrosscheck impact [--json] [--since <date>] [--money]. crosscheck statusgets a one-line impact summary appended:impact 47 PRs reviewed · ~23h saved · 8 issues caught— linking to fullcrosscheck impactfor details.
- New file:
- Calculation methodology (basis):
- Time saved per PR: Industry research (Google Engineering Productivity, Microsoft Research SPACE framework) puts median human code review time at 60–90 min per PR for non-trivial changes. The 60 min default is conservative. AI turnaround measured from log
duration_msis typically 1–3 min. Net saving per PR: ~57 min at default settings. - Defect cost: NIST studies put post-merge defect fix cost at 4–10× the cost of catching it during review. At $150/hr and 1 hr median fix time, each issue caught pre-merge is conservatively worth $150. BLOCK-severity issues are not weighted more (keeps the math transparent and conservative).
- Second-order quality signal: Declining BLOCK rate over time is a leading indicator that PRs are getting cleaner upstream — teams internalize review feedback. This is a proxy metric, not a hard measurement.
- Time saved per PR: Industry research (Google Engineering Productivity, Microsoft Research SPACE framework) puts median human code review time at 60–90 min per PR for non-trivial changes. The 60 min default is conservative. AI turnaround measured from log
- Tests Required: empty log dir → graceful no-data message; log with mixed verdicts → correct APPROVE/NEEDS_WORK/BLOCK counts; duration data present →
actual_ai_review_mincalculated correctly; duration data absent → falls back to default;--sincefilters log entries by date;--jsonoutput is valid JSON matching schema;--moneyflag gates monetary estimate display;crosscheck statusshows one-line summary.
-
crosscheck coverage— Gap Analysis and Self-Improvement Engine — compare what crosscheck should have reviewed (monitored scope × live uptime) against what it actually reviewed (logs), identify the root cause of each missed PR, and route the finding to the appropriate remediation: config fixes are applied or filed as best-practice issues; feature gaps become prd.md proposals that can optionally be auto-contributed as PRs tomotivation-labs/crosscheck.-
User: Anyone who has been running crosscheck for a week or more and wants to know whether it's actually catching everything it should be, and what to do about the gaps.
-
Acceptance Criteria:
Coverage measurement:
- Computes an uptime window from
session_startevents in~/.crosscheck/logs/— the union of all periods crosscheck was running. - For each repo/org/user in the current config, calls the GitHub API to enumerate all PRs opened or updated during the full analysis period (from the earliest
session_startin logs, or--sinceif provided — not limited to uptime windows). This ensures PRs that were active only while crosscheck was offline are still enumerated and can be classified asoffline_window. - Cross-references that list against
pr_received+review_completelog entries to find PRs in scope that were never reviewed. - Reports a coverage percentage per scope and overall:
63 / 71 PRs reviewed (89%).
Gap classification — each missed PR is classified into exactly one root cause:
Cause Meaning Action type author_filteredPR author not in allowed_authorsconfig_fixno_attributionPR body has no Claude/Codex footer and no author_routesentryconfig_fixno_reviewerAttribution detected but no vendor enabled for that origin config_fixoffline_windowPR opened while crosscheck was not running config_infowebhook_missPR in scope but no webhook event arrived (webhook not registered?) config_fixunknown_patternPR reviewed but reviewer assignment logged as skipped with unknown origin feature_requestunsupported_agentPR authored by an AI agent crosscheck doesn't recognize feature_requestConfig-fix recommendations:
- Each
config_fixgap produces a specific, copy-pasteable suggestion:author_filtered: "3 PRs fromdependabot[bot]skipped — add toallowed_authorsor switch to team mode."no_attribution: "5 PRs from a human author had no attribution footer — add anauthor_routesentry to route them."no_reviewer: "2 PRs detected ascodex-origin but Codex is disabled — setvendors.codex.enabled: true."
--applywrites the suggested config changes directly tocrosscheck.config.ymlafter confirmation.--issuefiles the config gap as a best-practice issue tomotivation-labs/crosscheckusing the samegh issue createpipeline ascrosscheck issue. Issue title:config: [gap type] — best practice recommendation. The issue describes the condition, the ideal config, and asks the maintainers to surface it incrosscheck initas a warning.
Feature-request recommendations:
- Each
feature_requestgap produces a structured feature proposal: problem statement, example missed PRs (sanitized), proposed detection logic, and estimated impact (N PRs/week that would be caught). --prdclonesmotivation-labs/crosscheckto a temp dir, creates a feature branch, appends the proposal toprd.mdunder Build Queue → 🔜 Next Up, and opens a draft PR. PR body includes the sanitized gap data as supporting evidence.--buildgoes further: after writing the prd.md entry, instructs the local AI agent (viaclaude --print) to implement the feature — creating the necessary source files, updating schema/config/tests — and pushes the implementation commits onto the same branch before opening the PR as ready-for-review.- Both
--prdand--buildrequiregh auth statusto confirm the user has push access to the repo (or their fork). Falls back to--dry-runbehavior if access check fails.
CLI interface:
crosscheck coverage # show gap report, no changes crosscheck coverage --apply # apply config fixes after confirmation crosscheck coverage --issue # file config gaps as best-practice issues crosscheck coverage --prd # open a draft PR with prd.md feature proposal crosscheck coverage --build # implement the feature and open a ready PR crosscheck coverage --since YYYY-MM-DD # limit analysis window crosscheck coverage --json # structured JSON output
Sample output:
crosscheck coverage (last 14 days · 71 PRs in scope) Coverage: 63 / 71 PRs reviewed (89%) Missing: 8 PRs author_filtered 3 PRs → add to allowed_authors or switch to team mode from: bot account (*[bot] pattern) ×3 no_attribution 3 PRs → add author_routes to config from: human author (no Claude/Codex footer detected) ×3 unsupported_agent 2 PRs → feature gap (GitHub Copilot attribution not recognized) from: copilot-swe-agent[bot] ×2 Config fixes available: Run crosscheck coverage --apply to apply the config fixes above. Run crosscheck coverage --issue to file best-practice issues for each config gap. Feature gaps available: Run crosscheck coverage --prd to propose attribution support for Copilot agents. Run crosscheck coverage --build to implement + open a PR to motivation-labs/crosscheck. - Computes an uptime window from
-
Technical Notes:
- New file:
src/commands/coverage.ts. - Uptime computation: scan log entries for
session_start/session_endpairs; merge overlapping windows; the result is a list of[start, end]intervals. Forsession_startwith no matchingsession_end(process killed), assume the window ended at the timestamp of the next non-session log entry. - Scope enumeration: GitHub has no
GET /orgs/{org}/pullsendpoint and the pulls list API has nosincefilter. Use the Search API for org scopes:GET /search/issues?q=type:pr+org:{org}+updated:>{since}&per_page=100wheresinceis the earliest analysis period boundary. Usingupdated:>(notcreated:>) ensures long-lived PRs opened before the window but updated during it are included, matching the "opened or updated during analysis period" requirement. Forconfig.usersentries, enumerate repos first vialistUserRepos(already inclient.ts) then queryGET /repos/{owner}/{repo}/pulls?state=all&per_page=100per repo. Forconfig.repos, query each repo directly the same way. Search API rate limit is 30 req/min authenticated — add a small delay between org queries if the user has many orgs. - Log join: a PR is "reviewed" if there is a
pr_receivedlog entry matchingowner/repo#numberAND areview_completeentry for the same key. - Scope enumeration uses the full analysis period, not uptime windows. The
sinceparameter for Search/pulls queries ismax(--since flag, earliest session_start in logs)— it covers everything crosscheck has ever been configured to watch, regardless of whether it was online. This ensuresoffline_windowis reachable. - Gap classification: applied in priority order; first matching rule wins.
offline_windowfires when a PR is not in the reviewed set AND all of itscreated_at/updated_attimestamps fall outside every uptime window — meaning crosscheck simply wasn't running when the PR was active. PRs that overlap at least one uptime window but were still not reviewed proceed to the author/attribution/routing checks. - Config apply: uses
yaml.load+yaml.dumppattern (same aspatchDeploymentConfig). Shows a before/after diff and promptsApply? [y/N]unless--yesis passed. - Issue filing (
--issue): callsgh issue create --repo motivation-labs/crosscheckwith a templated body. Body includes: gap type, frequency, ideal config, and a request to add a startup warning. No PR data included — only the gap pattern and config suggestion. - PR contribution (
--prd,--build):gh repo clone motivation-labs/crosscheck <tmpDir>(or fork + clone if no push access).git checkout -b feat/coverage-<gap-type>-<date>.- Append PRD entry to
prd.md(or--build: generate and write source files). git commit -m "feat: <gap type> — <one-line description>".gh pr create --title "..." --body "..."— draft PR for--prd, ready PR for--build.
- Wire into
cli.tsascrosscheck coverage [--apply] [--issue] [--prd] [--build] [--since <date>] [--json] [-y/--yes]. - Agent selection for
--build: sameselectOptimizeAgentlogic. Prompt instructs the agent to implement only the specific detection pattern or config handling identified in the gap, not a full feature rewrite.
- New file:
-
Tests Required:
- Uptime window computation: two overlapping sessions → merged; unclosed session → window ends at next log entry timestamp.
- Scope enumeration stub: given a mock GitHub API, returns correct PR list within uptime window.
- Gap classification:
author_filteredfires beforeno_attribution;offline_windowfires when all of a PR's timestamps are outside uptime windows AND the PR is enumerated from the full analysis period (not pre-filtered to uptime windows). --applyshows diff and writes correct YAML; is a no-op when--yesnot passed and user declines.--issuecallsgh issue createwith no PR identifiers or repo names in the body.--prdclones repo, creates branch, appends to prd.md, opens draft PR.--jsonoutput is valid JSON matching schema.- Coverage % calculated correctly: 0 reviewed → 0%; all reviewed → 100%.
-
-
Test
servemode — run on a fixed port, register webhook manually, verify reviews post correctly -
crosscheck reviewresult feedback — after posting, log a link to the PR comment -
Tiered Feedback Loops — local usage analytics, instruction-effectiveness signals, and safe opt-in telemetry — three-tier system that measures crosscheck's real-world impact, feeds those signals back into
optimize, and—only with explicit consent—sends de-identified aggregate counts to inform future development. Privacy-first by design: sensitive data never leaves the machine; telemetry is opt-in (off by default).-
User: All crosscheck users benefit from better defaults driven by real usage. Contributors to the project benefit from aggregate signal. Power users benefit from local analytics surfaced in
diagnose. -
Acceptance Criteria:
Tier 1 — Local count statistics (always-on, local only):
- After every review, append a metrics record to
~/.crosscheck/metrics/YYYY-MM.ndjsoncontaining:{ ts, reviewer_pair, verdict, duration_ms, comments_count, pr_sha_prefix }.pr_sha_prefixis the first 8 characters of the PR's head SHA — enough to detect follow-up commits later, not enough to identify the repo or author. reviewer_pairvalues:claude_reviews_codex,codex_reviews_claude,claude_reviews_claude,codex_reviews_codex.- Follow-fix detection: when a new commit arrives on a PR that previously received a
NEEDS_WORKorBLOCKverdict, log afollow_fixevent linking the two reviews bypr_sha_prefix. This tracks whether AI review comments actually get addressed. crosscheck diagnoseincorporates Tier 1 data: adds a Usage section reporting reviewer-pair distribution, average comments per review, verdict distribution over the period, and follow-fix rate.- Metrics files are subject to the same
logs.retention_dayssetting as event logs. Stored in~/.crosscheck/metrics/, never in the project directory. - No code content, no PR title, no repo name, no file names, no author identities stored.
Tier 2 — Instruction effectiveness tracking (always-on, local only):
- When
optimizeapplies a newinstructions.md, snapshot the current instruction fingerprint (SHA-256 of the file) and log it with a timestamp to~/.crosscheck/metrics/optimize-history.ndjson. - In subsequent metric records, include the active
instruction_fingerprintso outcomes (verdict distribution, follow-fix rate) can be correlated with the instruction version in effect at review time. crosscheck diagnose --since <date>can compare verdict distribution before and after eachoptimizerun, surfacing the delta: "After optimize on 2025-05-01: APPROVE rate +12%, BLOCK rate −5%."crosscheck optimizereads these deltas when selecting which instruction changes to keep vs. revert. If a fingerprint correlates with worse outcomes,optimizeflags it as a candidate for rollback.- Instruction text is never stored — only the SHA-256 fingerprint. The actual text lives in
instructions.mdunder the user's control.
Tier 3 — Privacy & consent design (non-negotiable constraints):
- Telemetry is opt-in.
telemetry.enableddefaults tofalsein config. No data is transmitted unless the user explicitly enables it. - On first
watch/serverun after install, display a one-time consent prompt:Default answer is N. Response is persisted tocrosscheck can send anonymous usage counts to Motivation Labs to improve future versions. No code, no repo names, no PR content, no usernames — only aggregate numbers. Enable? [y/N]:~/.crosscheck/config.ymlastelemetry.enabled. The prompt is never shown again. Users can change the setting at any time viacrosscheck telemetry [enable|disable|status]. crosscheck initoutput includes a Telemetry row: current state (enabled/disabled) and a link to the privacy doc.- Data categories that may never be collected or transmitted: code diffs, PR titles, PR descriptions, commit messages, file paths, repo names, GitHub usernames or org names, IP addresses, machine hostnames.
- A
PRIVACY.mdat the repo root documents exactly which fields are in a telemetry payload. This document is referenced in the consent prompt andget-started.md.
Tier 4 — Safe telemetry payload (only when
telemetry.enabled: true):install_id: a UUID generated once at first install and stored in~/.crosscheck/config.yml. Never linked to a GitHub identity, email, or hostname. Rotatable viacrosscheck telemetry reset-id.- Transmission: weekly batch, sent at the start of the first
watch/servesession of each UTC week. HTTPS POST tohttps://telemetry.crosscheck.dev/v1/report(TBD endpoint). No per-event streaming. - Payload schema (all fields are counts or enums — no free text, no identifiers):
{ "install_id": "<uuid>", "version": "0.2.0", "platform": "darwin | linux | win32", "period": "2025-W20", "sessions": 12, "prs_reviewed": 47, "comments_posted": 134, "reviews_by_pair": { "claude_reviews_codex": 23, "codex_reviews_claude": 18, "claude_reviews_claude": 4, "codex_reviews_codex": 2 }, "verdict_distribution": { "APPROVE": 30, "NEEDS_WORK": 15, "BLOCK": 2 }, "follow_fix_rate": 0.73, "optimize_runs": 2 } - If the HTTP request fails (network error, server error), log locally and retry at the next weekly opportunity. Never block startup on telemetry.
crosscheck telemetry statusshows: enabled/disabled, install_id (first 8 chars), date of last transmission, and the full payload that would be sent.crosscheck telemetry dry-runprints the payload without sending it, regardless ofenabledstate. Useful for users who want to audit before enabling.
- After every review, append a metrics record to
-
Technical Notes:
- New file:
src/lib/metrics.ts—appendMetric(record),readMetrics(since?),computeSummary(records). Module-level singleton; respectslogs.enabled(if logs are disabled, metrics are too). NDJSON append, same pattern aslogger.ts. - New file:
src/lib/telemetry.ts—maybeSendTelemetry(config): checks enabled + weekly cadence (~/.crosscheck/.telemetry-last-sent), aggregatesmetrics/files, POSTs payload, updates sentinel. All errors are caught and logged locally; never throws to caller. - New file:
src/commands/telemetry.ts—crosscheck telemetry [enable|disable|status|dry-run|reset-id]. - Schema additions:
telemetry: { enabled: boolean (default false), install_id: string (auto-generated) }. watch.ts/serve.ts: callmaybeSendTelemetry(config)early in startup (after consent check on first run).review.ts: callappendMetric(...)after each review completes (success or failure verdict).optimize.ts: callappendMetric({ event: 'optimize_run', fingerprint_before, fingerprint_after })after applying changes.diagnose.ts: extend output with Tier 1 usage summary section and Tier 2 instruction-effectiveness delta table.init.ts: add Telemetry row to status table.- New file:
PRIVACY.mdat repo root, included in npm package. Documents full payload schema, data retention, opt-out instructions, and contact for data deletion requests.
- New file:
-
Tests Required:
appendMetricwithlogs.enabled: falsewrites nothing.- Metrics records contain no sensitive fields (schema-level check on allowed keys).
maybeSendTelemetrywithenabled: falsemakes no HTTP request.maybeSendTelemetrywithin the same UTC week as last send makes no HTTP request.maybeSendTelemetryin a new week POSTs the correct aggregated payload.- Consent prompt on first run persists response; not shown again on second run.
telemetry dry-runprints payload structure matching schema; makes no HTTP request.diagnosewith Tier 1 data shows reviewer-pair distribution and follow-fix rate.diagnosewith two instruction fingerprints shows before/after verdict delta.- Follow-fix event emitted when new commit arrives on a PR with prior
NEEDS_WORK/BLOCKverdict.
-
-
Live review progress + verdict — ora spinners per stage (clone → review → post), VERDICT line in AI prompt, parsed and stripped before posting; verdict badge prepended to GitHub comment; color-coded in terminal.
-
Fortune cookie welcome message — random quote from
src/lib/fortune.tsprinted before watch/serve banner. -
Fix
verdict: null— handle Codex reviews that complete without a parseable verdict line — when Codex finishes a review but its output contains noVERDICT: APPROVE|NEEDS WORK|BLOCKline, the current code logsverdict: nulland posts the comment without a verdict badge. The missing verdict silently degrades the review experience and breaks downstream features (diagnoseverdict counts,impactBLOCK metrics). This fix adds a fallback extraction pass, a warning comment annotation, and a structured log field so the failure is visible.- User: Anyone running
crosscheck watch/servewith Codex as the reviewer — especially on large diffs where Codex may truncate or reformat output. - Acceptance Criteria:
- Primary extraction: scan the full Codex output for the last line matching
/^VERDICT:\s*(APPROVE|NEEDS[_ ]WORK|BLOCK)/i. Case-insensitive; tolerateNEEDS_WORKandNEEDS WORKspellings. - Fallback extraction: if the primary scan fails, scan for any line containing
APPROVE,NEEDS WORK,NEEDS_WORK, orBLOCKas a standalone word (not mid-sentence). Use the last match. - If both scans fail, set verdict to
null, prepend a warning line to the posted comment:> ⚠️ crosscheck could not extract a verdict from this review. See the full output below., and log{ event: 'verdict_parse_failed', reviewer: 'codex', output_length: N }atwarnlevel. - Verdict extraction logic is extracted into a pure function
parseVerdict(text: string): 'APPROVE' | 'NEEDS_WORK' | 'BLOCK' | nullinsrc/lib/verdict.ts— shared bycodex.tsandclaude.ts. crosscheck diagnosecountsverdict_parse_failedevents as a distinct error pattern with a suggestion: "Codex did not emit a VERDICT line — check your Codex instructions file or lower the quality tier."
- Primary extraction: scan the full Codex output for the last line matching
- Technical Notes:
- New file:
src/lib/verdict.ts—parseVerdict(text). Primary regex:/^VERDICT:\s*(APPROVE|NEEDS[_ ]WORK|BLOCK)\s*$/im. Fallback regex:/\b(APPROVE|NEEDS[_ ]WORK|BLOCK)\b/gi— last match wins. src/reviewers/codex.ts: replace inline verdict parsing withparseVerdict(output).src/reviewers/claude.ts: same — also useparseVerdict.src/commands/watch.ts/review.ts: whenverdict === null, prepend the warning line to the comment body before posting; logverdict_parse_failed.src/lib/logger.ts: addverdict_parse_failedto the known event union type.
- New file:
- Tests Required:
parseVerdictwith correctVERDICT:line → correct verdict; withNEEDS_WORKspelling →NEEDS_WORK; with verdict buried mid-paragraph (fallback) → correct; with no verdict →null; with multiple verdicts → last one wins; withBLOCKin a sentence ("this will not block deployment") → does not match.
- User: Anyone running
-
Codex reviewer quality tier and model config — the 5+ minute Codex review on large diffs (observed: 318s for PR #42 on o4-mini) is at the high end and blocks the watch terminal for the full duration. Expose
qualityandmodelas per-vendor config fields so users can trade review depth for speed.- User: Anyone running
crosscheck watchwho finds Codex reviews taking 3–6 minutes on large diffs. - Acceptance Criteria:
- New config fields under
vendors.codex:quality: 'low' | 'medium' | 'high'— maps to Codex--qualityflag; default'medium'.model: string | null— passed as--model <value>to Codex; defaultnull(uses Codex's own default). Only usable with API key auth; subscription auth ignores this field with a logged warning.
crosscheck watchbannerprofilerow shows the active quality tier:profile personal · watch · medium.crosscheck statusshowsvendors.codex.qualityandvendors.codex.model(ordefaultif unset).crosscheck.config.example.ymldocuments both fields with comments explaining the speed/depth tradeoff.get-started.mdadds a Review speed section explaining quality tiers and when to use each.
- New config fields under
- Technical Notes:
src/config/schema.ts: addquality: z.enum(['low', 'medium', 'high']).default('medium')andmodel: z.string().nullable().default(null)to thecodexvendor sub-schema.src/reviewers/codex.ts: pass--quality ${config.vendors.codex.quality}to the Codex CLI call. Ifmodelis set, append--model ${model}; ifmodelis set but auth is subscription-mode, logwarn: model override ignored — requires API key auth.src/commands/watch.ts: readconfig.vendors.codex.qualityfor the banner profile row.src/commands/status.ts: add Codex quality and model to the vendor section.
- Tests Required: schema defaults to
mediumquality andnullmodel;codex.tspasses--quality lowwhen configured;--modelflag omitted whenmodelis null;--modelflag omitted with a warning when subscription auth is detected; banner shows configured quality tier.
- User: Anyone running
-
Webhook re-registration flood on tunnel reconnect — deduplicate and back off — when the smee/localhost.run tunnel drops and reconnects (new URL),
watch.tsre-registers webhooks for all monitored repos. On a large org this produces a burst of GitHub API calls that can hit rate limits and fills the connectivity log with redundant entries. Add deduplication (skip re-registration if the webhook for a given URL is already registered) and exponential back-off for failed registrations.- User: Anyone monitoring large orgs (10+ repos) or experiencing frequent tunnel reconnects.
- Acceptance Criteria:
- Before registering a webhook for a repo/org, check whether a webhook pointing to the new tunnel URL is already registered via
GET /repos/{owner}/{repo}/hooks(or org equivalent). If a matching hook exists, skip thePOSTand logwebhook already registered — skipped. - After a tunnel reconnect, delete the old webhook (by stored hook ID) before registering the new one, rather than leaving orphaned hooks.
- Failed webhook registrations are retried with exponential back-off: 2s → 4s → 8s → give up. Log each retry attempt at
warnlevel. Do not block the main event loop — registration runs in background. - Connectivity log shows one summary line per tunnel reconnect event, not one line per repo:
✓ webhooks re-registered: 14/14 repos. crosscheck statusshows the count of active (known) webhooks and their URLs.
- Before registering a webhook for a repo/org, check whether a webhook pointing to the new tunnel URL is already registered via
- Technical Notes:
src/github/webhook.ts: addgetExistingWebhook(owner, repo, url, token): Promise<number | null>— returns hook ID if a hook with matchingconfig.urlexists, else null.src/commands/watch.ts: store registered hook IDs in aMap<string, number>(key:owner/repoororg). On tunnel reconnect: 1) delete old hooks using stored IDs; 2) callgetExistingWebhookfor each scope; 3) skipPOSTif hook already exists (stale ID from a previous session). Retry loop: max 3 attempts with 2^n second delays.- Connectivity log: buffer all per-repo results and emit a single aggregated line.
src/commands/status.ts: show active webhook count.
- Tests Required:
getExistingWebhookreturns hook ID when matching URL exists; returns null when no match; registration skipped when hook already exists; old hook deleted before new registration on reconnect; retry fires on 422 response with correct delays; aggregated log line shows correct count; status shows hook count.
-
crosscheck onboard— Guided Persona Setup and Smart Scope Configuration — an interactive wizard that runs allinitenvironment checks and then guides the user through persona selection, repo/org scope picking, author filtering, workflow depth, and optional comment branding. Produces a complete, workingcrosscheck.config.ymlwithout manual YAML editing. Supersedes the minimal first-run prompts inwatch/serveas the recommended entry point for new users.The deployment scenarios this command covers:
ID Name Core need UC-01 Personal — curated repos Personal + org accounts; building personal projects; wants to limit monitoring to a hand-picked set of active repos (backtracing feature requires tight scope) UC-02 Personal — cross-scope, author-filtered Monitors both personal and org repos but only wants their own PRs reviewed UC-03 Team shared CR Each member uses local Claude Code to author PRs; team shares a Codex account for review; configurable depth: CR-only vs CR + auto-fix UC-04 CRaaS / branded service Any of the above, plus custom service name and comment annotations; available as an optional add-on at the end of any onboarding path Command:
crosscheck onboard [--reconfigure]
Step 0 — Environment checks (same as
crosscheck init, compact output):Re-use
runChecks()frominit.ts. Print a compact one-line-per-tool summary. Non-fatal failures (e.g., codex CLI not installed) show the fix hint and continue —onboardnever aborts because a reviewer CLI is missing. Print✓ environment ready — proceeding to setupor⚠ 1 issue to address — see above; continuing setupbefore advancing to Step 1.
Step 1 — Persona selection:
How will you use crosscheck? [1] personal — I author PRs; review only my own work across my repos and orgs [2] team — shared CR workflow; review PRs from multiple authors in org repos Choice [1]:
Personal path (UC-01 and UC-02):
Step 2 — Monitoring scope (auto-detect org memberships before showing):
What should crosscheck monitor? (detected: member of my-company, another-org) [1] My personal repos only — github.com/beingzy/* — side projects and tools you own directly [2] My org repos only — github.com/my-company/*, another-org/* — team repos you contribute to [3] Both personal repos + orgs — everything across your GitHub account ← recommended Choice [3]:Step 3 — Repo picker (shown only when personal repos are in scope):
Fetch
GET /users/{login}/repos?type=owner&sort=pushed&per_page=100. Partition into three tiers:- Tier 1 — New repos (created within last 7 days): shown first in the primary list, tagged
new, pre-selected. A newly-created repo signals active intent even before its first PR lands — missing it would frustrate the user immediately. - Tier 2 — Active repos (≥ 1 PR in last 90 days): sorted by PR count descending.
- Tier 3 — Inactive repos (no PRs in 90 days, not new): deferred behind
[ ] Show more.
Primary list shows all Tier-1 repos + top Tier-2 repos up to a combined cap of 8. A
[ ] Show morerow expands inline to all Tier-2 + Tier-3 repos. Repos with zero PRs in 180 days that are not new are omitted entirely and noted: "add manually to config.repos if needed."Which personal repos should crosscheck monitor? [x] beingzy/new-side-project (new · 0 PRs) [x] beingzy/api-service 18 PRs (last 90 days) [x] beingzy/cool-project 12 PRs [ ] beingzy/dashboard 3 PRs [ ] beingzy/old-tool 1 PR ──────────────────────────────────────────────── [ ] Show more (3 repos with no recent activity — or add to config.yml manually) Space to toggle · Enter to confirmStep 4 — Author filter:
Whose PRs should be reviewed? [1] Only mine (author = beingzy) ← recommended for UC-02 [2] Everyone in the monitored scope Choice [1]:Config written by personal path:
- UC-01 (curated personal repos):
deployment: personal,repos: [selected],users: [],orgs: [],allowed_authors: [login] - UC-02 (both scopes, author-filtered):
deployment: personal,users: [login],orgs: [detected],allowed_authors: [login] - Personal + everyone:
deployment: personal,users: [login],orgs: [detected],allowed_authors: []
Team path (UC-03):
Step 2 — Org selection (auto-detect memberships via
GET /user/memberships/orgs?state=active):Which orgs should crosscheck monitor? [x] my-company (member · 42 repos) [ ] open-source-project (member · 7 repos) Space to toggle · Enter to confirmStep 3 — Author filter:
Whose PRs should be reviewed? [1] All authors (no filter) — review every PR in the org [2] Specific GitHub logins — restrict to listed team members Choice [1]:If [2]:
Enter logins (comma-separated): john, jane, aliceStep 4 — Review depth:
How deep should the CR workflow go? [1] CR only — post review comments; humans apply fixes [2] CR + Auto-fix — crosscheck also proposes and commits fixes Choice [1]:If [2]:
How should auto-fixes be delivered? [1] Open a fix PR (human reviews and merges before merge) ← recommended [2] Push directly onto the PR branch Choice [1]:Config written by team path (CR + fix PR example):
deployment: team,orgs: [selected],users: [],allowed_authors: [logins or []],post_review.auto_fix.enabled: true,post_review.auto_fix.delivery.mode: pull_request
Optional — Custom comment branding (available for all personas, including UC-04 CRaaS):
After the persona-specific questions complete, all paths ask one optional step:
Add custom branding to review comments? (for teams, services, or personal flair) [y/N]:If yes:
Name or label shown in every review comment: > My Team Comment header (prepended to every review, Enter to skip): > [Enter] Comment footer (appended to every review, Enter to skip): > Code review powered by My Team · AI-assisted, human-approved. Reviewer attribution line (replaces "Reviewed by {vendor}", Enter to skip): > [Enter]Branding is lightweight for personal users (e.g., a footer with a link) and richer for CRaaS operators (service name + attribution + footer). All fields are optional and independently skippable. Saved to the
brandconfig section; comment format unchanged when all fields are empty.
Confirmation preview and write:
After all steps, show a compact summary and prompt before writing:
Your crosscheck config: persona personal scope repos (beingzy/api-service, beingzy/cool-project) + orgs (my-company) filter author = beingzy reviewer cross-vendor (claude ↔ codex) config ~/.crosscheck/config.yml Write config? [Y/n]:On confirm: write config, print
✓ config written — run crosscheck watch to start. On decline: printCancelled. No files written.and exit 0.--reconfigureflag: Re-runs all steps even if config already exists. Shows the current saved value as the default at each prompt. Useful when joining a new org, switching personas, or correcting a first-run mistake.
New config schema —
brandsection (available to all personas):export const BrandConfigSchema = z.object({ service_name: z.string().default('crosscheck'), comment_header: z.string().default(''), comment_footer: z.string().default(''), reviewer_attribution: z.string().default(''), })
Added to
ConfigSchemaasbrand: BrandConfigSchema.default({}).Both
claude.tsandcodex.tsapplyconfig.brandwhen composing the GitHub comment:- Non-empty
brand.comment_header→ prepend to comment (followed by a blank line). - Non-empty
brand.comment_footer→ append to comment (preceded by a blank line). - Non-empty
brand.reviewer_attribution→ replace the default> Reviewed by {vendor}attribution line. - When all
brandfields are empty (the default), comment format is unchanged.
Relationship to existing commands:
crosscheck initremains unchanged — environment check only.onboardcallsrunChecks()internally as Step 0.initis for diagnosing environment issues;onboardis for first-time configuration.- The minimal first-run prompts in
watch/serve(from the Deployment Mode spec) remain as a lightweight fallback for users who skiponboard. They show a binarypersonal/teamchoice without repo picking or brand customization. crosscheck initoutput adds a one-line hint:Run crosscheck onboard to configure scope and persona.crosscheck statusshows apersonarow with the deployment value (orservicewhenbrand.service_nameis non-default) andbrand.service_namewhen set.
- User: New crosscheck users of any persona — solo developers building side projects, teams running shared CR pipelines, and anyone (personal or team) who wants light or rich comment branding.
- Acceptance Criteria:
crosscheck onboardruns environment checks first and prints a compact one-line-per-tool summary.- Persona question is always shown with two choices (personal / team); no third "service" option — branding is an optional add-on step for all paths.
- Personal path: scope (with per-option descriptions) → repo picker (when personal repos in scope) → author filter → optional branding → correct config for UC-01, UC-02, and the unrestricted variant.
- Team path: org picker → author filter → review depth → (if auto-fix) delivery mode (fix PR or direct push only — no inline-suggestion option) → optional branding → correct config for UC-03.
- Repo picker: Tier-1 new repos (< 7 days) shown first and pre-selected; Tier-2 active repos (≥ 1 PR last 90 days) sorted by count; Tier-3 inactive repos behind "Show more"; repos absent from both tiers for 180+ days noted as "add manually."
- Branding step is optional (
[y/N], default N); if yes, shows 4 fields each independently skippable. - Confirmation preview accurately reflects all choices before writing.
--reconfigurere-runs all steps with current config values pre-filled as defaults; overwrites on confirm; exits without writing on decline.brandfields injected into GitHub comments by bothclaude.tsandcodex.tswhen non-empty; comment format unchanged when allbrandfields are empty.crosscheck statusshowspersonarow andbrand.service_namewhen non-default.crosscheck initoutput includes the onboard hint line.- Running
onboardtwice produces identical config (no duplicate entries). onboardwith environment check failures still completes all questions and writes config; it does not abort.
- Technical Notes:
- New file:
src/commands/onboard.ts. ImportsrunChecksfrominit.ts. Drives persona branching. Branding step runs for all paths after persona-specific questions. At the end, callspatchDeploymentConfig+patchBrandConfigto write all config values atomically. - New file:
src/lib/repo-picker.ts:fetchActiveRepos(login: string, token: string): Promise<{ tier: 1 | 2 | 3; repo: string; prCount: number; createdAt: Date }[]>— GitHub API calls, three-tier partition;promptRepoPicker(repos, defaults?)— readline-based checkbox UI showing Tier 1 + Tier 2 up to 8 combined, "Show more" expands Tier 3; no new prompt-library dependency. src/config/schema.ts: addBrandConfigSchemaandbrand: BrandConfigSchema.default({})toConfigSchema.src/config/loader.ts: addpatchBrandConfig(configPath: string, brand: Partial<BrandConfig>): boolean.src/reviewers/claude.ts: injectbrand.comment_header/comment_footer/reviewer_attributionwhen composing the final comment string; no-op when all empty.src/reviewers/codex.ts: same injection logic.src/commands/status.ts: addpersonarow showing deployment +brand.service_namewhen non-default.src/commands/init.ts: append hint line at the end of output.- Wire into
cli.tsascrosscheck onboard [--reconfigure]. crosscheck.config.example.yml: add commentedbrand:block with all four fields.get-started.md: add First-time setup section with each UC described in 2–3 sentences.
- New file:
- Tests Required:
fetchActiveRepos: repos < 7 days old in Tier 1 regardless of PR count; repos ≥ 1 PR in last 90 days in Tier 2 sorted by count; zero-PR repos older than 7 days in Tier 3; returns[]gracefully when GitHub returns nothing.promptRepoPicker: Tier-1 repos shown first and pre-selected; Tier-2 shown next up to cap; "Show more" expands Tier-3 inline; returns correct selection array.- Personal + both scopes + author-mine →
{ deployment: 'personal', users: [login], orgs: [...], allowed_authors: [login] }. - Personal + personal-only + 2 repos selected →
{ deployment: 'personal', repos: [r1, r2], users: [], orgs: [], allowed_authors: [login] }. - Team + all-authors + CR-only →
{ deployment: 'team', allowed_authors: [], post_review: { auto_fix: { enabled: false } } }. - Team + specific logins + CR + fix-PR →
{ deployment: 'team', allowed_authors: ['john', 'jane'], post_review: { auto_fix: { enabled: true, delivery: { mode: 'pull_request' } } } }. - Branding opt-in →
brand.service_name,brand.comment_footerwritten; skipped fields remain empty string. - Branding opt-out (default N) → no
brandfields written (all remain at schema defaults). claude.tswith non-emptybrand.comment_header→ header prepended to posted comment.claude.tswith emptybrandfields → comment format unchanged.patchBrandConfigwrites correctly; idempotent on second run.--reconfigurere-prompts with current values as defaults; overwrites on confirm; no-ops on decline.crosscheck statusshowspersonarow.crosscheck initoutput includes hint line.
- Tier 1 — New repos (created within last 7 days): shown first in the primary list, tagged
-
Arrow-key + space interactive picker — replace number-toggle UI in
onboard— the org and repo selection steps incrosscheck onboarduse a raw-mode TTY picker with arrow-key navigation and space-to-toggle, matching the UX of standard CLI tools (inquirer,fzf). Closes issue #58.- User: Anyone running
crosscheck onboardwho needs to select repos without memorizing item numbers. - Acceptance Criteria:
- ↑ / ↓ arrows — move the highlight cursor between items; wraps at top and bottom.
- Space — toggle the focused item on/off (
[x]/[ ]). - Enter — confirm the current selection and return.
- a — select all / deselect all (toggles entire visible set).
- Items are rendered as
[x] owner/repo-namewith the focused row highlighted (bold or reverse-video ANSI). - 3-tier overflow: show at most 15 items at a time; pressing
m(or the overflow hint key) expands to show all. Cursor wraps within the visible set. - A status line below the list shows
↑↓ move · space select · a all · enter confirm · m more. - Rendering uses ANSI escape codes to re-render only changed lines (no full-screen flicker, no
readlineloop). - Degrades gracefully when
process.stdin.isTTYis false: skip picker, return an empty selection (caller prints a manual-edit hint).
- Technical Notes:
- New file:
src/lib/repo-picker.ts— exportspromptRepoPicker(items: string[], opts?: { title?: string }): Promise<string[]>. Internally callsprocess.stdin.setRawMode(true), intercepts raw key bytes, re-renders on each keypress, restores stdin on return/exit. - Key byte detection:
\x1b[A= up arrow,\x1b[B= down arrow,\x20= space,\x0d= enter,\x61=a,\x6d=m. - On SIGINT while picker is active: restore raw mode and re-throw so the process exits cleanly.
- No new runtime dependencies — raw-mode TTY and ANSI escapes are stdlib-level on Node.js.
- New file:
- Tests Required:
promptRepoPickerwith non-TTY stdin returns empty array; up/down moves cursor correctly; space toggles item; enter returns selected items;aselects all when none selected, deselects all when all selected; overflow hint shown when items > 15; cursor wraps at list boundaries; SIGINT restores raw mode.
- User: Anyone running
-
--no-backtraceflag forwatchandserve— add a session-only CLI flag that skips the initial scan for unreviewed open PRs, without requiring a config edit. Useful when restartingwatchfrequently (e.g., during config iteration) where the startup scan adds latency.- User: Any developer who restarts
crosscheck watchrepeatedly and finds the backtrace scan slow or redundant. - Acceptance Criteria:
crosscheck watch --no-backtraceandcrosscheck serve --no-backtraceskip thescanUnreviewedPRscall entirely for that session.- Config is not read or written —
backtrace.enabledincrosscheck.config.ymlis unaffected. - The connectivity log shows
backtrace skipped (--no-backtrace flag)in place of the scan summary when the flag is passed. crosscheck watch --no-backtraceis equivalent tocrosscheck watchwhenbacktrace.enabled: falseis in config — same code path, different trigger.--no-backtraceis independent of--personal,--team, and--reconfigure(all flags can be combined).
- Technical Notes:
src/cli.ts: add.option('--no-backtrace', 'skip the startup scan for unreviewed open PRs this session')to bothwatchandservecommand definitions. Commander maps--no-backtracetoopts.backtrace === false.src/commands/watch.ts/serve.ts: in the backtrace block, change the condition fromif (config.backtrace.enabled)toif (config.backtrace.enabled && opts.backtrace !== false). Passoptsthrough the existing opts parameter.- Log the skip message via
cLog(watch) /board.log(serve) so it's visible in the connectivity section.
- Tests Required:
--no-backtraceflag parsed correctly; backtrace block skipped when flag is true;config.backtrace.enabled: falsestill skips regardless of flag;config.backtrace.enabled: truewithout flag runs scan normally; log message emitted when flag skips the scan.
- User: Any developer who restarts
Problem: once a PR event arrives, the terminal goes quiet for 30–90s while the AI runs. No feedback on what's happening or whether it passed.
Solution — progress log:
Use ora (already a dep) to show a spinner per stage, collapsing to a checkmark on success:
3:14:22 PM PR #42 opened: fix: remove unused import
⠸ cloning motivation-labs/my-repo...
✓ cloned
⠸ codex reviewing...
✓ review complete
⠸ posting comment...
✓ posted → github.com/motivation-labs/my-repo/pull/42
verdict ✅ APPROVE
Solution — verdict:
Add a ## Verdict section to the review prompt:
At the end of your review, add exactly this line:
VERDICT: APPROVE | NEEDS WORK | BLOCK
APPROVE — no issues or trivial nits only
NEEDS WORK — addressable issues but not blocking
BLOCK — security risk, data loss, broken API contract, or correctness bug
Parse the last VERDICT: line from the review text before posting. Display in the terminal with color (green / yellow / red). Strip the VERDICT: line before posting to GitHub so the comment stays clean — or keep it as a bold header at the top of the comment for visibility.
Implementation files: src/reviewers/claude.ts, src/reviewers/codex.ts (prompt addition), src/commands/watch.ts (progress spinner + verdict display), src/commands/review.ts (same for manual reviews).
Problem: crosscheck runs locally and has no visibility into whether it's actually helping. Without usage signal, optimize can only react to failures — it can't learn that certain reviewer configurations produce better outcomes, or that specific instruction patterns reliably increase fix-follow rates. At the same time, collecting that signal must not compromise user privacy or trust.
Value:
- Self-improvement with evidence —
optimizegains a before/after comparison of instruction changes vs. verdict distribution, so it can recommend keeping or reverting changes based on real outcomes rather than heuristics. - Actionable
diagnoseoutput — users learn which reviewer pair works best for their repos, what their follow-fix rate is, and whether recentoptimizeruns improved quality. - Product signal for future development — with explicit consent, anonymous aggregate counts answer questions like "what fraction of installs use cross-vendor mode?" without revealing anything about individual users or repos.
- Trust foundation — a consent-first, audit-friendly design (dry-run, status, reset-id) makes telemetry a feature users can actually verify rather than a black box.
Tier summary:
| Tier | Data | Leaves machine? | Consent required? |
|---|---|---|---|
| 1 — Count statistics | Reviewer pair, verdict, duration, comment count, PR SHA prefix | Never | No |
| 2 — Instruction effectiveness | Instruction fingerprint (SHA-256 only), verdict delta | Never | No |
| 3 — Telemetry | Anonymous aggregate counts, install UUID, version, platform | Yes (if opted in) | Yes — opt-in |
Privacy constraints (non-negotiable):
These are design invariants, not config options:
- No code content ever stored or transmitted.
- No repo names, PR titles, file paths, GitHub usernames, org names, IP addresses, or hostnames.
- Telemetry payload contains only counts, enums, rates, and a locally generated UUID.
- The UUID is not derived from any user identity — it's a random v4 UUID generated at first install.
- Metrics files stay in
~/.crosscheck/metrics/— never in the project directory where they could be accidentally committed.
Consent flow (one-time, on first watch/serve):
crosscheck can send anonymous usage counts to Motivation Labs to improve
future versions. No code, no repo names, no PR content, no usernames —
only aggregate numbers. See PRIVACY.md for the exact payload.
Enable telemetry? [y/N]:
Default: N. Response written to ~/.crosscheck/config.yml immediately. Prompt is never shown again. Changeable any time:
crosscheck telemetry enable
crosscheck telemetry disable
crosscheck telemetry status # shows state, last send date, install_id prefix
crosscheck telemetry dry-run # prints payload without sending
crosscheck telemetry reset-id # generates a new UUID, breaking any linkageFollow-fix detection (Tier 1):
When a synchronize webhook fires on a PR that previously received a NEEDS_WORK or BLOCK verdict, log a follow_fix event linking the two review records by pr_sha_prefix. This event fires regardless of whether the new commit actually addresses the review — it's a count of "new activity after a non-APPROVE verdict." The ratio follow_fix_events / NEEDS_WORK_or_BLOCK_reviews is the follow-fix rate surfaced in diagnose.
Instruction-effectiveness delta (diagnose output):
Instruction history (last 30 days):
fingerprint a1b2c3d4 active 2025-04-01 → 2025-05-01 (30 reviews)
APPROVE 60% NEEDS_WORK 33% BLOCK 7%
fingerprint e5f6a7b8 active 2025-05-01 → now (17 reviews)
APPROVE 76% NEEDS_WORK 24% BLOCK 0% ↑ +16% APPROVE since last optimize
Telemetry payload (full schema):
{
"install_id": "<uuid-v4>",
"version": "0.2.0",
"platform": "darwin | linux | win32",
"period": "2025-W20",
"sessions": 12,
"prs_reviewed": 47,
"comments_posted": 134,
"reviews_by_pair": {
"claude_reviews_codex": 23,
"codex_reviews_claude": 18,
"claude_reviews_claude": 4,
"codex_reviews_codex": 2
},
"verdict_distribution": { "APPROVE": 30, "NEEDS_WORK": 15, "BLOCK": 2 },
"follow_fix_rate": 0.73,
"optimize_runs": 2
}No field may contain a string that could identify a user, repo, or machine. Any new telemetry field must be documented in PRIVACY.md before shipping.
File layout additions:
~/.crosscheck/
metrics/
YYYY-MM.ndjson ← Tier 1 review events
optimize-history.ndjson ← Tier 2 instruction fingerprints
.telemetry-last-sent ← ISO date of last successful transmission
src/
lib/
metrics.ts ← appendMetric, readMetrics, computeSummary
telemetry.ts ← maybeSendTelemetry, aggregatePayload
commands/
telemetry.ts ← crosscheck telemetry subcommands
PRIVACY.md ← exact payload schema, retention, opt-out, contact
Problem: startup feels cold and mechanical.
Solution: print one random quote before the watch/serve banner. Quotes are stored as a static array in src/lib/fortune.ts — no network call, no external dependency.
crosscheck "The best code review is the one that ships."
crosscheck watch
orgs motivation-labs
...
Style: dim text, italic if the terminal supports it. One quote per startup, randomly selected. ~20 quotes in the initial set — mix of original lines about code review, AI, and shipping. No attribution needed (original quotes only, avoids copyright edge cases).
Implementation files: src/lib/fortune.ts (quote array + randomFortune() helper), src/commands/watch.ts, src/commands/serve.ts (call randomFortune() before the banner).
Problem: crosscheck is a passive reviewer — it posts a comment and stops. The review → fix → re-review cycle is repetitive for formulaic issues (lint violations, missing tests, doc gaps). There is also no way to customize the pipeline shape per repo: some teams want review-only, others want auto-fix on NEEDS_WORK, others want a full review → address → recheck loop.
Value:
- Closes the feedback loop — from "AI posts comment" to "AI posts comment + attempts fixes + re-reviews." The PR author gets a clean diff rather than a list of action items.
- Pipeline composition without code changes — teams define multi-step workflows in a checked-in YAML file. crosscheck executes the steps.
- Behavior steering per step — the
reviewstep and theaddressstep need different instructions. A reviewer should be skeptical; an agent fixing its own comments should be conservative and scoped. - Progressive adoption — users can start with the default
[review]pipeline and addaddresswhen they're ready. No new concepts forced on existing users.
Design — workflow.yml:
Placed at .crosscheck/workflow.yml or crosscheck.workflow.yml in the repo. Falls back to a default single-step review pipeline if absent (fully backwards compatible).
# .crosscheck/workflow.yml
on:
- opened
- synchronize # new commits pushed to an existing PR
steps:
- name: review
type: review
reviewer: auto # auto = cross-vendor logic from config.mode
- name: address
type: address
when: "verdict == 'NEEDS_WORK'"
reviewer: auto
max_rounds: 2
instructions: |
Only address comments that are explicitly called out in the review.
Do not refactor logic, rename identifiers, or add tests.
Do not touch files the review did not mention.
If a comment requires understanding of business logic, skip it and leave a note.
- name: recheck
type: review
when: "address.applied_count > 0"
reviewer: autoStep types:
| Type | What it does |
|---|---|
review |
Runs the AI reviewer, posts a comment with verdict |
address |
Reads the review comment, opens a commit on the PR branch with fixes |
recheck |
Re-runs review on the updated branch (same as review but semantically distinct) |
notify |
Sends a notification — Slack, email (future) |
Step fields:
| Field | Required | Description |
|---|---|---|
name |
yes | Identifier used in when conditions |
type |
yes | review, address, recheck, notify |
reviewer |
no | auto, claude, codex — overrides config for this step |
when |
no | Boolean expression; step skipped if false. Context vars: verdict, <step-name>.applied_count, <step-name>.verdict |
max_rounds |
no | Caps iterations for address steps (default 1) |
instructions |
no | Prose appended to the AI prompt for this step only — overrides global instructions.md for this step |
Behavior steering — instructions block:
The per-step instructions field is the primary knob for steering AI behavior within the pipeline. It is appended to the prompt for that step only. Global ~/.crosscheck/instructions.md still applies as a base layer; step-level instructions extend or override it.
This lets teams express policies like:
- "During
address, never touch tests or migrations." - "During
recheck, be stricter about security than the initial review." - "During
address, prefer one-line fixes — no multi-function refactors."
Safeguards (non-negotiable defaults):
max_rounds: 1default on alladdresssteps — prevents loopsauto_merge: falsealways — address creates commits, never mergesaddressonly touches files mentioned in the review comment it is addressing- Every
addresscommit message begins[crosscheck]for traceability and easy revert - Hard limit: no
addressstep runs if the PR already has > N[crosscheck]commits (configurable, default 5)
Relationship to existing config files:
| File | Owns |
|---|---|
crosscheck.config.yml |
Infrastructure: mode, repos, orgs, vendors, budget, server |
.crosscheck/workflow.yml |
Pipeline shape: step order, types, conditions, max_rounds |
~/.crosscheck/instructions.md |
Global prose behavior for all review steps |
Step-level instructions: |
Per-step behavior overrides within workflow.yml |
Default workflow — constant, not a file:
// src/lib/workflow.ts
export const DEFAULT_WORKFLOW: WorkflowStep[] = [
{ name: 'review', type: 'review', reviewer: 'auto' }
]
export function loadWorkflow(repoDir: string, configDir: string): WorkflowStep[] {
const file = findWorkflowFile(repoDir, configDir)
if (!file) return DEFAULT_WORKFLOW
return parseWorkflowFile(file) // Zod-validated, throws on schema error
}watch.ts/serve.ts always call loadWorkflow + runWorkflow. No conditional for "no file". The constant is the backwards compatibility — existing installs without a workflow.yml get the default single-step behavior through the same code path as custom workflows.
crosscheck init generates a workflow template:
# .crosscheck/workflow.yml — generated by crosscheck init
on:
- opened
- synchronize
steps:
- name: review
type: review
reviewer: auto
# Uncomment to enable auto-fix after review:
# - name: address
# type: address
# when: "verdict == 'NEEDS_WORK'"
# max_rounds: 2
# instructions: |
# Only fix what the review explicitly calls out.
# Do not refactor logic or add tests.
# - name: recheck
# type: review
# when: "address.applied_count > 0"New users see the full capability surface immediately. The template is the documentation.
Implementation notes:
- New file:
src/lib/workflow.ts—DEFAULT_WORKFLOWconstant; Zod schema;loadWorkflow(repoDir, configDir). - New file:
src/lib/runner.ts—runWorkflow(steps, context)— iterates steps, evaluateswhen, dispatches handlers. watch.ts/serve.ts: replace direct reviewer call withloadWorkflow(tmpDir, configDir)+runWorkflow(steps, context)— unconditional, no legacy branch.addresshandler: read the crosscheck review comment, pass it + diff + stepinstructionsto AI, parse file patches from response, apply viagit apply, push[crosscheck] address: ...commit.whenevaluation: flat context object, equality + numeric comparison operators only — no scripting engine.init.ts: write.crosscheck/workflow.ymltemplate; skip silently if file already exists.
Open questions before implementation:
- Should
addresspush commits directly to the PR branch (requires write access) or open a follow-up PR? Direct commits are simpler; follow-up PRs are safer for external contributors. Default to direct commits on branches the token owns; follow-up PR on forks. - Should
whensupportAND/ORor keep it to single conditions? Start with single conditions — composable via multiple steps.
Problem: when crosscheck fails silently or behaves unexpectedly, users have to manually dig through ~/.crosscheck/logs/, identify the error, understand its context, write a coherent issue, and decide what details matter. That friction means failures go unreported.
Solution: crosscheck issue does the digging automatically. It reads recent logs, surfaces the most relevant failure, drives a short multiple-choice interview to fill context gaps, and hands the whole package to the local AI agent to write a well-structured issue draft. The user just reads, answers three quick questions, and hits y.
Flow:
$ crosscheck issue
scanning logs...
found 3 error patterns in the last 3 days:
[1] command_not_found: tsc (4 occurrences)
[2] base_branch_missing (1 occurrence)
[3] timeout (1 occurrence)
Which issue do you want to report? [1-3]: 1
drafting issue with claude...
┌─────────────────────────────────────────────────────────────────────┐
│ TITLE: codex reviewer fails when repo has a tsc build step │
├─────────────────────────────────────────────────────────────────────┤
│ ## Description │
│ The codex reviewer exits with `command_not_found: tsc` on repos │
│ that include a TypeScript build step... │
│ │
│ ## Steps to Reproduce │
│ 1. Run `crosscheck watch` on a TypeScript repo │
│ 2. Open a PR — codex reviewer is triggered │
│ 3. Review fails with `Error: command not found: tsc` │
│ │
│ ## Log Excerpt │
│ ``` │
│ {"ts":"...","event":"error","reviewer":"codex","error": │
│ "command_not_found","command":"tsc","repo":"[repo]"} │
│ ``` │
│ │
│ ## Environment │
│ - crosscheck: 0.2.1 │
│ - platform: darwin │
│ - reviewer: codex │
│ - mode: cross-vendor │
└─────────────────────────────────────────────────────────────────────┘
Can you reproduce this consistently?
[1] Every time [2] Sometimes [3] Happened once
> 1
Which command triggered this?
[1] watch [2] serve [3] review [4] Unknown
> 1
Is this blocking you from using crosscheck?
[1] Blocked [2] Degraded [3] Cosmetic
> 2
Submit to motivation-labs/crosscheck? [y/N]: y
✓ issue created → https://github.com/motivation-labs/crosscheck/issues/47
Agent selection: same selectOptimizeAgent logic as optimize — picks the vendor with the higher success rate from recent logs; falls back to claude on a tie or no data.
Sanitization: applied before sending log entries to the AI agent and before posting. Patterns stripped: owner/repo (→ [repo]), PR titles, file paths, GitHub usernames, branch names, GitHub URLs. Secrets never appear in logs (enforced at write time by logger.ts).
--dry-run use case: teams who want to review the draft before reporting, or who want to template-match issues for a triage queue without posting immediately.
--yes use case: automated pipelines (e.g., a cron that calls crosscheck issue --yes nightly and files anything new). Still shows the draft in stdout so CI logs are auditable.
Relationship to diagnose:
diagnose is a reporting tool — it reads logs and surfaces patterns for the operator. issue is an action tool — it takes the same patterns and turns them into a GitHub ticket. Both share the same error-grouping logic via src/lib/log-analysis.ts.
File layout additions:
src/
commands/
issue.ts ← crosscheck issue command
lib/
log-analysis.ts ← shared error-grouping logic (extracted from diagnose.ts)
#### `crosscheck impact` — value dashboard
**Problem:** crosscheck runs in the background and reviews PRs silently. After a few weeks, users have no concrete sense of what it has saved them — so they can't justify the setup cost, can't calibrate the tool, and can't communicate its value to their team.
**Value proposition of this feature:** Turn passive automation logs into a human-readable ROI summary. The answer to "is crosscheck worth it?" should be one command away.
---
**Time-saving calculation:**
The core unit is *time saved per PR* = `assumed_human_review_min − actual_ai_review_min`.
assumed_human_review_min → configurable, default 60 actual_ai_review_min → avg(review_complete.duration_ms) / 60000 from logs; fallback 2 min time_saved_per_pr → assumed − actual (≈ 58 min at defaults) total_hours_saved → (time_saved_per_pr × prs_reviewed) / 60
**Basis for the 60-minute default:**
- Google's Engineering Productivity research: median PR review latency 60–90 min for non-trivial changes when factoring in reviewer availability.
- GitHub Octokit 2023: developers spend ~15–20% of time on code review; for a 40h week that's 6–8 hours, typically covering 4–6 PRs → 60–90 min per PR average.
- Microsoft Research SPACE framework: "review overhead" tracked as 30–120 min depending on PR size; 60 min is the lower-bound safe default.
The displayed assumption line keeps the model transparent. Users with smaller/larger PRs can calibrate.
---
**Issues-caught calculation:**
issues_caught = NEEDS_WORK_count + BLOCK_count block_count = BLOCK verdicts (surfaced separately — higher severity) issue_rate = issues_caught / prs_reviewed
These are PRs that received actionable feedback. Without crosscheck, that feedback would not exist (cross-vendor review only happens because crosscheck ran).
---
**Defect cost model (opt-in via `--money`):**
estimated_value = (hours_saved × hourly_rate) + (issues_caught × defect_cost_per_issue)
defaults: hourly_rate = $150 USD (US mid-senior engineer) defect_cost = $150 USD (1 hr to fix, same rate)
**Basis for defect cost:**
- NIST 2002 report: cost to fix a defect grows 4–10× from review to production. At $150/hr and a 1-hour median fix, a defect caught in review saves $150 (fix during PR) vs $600–$1,500 (fix post-merge). Using $150 is maximally conservative — it only counts the direct fix cost, not downstream cost.
- IBM Systems Sciences Institute: software bugs found in production cost 6–15× more than during development. Same conservative logic applies.
The `--money` flag is opt-in so the output doesn't over-claim in contexts where monetary framing is inappropriate (open-source, student projects, etc.).
---
**Second-order code quality signal:**
The BLOCK rate trend (BLOCK verdicts / total PRs, by week) is a leading indicator of upstream quality improvement:
- Declining BLOCK rate: teams are internalizing review feedback; fewer high-severity issues reach PR stage.
- Stable BLOCK rate: issues persist — potential input for `crosscheck optimize` to tighten review instructions.
- Rising BLOCK rate: either more complex PRs or a genuine quality regression.
This is presented as a trend, not a judgment, with a note that it is a proxy metric.
---
**Sample output:**
crosscheck impact (all time · 63 PRs)
Time saved ───────────────────────────────────────── PRs reviewed 63 Avg AI review time 1.8 min Assumed human time 60 min ⓘ Time saved per PR ~58 min Total hours saved ~61 h
Issues caught ───────────────────────────────────────── APPROVE 41 (65%) NEEDS WORK 17 (27%) ← actionable feedback BLOCK 5 (8%) ← potential bugs/breaking changes caught before merge Total issues caught 22
Code quality trend (BLOCK rate, weekly) ───────────────────────────────────────── Apr W1 ██████ 12% Apr W2 ████ 8% Apr W3 ███ 6% Apr W4 ██ 4% ↓ improving
ⓘ assumes 60 min avg human review — set impact.assumed_human_review_minutes to adjust
Run crosscheck impact --money for a rough monetary estimate.
With `--money`:
Estimated value ───────────────────────────────────────── Time savings ~$9,150 (61h × $150/hr) Issues prevented ~$3,300 (22 × $150/issue) Total estimate ~$12,450
⚠ rough estimate · adjust rates in crosscheck.config.yml · not accounting data
---
#### Auto-init on `watch`/`serve`
**Problem:** the current flow requires `crosscheck init` before `crosscheck watch`. This is undiscoverable — most users will try `watch` first, hit a missing-config or missing-secret error, and not know why. `init` as a prerequisite is friction that blocks the happy path.
**Solution:** `watch` and `serve` call `ensureInit` at startup. If setup has already been done, it's a no-op. If not, it runs the missing steps inline and continues. `crosscheck init` stays as an explicit command for verification and re-runs, but it is no longer required.
**Detection — sentinel file, one check per startup:**
After a successful init, `ensureInit` writes `~/.crosscheck/.initialized` containing the current crosscheck version (e.g., `0.2.0`). On every subsequent `watch`/`serve` start, the sentinel is checked first. If it exists and the version matches, the global setup step (webhook secret) is skipped. However, the two repo-local files (`crosscheck.config.yml`, `.crosscheck/workflow.yml`) are always checked via cheap `existsSync` calls — if either is absent, it is created before proceeding. This means the cost is O(1) `existsSync` calls per startup after the first run, not a full re-init, but each repo gets its local files regardless of whether another repo was initialized first.
The subprocess-heavy checks (gh, claude, codex auth) are never run by `ensureInit` — they remain in `crosscheck init` only.
First run: check sentinel → absent → run all setup steps → write sentinel → continue Subsequent: check sentinel → present + version matches → skip webhook secret → check repo-local files → create any missing → continue Upgrade: check sentinel → version mismatch → re-run changed steps → update sentinel → continue New repo: check sentinel → present + version matches → repo-local files absent → create them → continue
`crosscheck init` always runs the full check and rewrites the sentinel regardless — explicit verification is its job. Already-present files are never overwritten by auto-init.
**Terminal output on first run:**
✦ first run — setting up crosscheck... ✓ webhook secret generated → ~/.crosscheck/webhook-secret ✓ config written → crosscheck.config.yml ✓ workflow written → .crosscheck/workflow.yml
crosscheck watch repos acme/api ...
Silent on subsequent runs. Auth checks (missing gh, claude, codex CLIs) are not run here — run `crosscheck init` explicitly to see full auth status.
**Implementation:**
- New file: `src/lib/setup.ts` — `ensureInit(cwd, opts?)`: checks sentinel first; if present and version matches, skips the webhook-secret step but still runs `existsSync` on the two repo-local files and creates any that are missing; if sentinel is absent or version differs, runs all setup steps and writes `~/.crosscheck/.initialized`. Returns `{ created: string[] }`. Never spawns a subprocess.
- Sentinel file: `~/.crosscheck/.initialized` — plain text, contains semver string (e.g., `0.2.0`). Version compared against `pkg.version` at runtime. On mismatch, only the steps that changed between versions are re-run.
- `init.ts` refactored: extracts setup steps into `setup.ts`; becomes a thin wrapper that calls `ensureInit` with `{ force: true, verbose: true }` (bypasses sentinel) then prints the full status table.
- `watch.ts` / `serve.ts`: `await ensureInit(process.cwd())` before `loadConfig`. `--no-init` flag skips the call entirely for CI/provisioned environments where setup is pre-baked.
---
#### smee.io tunnel backend for `crosscheck watch`
**Problem:** `localhost.run` SSH tunnels silently go dead (HTTP 503) without the SSH process exiting. `watch` stays stuck waiting for an SSH exit event, so all webhook events are dropped until the user manually restarts. Root-cause observation: PR #27 received no review because the tunnel died between 03:40 and 03:49 UTC while `watch` was running.
**Solution:** add `tunnel.backend: smee` as an opt-in alternative. The smee.io relay queues events while the local client is offline and replays them on reconnect — eliminating the missed-event class of failure entirely.
**Design decisions:**
| | localhost.run (default) | smee.io |
|---|---|---|
| Install | none (ssh built-in) | `npm install -g smee-client` |
| URL stability | changes every restart | permanent channel URL |
| Webhook registration | auto (org/repo hook API) | manual (one-time, point to smee URL) |
| Missed events | lost permanently | queued + replayed |
| Dead-tunnel detection | periodic health check (PR #29) | N/A — relay handles reconnect |
**Why localhost.run stays the default now:**
- Zero-install is the core UX promise; requiring `npm install -g smee-client` adds friction on first run.
- Manual webhook registration is a steeper setup step.
- The health-check fix (PR #29) closes the most common failure mode for localhost.run.
**Path to making smee the default:**
1. Ship smee backend (this PR) and gather feedback in production.
2. If missed-event reports drop to zero and install friction proves manageable, flip default in a minor version bump.
3. `crosscheck init` can auto-generate a smee channel (via the smee.io API) and write `tunnel.smee_channel` to config, making setup zero-manual-steps.
**Config contract (shipped):**
```yaml
tunnel:
backend: smee # localhost.run | smee
smee_channel: https://smee.io/your-channel-id
Implementation:
schema.ts:TunnelConfigSchemawithbackendandsmee_channel; added toConfigSchemawatch.ts: after banner print, branch onconfig.tunnel.backend; smee mode spawnssmee --url <channel> --path <path> --port <port>and auto-restarts on exit;currentTunnelProcshared with cleanup handlerinit.ts: checks ifsmeeCLI is installed; shows one-line tip if missingcrosscheck.config.example.yml: commented tunnel section with full instructions
Problem: crosscheck has no concept of why it's running — is this a developer's laptop monitoring their own work, or a shared server watching an entire team's org? Today:
- Personal users must manually discover that
users:exists; they can't say "watch everything I own." - Team operators who run
crosscheck servewith no author filter inadvertently review every PR in the org from any author. - There's no auto-detection of org memberships — users copy-paste org names by hand.
- An inaccessible repo in
repos:silently drops events with no diagnostic.
Solution: introduce deployment: personal | team as a first-class config concept. crosscheck watch and crosscheck serve prompt the user to choose a mode on first run (when deployment is absent from config), detect scopes from GitHub credentials based on the choice, write everything to config, and proceed — no restart required. crosscheck init is unchanged; it remains a pure environment check.
Scope model:
| Level | Config key | Coverage | Registration |
|---|---|---|---|
| Repo | repos: |
Named repos only | One webhook per repo; validated at startup |
| Org | orgs: |
All repos in org | One webhook per org (GitHub org webhook) |
| User | users: |
All non-archived personal repos | Enumerated at startup; one webhook per repo |
All three are additive — a config can mix orgs: + users: + repos:.
Deployment modes:
personal |
team |
|
|---|---|---|
| Primary use case | Developer laptop running crosscheck watch |
Shared server running crosscheck serve |
| Auto-detected scopes | users=[self] + orgs=[all-memberships] |
orgs=[all-memberships] only |
Default allowed_authors |
[self] — only the owner's PRs |
[] — all PRs in monitored scope |
| Personal repos monitored | Yes | No |
First-run prompt (watch and serve):
Shown before the startup banner when deployment key is absent from config. Printed once; after the user answers, the choice is persisted and never asked again.
crosscheck watch
How are you using crosscheck?
[1] personal — monitor all your repos and orgs; review only PRs you author
[2] team — monitor org repos only; review all PRs from any author
Choice [1]:
After selecting, crosscheck detects GitHub login + org memberships and writes to config:
Personal ([1]):
deployment: personal
users:
- beingzy # auto-detected from gh auth
orgs:
- motivation-labs # auto-detected from org memberships
- codatta
routing:
allowed_authors:
- beingzyTeam ([2]):
deployment: team
orgs:
- motivation-labs
- codatta
# users: omitted — personal repos excluded in team mode
# allowed_authors: omitted — all PRs reviewedThree ways to control the mode:
| Prompt shown? | Config written? | Use case | |
|---|---|---|---|
First run (no deployment in config) |
Yes | Yes | Initial setup |
--personal / --team flag |
No | No | One-time override, CI pipelines |
--reconfigure flag |
Yes (shows current mode) | Yes (overwrites) | Switching modes permanently, re-detecting after joining a new org |
crosscheck watch --personal # personal mode this session only, config unchanged
crosscheck serve --team # team mode this session only, config unchanged
crosscheck watch --reconfigure # re-prompts, saves new choice to config--reconfigure prompt (shows current saved mode):
Reconfiguring deployment mode...
How are you using crosscheck?
[1] personal — monitor all your repos and orgs; review only PRs you author
[2] team — monitor org repos only; review all PRs from any author
Current: personal
Choice [1]:
Re-detecting after the choice always refreshes org memberships and repo lists — useful after joining a new org without switching modes.
Runtime auto-detection (when explicit scopes are missing):
If deployment is set but users, orgs, repos are all empty (e.g., user manually cleared them), watch/serve auto-detect scopes at startup without prompting:
deployment: personal→ detectusers=[self]+orgs=[memberships]deployment: team→ detectorgs=[memberships]only
Banner line: deployment personal or deployment team.
Repo accessibility validation:
At startup, for each entry in repos:, call GET /repos/{owner}/{repo} in parallel. Any that return 404 or 403 produce:
✗ repo not accessible: acme/old-repo — skipped (404 Not Found)
Remaining accessible repos continue normally. Non-crashing — a stale entry shouldn't halt the whole session.
New API functions (src/github/client.ts):
// Returns org login strings for all active memberships of the authenticated user
listUserOrgs(token: string): Promise<string[]>
// Returns false on 404/403; true on 200; throws on network error
checkRepoAccessible(owner: string, repo: string, token: string): Promise<boolean>New loader functions (src/config/loader.ts):
// Returns scopes to use for auto-detection based on deployment mode
detectScopesForDeployment(
deployment: 'personal' | 'team',
token: string
): Promise<{ users: string[]; orgs: string[] }>
// Writes deployment, users, orgs, allowed_authors to config file; no-op if deployment already set
patchScopesAndDeployment(
configPath: string,
deployment: 'personal' | 'team',
login: string,
orgs: string[]
): booleanInteraction with existing patchAllowedAuthors:
patchScopesAndDeployment supersedes the single-field patchAllowedAuthors for new installs. patchAllowedAuthors is kept for backward compatibility (existing configs that already have deployment omitted but allowed_authors empty).
Problem: crosscheck runs silently in the background and users have no way to know what percentage of eligible PRs it actually reviewed. Missed PRs fall into several categories — author filter excluded them, no attribution footer existed, the webhook wasn't registered during that window, or an unknown AI agent wrote the PR. Without a way to enumerate these gaps, users can't tell whether their config is optimal or whether a crosscheck feature is missing. And when a feature is missing, there's currently no automated path from "I spotted the gap" to "I filed a proposal" to "I implemented the fix."
Solution: crosscheck coverage enumerates all PRs in the monitored scope during the crosscheck uptime window, joins that list against review logs, classifies each missed PR by root cause, and routes each class to the right remediation:
- Config gaps (author filter, missing
author_routes, disabled vendor) → suggest and optionally apply config changes; optionally file a best-practice issue to the crosscheck repo - Feature gaps (unrecognized AI agent attribution, unsupported routing pattern) → draft a prd.md feature proposal; optionally clone
motivation-labs/crosscheck, write the implementation, and open a ready-for-review PR
Why this is different from diagnose:
diagnose reads error events — things that broke during a review that was attempted. coverage reads the inverse: PRs that were never attempted at all. The two are complementary: diagnose finds quality problems in the review pipeline; coverage finds scope problems upstream of it.
Self-improvement loop:
crosscheck running
↓
coverage gap detected
↓
config gap? ──────────────→ --apply (config write) or --issue (best-practice PR)
↓
feature gap? ──────────────→ --prd (prd.md proposal PR, draft)
--build (implement + ready PR)
The --build path makes this the first crosscheck command that contributes back to its own development autonomously: it clones the repo, detects exactly which detection pattern or config handling is missing, implements it, adds tests, updates prd.md, and opens a PR. The human reviews and merges.
Gap classification decision tree:
PR in scope (full analysis period), not in reviewed logs
└─ PR overlaps any uptime window?
NO → offline_window (config_info: crosscheck was offline)
YES
└─ PR author in allowed_authors?
NO → author_filtered (config_fix: add to allowed_authors)
YES
└─ webhook event arrived for this PR?
NO → webhook_miss (config_fix: webhook not registered)
YES
└─ PR body matches any attribution pattern?
YES → reviewer assigned?
NO → no_reviewer (config_fix: enable vendor)
YES → flag as anomaly (reviewed but not logged)
NO → PR authored by a known-but-unsupported AI agent?
YES → unsupported_agent (feature_request: add detection pattern)
NO
└─ author in author_routes?
YES → (shouldn't be here — flag as anomaly)
NO → no_attribution (config_fix: add author_routes)
Uptime window computation:
Session boundaries from log entries:
session_start @ 09:00 → session_end @ 17:00 window: [09:00, 17:00]
session_start @ 18:00 → (no session_end) window: [18:00, next-log-ts]
Overlapping windows are merged. A PR's uptime membership check: window.some(w => pr.updated_at >= w.start && pr.created_at <= w.end).
--issue payload (config gaps):
Filed to motivation-labs/crosscheck. Body uses only aggregate counts and pattern types — no GitHub usernames, repo names, org names, PR numbers, or branch names. Body structure:
## Config best-practice gap: author_filtered
**Condition:** `allowed_authors` is set and PRs from a bot account matching the
pattern `*[bot]` are being skipped.
**Ideal behavior:** `crosscheck init` or `crosscheck watch` startup should warn when
known bot accounts (dependabot, renovate, copilot-workspace) are active in any
monitored repo but absent from `allowed_authors`.
**Suggested detection:** at startup, if `allowed_authors` is non-empty, check whether
recent PR authors in monitored scope include any `*[bot]` logins not in the list.
Warn with: "3 PRs from bot accounts were skipped — add them to allowed_authors or
switch to team mode."
**Supporting data:** N PRs skipped over 14 days across M repos (counts only — no identifiers).
Sanitization applied before generating the body: all GitHub logins replaced with their category (e.g., bot account, human author); repo/org names replaced with counts; PR numbers omitted entirely.
--build agent prompt structure:
You are implementing a feature for crosscheck (an AI code review orchestrator).
Gap type: unsupported_agent
Description: PRs authored by `copilot-swe-agent[bot]` are not being recognized
as AI-authored and therefore not reviewed. crosscheck needs a detection pattern
for this agent's attribution footer.
GitHub Copilot attribution footer: "Co-Authored-By: GitHub Copilot <>"
Task:
1. Add `'Co-Authored-By: GitHub Copilot'` to `claude_reviews_patterns` default in
`src/config/schema.ts` (GitHub Copilot is reviewed by Claude, as Codex reviews Claude-authored code).
2. Update `crosscheck.config.example.yml` with the new pattern, commented.
3. Update `get-started.md` routing section with a note about GitHub Copilot support.
4. Add a test case to the routing test suite verifying the new pattern matches.
Do not touch any other files. Do not change existing patterns.
Implementation phasing:
Phase 1 (this feature): Config-gap detection + --apply + --issue. Delivers immediate value, no cloning required.
Phase 2: --prd — generates prd.md proposal, opens draft PR. No code generation.
Phase 3: --build — full autonomous contribution. Requires careful scoping of the agent prompt to prevent scope creep.
-
smee.io as default tunnel — once smee proves stable in production, flip
tunnel.backenddefault fromlocalhost.runtosmee. Migration:crosscheck initauto-generates a smee channel and writes it to config. Old configs keep working (localhost.run continues to work). Track: hassmee-clientinstall friction reduced? Are missed-event reports gone? -
Retry logic — if
codex revieworclaudesubprocess fails, retry once with exponential backoff -
crosscheck logs— tail recent review activity from a local log file -
Config validation on startup — warn on unknown keys, required-but-missing fields
-
Per-repo routing overrides — allow different quality tiers or reviewers per repo in config
-
Slack/email notification — optional ping when a review is posted
-
Graduate
serveout of beta — battle-test on an always-on machine, document pm2/launchd setup
-
init,review,watch,serve,statuscommands - Cross-vendor and single-vendor modes
- Org-level webhook support
- Auto-generated webhook secret (
~/.crosscheck/webhook-secret) - npm publish as
@motivation-labs/crosscheck - CI (typecheck + build) + CD (staging @beta, production @latest) workflows
- get-started.md — full documentation
-
crosscheck initgh CLI check acceptsGITHUB_TOKENenv var as valid auth (no false failure when token is set butgh auth loginwas never run)