Skip to content

humanbased-ai/verifyflow

Repository files navigation

VerifyFlow

Evidence-backed delivery verification for Linear-driven pull requests.

VerifyFlow answers one question:

Did this PR actually deliver what the linked ticket asked for?

It is not code review. Crosscheck reviews code and merge readiness; VerifyFlow runs after that, executes the ticket acceptance criteria, captures evidence, and reports the delivery verdict.

Linear issue + GitHub PR -> execution evidence -> delivery verdict

The CLI is verifyflow, with the short alias vf.

Quickstart

The fastest way to see a full run — no credentials, no checkout, fully offline:

npx github:humanbased-ai/verifyflow demo

vf demo runs bundled fixtures through the whole pipeline and writes a report you can read.

Then check your environment and (optionally) get a guided setup:

vf doctor      # are gh / claude / LINEAR_API_KEY / Playwright / a sandbox runtime ready?
vf onboard     # guided first-run setup; prints the exact fix command for anything missing

vf onboard can save your Linear key to ~/.verifyflow/credentials.json (mode 0600). At runtime LINEAR_API_KEY is resolved from the environment first, then that credentials file — so you never have to export it again.

Install

After the first npm release:

npm install -g @humanbased/verifyflow
vf doctor

Before npm publish, run straight from GitHub:

npx github:humanbased-ai/verifyflow doctor
npx github:humanbased-ai/verifyflow run \
  --fixtures fixtures/example-cli \
  --linear EX-1 \
  --pr example/greet#7 \
  --level functional

Verify a PR — vf run

vf run \
  --linear IN-123 \
  --pr humanbased-ai/monorepo#456 \
  --level auto \
  --checkout \
  --comment
  • --linear is optional: if omitted, VerifyFlow derives the issue from the PR body's Linear link or the branch name.
  • --checkout clones the repo and checks out the PR head for real execution. Or point at an existing checkout with --workdir <dir>, or run offline with --fixtures <dir>.
  • --comment posts the markdown report as a PR comment (idempotent — updates in place).
  • --linear-writeback also posts the delivery verdict back to the linked Linear issue.

Preview what VerifyFlow would do without checking out or executing anything (exits 0):

vf run --linear IN-123 --pr humanbased-ai/monorepo#456 --level auto --dry-run

Verify a PR that has no resolvable ticket, against its own description (verdict capped at manual_review_required):

vf run --pr humanbased-ai/monorepo#456 --allow-no-ticket

Levels — --level

Level What it does Needs
functional Command/test probes against a checkout checkout/workdir
ui AI-driven browser checks via Playwright Playwright + a running app
journey Multi-step end-to-end (backend + browser) checkout + Playwright
auto Picks the level from the ticket; downgrades to functional if no browser is available, and says why

For ui / journey, point at the app with --base-url <url> (otherwise VerifyFlow tries to find a deployment preview from the PR's checks), and supply an authenticated session with --ui-auth <storageState.json> (create it with playwright codegen --save-storage=auth.json).

Merge policy — --policy

Policy Behavior
advisory Default. Reports only, never blocks.
merge_gate Exits non-zero on needs_fix, so a failing verdict blocks merge.
strict Also blocks on manual_review_required / accept_with_risks.

Other commands

Command What it does
vf run Verify one PR against its Linear ticket (see above).
vf step Orchestrator-facing step (Symphony/Jazzband): advisory-only, auto-resolves the issue, checks out + executes + comments, and prints one machine-readable JSON line to stdout. Never blocks.
vf watch Independent daemon: watch a repo's Crosscheck-approved PRs, verify delivery, and (with --auto-merge) squash-merge on a clean accept.
vf report Aggregate accumulated runs into quality metrics; --trend, --since, --repo, --level, --json filters.
vf replay <runId> Re-run the verdict engine against a past run's stored evidence — no probes/tests re-execute.
vf show <runId> Re-render a past run's report.md (or report.json).
vf signal <runId> Pretty-print a past run's improvement-signal.
vf memory Inspect reusable test-point memory: vf memory ls, vf memory show <key>, vf memory clear [--repo <o/r>] [--yes].
vf init Scaffold a verifyflow.config.json in the target repo (auto-detects npm / uv / go / cargo / make).
vf doctor Check that the tools/env VerifyFlow relies on are ready.
vf onboard Guided first-run setup; --non-interactive to skip prompts.
vf demo Offline demo with bundled fixtures; --open to open the report.

Full flag-by-flag reference: docs/commands.html. Run vf --help for the same usage in the terminal.

Watch a repo's Crosscheck-approved PRs:

vf watch --repo humanbased-ai/monorepo --interval 120              # monitor + verify + comment
vf watch --repo humanbased-ai/monorepo --auto-merge --interval 120 # also squash-merge on accept

What works today

  • One Linear issue against one GitHub PR.
  • Acceptance-criteria extraction from the Linear issue (or the PR description with --allow-no-ticket).
  • functional, ui, journey, and auto levels.
  • Real command/test execution against a checkout or workdir.
  • Browser-backed UI and journey checks through Playwright.
  • Markdown and JSON reports, screenshots, traces, command logs, and reusable test memory.
  • Idempotent PR comments and optional Linear writeback.
  • advisory / merge_gate / strict merge policies.
  • Standalone CLI, orchestration step, and Crosscheck-approved watcher modes.
  • Colorized terminal output in an interactive shell (auto-disabled when piped or under NO_COLOR).

VerifyFlow is conservative: uncertainty becomes blocked or not_evaluable, not a fake product failure.

Requirements

VerifyFlow stores no secrets. It reuses local tools and environment variables.

Tool Needed for Required?
Node.js >= 20 CLI runtime required
gh authenticated GitHub PR context and comments required
LINEAR_API_KEY (env or ~/.verifyflow/credentials.json) Linear issue reads required (or use --fixtures)
claude authenticated LLM planning and judging; otherwise rules-only fallback optional
docker or podman Sandbox isolation for executing PR code (IN-555); without it, probes run on the host optional
Playwright ui and browser-backed journey runs optional
npm, uv, etc. Target repo setup and tests as the target needs

Roadmap

Next major layer: project-level verification.

Linear project -> ticket/PR matrix -> leveled runs -> evidence bundle -> project report

Planned work:

  • Read a Linear project and map tickets to PRs.
  • Generate a coverage matrix across tickets, criteria, PRs, SHAs, and evidence.
  • Run per-ticket functional/UI/journey checks.
  • Produce one project-level implementation gap report.
  • Keep stronger sandbox isolation for untrusted PR execution.
  • Add opt-in Linear status transitions and follow-up issue filing.

Boundaries

VerifyFlow does not:

  • review code or decide merge readiness; that is Crosscheck
  • decompose tickets or dispatch coding agents; that is Symphony today and Jazzband next
  • move money, send production email, or perform irreversible side effects
  • replace CI
  • provide native mobile evaluation yet

Toolchain fit

Symphony / Jazzband -> Crosscheck -> VerifyFlow
orchestration          code review    delivery verification

Symphony is the current Python orchestration layer. Jazzband is the planned open TypeScript/npm successor. VerifyFlow works with both, and also runs alone, by using public artifacts: GitHub PR metadata, Linear issue links, SHA-bound comments, CLI JSON, and evidence files.

More detail:

Development

npm install
npm run typecheck
npm test
npm run build
npm pack --dry-run

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors