harness-maker

English · 한국어

A different harness for every project — built from yours, never generic.

Per-project personalization · Grade-gated · Self-evolving · Multi-IDE

Why · How it fits · Quickstart · Features · How it works · Slash commands · Comparison · Configuration · Targets · FAQ · Stability · Roadmap · Deep dive

Try in 30 seconds

Paste this into Claude Code, Cursor, or Codex CLI. A profiler reads 12+ stack signals + a 10-dimension interview synthesizes a harness specific to YOUR project — not a generic template. It auto-detects your IDE, installs the matching plugin via Bash (no slash-command typing), runs the interview, and prints a personalization-tier report.

Install harness-maker for this project and bootstrap the harness end-to-end.

You are an AI agent running inside Claude Code, Cursor, or Codex CLI.
Detect which one (silently — don't ask me), install the matching plugin via
Bash (NOT slash commands typed by me), then invoke harness-maker:make and
hm:health via the Skill tool. Conduct the conversation in the language of
my first reply.

Install path per IDE:
  Claude Code:  Bash  claude plugin marketplace add Ecro/harness-maker
                Bash  claude plugin install harness-maker@harness-maker
                Then tell me: "Type /reload-plugins, then send any short message (e.g. `go`) to re-trigger me."
  Cursor:       Bash  git clone --depth 1 https://github.com/Ecro/harness-maker.git ~/.cursor/plugins/local/harness-maker
                Then tell me: "Reload the Cursor window (Ctrl+Shift+P → Reload Window)."
  Codex CLI:    Bash  codex plugin marketplace add Ecro/harness-maker
                Then tell me: "Open /plugins; if harness-maker isn't enabled, restart codex."

After the reload, run harness-maker:make (drive the interview — accept defaults
unless I object), then hm:health (read out the personalization tier and top
action items).

Bash approval: approve only the exact install commands shown above (claude plugin install …, git clone --depth 1 https://github.com/Ecro/harness-maker.git …, codex plugin marketplace add …). Do NOT grant blanket Bash(*). If the AI requests a different command, stop and inspect.

For the long form (per-IDE action counts, manual install fallbacks, integrity-pin guidance), see Quickstart below.

Why harness-maker?

Most AI coding setups start from a generic template and drift from day one — same reviewer set on every project, same prompts on every stack, same defaults that nobody ever revisits. harness-maker takes the opposite stance: the harness is shaped by your project, and it keeps that shape as the project moves.

	Principle	What this means in practice
🎯	Personalized	Profiler reads 12+ stack/framework/CI signals. Interview locks 10+ dimensions. A `Side` experiment and a `Production` service get structurally different harnesses — different reviewer sets, different workflow stages, different security gates. No generic defaults silently shipped.
🛡️	Trusted	Every `/hm:execute` runs in a fresh worktree with a TDD loop. `/hm:review` doesn't just report — it applies consensus-passed fixes and re-reviews until grade ≥ A. Mechanical checks (lint/tests) gate the LLM reviewer before any token is spent.
🌱	Self-evolving	Hand-edit any agent, skill, CLAUDE.md — block-merge markers preserve your edits across `--update`. Memory accumulates project-specific patterns; recurring failures auto-propose new guardrails.
🌀	Anti-rot	Weekly crawl across 4 sources (Anthropic, GitHub releases, arXiv, OSV CVEs). Adaptive relevance filter learns from your accept/reject history. Always manual-confirmed — no silent auto-apply path exists.
🎛️	Multi-IDE	Claude Code + Cursor + Codex. One `harness.yaml`, three target-native renders. Existing Cursor rules / Aider config / Copilot instructions get absorbed on first run — no manual port.

How it fits your project

SENSE  →  DECIDE  →  RENDER  →  EVOLVE

Four steps. Each is a question harness-maker answers for you — once, then continuously.

1. SENSE — What kind of project is this?

The profiler scans concrete signals before asking you anything. Every detection is tagged with a confidence (HIGH / MEDIUM / LOW) that decides whether the default is applied silently, prompted, or skipped.

Signal	Detection source
Stack (12+)	`pyproject.toml`, `package.json`, `Cargo.toml`, `go.mod`, `pubspec.yaml`, … + extension counts
Scale	Total source files, monorepo depth, presence of `apps/` or `packages/`
Lifecycle	Git commit velocity, branch protection rules, CI workflow presence
Frameworks	React / Vue / FastAPI / Django / Tauri / Zephyr / Next.js / Tailwind / Pydantic … (dependency-parsed, not keyword-guessed)
Package manager	uv · poetry · pnpm · yarn · cargo · go mod · gradle · maven
CI provider	GitHub Actions · GitLab CI · CircleCI · Bitbucket Pipelines
Foreign AI config	Pre-existing `.cursor/rules/`, `AGENTS.md`, `CLAUDE.md`, `.continue/`, `.aider.conf.yml`, `.github/copilot-instructions.md`

Result: A 24-hour cached ProjectProfile you can inspect with harness-maker profile . --json before any interview runs.

2. DECIDE — What harness does this project need?

A short interview locks the dimensions that shape every downstream render. Re-runs silently reuse prior answers; explicit --reinterview re-prompts.

Dimension	Choices	What it changes
Preset	`Side` · `Production`	Reviewer count (1 vs 5), workflow stage count, security gate depth, verify-required flag
Dev mode	`task-driven` · `spec-driven`	Whether SPEC stage is mandatory; whether plan stages chain into execute
Targets	`claude-code` · `cursor` · `codex` (multi-select)	Which IDE-native asset trees are rendered
Locale	`en` · `ko` · any tag	Interview text + user-facing error messages
Workflows	Fused sequences from atomic stages	Which `/hm:<name>` slash commands appear
Reviewers / skills	Preset defaults + overrides	Which agents and skills install
Ref folders	Path + glob pairs	Which external docs are searchable via `refdocs-search` skill
Sibling repos	Relative paths	Which adjacent repos share the same harness session
Second Brain	Obsidian vault path + project_id	Where cross-session memory writes
Recommended model	`claude-opus-4-7` default	The model frontmatter on every generated agent

Result: .claude/harness.yaml — a single source of truth that survives upgrades.

3. RENDER — How does this become a working harness?

One source tree, three IDE-native renders, all from a single harness.yaml:

.claude/                 ← single source: agents, skills, commands/hm/, hooks, memory, observability
├── .cursor/             ← + Cursor-native rules, hooks, mcp.json     (if cursor target)
├── .codex/              ← + Codex-native config.toml, agent TOMLs    (if codex target)
├── AGENTS.md            ← + Codex root instructions                  (if codex target)
└── .agents/skills/      ← + Codex skill paths                        (if codex target)

Layered on top of the base render:

Domain packs — --add-domain python inlines a stack-specific standards block into the 5 reviewer agents (code, security, performance, concurrency, ux). python is the only sample that ships pre-filled; --add-domain <other> scaffolds a blank user-side stub at .claude/agents/_standards/<name>.md for teams to fill in without forking harness-maker.
Foreign config absorption — pre-existing Cursor rules, Aider config, Copilot instructions get LLM-translated into harness.yaml axes. @hm:harness:* inverted markers keep them synced on re-renders.
Sibling repos — frontend + backend + library share one harness session via relative-path bindings in harness.yaml. Cross-machine portable.
Ref folders + Second Brain — registered project docs + Obsidian vault notes become searchable from any stage.

Result: Every IDE you use sees the same agents, the same skills, the same workflows — natively. Zero manual porting.

4. EVOLVE — How does the harness stay useful?

Three independent feedback loops keep the harness aligned with your project as it grows.

A. Your edits survive upgrades. Hand-tune agents/code-reviewer.md. Add a custom skill. Edit a CLAUDE.md section. All survive harness-maker make --update because content hashes per file plus @hm:user:* block-merge markers separate your edits from template-owned regions. @hm:harness:* inverted markers do the opposite (for foreign config: preserve outside, replace inside).

B. The harness learns your project.

.claude/memory/wiki.md — patterns, conventions, gotchas. Each /hm:wrapup appends new entries.
.claude/memory/failures.md — recurring mistakes, deduplicated by slug + count.
.claude/memory/session/<date>.md — non-obvious decisions per workday.
When any failure slug reaches count ≥ 3, wrapup writes a proposal to pending-proposals.md — a new skill, rule, or hook that would have prevented the recurrence. The user reviews and decides whether to ingest.

C. Defaults adapt to your overrides. Override telemetry (100% local) tracks every time you change a default. /hm:health scores three layers — detection→recommendation conversion, override stability, audit cadence — and surfaces a Bronze → Silver → Gold → Platinum tier with ranked action items. Drift the harness too far from your real usage and the audit will tell you exactly which default is wrong.

Result: The harness improves with the project, not against it. No silent re-litigation of decisions you've already made.

For the mechanics behind each step — full procedures, decision paths, internal invariants — see docs/HOW-IT-WORKS.md.

Quickstart

Universal Bootstrap Prompt

Paste into your AI agent — it detects whether you're in Claude Code, Cursor, or Codex, runs the matching plugin install via Bash (no slash-command typing by you), and bootstraps the harness end-to-end. One paste, every supported IDE.

Per-IDE user actions (the AI does the rest):

IDE	Paste	Typed slash	GUI / restart action	Total user actions
Claude Code	1	1 (`/reload-plugins`)	1 short message (e.g. `go`) to re-trigger Claude	3
Cursor	1	0	1 (`Ctrl+Shift+P → Reload Window`)	2-3
Codex CLI	1	0	0-1 (restart codex if `/plugins` doesn't show harness-maker)	2-3

First-use Bash approval. Claude Code / Cursor will request Bash permission for the install commands shown below. Approve only the exact commands in the prompt (claude plugin install harness-maker@harness-maker for Claude Code, git clone --depth 1 https://github.com/Ecro/harness-maker.git ~/.cursor/plugins/local/harness-maker for Cursor). Do NOT grant blanket Bash(*) — if the AI requests a different command, stop and inspect. The Cursor clone fetches the current main branch of a public repo; future git pull updates are not integrity-verified — pin a release tag manually if this matters for your threat model.

Install harness-maker for this project and bootstrap the harness end-to-end.

You are an AI agent running inside one of: Claude Code, Cursor, or Codex CLI.
Detect which one (silently — do NOT ask me), run the matching plugin install
via Bash (NOT slash commands typed by me), then invoke harness-maker:make
and hm:health via the Skill tool.

Step 1 — Detect the host IDE (silent):
  - Claude Code  → $CLAUDE_CODE is set, or ~/.claude/ exists, or you have
                   the Bash tool with access to the `claude` CLI
  - Cursor       → $CURSOR_SESSION is set, or ~/.cursor/ exists, or you have
                   Cursor's plugin manager available
  - Codex CLI    → $CODEX_SESSION is set, or ~/.codex/ exists, or the
                   `codex` command is on PATH

Step 2 — Install harness-maker as a plugin via Bash (skip when the
harness-maker:make skill is already available):

  IF Claude Code:
    Bash: claude plugin marketplace add Ecro/harness-maker
    Bash: claude plugin install harness-maker@harness-maker
    Then tell me verbatim: "Type /reload-plugins, then send any short message (e.g. `go`) to re-trigger me."

  IF Cursor:
    Bash: git clone --depth 1 https://github.com/Ecro/harness-maker.git ~/.cursor/plugins/local/harness-maker
    Then tell me verbatim: "Reload the Cursor window now (Ctrl+Shift+P → Reload Window)."

  IF Codex CLI:
    Bash: codex plugin marketplace add Ecro/harness-maker
    Then tell me verbatim: "Open Codex's /plugins list; if harness-maker isn't enabled, restart codex."

  IF you can't tell which IDE (or none of the above), STOP and ask me which
  IDE you're in.

Step 3 — After I confirm the reload/restart, invoke harness-maker:make via the
Skill tool (do NOT ask me to type any slash command) and drive the interview:
  • Confirm preset (Side / Production), dev mode, target IDEs, and locale.
  • Accept the recommended defaults unless I object.

Step 4 — After harness-maker:make finishes, invoke hm:health via the Skill tool
(again, do NOT ask me to type it):
  Produces a 3-section dashboard at .claude/observability/dashboard.md
  (structural / external-risks / personalization). Tell me the personalization
  tier and any high-priority action items.

Conduct the conversation in the language of my first reply; default to English
if you cannot tell.

Manual install — pick one for your IDE

Claude Code (plugin marketplace)

/plugin marketplace add Ecro/harness-maker
/plugin install harness-maker@harness-maker

The marketplace metadata and the plugin both ship in this repo, so two commands cover discovery + install. Reload Claude Code; /harness-maker:make becomes available.

Codex CLI (plugin marketplace)

codex plugin marketplace add Ecro/harness-maker
# pin a specific release for reproducibility:
codex plugin marketplace add Ecro/harness-maker --ref v0.14.3

marketplace add is the install — there is no separate install step in Codex. Then run /plugins inside Codex to enable.

Cursor (Team marketplace import OR local symlink)

Cursor's public plugin marketplace is curated (cursor.com/marketplace/publish, submit-and-review). Direct GitHub install is still on the roadmap as of Cursor 2.5+. Two working paths today:

A. Team marketplace (Team / Enterprise plans): Dashboard → Settings → Plugins → Team Marketplaces → Import → paste https://github.com/Ecro/harness-maker → push to team members.

B. Local dev install (community pattern):

git clone https://github.com/Ecro/harness-maker.git ~/.cursor/plugins/local/harness-maker

Reload Cursor (Ctrl+Shift+P → "Reload Window") to pick up the new plugin.

PyPI (no IDE plugin — CI, headless, scripts)

For environments where no IDE plugin path applies (CI scripts, headless servers, automation, or strong preference for a Python CLI tool):

uv tool install harness-maker          # POSIX / macOS / WSL
# Native Windows / PowerShell:
irm https://astral.sh/uv/install.ps1 | iex
uv tool install harness-maker

Then:

cd your-project
harness-maker profile . --json
harness-maker make . --preset Production --locale en --targets claude-code,cursor

This path is fine for one-off renders but skips the in-IDE slash-command surface — /hm:health, /hm:plan, /hm:execute, etc. only appear after the IDE plugin loads them.

After the harness is rendered, re-run with flags to evolve it:

harness-maker make . --audit           # score the existing .claude/ against the rubric
harness-maker make . --add NAME        # graft on a single skill/agent/command
harness-maker make . --remove NAME     # surgically remove one component
harness-maker make . --promote NAME    # move an ad-hoc artifact into the harness

Requirements

Dependency	Notes
Python 3.12+ and `uv`	Required wherever Claude Code runs against your project — even if your project's primary language is Rust, Node, or Go. Hooks invoke `uv run python -m harness_maker.gates.*`; without `uv` they are silent no-ops.
Claude Code CLI (plugin + hook support)	Optional for plugin mode (`claude --plugin-dir /path/to/harness-maker`). Not required — the `harness-maker` CLI can generate harnesses independently.
Cursor IDE 2.4+ (3.2+ recommended)	Optional. Reads `.claude/agents/`, `.claude/skills/`, and `.claude/commands/hm/*.md` natively (verified empirically in 0.6.2, re-confirmed 0.7.1 — see `tests/cursor-compat/results-2026-05-08.md`). Hooks render to a separate `.cursor/hooks.json` with Cursor-native schema; both files are emitted when `targets` includes `cursor`. Cursor 3.0+ adds native `/worktree` and `/best-of-n` which coexist safely with harness-maker's prefix-matched cleanup.
OpenAI Codex CLI	Optional. When `targets` includes `codex`, harness-maker renders Codex-native `.codex/` config, `AGENTS.md`, and `.agents/skills/` assets from the same workflow definitions.
Git	Required for worktree isolation (every `/hm:execute` and `/hm:loop` run).

Features

Grouped by what they do for your project, not by component.

🎯 Personalization — fits your project

Project-shaped harness. Profiler + 10-dimension interview produce structurally different harnesses for Side experiments vs Production services. Reviewer count, workflow stages, security gate depth, mechanical-check enforcement — all derived from your answers.
12+ stack detection. Python · Node · Rust · Java · Kotlin · Swift · Dart · Ruby · PHP · C# · Elixir · Scala · C/C++ · Zig · Haskell. Framework + package-manager + CI provider parsed from manifests, not keyword-guessed. 24h cache ceiling on manifest mtime.
Confidence-bucketed defaults. Every detection declares HIGH/MEDIUM/LOW confidence. HIGH → silent default with # detected: provenance. MEDIUM → explicit interview prompt. LOW → skipped (you decide). Regression test guards against surprise silent-default changes between minor versions.
Adaptive personalization tier. /hm:health scores three layers (detection→recommendation conversion, override stability, audit cadence) and reports Bronze → Silver → Gold → Platinum tier. Auto-proposes default changes when one axis has been overridden 5+ times. 100% local telemetry; nothing leaves the project.
Domain packs. --add-domain python inlines stack-specific standards into the 5 reviewer agents. python is the only pre-filled sample; other domain names scaffold blank user-side stubs that teams fill in without forking harness-maker.
Single command, no subcommand sprawl. /harness-maker:make is the only entry point. Everything else is a flag (--audit, --add, --remove, --promote, --add-domain, --reinterview, --update).

🛡️ Trust — grade-gated work

Grade-based auto-fix loop. /hm:review doesn't just report. It applies consensus-passed fixes → re-reviews (selectively, only reviewers whose scope was touched) → regrades, until grade ≥ threshold (default A) or max_review_rounds is exhausted. Weak-consensus and manual-only findings are never auto-applied.
Mechanical pre-checks before any LLM token. Lint clean + tests green are enforced before reviewers spawn. First non-zero exit emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts. --no-auto-fix does not skip this gate.
Conditional reviewer routing. .env change → security-reviewer. /perf/ → performance-reviewer. .tsx → ux-reviewer. Async/locking → concurrency-reviewer. 10× cheaper than fanning out to every reviewer on every diff.
2-pass redaction (+47pp precision). Pass 1 strips PR metadata so findings can't anchor on author/title. Pass 2 restores full context — reviewers must validate or drop each Pass 1 finding. Ablation-measured precision gain on anchoring-prone diffs.
7 security gates. secrets · permissions (settings.json over-grant) · hook-injection (AST scan for rm -rf, curl|sh, eval) · CVEs (OSV.dev) · hallucination (AST scan for non-existent imports) · prod-name guard · prompt-injection. Findings stay local in .claude/observability/security/.
Privilege separation. Reviewer agents deny Write / Edit / interpreter Bash calls. Executor agents allow only .worktrees/** writes + paired Edit/Write denies on system paths. Defense-in-depth survives prompt injection, agent compromise, and tool_input poisoning.
Worktree isolation per run. Every execute runs in a fresh git worktree. Failed runs preserve evidence; successful runs auto-cleanup with prefix-match (Cursor's own worktrees never touched).

🌱 Self-evolving — grows with your project

Block-merge preservation. Hand-tune any agent, skill, CLAUDE.md section. Survives --update because content hashes per file plus @hm:user:* markers separate your edits from template-owned regions. @hm:harness:* inverted markers do the opposite for foreign config absorption.
Three-tier memory accumulation. wiki.md (patterns) · failures.md (recurring mistakes deduplicated by slug) · session/<date>.md (non-obvious decisions). Wrapup writes these automatically; every stage reads them automatically.
Self-improving failure proposals. When a [fail:*] slug recurs 3× across sessions, wrapup writes a proposal to pending-proposals.md — a new skill, rule, or hook that would have prevented the recurrence. You review and decide whether to ingest.
ADR system as binding execute constraints. Architecture Decision Records promoted during /hm:plan are hard constraints on /hm:execute. Conflicts surface as blockers, never silently proceed. Future sessions don't re-litigate settled decisions.
Refdocs search. Register architecture docs, API specs, design docs in harness.yaml. refdocs-search skill gives lossless full-text search — no chunking, no RAG index.
Optional Second Brain. Obsidian vault integration. Allowlisted write folders by note type (decision · preference · failure · project · reference · journal). Cross-session memory survives plugin reinstalls and machine moves.
Brownfield-safe upgrades. Reconciler hashes existing .claude/ and offers per-conflict keep/replace/both. Apply is ADD-only with timestamped backups. User edits never silently overwritten.

🌀 Anti-rot — survives the ecosystem

4-source weekly crawl. Anthropic blog/changelog · anthropics/claude-code GitHub releases · arxiv cs.SE/CL/CR · OSV.dev CVEs. Manifested as pending items in /hm:health.
Adaptive relevance filter. Threshold starts at 0.7, adjusts ±0.05 based on your accept/reject history per source. Always manual-confirmed — no --auto-apply path exists.
Unified health audit. /hm:health composites three orthogonal layers — Structural (70% deterministic + 25% LLM rubric + 5% cache diagnostic), External Risks (anti-rot pending queue), Personalization (Bronze→Platinum tier). One 3-section dashboard at .claude/observability/dashboard.md. No auto-apply ever (ADR-001).
SessionStart drift reminder. Hook fires on every session open and warns if running plugin version differs from the version that rendered the harness — so you notice when /plugin update needs a re-render. Detector compares against latest cached plugin version (not just imported __version__).
Cache-miss classification. Prompt-cache diagnostic reports min_threshold · invalidation · ttl · first (cold start). 5% weight in AI-readiness distinguishes cold-start misses (benign) from structural misses (actionable).

🎛️ Multi-IDE — one source, every IDE

Three targets from one harness. Claude Code + Cursor (2.4+, 3.0+ recommended) + Codex CLI. Single .claude/ source tree; Cursor reads it natively plus its own .cursor/ for hooks and rules; Codex reads .codex/, AGENTS.md, and .agents/skills/. All rendered from one harness.yaml.
Foreign AI config migration. Detects 6 known foreign configs (.cursor/rules/, AGENTS.md, CLAUDE.md, .continue/config.json, .aider.conf.yml, .github/copilot-instructions.md). LLM-translates them into harness.yaml axes. @hm:harness:* inverted markers keep them synced across re-renders.
Sibling repos. Frontend + backend + library bind into one harness session via relative paths in harness.yaml. Cross-machine portable (relative paths survive git clone).

🔁 Workflow primitives — the rest of the toolchain

Deep interview before every implementation. /hm:spec runs a 6-category interview (Intent → Outcomes → In-Scope Scenarios → Non-Goals → Constraints → Verification) scored for completeness. /hm:plan runs a 9-category interview (scope → architecture → contract → risk → testing → phasing → dependencies → failure handling → observability) in impact order. Every settled decision promotes to a binding ADR.
Autoloop with adaptive interview + 4-gate convergence. /hm:loop runs time-and-iteration-bounded loops. autoloop-driver reads the goal, asks only what's missing, locks intensity + exit checklist, then requires mechanical checks + LLM judgment + regression comparison + 2-iter convergence streak before accepting completion.
3-tier context loading + compaction recovery. Hot tier (today's session) · Warm tier (failures + wiki first 60/40 lines) · Cold tier (git log / PLANs on demand). PreCompact hook flushes session before context compaction; next turn detects the marker and resumes from the last in-progress phase.
Cross-process memory safety. .claude/memory/ writers serialise via re-entrant POSIX flock. Telemetry hooks append atomically via raw os.write() on O_APPEND (single-syscall, ≤PIPE_BUF) so concurrent Claude Code + Cursor sessions cannot interleave JSONL lines.

For the complete mechanics — all procedures, decision paths, internal invariants — see docs/HOW-IT-WORKS.md.

How it works

flowchart TD
    A["/harness-maker:make"] --> B["Profile\n(stack, scale, lifecycle)"]
    B --> C["Interview\n(preset + 10 dims + targets)"]
    C --> D["Synthesize\n(deterministic Blueprint)"]
    D --> E["Render\n(Jinja2 + provenance frontmatter)"]
    E --> F{Brownfield?}
    F -- No --> G["Write .claude/ directly"]
    F -- Yes --> H["Reconcile\n(hash-based keep/replace/both)"]
    H --> G
    G --> I["Extra targets?\nRender .cursor/ and/or .codex/ assets"]
    I --> J["User runs /hm:* commands"]
    J --> K["Weekly /hm:health (Step 2)\n4-source anti-rot crawl\n→ manual confirm"]

14 mechanisms (M1-M14) back every feature. See docs/ARCHITECTURE.md for the full breakdown including the privilege-separation model, security gate triggers, and reconcile invariants.

Since 0.12.0 the synthesis pipeline also threads through a typed Recommendation registry: every recommend_<axis>(profile, project_dir) function declares per-detection confidence (ADR-007), and the interview.py dispatcher routes by bucket. Detection results land in ~/.cache/harness-maker/profile-<repo-hash>.json with manifest-mtime + 24h-ceiling invalidation. Foreign AI config files detected at /hm:configure time can be imported into harness.yaml and re-rendered single-source with @hm:harness:* inverted block markers (ADR-009).

Slash commands the harness exposes

After install, the rendered harness exposes commands under /hm:*:

Atomic stages (always available)

Command	Purpose
`/hm:research`	Gather facts, user-workflow signals, best practices, and options
`/hm:spec`	Write acceptance criteria from research
`/hm:plan`	Decompose spec into phases with exit criteria
`/hm:execute`	Implement with TDD + worktree isolation
`/hm:review`	Multi-reviewer consensus (mechanical pre-check → conditional routing)
`/hm:wrapup`	Clean, document, commit
`/hm:verify`	6-check gate before completion

Fused workflows (preset-generated, user-renameable)

Fused workflows combine atomic stages into a single command. The interview generates a starter set; you can add, remove, or rename them in harness.yaml.

Preset	Default fused workflows
Side	`/hm:plan-exec-rev` · `/hm:exec-rev` · `/hm:exec-rev-wrap` (default)
Production	`/hm:exec-rev-wrap-ver` (default) · `/hm:exec-rev-wrap` · `/hm:plan-exec-rev` · `/hm:exec-rev` · `/hm:res-spec-plan`

Utility commands

Command	Purpose
`/hm:loop "<goal>"`	Autoloop driver — `feature` or `improve` mode, time/iter-bounded
`/hm:ai-readiness`	3-layer readiness score + P0/P1/P2 ranked actions
`/hm:personalization-audit`	Composite-score rubric (Bronze/Silver/Gold/Platinum) from telemetry + harness.yaml + ProjectProfile. Reads `.claude/observability/adaptive/overrides.jsonl`; outputs ranked ActionItem list with evidence schema. ADR-011 (v0); calibration deferred to 30+ project sample.
`/hm:refresh`	Anti-rot crawl — manual confirm required

How it compares

Most AI-coding harnesses ship a fixed bundle — same agents, same prompts, same defaults for every project. harness-maker takes a different stance on five axes:

Axis	What harness-maker does
Project-tailored synthesis	Profiler reads 12+ stack/framework/CI signals before any prompt. A 10-dimension interview locks the rest. A `Side` experiment and a `Production` service get structurally different harnesses (reviewer counts, workflow stages, security gates) — not the same set of files with different flags.
Edit-preserving upgrades	Hand-edit any agent, skill, or CLAUDE.md. Block-merge markers (`@hm:user:*`) carry your edits across `--update`. No "your customisations will be overwritten" warning at the top of every file.
Anti-rot crawl	Weekly fetch across Anthropic blog · GitHub releases · arXiv · OSV CVEs. Adaptive relevance filter learns from your accept/reject history. Manual confirmation is mandatory — no silent auto-apply.
Multi-IDE single source	One `harness.yaml` renders Claude Code, Cursor, and Codex CLI native assets. Foreign configs (`.cursor/rules/`, `AGENTS.md`, `.aider.conf.yml`, `.continue/`, `.github/copilot-instructions.md`) are absorbed on first run — no manual port.
Privilege-separated reviewers	Reviewer agents (`code`, `security`, `performance`, `concurrency`, `UX`) have read-only permissions — no `Write`, no `Edit`, no shell interpreters. The stage orchestrator applies fixes, preserving the boundary. Mechanical checks (lint/tests) gate the LLM reviewer before any token is spent.

harness-maker generates and owns the lifecycle of your .claude/ directory. A static template gives you a starting point. harness-maker gives you a starting point that knows who it was created for, and can be updated without losing your changes.

Configuration

The interview writes answers to .claude/harness.yaml. Key dimensions:

preset: Production           # Side | Production
locale: en                   # en | ko | <any — unknown falls back to en>
dev_mode: spec-driven        # spec-driven | task-driven
targets:                     # which runtimes to drive
  - claude-code
  - cursor
  - codex
recommended_model: claude-opus-4-7

ref_folders:
  - path: ../architecture-docs
    glob: "**/*.{md,txt,pdf}"

second_brain:
  enabled: true
  backend: filesystem
  project_id: my-app
  vault_path: ../obsidian-vault
  trusted_allowlist: true       # configured write folders need no confirmation/backup
  required_frontmatter: [type, created, updated, tags, links]
  folders:
    - path: Projects/my-app
      read: true
      write: true
      note_types: [decision, preference, failure, project, reference, journal]

sibling_repos:
  - ../backend

reviewers:
  enabled: [code, security, performance, ux, concurrency]
  routing: conditional       # conditional | always-all
  mechanical_checks:         # pre-LLM stop-on-first gate (optional)
    - ruff check .
    - uv run pytest tests/unit -x -q

worktree:
  scope: [execute, plan]     # which stages run in a fresh worktree
  cleanup: on_success        # on_success | always | never

anti_rot:
  enabled: true
  threshold: 0.7             # adaptive — adjusts ±0.05 based on accept/reject ratio

context_lint:
  strict: true               # warn on overrun | block

memory:
  files: [failures.md, wiki.md]

adaptive:
  disable_telemetry: false   # opt-out per ADR-005; 100% local capture
  audit_session_threshold: 30  # SessionStart hint after N axis overrides
  audit_days_threshold: 14     # SessionStart hint after N days without audit

All adaptive features are 100% local. tests/unit/test_no_network.py asserts no socket call during telemetry emit, audit, or SessionStart hook execution (ADR-005 positive obligation).

Run /harness-maker:make again and choose Update (same settings, pick up template improvements) or Full reconfigure to change any dimension.

Obsidian Second Brain

second_brain connects a Markdown/Obsidian vault as a typed knowledge graph for stage-aware memory. hm-research, hm-plan, hm-review, and hm-wrapup use different note types (reference, project, decision, preference, failure, journal) instead of loading the whole vault. Writes are full Markdown writes inside configured write: true folders only. Configure narrow folders: the allowlist is trusted completely, with no per-write confirmation or backup. For multiple projects sharing one vault, give every project a distinct project_id and put writable folders under a path segment with the same value (for example Projects/my-app). harness-maker rejects writable Second Brain folders that do not include the configured project_id, which keeps one project from writing into another project's note namespace.

Setup walkthrough

At interview time — when prompted for vault_path, supply the absolute path to your Obsidian vault root (or a not-yet-created subfolder of it; the harness creates the subfolder on first write iff the parent has .obsidian/). The interview then asks for project_id and a writable folder, defaulting to 99_HM/{project_id}/ (matches the 99_*/01_* numeric-prefix organization style).
Post-install adjustment — run /hm:configure to revisit Second Brain settings. The slash command dispatches to harness-maker configure-second-brain --check, which inspects state and returns guidance JSON so the slash command can prompt only when something is missing. To add a folder non-interactively, call harness-maker configure-second-brain --add-folder 99_HM/my-project/.
Existing harnesses upgraded from a pre-fix release — the loader now tolerates folders: [] (logs a one-shot warning + remediation hint pointing at /hm:configure); writes raise SecondBrainError with the same hint instead of an obscure "not under a configured write folder" message.

Internals: the rendered harness.yaml carries a provenance frontmatter block; all readers route through harness_maker.io_utils.load_harness_yaml (the canonical multi-document-tolerant loader) to avoid the parser-strategy drift that produced the original bug.

Targets

targets is a multi-select. Choose claude-code, cursor, codex, or any combination. Claude Code is the default; Cursor and Codex add runtime-specific assets while preserving the same preset, workflows, skills, reviewers, and safety model.

Cursor target

Run /harness-maker:make and pick targets: [cursor] or [claude-code, cursor] at the interview. The renderer adds:

.cursor/rules/harness.mdc — always-on workflow rules with Cursor-legal frontmatter (description / globs: [] / alwaysApply: true)
.cursor/hooks.json — Cursor-native hooks schema (lowercase camelCase keys, version: 1, flat {matcher, command}). Deliberately different from .claude/hooks/hooks.json (PascalCase, nested {hooks:[…], matcher}); each IDE reads only its own file. Don't try to collapse them — Cursor will silently stop firing hooks. See tests/cursor-compat/results-2026-05-08.md for the kairos 0.5.7 forensic that proved this empirically.
.cursor/mcp.json — Cursor MCP server config. Populated from harness.yaml.mcp_servers (0.6.2+); defaults to {"mcpServers": {}} when no servers are configured. Inner shape (command, args, env) is type-validated on parse with a warning log when entries are dropped.

.claude/agents/, .claude/skills/, and .claude/commands/hm/ are single-source — Cursor 2.4+ reads them natively (forensic-verified). Hooks are the only asset that requires per-IDE rendering because the schemas diverge by design.

Recommended model

harness.yaml.recommended_model defaults to claude-opus-4-7 and propagates to agent frontmatter. Cursor users may override model selection in their IDE. The harness does not rewrite prompts to be model-agnostic — <thinking> blocks and Claude-specific patterns are preserved deliberately.

Verification

Per-release Cursor compatibility is tracked in tests/cursor-compat/:

MANUAL_CHECKLIST.md — A1–A4 (agent dispatch, hook fire, skill auto-discovery, slash command + Q&A loop) covering both IDEs
RESULTS.md — PASS/FAIL/PARTIAL grid you fill while running the checklist
results-2026-05-08.md — kairos 0.5.7 forensic that resolved Q-A (hooks discovery) and Q-B (commands discovery) without an IDE-driven manual run; future Cursor verifications append a new dated results-*.md
fixture/ — minimal .claude/ for opening directly in either IDE

Automated CI guards the dual-schema invariants regardless of manual fixture runs:

test_cursor_hooks_uses_lowercase_native_schema — fails if .cursor/hooks.json accidentally adopts Claude PascalCase
test_no_cursor_commands_rendered — fails if a future change starts emitting .cursor/commands/hm-*.md mirrors (Cursor reads .claude/commands/ natively)
test_render_agents_have_structured_permissions_frontmatter — fails if any agent template loses its permissions.allow/deny block (Cursor 2.5+ subagent permission inheritance gap)

Codex target

Run /harness-maker:make and pick targets: [codex] or include codex with the other targets. The renderer adds:

AGENTS.md — Codex's top-level project instruction file, rendered without YAML frontmatter so Codex displays clean instructions.
.codex/config.toml and .codex/agents/*.toml — Codex-native configuration and agent registrations.
.codex/hooks.json — Codex hook wiring, including Codex-specific permission events and file-edit tool matchers.
.agents/skills/*/SKILL.md — the existing harness skills plus stage, workflow, and loop trigger skills in the layout Codex discovers.

Codex TOML files are rendered as pure TOML, not markdown-with-frontmatter. AGENTS.md uses block-merge markers so user additions survive re-renders, while .codex/*.toml files are regenerated from the selected target configuration.

Structured questions in Codex: Stage and loop skills instruct the agent to use the request_user_input tool for interview questions. This tool is available in Plan mode by default. To enable it in Code mode, add default_mode_request_user_input = true under [features] in .codex/config.toml. If the tool is unavailable at runtime, the agent falls back to asking in its response text.

Reconcile rules (re-rendering an existing harness)

Re-running /harness-maker:make on an existing harness uses a hash-driven KEEP rule: if a file's content_hash frontmatter matches the new template's hash, it's "ours" — safe to overwrite. If it differs (user-edited), it's "theirs" — kept.

Trade-off: when harness-maker bumps a template on a minor release, the existing file's hash no longer matches, so reconcile picks KEEP even if you didn't edit the file.

To pick up template updates after a version bump:

rm .claude/harness.yaml
/harness-maker:make
# (previous .claude/ is auto-backed up to .backup-<ISO>/)

.cursor/rules/*.mdc follow the same KEEP behavior. A future phase introduces a sidecar .hm-meta.yaml so harness-maker can hash-track Cursor assets without polluting Cursor frontmatter.

@hm:harness:* inverted markers (0.12.0) — for foreign-AI-config files we generate post-import (.cursor/rules/*.mdc, AGENTS.md, CLAUDE.md, etc.), content INSIDE @hm:harness:<id> markers is harness-owned (replaced on every render); content OUTSIDE is user-owned (byte-for-byte preserved). The block-merge parser dispatches per file extension: HTML comments for .md/.mdc, # @hm:harness: for .yml/.yaml, top-level _hm_harness JSON key for .json. 0.11.x files (frontmatter generated_by: harness-maker + zero @hm:harness:* markers) are treated as wholly harness-owned on first encounter post-upgrade and re-rendered into the new marker family (ADR-009 amendment).

Observability

All observability is 100% local — nothing is transmitted externally.

File	Contents
`.claude/observability/dashboard.md`	AI-readiness score, dimension breakdown, ranked action items
`.claude/observability/metrics-YYYY-MM-DD.jsonl`	Per-turn telemetry (cache hit %, tool calls, durations) — date-rotated daily (ADR-103, 0.7.1). Pre-0.7.1 `metrics.jsonl` is read as the trailing legacy shard.
`.claude/observability/refresh/raw-*.jsonl`	Anti-rot crawl evidence (accepted / rejected items)
`.claude/observability/security/findings-*.jsonl`	7-gate security scan findings
`.claude/observability/adaptive/overrides.jsonl`	`harness_yaml_override` events with `schema_version: 1`, dual capture sites (`/hm:configure` exit primary + SessionStart secondary), dedup-keyed
`.claude/observability/adaptive/last-audit.txt`	Last `/hm:personalization-audit` run timestamp

Run /hm:ai-readiness to regenerate the dashboard on demand.

Marketplace

Marketplace manifests are present for each runtime:

.claude-plugin/plugin.json — Claude Code plugin spec
.cursor-plugin/plugin.json — Cursor Marketplace spec
.codex-plugin/plugin.json — Codex plugin spec

Listings are pending. Until then, install locally:

# Any IDE — CLI install (recommended)
uv tool install harness-maker          # from PyPI (when published)
uv tool install /path/to/harness-maker # pre-PyPI, from clone
harness-maker make . --targets claude-code,cursor,codex

# Claude Code — plugin mode (optional, enables /harness-maker:make command)
claude --plugin-dir /path/to/harness-maker

# Cursor — open the repo folder directly in Cursor as a workspace plugin

# Codex — render Codex-native assets in your project harness
harness-maker make . --targets codex

Stability

harness-maker is on 0.x and stays there until enough projects depend on it that a 1.0 commitment is honest (see ADR-001 in work-docs/PLAN-oss-readiness-audit.md).

Frozen surfaces — these will not break in any 0.x.minor without a deprecation cycle:

Slash command names: /hm:make, /hm:research, /hm:plan, /hm:execute, /hm:review, /hm:wrapup, /hm:verify, /hm:health, /hm:loop, /hm:configure, /hm:personalization-audit, /harness-maker:make.
harness.yaml top-level keys: targets, preset, dev_mode, locale, reviewers, skills, agents, worktree, anti_rot, observability, ref_folders, second_brain, recommended_model.
Plugin manifest schemas: .claude-plugin/plugin.json, .cursor-plugin/plugin.json, .codex-plugin/plugin.json — fields covered by each marketplace's published spec.
Local-only telemetry guarantee — see PRIVACY.md. A documented-vs-actual mismatch is treated as a P0 bug.

Everything else may break in any 0.x.minor. Internal Python APIs, file formats under .claude/observability/, template contents, ADR numbering, reviewer prompt phrasing, interview wording. Pin a specific release (uvx --from harness-maker==0.17.0 ...) if reproducibility matters before 1.0.

What "broken" looks like in practice: read CHANGELOG.md. Patches inside a single minor (e.g., 0.15.0 → 0.15.3) have included fixes that rolled forward without a deprecation; that's the kind of velocity 0.x is for.

FAQ

Q: Why Python? My project is Rust / Node / Go. The hooks (permission_gate, worktree_gate, telemetry) call uv run python -m harness_maker.* at PreToolUse / PostToolUse boundaries. This doesn't touch your project's toolchain — uv and harness_maker need to be on the path, but only to run hooks. Your project's build system is untouched.

Q: Why does it require uv? uv gives a hermetic, fast Python environment without polluting the system or your project's virtualenv. Hooks run in milliseconds without activating anything.

Q: Will harness-maker overwrite my hand-edits when I re-render? No. Every generated file carries a content_hash in its provenance frontmatter. Re-render compares the new template's hash against the file on disk. If they differ — meaning you edited the file — it keeps yours. See Reconcile rules.

Q: What's the difference between Side and Production? Side is lean: 1 reviewer (code), verify-before-completion optional, worktree scope [execute]. Production is thorough: 5 reviewers, verify required, worktree scope [execute, plan], security on high-finding = block. Both share the same anti-rot and caching defaults.

Q: Does anti-rot ever auto-apply? Never. Every anti-rot item surfaces via a structured question (AskQuestion in Cursor, AskUserQuestion in Claude Code) in /hm:refresh. There is no --auto-apply flag and no plan to add one. The rationale: a wrong patch is worse than a stale harness.

Q: Can I use only Claude Code? Only Cursor? Only Codex? Yes. targets is a multi-select at the interview. Claude Code uses .claude/; Cursor reuses most .claude/ assets and adds .cursor/; Codex adds AGENTS.md, .codex/, and .agents/skills/.

Q: Do my prompts or telemetry leave my machine? No. metrics.jsonl, dashboard, and security findings are written to .claude/observability/ locally. Anti-rot crawls read public sources (Anthropic blog, arxiv, GitHub, OSV.dev) but never uploads anything.

Q: How do I pick up template improvements after a /plugin update? Run /harness-maker:make → choose Update. For files where your hash matches the old template, the new version is applied. For files you edited (hash mismatch), yours is kept. To force a full refresh, rm .claude/harness.yaml and re-run.

Q: What are mechanical_checks and when should I use them? Shell commands listed under reviewers.mechanical_checks in harness.yaml run at the start of every /hm:review — before any LLM reviewer spawns. The first command that exits non-zero emits ## MECHANICAL_BLOCK: <cmd> exit=<N> and halts review immediately (CHANGES_REQUESTED). Use them for fast, deterministic gates (lint, type-check, unit test) that shouldn't waste LLM tokens when the basics are broken. The list is user-managed; harness-maker never populates it automatically. --no-auto-fix does not skip mechanical checks — they are a hard gate, not part of the fix loop.

Q: Why doesn't harness-maker rewrite prompts to be model-agnostic? The prompts are tuned for claude-opus-4-7 — <thinking> blocks, role framing, chain-of-thought structure. Rewriting for model-neutrality would degrade quality on the recommended model for hypothetical gains on others. Override recommended_model in harness.yaml if you want a different model; the prompts remain as-is.

Personalization Architecture

harness-maker 0.12.0 introduces three tracks of personalization depth:

Detection Depth (Track A) — harness_maker.profile.profile() detects stack (12+ languages), framework, package manager, and CI provider. Results land in ProjectProfile and feed the recommendation framework. Cache at ~/.cache/harness-maker/profile-<repo-hash>.json (manifest-mtime invalidation + 24h TTL).
Foreign AI Config Migration (Track D) — /hm:configure detects .cursor/rules/, AGENTS.md, CLAUDE.md, .continue/config.json, .aider.conf.yml, and .github/copilot-instructions.md. With user confirmation, harness-maker imports the foreign config's intent into harness.yaml and re-generates the foreign file as part of the harness on every render. The @hm:harness:* block markers protect user customizations outside the marked regions.

Cursor power-user constraint (ADR-003) — single-source means harness re-generates .cursor/rules/ on every render. If you prefer Cursor-only ownership of those rules, the only opt-out today is to drop the cursor target from harness.yaml.targets. A dedicated opt-out flag is TODO for a future PLAN.
Adaptive (Track B start) — harness.yaml.adaptive.disable_telemetry: false (opt-out per ADR-005) enables override telemetry. /hm:personalization-audit scores your harness composite (0–100; Bronze < 40 < Silver < 65 < Gold < 85 ≤ Platinum) and surfaces ranked action items with evidence. SessionStart drift hint fires after 30 overrides or 14 days without an audit.

All adaptive features run 100% locally — no network calls (asserted by tests/unit/test_no_network.py).

0.12.1 patch extended CACHED_MANIFESTS with literal filenames from STACK_GLOB_MANIFESTS (stack.yaml, package.yaml) so Haskell projects now properly invalidate the profile cache on those manifests' mtime bump.

Roadmap

Done (0.12.0–0.12.1):

Track A (Detection Depth): 12+ stacks/frameworks/pkg-mgr/CI
Track D (Foreign AI Config Migration): 6 config types, single-source re-render, @hm:harness:* markers, 0.11.x migration
Track B-start (Adaptive): override telemetry, /hm:personalization-audit, SessionStart drift surface

Next (0.13.0 — Track B completion + Cursor opt-out):

Track B-extra: B2 permission-frequency capture + B3 reviewer-signal aggregation
harness.yaml.cursor.opt_out_render flag for Cursor power-users (ADR-003 documented constraint)
github/spec-kit external e2e fixture vendoring (currently pytest.skip with TODO)

Future (0.14.0+ — Track C strategic expansion, ranked by user value):

C2 privacy/regulated tier (HIPAA/PCI/GDPR — enterprise entry barrier)
C1 team-profile axis (solo vs small-team vs large-team)
C3 code-style detection (docstring conventions, naming patterns)
C4 cross-project user-defaults
C5 per-developer overlay in shared harness

Deferred-by-data:

Rubric v0 calibration after 30+ projects accumulate /hm:personalization-audit runs (passive trigger)

Standing items:

PyPI publish — remove the editable-from-clone requirement.
Claude Code + Cursor + Codex Marketplace listings — submit all plugin manifests.
.hm-meta.yaml sidecar for Cursor assets — enable hash-tracking of .cursor/rules/*.mdc without polluting Cursor frontmatter.
User-configurable anti-rot repo list — harness.yaml.anti_rot.github_repos to track additional Claude Code ecosystem repos beyond the default.
Demo screencast — record a first-install + /hm:loop session.
Enterprise preset — stricter security gates, mandatory spec-driven dev mode.

Development

uv sync
uv run pytest                       # full suite
uv run ruff check src/ tests/       # lint
uv run ruff format src/ tests/      # format
uv run mypy --strict src/           # type check
bash .claude-verify.sh all          # phase-by-phase exit criteria + final acceptance

Contributing

See docs/CONTRIBUTING.md for adding skills/agents/presets, test patterns, and the PR checklist (including the 4-file version bump invariant).

See docs/ARCHITECTURE.md for the 14 mechanisms (M1-M14) behind the system.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.claude-plugin		.claude-plugin
.claude/memory		.claude/memory
.codex-plugin		.codex-plugin
.cursor-plugin		.cursor-plugin
.github		.github
commands		commands
docs		docs
hooks		hooks
scripts		scripts
src/harness_maker		src/harness_maker
tests		tests
work-docs		work-docs
.claude-verify.sh		.claude-verify.sh
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
README.ko.md		README.ko.md
README.md		README.md
SECURITY.md		SECURITY.md
TECH_SPEC.md		TECH_SPEC.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

harness-maker

Try in 30 seconds

Why harness-maker?

How it fits your project

1. SENSE — What kind of project is this?

2. DECIDE — What harness does this project need?

3. RENDER — How does this become a working harness?

4. EVOLVE — How does the harness stay useful?

Table of Contents

Quickstart

Universal Bootstrap Prompt

Claude Code (plugin marketplace)

Codex CLI (plugin marketplace)

Cursor (Team marketplace import OR local symlink)

PyPI (no IDE plugin — CI, headless, scripts)

Requirements

Features

🎯 Personalization — fits your project

🛡️ Trust — grade-gated work

🌱 Self-evolving — grows with your project

🌀 Anti-rot — survives the ecosystem

🎛️ Multi-IDE — one source, every IDE

🔁 Workflow primitives — the rest of the toolchain

How it works

Slash commands the harness exposes

Atomic stages (always available)

Fused workflows (preset-generated, user-renameable)

Utility commands

How it compares

Configuration

Obsidian Second Brain

Targets

Cursor target

Recommended model

Verification

Codex target

Reconcile rules (re-rendering an existing harness)

Observability

Marketplace

Stability

FAQ

Personalization Architecture

Roadmap

Development

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages