An AI-assisted testing system and quality intelligence center. Test Commander helps teams move from requirements and exploration to BDD, automation, evidence, and reporting — with a continuous learning loop and a team-facing console.
It is built as a Claude Code plugin plus a small Python and TypeScript runtime. It is designed to be installed once and grown phase by phase.
Status: Phase 13 complete (2026-06-02) — project complete (Phases 0–13).
tc-coreships/tc:init,/tc:status,/tc:journal,/tc:next.tc-requirementsships/tc:review-requirements,/tc:review-user-stories,/tc:review-acceptance-criteria,/tc:requirements-coverage,/tc:requirements-to-tests.tc-knowledgeships/tc:learn-from-docs,/tc:learn-from-specs,/tc:learn-from-code,/tc:learn-from-api,/tc:learn-from-tests.tc-exploreships/tc:create-charter,/tc:explore(with the internal exploration-review sub-mode),/tc:session-summary,/tc:test-ideas.tc-bddships/tc:generate-bdd(with the internal review sub-mode) and/tc:review-bdd;tc-traceabilityships/tc:traceability-map.tc-build-frameworkships/tc:build-framework;tc-automation-planships/tc:automation-plan;tc-automateships/tc:automate(with the internal automation-review sub-mode) and/tc:review-automation;tc-test-dataships/tc:generate-test-data.tc-runships/tc:run(with the internal evidence-index sub-mode) and/tc:analyze-results;tc-evidenceis the internal evidence indexer;tc-quality-reportships/tc:reportand/tc:quality-gate.tc-learningships/tc:learn,/tc:learn-from-failures,/tc:learn-from-exploration,/tc:learn-from-feedback,/tc:review-lessons,/tc:promote-lessons.tc-visualizeships/tc:visualize, the eight/tc:diagram-*commands,/tc:generate-infographic, and/tc:render-visuals.tc-webships/tc:web-init,/tc:web-start,/tc:web-sync,/tc:web-index-artifacts, and/tc:web-export— the read-only web console.tc-governanceships the controlled-execution pipeline (intent → plan → policy → approval → bounded execution → validation → audit) behind the console's/api/execute— no/tc:*commands.tc-mcpexposes the workspace through the expanded Runtime API (apps/api: the/api/runtime/namespace) and a schema-first MCP server (apps/mcp:tc_status,tc_plan,tc_run_command) — both alternative front-ends to the same governance pipeline, with the seven permission levels enforced server-side; no/tc:*commands.tc-sandboxships/tc:sandbox-init,/tc:sandbox-launch,/tc:sandbox-status,/tc:sandbox-sync,/tc:sandbox-stop, and/tc:sandbox-export— on-demand, team-accessible Test Commander environments launched from GitHub Actions, governed by the same Phase-10.5 pipeline and safe-by-default (allow-listed hosts, blocked private ranges).tc-continuous-qualityships/tc:watch-changes,/tc:impact-analysis,/tc:coverage-gap-analysis,/tc:propose-tests,/tc:create-test-pr, and/tc:continuous-quality-check— continuous quality mode that watches changes, maps impact, finds coverage gaps, and opens clearly-labeled test PRs when the configured autonomy level (0–4) allows, all through the same Phase-10.5 pipeline. See planning/plan.md for the full roadmap, docs/user-guide/workflow.md for the Phase 1 walkthrough, docs/user-guide/reviewing-requirements.md for the Phase 2 walkthrough, docs/user-guide/building-project-knowledge.md for the Phase 3 walkthrough, docs/user-guide/exploring-an-app.md for the Phase 4 walkthrough, docs/user-guide/generating-bdd.md for the Phase 5 walkthrough, docs/user-guide/automation.md for the Phase 6 walkthrough, docs/user-guide/running-tests.md for the Phase 7 walkthrough, docs/user-guide/learning-loop.md for the Phase 8 walkthrough, docs/user-guide/visuals.md for the Phase 9 walkthrough, docs/user-guide/web-console.md for the Phase 10 walkthrough, docs/user-guide/governance.md for the Phase 10.5 governance guide, docs/user-guide/integrating.md for the Phase 11 integration guide (Runtime API + MCP server), docs/user-guide/sandbox.md for the Phase 12 sandbox guide, and docs/user-guide/continuous-quality.md for the Phase 13 continuous-quality guide.
- A disciplined workflow that turns product context into testable artifacts: requirements reviews, exploration notes, test ideas, BDD specs, Playwright automation, evidence, and a live quality report.
- A Claude Code plugin (
test-commander) with skills that orchestrate each step. - A workspace convention (
.test-commander/) that keeps every quality artifact in one place, versioned in git, with full traceability. - A continuous learning loop that captures lessons from failures, exploration, and human feedback — and applies them only after human review.
Test Commander is product-domain-agnostic. It ships with universal English and software-engineering defaults only — no e-commerce, healthcare, finance, research, or other product-domain vocabulary in the shipped rubric, tags, methodology, fixtures, or examples. The tool does not assume what product your team is testing.
Consuming projects extend Test Commander for their own domain through four explicit hooks:
<workspace>/config.yamlextensions to rubric keyword sets (PCI, HIPAA, your role taxonomy, etc.).- Your project's own requirement and exploration documents under
.test-commander/documents/uploaded/. - Project knowledge ingested in Phase 3 (
/tc:learn-from-docs,/tc:learn-from-code, ...). - Project-defined values inside shipped tag namespaces (
@area:<feature>,@risk:<class>,@persona:<role>).
See docs/user-guide/customizing-for-your-project.md for worked examples and the full extension model, and Decision D19 for the rationale.
- It is not a replacement for skilled testers.
- It is not a fully autonomous QA system.
- It is not a promise that AI can understand every product perfectly.
- It is not a test automation silver bullet.
- It is not a wrapper over third-party skill plugins — every skill is owned in-repo.
| Role | Value |
|---|---|
| Testers | Charter-based exploration, captured observations and risks, generated test ideas, BDD that's actually readable. |
| Automation engineers | Playwright framework scaffolded on demand, page objects and fixtures generated from BDD, test data kept out of code. |
| Developers | Requirements reviews catch ambiguity before code; impact analysis and proposed tests on PRs (later phases). |
| Product owners | Live quality report with release-readiness; coverage gaps and open questions visible. |
| Engineering leaders | Traceability from requirement to test result; risk register; learning loop that improves the test strategy over time. |
Test Commander is built in 13 phases. Each phase produces a working, demonstrable increment. The capstone target is phases 0–3, 4–8, and 10. Phases 9, 11, 12, and 13 follow.
See planning/plan.md for the full phased plan, including Decisions, Open Questions, and per-phase Definition of Done.
The roadmap summary:
| Phase | Name |
|---|---|
| 0 | Repository foundation |
| 1 | Workspace and artifact model |
| 2 | Requirements and user story intelligence |
| 3 | Project knowledge ingestion |
| 4 | Exploratory testing |
| 5 | BDD generation and traceability |
| 6 | Playwright framework (lazy) and automation |
| 7 | Execution, evidence, and quality report |
| 8 | Continuous learning |
| 9 | Visual documentation and infographics |
| 10 | Web console MVP |
| 11 | Runtime API and MCP server |
| 12 | Sandboxed testing environment |
| 13 | Continuous quality agent |
The full install guide lives in docs/install.md (filled out in Step 0.2). What follows is the short version.
Prerequisites the script will check for you:
make- Python 3.12
- PDM
- Docker (any compatible runtime)
- Git
Two-stage install:
./bootstrap.sh # verifies prereqs; auto-installs the safe ones
make install # provisions the project and registers the Claude Code pluginPlatforms supported: macOS, Linux, Windows via WSL2 or Git Bash. PowerShell is explicitly not supported.
Once installed, open Claude Code and confirm test-commander:tc-core appears in available skills.
The eventual end-to-end flow (commands roll out across phases):
/tc:init
/tc:review-requirements
/tc:learn-from-code
/tc:create-charter --area <feature>
/tc:explore --target <url> --charter <feature>
/tc:test-ideas --area <feature>
/tc:generate-bdd --area <feature>
/tc:automation-plan --area <feature>
/tc:generate-test-data --area <feature>
/tc:automate --feature <feature>
/tc:run --suite smoke
/tc:report
/tc:learn
/tc:next
/tc:next always tells you what to do next based on the state of .test-commander/.
test-commander/
.claude-plugin/marketplace.json # local marketplace
plugins/test-commander/ # the Claude Code plugin
.claude-plugin/plugin.json
skills/
tc-core/SKILL.md # phase 0
tc-requirements/SKILL.md # phase 2
tc-bdd/SKILL.md # phase 5
...
docs/ # vision, architecture, methodology, user guide
planning/plan.md # the phased plan
scripts/ # verify_skills.py and friends
bootstrap.sh # prereq checker
Makefile # install / lint / test / build / run / verify
pyproject.toml # PDM, Python 3.12+
The per-project quality workspace lives at .test-commander/ in consuming projects, not here.
- Vision
- Architecture
- Roadmap
- Methodology
- Command reference
- Workspace reference
- Glossary
- Install guide
- User guide — getting started
- User guide — first workflow walkthrough (Phase 1)
- User guide — reviewing requirements (Phase 2)
- User guide — building project knowledge (Phase 3)
- User guide — exploring an app (Phase 4)
- User guide — customizing for your project
- Public-skill evaluation pass
- Controlled agent execution
- Security and permissions
- Chat command governance
- Runtime approval flow
- Agent adapters
- Phased plan
Most docs/ files are stubs in Phase 0 and get filled in by their owning phase.
AGENTS.md is the entry point an agent reads at the start of every session. It names the source of truth (planning/plan.md), lists the 19 settled decisions (D1–D19), enumerates the seven Per-Phase Conventions, documents the TDD micro-cycle and verify chain, describes the commit and phase sign-off pattern, and lists what NOT to do. Read it before touching code.
See CONTRIBUTING.md. The short version: pick a phase step, build it small, test it, document it, raise a PR referencing the plan step.
MIT.