Multi-agent swarm framework for Claude Code.
Plan, build, review, test, ship -- with human-in-the-loop gates.
AY Framework gives Claude Code a structured brain for building software. Instead of ad-hoc prompting, your AI agent follows a disciplined cycle: plan the work, build it, review it, test it, ship it -- with you approving only two things: the plan and the code.
Everything else is autonomous.
You say: "Build a user authentication system"
Claude: 1. Analyzes your codebase
2. Proposes agents (api-builder, db-architect, web-builder)
3. Creates 8 dependency-ordered tasks
4. Asks you to approve the plan <-- Gate 1
5. Builds task by task, commits atomically
6. Self-reviews for security + quality
7. Shows you the code <-- Gate 2
8. Tests, learns, moves to next task
Built by AY Automate. Inspired by gstack.
One command:
npx ay-frameworkInteractive prompt asks which runtime and where to install:
Which runtime?
1) Claude Code (~/.claude)
2) Cursor (~/.cursor)
3) Windsurf (~/.windsurf)
4) Codex (~/.codex)
5) Trae (~/.trae)
6) Kiro (~/.kiro)
Or skip prompts:
npx ay-framework --global --yes # Claude Code, global, no prompts
npx ay-framework --local --runtime=cursor # Cursor, this project onlyOr clone and run setup:
git clone https://github.com/walidboulanouar/ay-framework.git ~/.claude/skills/ay-framework
cd ~/.claude/skills/ay-framework && ./setupWhat happens:
- Interactive prompt: global (
~/.claude) or local (./.claude) - If
CLAUDE.mdexists: injects AY Framework section (never overwrites your config) - Installs 6 modes, 13 commands, tracking layer, templates, hooks
- Creates
.ay/directory for framework operational state
After install, open Claude Code and say:
"I want to build [describe your project]"
Claude analyzes your codebase (or asks questions if empty), generates agents and tasks, and the /go cycle begins.
CLAUDE.md
(the router)
|
reads project state, picks mode + agent + task
|
+-------------+-------------+
| | |
MODE AGENT TASK
(how to (what to (what to
think) touch) build)
| | |
+-------------+-------------+
|
SKILLS
(domain knowledge)
|
TRACKING
(BOARD, HANDOFFS, locks)
|
.ay/
(state + learnings)
The main command. Fully autonomous except for two human gates.
OBSERVE ---- check for paused session, read BOARD, check locks
|
LOCK ------- create lock file, commit (prevents conflicts)
|
PLAN ------- research task, create plan folder, verify 8 dimensions
|
[HUMAN APPROVES PLAN] << Gate 1
|
BUILD ------ follow sequence with typed checkpoints
| [auto] = agent verifies itself
| [human-verify] = shows result to you
| [decision] = you pick the approach
|
SELF-REVIEW - fix mechanical issues, ask about hard ones
|
[HUMAN REVIEWS CODE] << Gate 2
|
TEST ------- verify against plan + run suite
|
LEARN ------ capture deviations, gotchas, skill updates
|
UNLOCK ----- update tracking, delete lock, commit
Human does two things: approve the plan, review the code. Everything else is autonomous.
Six cognitive stances. Each mode changes how the agent thinks. No mixing.
| Mode | File | Cognitive Stance |
|---|---|---|
| PLAN | CLAUDE-PLAN.md |
Expansive thinking. Map dependencies. Find risks. No code. |
| BUILD | CLAUDE-BUILD.md |
Focused execution. Follow task file. One sub-task at a time. |
| REVIEW | CLAUDE-REVIEW.md |
Paranoid. Find bugs and security holes. Fix-first. |
| QA | CLAUDE-QA.md |
Systematic. Diff-aware tests. Health score computation. |
| SHIP | CLAUDE-SHIP.md |
Checklist execution. Deploy, verify, rollback plan. |
| RETRO | CLAUDE-RETRO.md |
Data-driven. Metrics from git. Patterns. Improvements. |
Each agent owns specific directories and never writes outside its scope. This prevents conflicts when multiple agents work in parallel.
.claude/agents/
frontend-builder.md -- owns apps/web/
api-builder.md -- owns packages/api/
db-architect.md -- owns packages/db/
qa-tester.md -- owns test files only
Agents are generated automatically during First Contact based on your project structure. Or add them manually:
ayf add agent api-builder packages/api/
ayf add agent web-builder apps/web/| Command | What it does |
|---|---|
/go |
Full autonomous cycle -- the main one |
/plan N |
Plan task N with full research |
/task-start N |
Start building task N |
/task-done N |
Mark task N complete |
/task-status |
Show all tasks with progress |
/pipeline-view |
ASCII kanban board |
/pause |
Save state for session handoff |
/resume |
Continue from paused session |
/review |
Code review (fix-first) |
/qa |
Run tests (diff-aware) |
/ship |
Deploy with checklist |
/learn |
Capture a learning |
/verify-plan N |
8-dimension plan quality check |
/fix |
Investigate + fix a bug (root cause > fix > verify > prevent) |
/scaffold |
Generate boilerplate matching your project's patterns |
/audit |
Security + performance + quality audit with severity report |
/explain |
Explain code, architecture, or generate documentation |
/refactor |
Safe refactoring with tests passing before and after |
18 commands that cover what others need 66+ for. Each command is a compound workflow, not a single action.
All coordination is file-based. No external tools. Git is the sync layer.
.ay/
tracking/
BOARD.md Pipeline kanban (BACKLOG > READY > IN PROGRESS > DONE)
locks/ File-based mutex (task-01.lock = task is taken)
HANDOFFS.md Agent-to-agent async messages
AGENTS-LOG.md Who did what, when
CHANGELOG.md What shipped
DECISIONS.md Architecture Decision Records
BLOCKERS.md What is stuck
METRICS.md Sprint velocity
plans/ Per-task plan folders
retros/ Sprint retrospectives
tasks/ Dependency-ordered task files
When you first open Claude Code after installing, the framework detects whether your repo has code or is empty.
Existing repo: Claude silently analyzes your stack, structure, and git history, then proposes agents and tasks tailored to your project.
Empty repo: Claude asks three questions (what are you building, what tech stack, solo or team) and generates a full scaffold.
Either way, nothing is created until you approve.
For shared repos where everyone should use the framework:
# Recommended (gentle suggestion in CLAUDE.md)
./setup --team optional
# Required (blocks Claude Code if framework not installed)
./setup --team requiredRequired mode creates a PreToolUse hook that checks for the framework on every session. If a teammate doesn't have it installed, Claude stops and tells them how to install.
The ayf CLI manages the framework after install:
ayf status # Components, pipeline, agents
ayf doctor # Health check -- find missing files
ayf add agent <name> <scope> # Add a new agent definition
ayf add skill <name> # Add a new skill pack
ayf add task <N> <title> # Add a new task file
ayf upgrade # Update modes + commands to latestAfter install, your project has:
your-project/
CLAUDE.md # Router (the brain)
.ay/ # Framework operational state
state.json # Project metadata
tracking/ # BOARD, HANDOFFS, locks
plans/ # Per-task plan folders
retros/ # Sprint retrospectives
tasks/ # Task files
sessions/ # Pause/resume snapshots
learnings.jsonl # Cross-session learnings
ETHOS.md # Builder philosophy
.claude/ # Claude Code integration
CLAUDE-PLAN.md # 6 cognitive modes
CLAUDE-BUILD.md
CLAUDE-REVIEW.md
CLAUDE-QA.md
CLAUDE-SHIP.md
CLAUDE-RETRO.md
agents/ # Agent definitions
commands/ # 13 slash commands
skills/ # Domain knowledge packs
hooks/ # Optimization hooks
gstack by Garry Tan is excellent for single-agent workflows -- code review, QA, shipping. AY Framework builds on that foundation and adds everything needed for multi-agent project lifecycles.
gstack runs one agent at a time with different skills. AY Framework runs multiple agents in parallel, each owning specific directories. A frontend-builder never touches packages/api/. A db-architect never writes React components. Scope boundaries prevent conflicts automatically.
No task system in gstack. AY Framework has dependency-ordered task files with a full pipeline:
Task 01 (Foundation) ──> Task 02 (Schema) ──> Task 06 (Services) ──> Task 07 (CLI)
└──> Task 03 (Clients) ──┘ └──> Task 08 (MCP)
└──> Task 09 (Web Auth) ──> Task 10 (UI)
Tasks have clear inputs, outputs, verification criteria, and unblock relationships. Agents self-organize via the dependency graph -- no coordinator needed.
.ay/tracking/
BOARD.md -- who is doing what (kanban)
locks/ -- mutex: task-03.lock means "taken by api-builder"
HANDOFFS.md -- async messages between agents
DECISIONS.md -- architecture decision records
BLOCKERS.md -- what is stuck and who owns it
METRICS.md -- velocity tracking
gstack has learnings. AY Framework has a full coordination protocol: lock before build, write handoffs, update the board, log decisions.
gstack skills run autonomously. AY Framework's /go cycle has two mandatory human gates:
- Plan approval -- Claude presents the plan, you say "go" or "change X"
- Code review -- Claude presents the output, you say "ship" or "fix X"
Everything between the gates is autonomous. You control the what, Claude handles the how.
gstack lists available skills. AY Framework analyzes your codebase on first run:
- Detects language, framework, tech stack
- Maps directory structure
- Reads git history for contributors
- Proposes agents tailored to your project
- Generates initial task breakdown
For empty repos, it asks three questions and scaffolds everything.
Before any build starts, the plan is verified across 8 dimensions:
- Requirement coverage -- every objective maps to a step
- Task atomicity -- no step creates more than 5 files
- Dependency ordering -- no forward references
- File scope -- all files within agent's scope
- Test mapping -- every requirement has a test
- Context fit -- plan folder under 2000 lines
- Gap detection -- no missing exports or error handling
- Rules compliance -- all universal rules present
If any dimension fails, the plan is fixed before execution.
After every task, Claude compares what was planned vs what was actually built. Deviations are logged with reasons, impacts, and learnings. These compound across sessions -- future agents inherit the knowledge.
| Feature | gstack | ay-framework |
|---|---|---|
| Agent model | Single agent, skill switching | Multi-agent swarm, scoped ownership |
| Coordination | None | BOARD + locks + HANDOFFS |
| Task system | None | Dependency graph with ordered tasks |
| Human gates | None | Plan approval + code review |
| First contact | Lists skills | Analyzes codebase, generates setup |
| Plan quality | None | 8-dimension verification |
| Learning | Learnings file | Diff-from-plan + retro + learnings |
| Session handoff | None | /pause + /resume with full context |
| Retro | Basic retro skill | Data-driven: git metrics, JSON trends |
| Deployment | /ship skill | Full checklist + rollback + monitoring |
| CLI | None | ayf status, doctor, add agent/skill/task |
| Team mode | Hook-based enforcement | Hook-based + CLAUDE.md injection |
Both tools are complementary. You can use gstack's /review and /qa alongside AY Framework's lifecycle management.
Get Shit Done (GSD) is a great collection of 66 commands across 15 runtimes. AY Framework takes a different approach -- fewer commands, deeper architecture.
GSD gives you 66 individual commands (/gsd-add-tests, /gsd-refactor, /gsd-debug, etc.). Each runs independently. AY Framework gives you one cycle (/go) that chains planning, building, reviewing, testing, and learning into a single autonomous loop. The 13 commands exist to control that cycle, not to be used in isolation.
GSD commands run in one agent context. AY Framework creates scoped agents -- each owns specific directories, loads specific skills, and coordinates with other agents via file-based locks and handoffs. Two agents can work on different parts of the codebase simultaneously without conflicts.
GSD has no task system. AY Framework manages dependency-ordered tasks with a pipeline board. Tasks have explicit dependencies, verification criteria, and unblock relationships. The framework knows task 06 can't start until tasks 02, 03, 04, and 05 are done.
GSD commands execute immediately. AY Framework plans first: researches the task, gathers API docs, creates a plan folder with sequence, rules, tests, and schema files. Then verifies the plan across 8 dimensions before a single line of code is written. You approve the plan before build starts.
GSD installs commands and that's it -- no persistent state between sessions. AY Framework maintains .ay/ with tracking (BOARD, HANDOFFS, locks), plans, retros, tasks, learnings, and session snapshots. Context carries across sessions. Pick up exactly where you left off with /resume.
GSD has no retro system. AY Framework's /retro computes metrics from git history -- commits, files changed, test ratio, health score, blockers hit, deviations from plan. Outputs both JSON (for trend tracking) and markdown (for humans). Compares against previous retros to show improvement or regression.
| Feature | GSD | ay-framework |
|---|---|---|
| Commands | 66 individual commands | 18 compound commands (each replaces 3-5) |
| Architecture | Flat command library | 7-layer swarm (router > mode > agent > task > skill > tracking > state) |
| Agent model | Single agent | Multi-agent with scoped ownership |
| Task management | None | Dependency graph + pipeline board |
| Planning | None (execute immediately) | Plan folder + 8-dimension verification |
| Human gates | None | Plan approval + code review |
| State persistence | None | .ay/ with tracking, plans, retros, learnings |
| Session handoff | None | /pause + /resume with full context |
| Coordination | None | Locks + HANDOFFS + BOARD |
| Retro | None | Git-based metrics, JSON trends |
| First contact | Lists commands | Analyzes codebase, generates agents + tasks |
| Runtime support | 15 runtimes | 6 runtimes (Claude, Cursor, Windsurf, Codex, Trae, Kiro) |
AY Framework covers the same breadth with fewer, smarter commands -- and adds depth that flat command libraries can't match.
- Claude Code (any version)
- Git (for tracking coordination)
- Node.js 18+ (only for the
npxinstall path -- bash./setupworks without it)
Works on macOS, Linux, and Windows (Git Bash / WSL).
- Fork the repo
- Create a feature branch
- Make your changes
- Submit a PR
We welcome contributions to modes, commands, templates, and documentation.
MIT -- AY Automate
