Build software products faster with 27 specialized, battle-tested AI agents
New in v1.2:
- 7 Data Science Agents (Agents 21-27): Complete ML pipeline from EDA to MLOps deployment
- DS Orchestrator, Data Explorer, Feature Engineer, Model Architect, ML Engineer, Model Evaluator, MLOps Engineer
- Full coverage for classification, regression, clustering, NLP, CV, and time series
- Fairness auditing, interpretability (SHAP), and production monitoring
Previous (v1.1):
- Debug Agents (Agents 10-16): Visual, performance, network, state, errors, memory leaks
- Review Agents (Agents 17-20): Security, code review, database, design
- Automated Testing Framework: JSON-based scenarios with weighted scoring (91%+ pass rates)
- Edge Case Protocols: Vague input handling, security refusal, destructive operation safeguards
Key Improvements:
- All agents tested with automated scenarios and edge cases
- Added failure recovery and unrealistic scope detection protocols
- Strong guardrails for security, destructive operations, and over-engineering
See: testing/ for the automated testing framework
This repository contains a complete, production-ready system for building software products using AI agents that emulate:
Core Development (Agents 0-9):
- Agent 0: Project Orchestrator
- Agent 1: Problem Framer
- Agent 2: Competitive Mapper
- Agent 3: Product Manager (PRD writing)
- Agent 4: UX Designer
- Agent 5: System Architect
- Agent 6: Engineer
- Agent 7: QA Test Engineer
- Agent 8: DevOps Deployment
- Agent 9: Analytics & Growth
Debug Suite (Agents 10-16):
- Agent 10: Debug Detective (triage)
- Agent 11: Visual Debug Specialist
- Agent 12: Performance Profiler
- Agent 13: Network Inspector
- Agent 14: State Debugger
- Agent 15: Error Tracker
- Agent 16: Memory Leak Hunter
Review & Specialized (Agents 17-20):
- Agent 17: Security Auditor
- Agent 18: Code Reviewer
- Agent 19: Database Engineer
- Agent 20: Design Reviewer
Data Science Suite (Agents 21-27):
- Agent 21: DS Orchestrator (coordinates ML projects)
- Agent 22: Data Explorer (EDA, profiling)
- Agent 23: Feature Engineer (features, encoding)
- Agent 24: Model Architect (model selection, architecture)
- Agent 25: ML Engineer (training, optimization)
- Agent 26: Model Evaluator (evaluation, fairness, interpretability)
- Agent 27: MLOps Engineer (deployment, monitoring)
Plus: A full-stack web dashboard to orchestrate everything!
One-liner (recommended):
curl -fsSL https://raw.githubusercontent.com/yourusername/ai-agent-workflow/main/scripts/install.sh | bashThen create a project:
agent-init my-project
cd my-project && claudeOther methods: See INSTALL.md for git submodule, npm, and more options.
Open the Interactive Agent Map - Explore all 27 agents in a visual, interactive constellation map with animated workflows.
| Document | Purpose | Time |
|---|---|---|
| INSTALL.md | Installation options | 5 min |
| docs/CLAUDE_CODE_GUIDE.md | Use with Claude Code (CLI) | 10 min |
| QUICK_START.md | Manual agent usage | 5 min |
| CHEAT_SHEET.md | One-page quick reference | 2 min |
| agents/README.md | How to use each agent | 15 min |
| Document | Purpose | Time |
|---|---|---|
| dashboard/GETTING_STARTED.md | Dashboard overview | 10 min |
| dashboard/QUICK_START_DASHBOARD.md | Set up in 10 minutes | 10 min |
| dashboard/ARCHITECTURE.md | System design | 30 min |
| dashboard/IMPLEMENTATION_ROADMAP.md | Week-by-week build guide | 1 hour |
| Document | Purpose | Time |
|---|---|---|
| testing/ | Automated testing framework | 15 min |
| AGENT_OPTIMIZATION_SUMMARY.md | What was optimized & why | 15 min |
| Document | Purpose | Time |
|---|---|---|
| docs/INTEGRATION_GUIDE_REIMBURSEMENT.md | Example integration with ClearConcur | 10 min |
| docs/CLEARCONCUR_QUICK_START.md | Copy-paste prompts for ClearConcur | 5 min |
ai-agent-workflow/
β
βββ π Documentation
β βββ README.md # This file
β βββ QUICK_START.md # 5-minute guide
β βββ CHEAT_SHEET.md # Quick reference
β βββ AGENT_OPTIMIZATION_SUMMARY.md # Optimization details
β
βββ π€ Agents (27 Ready-to-Use Prompts)
β βββ agent-0 to agent-9 # Core development agents
β βββ agent-10 to agent-16 # Debug suite agents
β βββ agent-17 to agent-20 # Review & specialized agents
β βββ agent-21 to agent-27 # Data science suite agents
β βββ README.md # Agent usage guide
β βββ DEBUG-AGENTS-README.md # Debug agent guide
β βββ DATA-SCIENCE-AGENTS-README.md # Data science agent guide
β
βββ π Claude Code Integration
β βββ templates/CLAUDE.md.template # Project template for Claude Code
β βββ scripts/init-project.sh # Initialize new projects
β βββ scripts/add-to-project.sh # Add to existing projects
β βββ scripts/install.sh # One-liner installer
β βββ docs/CLAUDE_CODE_GUIDE.md # Full Claude Code guide
β βββ INSTALL.md # All installation options
β
βββ π§ͺ Testing
β βββ scenarios/ # JSON test scenarios
β βββ runner.js # Automated test runner
β βββ README.md # Testing guide
β
βββ π docs/ (Integration Guides)
β βββ CLAUDE_CODE_GUIDE.md # Claude Code setup guide
β βββ INTEGRATION_GUIDE_REIMBURSEMENT.md # ClearConcur example
β βββ CLEARCONCUR_QUICK_START.md # Copy-paste prompts
β βββ CLEARCONCUR_CLAUDE_ADDITION.md # CLAUDE.md additions
β
βββ π» Dashboard (Full-Stack App - Optional)
βββ backend/ # Express + Prisma + LangGraph
βββ frontend/ # Next.js + TypeScript
βββ docker-compose.yml # PostgreSQL + Redis
Time: 5 minutes to set up, then seamless workflow
How it works:
- Create a project folder with
CLAUDE.mdfrom our template - Start Claude Code in your project directory
- Tell the Orchestrator what you want to build
- Agent 0 drives everything - selecting agents, executing tasks, and only asking you key questions
- Artifacts are saved automatically
Best for:
- Maximum efficiency - minimal context-switching
- Developers using Claude Code (CLI)
- Solo builders who want AI to drive the process
- Projects where you want to focus on decisions, not prompts
Features:
- Autonomous agent selection and execution
- Only interrupts for key decisions
- Automatic artifact management
- Flow control ("speed up", "slow down", "skip", "go back")
Cost: ~$3-7 in API calls for complete workflow
Start: docs/CLAUDE_CODE_GUIDE.md
Time: 5 minutes to start, 3-4 hours for full v0.1 workflow
How it works:
- Open agents/agent-0-orchestrator.md
- Copy the prompt to Claude/ChatGPT
- Follow the agent's recommendations
- Work through all 20+ agents as needed
- Save artifacts manually as you go
Best for:
- Using Claude web UI, ChatGPT, or other LLMs
- Testing the workflow before committing
- Projects without coding needs (just planning)
- Maximum control over each step
Cost: ~$3-4 in API calls for complete workflow
Time: 4-5 weeks to full implementation
What you get:
- Beautiful web UI for managing projects
- Real-time agent execution
- Automatic artifact management
- Cost tracking & analytics
- Multi-project support
- WebSocket-powered live updates
Best for:
- Developers who want to customize
- Building this as a SaaS product
- Teams who need collaboration features
Cost: ~$35-95/month (infrastructure + API)
Start: dashboard/GETTING_STARTED.md
| Agent | Score | Key Strength |
|---|---|---|
| Agent 0 (Orchestrator) | 91% | Failure recovery, scope detection |
| Agent 1 (Problem Framer) | 92% | Vague input handling, solution detection |
| Agent 3 (Product Manager) | 91% | Over-engineering prevention |
| Agent 5 (Architect) | 90% | Monolith-first, boring tech |
| Agent 6 (Engineer) | 91% | Security refusal, conflict detection |
| Agent 7 (QA) | 92% | Comprehensive test strategies |
| Agent 19 (Database) | 91% | Destructive operation safeguards |
| Agent | Purpose |
|---|---|
| Agent 10 | Debug triage and routing |
| Agent 11 | Visual/CSS debugging |
| Agent 12 | Performance profiling |
| Agent 13 | Network/API debugging |
| Agent 14 | State management debugging |
| Agent 15 | Error tracking |
| Agent 16 | Memory leak detection |
| Agent | Purpose |
|---|---|
| Agent 17 | Security auditing |
| Agent 18 | Code review |
| Agent 19 | Database migrations & optimization |
| Agent 20 | Design system review |
| Agent | Purpose |
|---|---|
| Agent 21 | DS Orchestrator - coordinates ML projects |
| Agent 22 | Data Explorer - EDA, profiling, quality |
| Agent 23 | Feature Engineer - features, encoding, selection |
| Agent 24 | Model Architect - model selection, architecture |
| Agent 25 | ML Engineer - training, hyperparameter tuning |
| Agent 26 | Model Evaluator - evaluation, fairness, interpretability |
| Agent 27 | MLOps Engineer - deployment, monitoring, retraining |
Overall: 91%+ pass rate on automated testing
Example: Literature review app for PhD students
"I want to build a tool to help PhD students manage literature reviews"
Constraints:
- Timeline: 4 weeks
- Budget: $0
- Solo builder
- Tech: TypeScript
artifacts/
βββ problem-brief-v0.1.md β
Clear problem, personas, JTBD
βββ competitive-analysis-v0.1.md β
5 competitors analyzed, wedge strategy
βββ prd-v0.1.md β
5 MUST features (not 15!)
βββ ux-flows-v0.1.md β
User journeys, wireframes
βββ architecture-v0.1.md β
Next.js + PostgreSQL (simple!)
βββ code/ β
Implementation guidance
βββ test-plan-v0.1.md β
Unit, integration, E2E tests
βββ deployment-plan-v0.1.md β
Vercel + Neon setup
βββ analytics-plan-v0.1.md β
5 critical events to track
Result: Complete product specification, ready to build!
Time: 3-4 hours Cost: $3-4
Before:
- Agent 3 suggested 12-15 MUST features
- Agent 5 recommended microservices + Redis + caching
- 6-8 hours of revisions
After:
- Agent 3 limited to 5-8 MUST features β
- Agent 5 recommends monoliths + boring tech β
- 3-4 hours total workflow β
Impact: 50% scope reduction while maintaining value!
Agent 5 (System Architect) now has strong guardrails:
β NO microservices for v0.1 β NO Redis/caching for simple CRUD β NO background jobs unless > 30 seconds β NO Elasticsearch (PostgreSQL FTS is fine) β NO custom auth (use managed services)
β YES to boring, proven tech β YES to monoliths β YES to managed services β YES to one-command deploys
Improvements across all agents:
- β Testable acceptance criteria (was vague)
- β Specific, measurable success metrics
- β Consistent terminology across agents
- β Actionable recommendations
- β Realistic scope for solo builders
cat agents/agent-0-orchestrator.mdAdd your project idea:
I want to build [YOUR IDEA].
Target users: [WHO]
Main problem: [WHAT]
Constraints:
- Timeline: [X weeks]
- Budget: [$X/month]
- Tech: [preferences]
What should I do first?
Agent 0 will tell you to run Agent 1 (Problem Framer) next.
Core Development Flow:
- Agent 1 β Problem Brief
- Agent 2 β Competitive Analysis
- Agent 3 β PRD
- Agent 4 β UX Flows
- Agent 5 β Architecture
- Agent 6 β Code
- Agent 7 β Tests
- Agent 8 β Deployment
- Agent 9 β Analytics
When Debugging:
- Agent 10 β Triage β Route to Agents 11-16
For Reviews:
- Agent 17 β Security Audit
- Agent 18 β Code Review
- Agent 19 β Database Changes
- Agent 20 β Design Review
mkdir -p my-project/artifacts
# Save each agent's output as you goProject: Literature review app for PhD students Input: Vague idea + constraints Process: All 10 agents sequentially Result: Complete v0.1 specification
| Metric | Score | Notes |
|---|---|---|
| Clarity | 4.5/5 | Clear, unambiguous outputs |
| Completeness | 4.6/5 | All required sections present |
| Actionability | 4.7/5 | Immediately usable |
| Scope | 4.3/5 | Realistic for solo builder |
| Consistency | 4.5/5 | Terminology aligned |
| Overall | 4.5/5 β | Production Ready |
- Time: 3-4 hours (down from 6-8)
- Cost: $3-4 (down from $4-6)
- Revisions: 1-2 cycles (down from 3-4)
Improvement: ~40% faster, ~25% cheaper
- SaaS Products: Build and ship web applications
- Internal Tools: Create tools for your team/lab
- Research Apps: Specialized domain tools
- Side Projects: Solo builder projects
- Hackathons: Rapid prototyping
- MVPs: Validate ideas quickly
- Learning: Understand product development
- β Not a no-code builder (you still need to code)
- β Not AI that writes all your code (Agent 6 guides, you implement)
- β Not for large teams (optimized for solo builders)
- β Not for enterprise-scale (optimized for 10-1000 users)
Q: Do I need to know how to code? A: For Agents 1-5 (planning), no. For Agents 6-8 (implementation), yes.
Q: Can I customize the agents?
A: Yes! Edit the markdown files in agents/
Q: How much does it cost? A: Manual: ~$3-4 per project. Dashboard: ~$35-95/month.
Q: Can I use other LLMs besides Claude? A: Yes, the prompts work with GPT-4, Gemini, etc.
Q: Is my data private? A: Yes. If using Claude API directly, your data isn't used for training.
Q: Can multiple people use this? A: Manual: yes (share artifacts). Dashboard: planned for v1.0.
Q: What if an agent makes a mistake? A: Edit the artifact manually or re-run with different inputs.
Q: How long does a full workflow take? A: 3-4 hours for all 10 agents (manual prompting).
Improvements welcome! Areas of interest:
- π§ͺ More test scenarios (different domains)
- π Translations (agents in other languages)
- π§ Integration code (LangGraph, CrewAI examples)
- π± Mobile app agents
- π€ New specialized agents
To contribute:
- Fork the repository
- Make your changes
- Test with real projects
- Submit a pull request
MIT License - see LICENSE file
Created by: Adrian C. Stier
Built with:
Inspiration:
- Jobs-to-be-Done framework
- Lean Startup methodology
- Agile development
- Shape Up (Basecamp)
- Try the agents: QUICK_START.md
- Read the optimization summary: AGENT_OPTIMIZATION_SUMMARY.md
- Build something!
v1.1 (Next):
- Agent performance monitoring
- More test scenarios
- Video tutorials
- Community examples
v2.0 (Future):
- Team collaboration features
- Custom agent marketplace
- Mobile app support
- Advanced analytics
- Documentation: See files above
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Ready to build products with AI agents? π
Start here: QUICK_START.md
Or jump to: