Lab / showroom for AI agent experiments with DSPy 3.2, MCP, and dapr-agents.
lab/
├── 01-basics/ DSPy foundations: signatures, Predict, ChainOfThought, custom Modules
├── 02-react-tools/ ReAct agent loop with function tools and PythonInterpreter
├── 03-rag-pipeline/ Multi-stage RAG with ColBERTv2 + BootstrapFewShot optimization
├── 04-optimizers/ Optimizer comparison: BootstrapFewShot, GEPA, MIPROv2, BetterTogether
├── 05-rlm/ Recursive Language Model (REPL-based code-gen agent)
├── 06-advanced/ MultiChainComparison, Module.batch, Ensemble, Streaming, Async, Adapters
├── 07-generative-feedback-loops/ GFL patterns: trace→feedback→update loop with 5 optimizers
├── 08-rlm-mcp/ End-to-end: MCP scrape → GFL optimize → RLM agent → Trace2Skill
├── 09_super_deep_research/ Multi-agent research: UCB frontier + LSE evolution + KG memory
├── 10_dapr_deep_research/ Durable multi-agent: Dapr workflows + DSPy deltas + distillation
├── 11_meta_agent/ Meta-agent: dynamic agent generation via LSE + Trace2Skill
├── 12_formal_evolution/ Formal evolution: Z3 + Lean4 + OpenRouter MCP consensus
├── 13_autonomous_factory/ Autonomous Software Factory: 23 MCP servers, IaC, verification
├── 14_durable_meta_agent/ Durable Meta-Agent: DSPy + Dapr production framework + swarm
├── 15_ray_sglang/ Distributed high-throughput: SGLang inference + Ray parallel agents
├── 99-sandbox/ Scratch space
└── shared/ Env config, LM helpers, research primitives (ResearchDirection, ResearchFrontier ABC)
| Module | File(s) | Key Classes/Functions | DSPy Modules Used |
|---|---|---|---|
| 01-basics | main.py |
Classify, Extract, Outline, DraftSection, ArticleWriter |
ChainOfThought, Predict, Module, Signature |
| 02-react-tools | main.py |
search(), calculator() |
ReAct, PythonInterpreter |
| 03-rag-pipeline | main.py |
RAG, gold_answer_metric() |
ColBERTv2, ChainOfThought, BootstrapFewShot, Evaluate |
| 04-optimizers | main.py |
NumClassify, exact_match() |
ChainOfThought, BootstrapFewShot, GEPA, MIPROv2, BetterTogether |
| 05-rlm | main.py |
(example script) | RLM |
| 06-advanced | main.py |
CompareAnswers, Summarize, Classify, Person |
MultiChainComparison, Module.batch, Ensemble.compile(), streamify, asyncify, JSONAdapter |
| 07-gfl | main.py |
ClassifyIntent, intent_metric(), gepa_metric(), eval_score() |
ChainOfThought, BootstrapFewShot, MIPROv2, GEPA, Evaluate |
| 08-rlm-mcp | main.py, mcp_server.json |
MCPClient, MemoryManager, ClassifyContent, ResearchReport, content_metric() |
ChainOfThought, RLM, BootstrapFewShot, MIPROv2, GEPA, BetterTogether, BAMLAdapter |
| 09-super-deep-research | 16 files in 5 packages | ResearchOrchestrator, ResearchFrontier, ResearchDirection, KnowledgeGraph, MemoryStore, MCPClient, LSEOptimizer, SkillConsolidator, SelfDistill, 5 agent factories |
RLM, ChainOfThought |
| 10-dapr-deep-research | 15 files in 6 packages + shared | ExplorerAgent, DeepReaderAgent, SynthesizerAgent, CriticAgent, ResearchWorkflow, InMemoryFrontier, DaprFrontier, ResearchFrontier ABC, ResearchDirection, NoopStore, LSEOptimizer, SkillConsolidator, MCPBridge, AssessBatchSaturation, 10 DSPy signatures, shared compile constants |
RLM, ChainOfThought, BestOfN, Refine, MultiChainComparison, BootstrapFewShot, Evaluate, BAMLAdapter |
| 11-meta-agent | 10 files in 5 packages | MetaAgent, AgentStack, AgentEntry, AgentGenerator, AnalyzeTask, GenerateSignature, SelectNextAgent, InMemoryFrontier, LSEOptimizer, SkillConsolidator, MCPBridge |
ChainOfThought, BootstrapFewShot |
| 12-formal-evolution | 10 files + config | MCP servers: z3-solver, lean-lsp, openrouter; agents auto-discover Z3/Lean4/OpenRouter tools | Same as lab 11 + tool integration |
| 13-autonomous-factory | 9 files + config | AgentGenerator, MetaAgent, InMemoryFrontier, AgentStack, GFLPipeline, LSEOptimizer, SkillConsolidator, MCPBridge, ResourceBudget, AgentEntry, AnalyzeTask, SelectAgentCompare, ImproveAgentPrompt, ExtractRules, QualityEvaluation, ExtractPatterns |
ChainOfThought, BestOfN, MultiChainComparison, Refine, RLM, ReAct, CodeAct, BootstrapFewShot, MIPROv2, GEPA, Evaluate |
| 14-durable-meta-agent | 35 files + config + dapr resources + swarm | DurableMetaAgent, DurableMetaConfig (Continue-as-New via max_iterations_per_segment), GeneratedDurableAgent, wrap_module(), DaprFrontier (dirty-flag persistence), DaprLSEOptimizer, DaprAgentStack (delta-update per-entry keys), AssessBatchSaturation, SwarmCoordinator, SwarmMetaAgent, SwarmTask, SwarmDiscovery, SwarmHeartbeat, SwarmInquiry, SwarmResponse + all lab 13 modules |
dspy.RLM (REPL code+tools), dspy.ReAct, dspy.CodeAct, dspy.ChainOfThought, BAMLAdapter (Pydantic structured output), same as lab 13 + workflow_entry, DaprChatClient, DurableAgent, @message_router, call_agent() |
| 15-ray-sglang | 8 new files + Lab 14 base | MetaConfig dataclass (validated), ModuleExecutor ABC, InProcessExecutor, RayModuleExecutor (bytes transport, weakref cache, order-preserving batch, circuit-breaker), RayNotAvailableError, parallel_lse_evaluate() (reuses Ray executor), _pickle_module() (bytes serializer), _get_module_bytes() (cached), sglang.launch_server scripts (4-bit AWQ + FP8 KV), with_bridge decorator |
Same as lab 14 + RayModuleExecutor.execute(), parallel_lse_evaluate(), _pickle_module() |
| 15b-livekit-a2ui | 10 new files + Lab 15 base | OctAg0nOAgent(Agent) (llm_node streaming + thinking steps), Qwen3TTS (LiveKit plugin, 3 modes: CustomVoice/VoiceDesign/VoiceClone), A2UIChannel (v0.10 protocol, surfaces, theming), A2UITransport ABC (LiveKitDataTransport, WebSocketTransport), AGUIHandler (shared state, frontend tools, interrupt routing), InterruptSystem (human-in-the-loop confirm/input), frontend_tools (geolocation, file picker, clipboard), ModelProfile registry (4-role hybrid: Llama 4 Omni, DeepSeek-V4-Lite, Phi-4-Pro, NeMo-v3), configure_all_models(), sglang_launch_commands(), CopilotKit setup guide |
Same as lab 15 + llm_node override, a2ui_*() tools, AG-UI events, dspy.context(lm=...) role routing |
| shared | config.py, research.py |
get_lm_model(), get_student_lm_model(), get_lm_temperature(), get_agent_port(), get_dapr_state_store(), get_dapr_pubsub(), project_root(), ResearchDirection dataclass, ResearchFrontier ABC, SATURATION_THRESHOLD, MAX_BOOTSTRAPPED_DEMOS, MAX_LABELED_DEMOS |
— |
cp .env.example .env # Configure DEEPSEEK_API_KEY, LLM_MODEL, etc.
uv sync # Install dependencies
docker compose -f lab/10_dapr_deep_research/docker-compose.yml up -d # Crawl4AI
dapr init # For lab 10 (optional)
ollama pull gemma4 # For distillation (optional)| Module | Constructor | Purpose |
|---|---|---|
dspy.Predict |
Predict(signature) |
Direct prediction, no reasoning trace |
dspy.ChainOfThought |
CoT(signature) |
Step-by-step reasoning before output |
dspy.ReAct |
ReAct(signature, tools=[]) |
Agentic loop: thought→action→observation |
dspy.RLM |
RLM(signature, tools=[], max_iterations=20) |
REPL-based code-gen agent |
dspy.ProgramOfThought |
PoT(signature) |
Code-generating module with execution |
dspy.MultiChainComparison |
MultiChainComparison(signature, n=3) |
Compare N reasoning chains |
dspy.BestOfN |
BestOfN(module, N=3, reward_fn=...) |
Sample N, pick best by metric |
dspy.Refine |
Refine(module, N=3, reward_fn=...) |
Iterative refinement loop |
dspy.Parallel |
Parallel(num_threads=...) |
Internal parallel execution (use Module.batch()) |
dspy.Ensemble |
Ensemble() + .compile([modules]) |
Combine predictions |
dspy.BootstrapFewShot |
BootstrapFewShot(metric=...) |
Trace→demonstration pipeline |
dspy.MIPROv2 |
MIPROv2(metric=..., auto="light") |
Bayesian instruction+ demo search |
dspy.GEPA |
GEPA(metric=...) |
Genetic prompt evolution (ICLR 2026) |
dspy.BetterTogether |
BetterTogether(metric=..., **optimizers) |
Chain multiple optimizers |
dspy.streamify |
streamify(module, async_streaming=False) |
Convert module to streaming |
dspy.asyncify |
asyncify(module) |
Convert module to async |
Used across labs 08-10 for structured RLM output:
| Model | Fields | Used In |
|---|---|---|
FoundDirection |
topic, relevance, seed_query | 09, 10 |
ExplorationResult |
directions: list[FoundDirection] | 09, 10 |
ExtractedFinding |
claim, evidence, source, confidence | 09, 10 |
DeepReadResult |
findings, summary | 09, 10 |
SynthesisReport |
synthesis, key_insights, gaps | 09, 10 |
Critique |
strengths, weaknesses, follow_ups | 09, 10 |
ScrapedContent |
url, category, summary, key_topics | 08 |
ResearchReport |
findings, synthesis | 08 |
| Signature | Inputs | Outputs | Defined In |
|---|---|---|---|
GenerateHypotheses |
topic | hypotheses: list[str] | 10/agents/research_agents.py |
CrossValidateFindings |
findings_summary | validated_claims, contradictions | 10/agents/research_agents.py |
SynthesizeAcrossSources |
task | synthesis, key_insights, gaps | 10/agents/research_agents.py |
SelectAgent |
exploration_depth, confidence, topic | selected_agent | 10/agents/research_agents.py |
ComputeConfidenceDelta |
topic, agent_type, num_findings, findings_summary, exploration_depth | confidence_delta, reasoning | 10/agents/research_agents.py |
AssessSaturation |
topic, confidence, exploration_depth, source_count | is_saturated, reasoning | 10/agents/research_agents.py |
CritiqueReasoning |
research_summary | critique | 10/agents/research_agents.py |
QualityEvaluation |
num_directions, num_findings, frontier_saturation | quality_score, explanation | 10/evolution/lse.py |
ExtractPatterns |
trajectory_context | error_patterns, success_patterns, improvement_suggestion | 10/evolution/trace2skill.py |
AssessBatchSaturation |
directions_json: str | saturated_indices: list[int] | 10/memory/dapr_frontier.py |
ClassifyContent |
chunk | category, key_topics | 08/main.py |
ClassifyIntent |
query | intent, confidence | 07/main.py |
QualityEvaluation |
— | — | 07/main.py |