Spawn a conference of autonomous researchers that compete, collaborate, and synthesize breakthroughs.
Commands · When to Use · Quick Start · How It Works · Templates · Guide · 한국어
Note
A Claude Code skill that orchestrates N parallel autoresearch agents in structured conference rounds -- with adversarial peer review and cross-researcher synthesis. Write a conference.md defining your research goal, and the conference handles hypothesis generation, experimentation, evaluation, and multi-agent iteration. Built on autoresearch-skill. Works with Claude Code, Codex CLI, and Gemini CLI.
| Round 1 (Naive) | Final (Converged) |
|---|---|
![]() |
![]() |
| Composite: 34.5 — water in slab | Composite: 99.9 — clean separation |
| Convergence | Sub-metrics |
|---|---|
![]() |
![]() |
33 iterations across 3 researchers and 3 rounds. The conference learns to preserve crystal structure and exclude water from the hydrate slab. See full example →
autoconference v2.0 provides 7 commands through a subcommand architecture:
| Command | Description | Use Case |
|---|---|---|
/autoconference |
Core conference loop | Run N researchers through structured rounds with peer review |
/autoconference:plan |
8-step setup wizard | Interactively create a conference.md with dry-run evaluator gate |
/autoconference:resume |
Checkpoint recovery | Resume an interrupted conference from last completed phase |
/autoconference:analyze |
Post-conference analysis | Extract insights, failure modes, and transferable learnings |
/autoconference:debate |
Adversarial debate | 2-researcher pro/con format with Opus judge |
/autoconference:survey |
Literature survey | Multi-database systematic review with citation chains |
/autoconference:ship |
Ship results | 8-phase pipeline to format results for publication |
Command chaining:
plan ──> autoconference ──> ship (standard pipeline)
plan ──> autoconference ──> analyze ──> ship (with post-analysis)
debate ──> autoconference (debate-informed experiment)
survey ──> autoconference (literature-guided experiment)
resume ──> autoconference (continuation)
- Multi-Agent Orchestration -- N researchers explore different parts of the search space in parallel, then share findings after each round.
- Adversarial Peer Review -- Opus-powered Reviewer agent challenges claims each round, catching overfitting and measurement noise before results propagate.
- Synthesis Over Selection -- Final output combines complementary insights from multiple researchers, not just picks the winner.
- Dual Mode -- Metric mode for numeric optimization, Qualitative mode for literature review and hypothesis generation.
- 7 Subcommands -- Plan, run, resume, analyze, debate, survey, and ship — chainable into full research pipelines.
- Guard Parameters -- Conference-level safety constraints that override metric improvements when violated.
- Automatic Convergence -- Detects plateau, budget exhaustion, or stall and triggers final synthesis automatically.
- Crash Recovery -- 5-type recovery matrix for interrupted conferences (mid-research, mid-poster, mid-review, mid-transfer, pre-synthesis).
- Full Audit Trail -- Per-researcher logs, poster sessions, peer reviews, conference-level TSV, and JSONL event stream.
- Built on autoresearch-skill -- Each researcher runs the proven 5-stage experiment-evaluate-iterate loop.
- Safety Built In -- Max iterations, time budgets, researcher timeouts, forbidden-change boundaries, and automatic rollback.
Autoconference builds on autoresearch-skill's single-agent loop by adding parallel exploration, adversarial review, and cross-researcher synthesis. Use it when one agent exploring sequentially isn't enough -- either because the search space is too wide, or because self-evaluation alone can't be trusted.
| autoresearch-skill | Agent harness modes (team, /batch, ouroboros, etc.) | autoconference | |
|---|---|---|---|
| Agents | 1 | N, each on a different subtask | N, all on the same problem with different strategies |
| Exploration | Sequential strategy switching | Independent subtask decomposition | Search space partitioning + knowledge transfer between rounds |
| Validation | Mechanical evaluator or agent self-judgment | Build/test pass | Opus adversarial reviewer (catches overfitting, noise, invalid claims) |
| Result integration | Single best result | Per-subtask results merged | Synthesis -- combines complementary insights, not just picks a winner |
| Round structure | None (continuous iteration) | None (one-shot dispatch) | Poster session → peer review → knowledge transfer per round |
- The search space is wide enough to partition -- N researchers exploring different regions in parallel cover more ground than one agent switching strategies sequentially
- Self-evaluation has blind spots -- the adversarial Reviewer (Opus) catches overfitting, measurement noise, and Goodhart's Law effects that a single agent's evaluator misses
- You need synthesis, not selection -- the final output should combine complementary findings from multiple approaches, not just take the best score
- The research is qualitative (literature synthesis, hypothesis generation) and benefits from multiple perspectives converging on a shared taxonomy
- The search space is small enough for one agent to cover within the iteration budget
- You have a mechanical evaluator and trust its keep/revert decisions without external review
- Token cost matters -- autoconference runs N researchers + reviewer + synthesizer per round, roughly N+2x the cost of a single autoresearch loop
Tip
Paste the block below directly into Claude Code. It clones the repo, installs the skill, and verifies the setup in one shot.
I want to install the autoconference-skill. Do these steps:
1. git clone https://github.com/wjgoarxiv/autoconference-skill.git /tmp/autoconference-skill
2. mkdir -p ~/.claude/skills/autoconference-skill && cp -r /tmp/autoconference-skill/SKILL.md /tmp/autoconference-skill/scripts /tmp/autoconference-skill/assets /tmp/autoconference-skill/references ~/.claude/skills/autoconference-skill/
3. Test: python ~/.claude/skills/autoconference-skill/scripts/init_conference.py --goal "test" --metric "score" --direction minimize --researchers 2 --output /tmp/test-conference && echo "OK: autoconference-skill installed"
4. Say "autoconference-skill installed successfully"
git clone https://github.com/wjgoarxiv/autoconference-skill.git
cd autoconference-skill
# Symlink into your skills directory
mkdir -p ~/.claude/skills
ln -s "$(pwd)" ~/.claude/skills/autoconference-skill| Tool | Install Command |
|---|---|
| Claude Code | Paste the copy-paste block above, or use the manual install |
| Codex CLI | Copy SKILL.md into your Codex instructions directory |
| Gemini CLI | Copy SKILL.md into your Gemini context directory |
| Platform | Skills Path | Install Command |
|---|---|---|
| Claude Code | ~/.claude/skills/autoconference-skill/ |
See above |
| Codex CLI | ~/.codex/skills/autoconference-skill/ |
mkdir -p ~/.codex/skills && ln -s "$(pwd)" ~/.codex/skills/autoconference-skill |
| OpenCode | ~/.config/opencode/skills/autoconference-skill/ |
mkdir -p ~/.config/opencode/skills && ln -s "$(pwd)" ~/.config/opencode/skills/autoconference-skill |
| Gemini CLI | ~/.gemini/skills/autoconference-skill/ |
mkdir -p ~/.gemini/skills && ln -s "$(pwd)" ~/.gemini/skills/autoconference-skill |
Three researchers compete on prompt accuracy — each specializing in instruction phrasing, few-shot selection, and chain-of-thought formatting.
Run an autoconference using templates/prompt-optimization.md.
Goal: maximize accuracy on my classification benchmark.
3 researchers, 3 rounds.
Algorithmic, data-structure, and low-level researchers independently optimize the same codebase, then cross-pollinate validated wins each round.
Run an autoconference using templates/code-performance.md.
Metric: wall-clock time on my benchmark suite.
Direction: minimize. Target: < 200ms.
Qualitative mode. Three researchers survey the same topic from foundational, recent, and cross-domain angles, then synthesize a unified taxonomy.
Run an autoconference in qualitative mode.
Goal: survey LLM agent papers from 2022-2025.
3 researchers, 2 rounds. Synthesize findings into a taxonomy.
python scripts/init_conference.py \
--goal "Optimize inference latency" \
--metric "p95_latency_ms" \
--direction minimize \
--target "< 50" \
--researchers 3 \
--strategy assigned \
--output ./latency-conference/Then edit the generated conference.md to fill in your Current Approach, Search Space, and researcher focus areas. When ready:
Run the autoconference on my conference.md
Claude loads SKILL.md, reads conference.md, and orchestrates the full conference -- all rounds, peer review, and final synthesis.
+----------------------------------------------------------+
| CONFERENCE ROUND |
| |
| Phase 1: INDEPENDENT RESEARCH (parallel) |
| +----------+ +----------+ +----------+ |
| |Researcher| |Researcher| |Researcher| Each runs N |
| | A | | B | | C | autoresearch |
| | (iter x N)| | (iter x N)| | (iter x N)| iterations |
| +----+-----+ +----+-----+ +----+-----+ |
| | | | |
| Phase 2: POSTER SESSION |
| +----------------------------------------------+ |
| | Session Chair collects all logs, | |
| | surfaces key findings & deltas | |
| +----------------------+-----------------------+ |
| | |
| Phase 3: PEER REVIEW (adversarial) |
| +----------------------------------------------+ |
| | Reviewer agent challenges claims: | |
| | - "Did metric actually improve?" | |
| | - "Is this overfitting?" | |
| | - "Could this be measurement noise?" | |
| +----------------------+-----------------------+ |
| | |
| Phase 4: KNOWLEDGE TRANSFER |
| +----------------------------------------------+ |
| | Validated findings shared back to | |
| | all researchers for next round | |
| +----------------------------------------------+ |
| |
+----------------------------------------------------------+
|
v Convergence check -> next round or final synthesis
| Section | Purpose |
|---|---|
Goal |
What the conference should achieve |
Mode |
metric (numeric optimization) or qualitative (reasoning quality) |
Success Metric |
Metric name, target, direction (metric mode only) |
Success Criteria |
Natural language description of "good" (qualitative mode only) |
Researchers |
Count, iterations per round, max rounds |
Search Space |
What researchers can and cannot modify |
Search Space Partitioning |
assigned (each researcher has a focus) or free (overlap allowed) |
Constraints |
Max iterations, time budget, researcher timeout |
Current Approach |
Baseline description |
Shared Knowledge |
Auto-populated after each round with validated findings |
Conference Log |
Auto-maintained round-by-round history |
See assets/conference_template.md for the full template.
| Role | Model | Count | Responsibility |
|---|---|---|---|
| Conference Chair | Sonnet | 1 | Orchestrator -- manages rounds, spawns researchers, detects convergence, triggers synthesis |
| Researcher | Sonnet | N | Runs the autoresearch 5-stage loop within assigned search space |
| Session Chair | Haiku | 1 | Lightweight summarizer -- collects logs and produces poster session summary after each round |
| Reviewer | Opus | 1 | Adversarial critic -- challenges claims, checks for overfitting/noise, assigns verdicts |
| Synthesizer | Opus | 1 | Runs once at end -- combines complementary insights from all researchers |
Ready-to-use conference.md configs for common tasks:
| Template | Mode | Use Case |
|---|---|---|
templates/quick-conference.md |
metric | 2 researchers, 2 rounds -- test if your problem benefits from the conference format |
templates/prompt-optimization.md |
metric | Optimize LLM prompt accuracy with 3 specialized researchers |
templates/code-performance.md |
metric | Optimize code speed with algorithmic, data-structure, and low-level researchers |
templates/research-synthesis.md |
qualitative | Literature exploration across foundational, recent, and cross-domain angles |
templates/debate-mode.md |
qualitative | 2-researcher adversarial debate with structured rounds and Opus judge |
templates/survey-mode.md |
qualitative | Multi-database literature survey with citation chain tracking |
| Field | Default | Description |
|---|---|---|
mode |
metric |
metric or qualitative |
count |
-- | Number of researcher agents |
iterations_per_round |
5 | Autoresearch iterations each researcher runs per round |
max_rounds |
4 | Maximum conference rounds before forced synthesis |
max_total_iterations |
-- | Hard cap across all researchers and rounds |
time_budget |
-- | Wall-clock limit for the entire conference |
researcher_timeout |
-- | Per-researcher timeout per round |
strategy |
free |
assigned (focus areas) or free (open exploration) |
guard |
-- | Safety constraint enforced on ALL researchers (violations revert regardless of metric) |
noise_runs |
1 | Repeated evaluations to average for noise reduction |
min_consensus_delta |
0 | Minimum average improvement across kept researchers to advance |
| File | Description |
|---|---|
conference.md |
User config (updated with log entries each round) |
conference_results.tsv |
Master conference-level TSV with all iterations and peer review verdicts |
researcher_A_log.md |
Detailed per-researcher iteration log |
researcher_A_results.tsv |
Per-researcher TSV (same format as autoresearch) |
poster_session_round_N.md |
Session Chair summary for each round |
peer_review_round_N.md |
Reviewer verdicts for each round |
synthesis.md |
Final synthesized output from Synthesizer |
final_report.md |
Executive summary with full conference history |
To run a conference overnight, use the universal loop script:
# Option A: Foreground (simplest)
bash scripts/autoconference-loop.sh ./my-conference/
# Option B: Background with nohup (no tmux needed)
nohup bash scripts/autoconference-loop.sh ./my-conference/ > conference.log 2>&1 &
# Option C: Background with tmux (best experience)
tmux new-session -d -s conference 'bash scripts/autoconference-loop.sh ./my-conference/'
# Check progress anytime
bash scripts/check_conference.sh ./my-conference/The script auto-detects your CLI tool, handles round restarts, and checks for conference completion. Works with Claude Code, Codex CLI, OpenCode, and Gemini CLI.
Each researcher in a conference runs the autoresearch loop -- the same autonomous experiment-evaluate-iterate cycle from autoresearch-skill. Autoconference adds three layers on top:
- Multi-agent orchestration -- N researchers explore different parts of the search space in parallel
- Adversarial peer review -- A Reviewer agent challenges findings each round (catches what self-evaluation misses)
- Synthesis -- A Synthesizer combines complementary insights rather than just picking the best result
Use autoresearch-skill for a single focused research loop. Use autoconference when your search space is large enough to partition, when diversity of approach matters, or when you want external validation of results.
Comprehensive documentation is available in the guide/ directory:
| Guide | Topic |
|---|---|
| Getting Started | 60-second quickstart + domain cheat sheet |
| Core Conference | 4-phase round structure, convergence, overnight runs |
| Plan | 8-step setup wizard |
| Resume | Checkpoint recovery |
| Analyze | Post-conference insight extraction |
| Debate | Adversarial 2-researcher format |
| Survey | Multi-database literature survey |
| Ship | Conference results to publication |
| Chains | Command chaining patterns |
| Advanced | Guards, noise, worktrees, CI/CD |
| Troubleshooting | Common failure modes and fixes |
See also: COMPARISON.md for autoconference vs alternatives.
| Platform | Status | Install |
|---|---|---|
| Claude Code | Ready | Plugin install or manual symlink |
| Codex CLI | Ready | See .codex/INSTALL.md |
| OpenCode | Ready | See .opencode/INSTALL.md |
| Gemini CLI | Ready | Uses gemini-extension.json auto-discovery |
| Requirement | Details |
|---|---|
| Python | 3.8+ (stdlib only — for scripts/init_conference.py only) |
| LLM CLI | Any CLI with subagent support (Claude Code, Codex CLI, OpenCode, Gemini CLI) |
| autoresearch-skill | Referenced by each researcher agent's prompt |
See CONTRIBUTING.md for detailed guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'Add your feature') - Push to the branch (
git push origin feature/your-feature) - Open a pull request
MIT -- see LICENSE for details.




