autoconference-skill

Spawn a conference of autonomous researchers that compete, collaborate, and synthesize breakthroughs.

Commands · When to Use · Quick Start · How It Works · Templates · Guide · 한국어

Note

A Claude Code skill that orchestrates N parallel autoresearch agents in structured conference rounds -- with adversarial peer review and cross-researcher synthesis. Write a conference.md defining your research goal, and the conference handles hypothesis generation, experimentation, evaluation, and multi-agent iteration. Built on autoresearch-skill. Works with Claude Code, Codex CLI, and Gemini CLI.

Example: sII Hydrate + Water .gro Generation

Round 1 (Naive)	Final (Converged)

Composite: 34.5 — water in slab	Composite: 99.9 — clean separation

Convergence	Sub-metrics

33 iterations across 3 researchers and 3 rounds. The conference learns to preserve crystal structure and exclude water from the hydrate slab. See full example →

Commands

autoconference v2.0 provides 7 commands through a subcommand architecture:

Command	Description	Use Case
`/autoconference`	Core conference loop	Run N researchers through structured rounds with peer review
`/autoconference:plan`	8-step setup wizard	Interactively create a `conference.md` with dry-run evaluator gate
`/autoconference:resume`	Checkpoint recovery	Resume an interrupted conference from last completed phase
`/autoconference:analyze`	Post-conference analysis	Extract insights, failure modes, and transferable learnings
`/autoconference:debate`	Adversarial debate	2-researcher pro/con format with Opus judge
`/autoconference:survey`	Literature survey	Multi-database systematic review with citation chains
`/autoconference:ship`	Ship results	8-phase pipeline to format results for publication

Command chaining:

plan ──> autoconference ──> ship              (standard pipeline)
plan ──> autoconference ──> analyze ──> ship  (with post-analysis)
debate ──> autoconference                     (debate-informed experiment)
survey ──> autoconference                     (literature-guided experiment)
resume ──> autoconference                     (continuation)

Features

Multi-Agent Orchestration -- N researchers explore different parts of the search space in parallel, then share findings after each round.
Adversarial Peer Review -- Opus-powered Reviewer agent challenges claims each round, catching overfitting and measurement noise before results propagate.
Synthesis Over Selection -- Final output combines complementary insights from multiple researchers, not just picks the winner.
Dual Mode -- Metric mode for numeric optimization, Qualitative mode for literature review and hypothesis generation.
7 Subcommands -- Plan, run, resume, analyze, debate, survey, and ship — chainable into full research pipelines.
Guard Parameters -- Conference-level safety constraints that override metric improvements when violated.
Automatic Convergence -- Detects plateau, budget exhaustion, or stall and triggers final synthesis automatically.
Crash Recovery -- 5-type recovery matrix for interrupted conferences (mid-research, mid-poster, mid-review, mid-transfer, pre-synthesis).
Full Audit Trail -- Per-researcher logs, poster sessions, peer reviews, conference-level TSV, and JSONL event stream.
Built on autoresearch-skill -- Each researcher runs the proven 5-stage experiment-evaluate-iterate loop.
Safety Built In -- Max iterations, time budgets, researcher timeouts, forbidden-change boundaries, and automatic rollback.

When to Use

Autoconference builds on autoresearch-skill's single-agent loop by adding parallel exploration, adversarial review, and cross-researcher synthesis. Use it when one agent exploring sequentially isn't enough -- either because the search space is too wide, or because self-evaluation alone can't be trusted.

autoconference vs alternatives

	autoresearch-skill	Agent harness modes (team, /batch, ouroboros, etc.)	autoconference
Agents	1	N, each on a different subtask	N, all on the same problem with different strategies
Exploration	Sequential strategy switching	Independent subtask decomposition	Search space partitioning + knowledge transfer between rounds
Validation	Mechanical evaluator or agent self-judgment	Build/test pass	Opus adversarial reviewer (catches overfitting, noise, invalid claims)
Result integration	Single best result	Per-subtask results merged	Synthesis -- combines complementary insights, not just picks a winner
Round structure	None (continuous iteration)	None (one-shot dispatch)	Poster session → peer review → knowledge transfer per round

Pick autoconference when

The search space is wide enough to partition -- N researchers exploring different regions in parallel cover more ground than one agent switching strategies sequentially
Self-evaluation has blind spots -- the adversarial Reviewer (Opus) catches overfitting, measurement noise, and Goodhart's Law effects that a single agent's evaluator misses
You need synthesis, not selection -- the final output should combine complementary findings from multiple approaches, not just take the best score
The research is qualitative (literature synthesis, hypothesis generation) and benefits from multiple perspectives converging on a shared taxonomy

Use autoresearch-skill instead when

The search space is small enough for one agent to cover within the iteration budget
You have a mechanical evaluator and trust its keep/revert decisions without external review
Token cost matters -- autoconference runs N researchers + reviewer + synthesizer per round, roughly N+2x the cost of a single autoresearch loop

Quick Start

1. Copy-Paste Install

Tip

Paste the block below directly into Claude Code. It clones the repo, installs the skill, and verifies the setup in one shot.

I want to install the autoconference-skill. Do these steps:
1. git clone https://github.com/wjgoarxiv/autoconference-skill.git /tmp/autoconference-skill
2. mkdir -p ~/.claude/skills/autoconference-skill && cp -r /tmp/autoconference-skill/SKILL.md /tmp/autoconference-skill/scripts /tmp/autoconference-skill/assets /tmp/autoconference-skill/references ~/.claude/skills/autoconference-skill/
3. Test: python ~/.claude/skills/autoconference-skill/scripts/init_conference.py --goal "test" --metric "score" --direction minimize --researchers 2 --output /tmp/test-conference && echo "OK: autoconference-skill installed"
4. Say "autoconference-skill installed successfully"

2. Manual Install

git clone https://github.com/wjgoarxiv/autoconference-skill.git
cd autoconference-skill

# Symlink into your skills directory
mkdir -p ~/.claude/skills
ln -s "$(pwd)" ~/.claude/skills/autoconference-skill

3. Other Tools

Tool	Install Command
Claude Code	Paste the copy-paste block above, or use the manual install
Codex CLI	Copy `SKILL.md` into your Codex instructions directory
Gemini CLI	Copy `SKILL.md` into your Gemini context directory

Other Platforms

Platform	Skills Path	Install Command
Claude Code	`~/.claude/skills/autoconference-skill/`	See above
Codex CLI	`~/.codex/skills/autoconference-skill/`	`mkdir -p ~/.codex/skills && ln -s "$(pwd)" ~/.codex/skills/autoconference-skill`
OpenCode	`~/.config/opencode/skills/autoconference-skill/`	`mkdir -p ~/.config/opencode/skills && ln -s "$(pwd)" ~/.config/opencode/skills/autoconference-skill`
Gemini CLI	`~/.gemini/skills/autoconference-skill/`	`mkdir -p ~/.gemini/skills && ln -s "$(pwd)" ~/.gemini/skills/autoconference-skill`

Usage

Prompt Optimization Tournament

Three researchers compete on prompt accuracy — each specializing in instruction phrasing, few-shot selection, and chain-of-thought formatting.

Run an autoconference using templates/prompt-optimization.md.
Goal: maximize accuracy on my classification benchmark.
3 researchers, 3 rounds.

Code Performance Competition

Algorithmic, data-structure, and low-level researchers independently optimize the same codebase, then cross-pollinate validated wins each round.

Run an autoconference using templates/code-performance.md.
Metric: wall-clock time on my benchmark suite.
Direction: minimize. Target: < 200ms.

Literature Synthesis Conference

Qualitative mode. Three researchers survey the same topic from foundational, recent, and cross-domain angles, then synthesize a unified taxonomy.

Run an autoconference in qualitative mode.
Goal: survey LLM agent papers from 2022-2025.
3 researchers, 2 rounds. Synthesize findings into a taxonomy.

Scaffold a New Conference

python scripts/init_conference.py \
  --goal "Optimize inference latency" \
  --metric "p95_latency_ms" \
  --direction minimize \
  --target "< 50" \
  --researchers 3 \
  --strategy assigned \
  --output ./latency-conference/

Then edit the generated conference.md to fill in your Current Approach, Search Space, and researcher focus areas. When ready:

Run the autoconference on my conference.md

Claude loads SKILL.md, reads conference.md, and orchestrates the full conference -- all rounds, peer review, and final synthesis.

How It Works

+----------------------------------------------------------+
|                     CONFERENCE ROUND                     |
|                                                          |
|  Phase 1: INDEPENDENT RESEARCH (parallel)               |
|  +----------+  +----------+  +----------+               |
|  |Researcher|  |Researcher|  |Researcher|  Each runs N  |
|  |    A     |  |    B     |  |    C     |  autoresearch |
|  | (iter x N)|  | (iter x N)|  | (iter x N)|  iterations |
|  +----+-----+  +----+-----+  +----+-----+               |
|       |              |             |                     |
|  Phase 2: POSTER SESSION                                 |
|  +----------------------------------------------+       |
|  | Session Chair collects all logs,             |       |
|  | surfaces key findings & deltas               |       |
|  +----------------------+-----------------------+       |
|                         |                               |
|  Phase 3: PEER REVIEW (adversarial)                     |
|  +----------------------------------------------+       |
|  | Reviewer agent challenges claims:            |       |
|  | - "Did metric actually improve?"             |       |
|  | - "Is this overfitting?"                     |       |
|  | - "Could this be measurement noise?"         |       |
|  +----------------------+-----------------------+       |
|                         |                               |
|  Phase 4: KNOWLEDGE TRANSFER                            |
|  +----------------------------------------------+       |
|  | Validated findings shared back to            |       |
|  | all researchers for next round               |       |
|  +----------------------------------------------+       |
|                                                          |
+----------------------------------------------------------+
          |
          v  Convergence check -> next round or final synthesis

The `conference.md` Format

Section	Purpose
`Goal`	What the conference should achieve
`Mode`	`metric` (numeric optimization) or `qualitative` (reasoning quality)
`Success Metric`	Metric name, target, direction (metric mode only)
`Success Criteria`	Natural language description of "good" (qualitative mode only)
`Researchers`	Count, iterations per round, max rounds
`Search Space`	What researchers can and cannot modify
`Search Space Partitioning`	`assigned` (each researcher has a focus) or `free` (overlap allowed)
`Constraints`	Max iterations, time budget, researcher timeout
`Current Approach`	Baseline description
`Shared Knowledge`	Auto-populated after each round with validated findings
`Conference Log`	Auto-maintained round-by-round history

See assets/conference_template.md for the full template.

Agent Roles

Role	Model	Count	Responsibility
Conference Chair	Sonnet	1	Orchestrator -- manages rounds, spawns researchers, detects convergence, triggers synthesis
Researcher	Sonnet	N	Runs the autoresearch 5-stage loop within assigned search space
Session Chair	Haiku	1	Lightweight summarizer -- collects logs and produces poster session summary after each round
Reviewer	Opus	1	Adversarial critic -- challenges claims, checks for overfitting/noise, assigns verdicts
Synthesizer	Opus	1	Runs once at end -- combines complementary insights from all researchers

Templates

Ready-to-use conference.md configs for common tasks:

Template	Mode	Use Case
`templates/quick-conference.md`	metric	2 researchers, 2 rounds -- test if your problem benefits from the conference format
`templates/prompt-optimization.md`	metric	Optimize LLM prompt accuracy with 3 specialized researchers
`templates/code-performance.md`	metric	Optimize code speed with algorithmic, data-structure, and low-level researchers
`templates/research-synthesis.md`	qualitative	Literature exploration across foundational, recent, and cross-domain angles
`templates/debate-mode.md`	qualitative	2-researcher adversarial debate with structured rounds and Opus judge
`templates/survey-mode.md`	qualitative	Multi-database literature survey with citation chain tracking

Configuration Options

Field	Default	Description
`mode`	`metric`	`metric` or `qualitative`
`count`	--	Number of researcher agents
`iterations_per_round`	5	Autoresearch iterations each researcher runs per round
`max_rounds`	4	Maximum conference rounds before forced synthesis
`max_total_iterations`	--	Hard cap across all researchers and rounds
`time_budget`	--	Wall-clock limit for the entire conference
`researcher_timeout`	--	Per-researcher timeout per round
`strategy`	`free`	`assigned` (focus areas) or `free` (open exploration)
`guard`	--	Safety constraint enforced on ALL researchers (violations revert regardless of metric)
`noise_runs`	1	Repeated evaluations to average for noise reduction
`min_consensus_delta`	0	Minimum average improvement across kept researchers to advance

Output Files

File	Description
`conference.md`	User config (updated with log entries each round)
`conference_results.tsv`	Master conference-level TSV with all iterations and peer review verdicts
`researcher_A_log.md`	Detailed per-researcher iteration log
`researcher_A_results.tsv`	Per-researcher TSV (same format as autoresearch)
`poster_session_round_N.md`	Session Chair summary for each round
`peer_review_round_N.md`	Reviewer verdicts for each round
`synthesis.md`	Final synthesized output from Synthesizer
`final_report.md`	Executive summary with full conference history

Overnight Runs

To run a conference overnight, use the universal loop script:

# Option A: Foreground (simplest)
bash scripts/autoconference-loop.sh ./my-conference/

# Option B: Background with nohup (no tmux needed)
nohup bash scripts/autoconference-loop.sh ./my-conference/ > conference.log 2>&1 &

# Option C: Background with tmux (best experience)
tmux new-session -d -s conference 'bash scripts/autoconference-loop.sh ./my-conference/'

# Check progress anytime
bash scripts/check_conference.sh ./my-conference/

The script auto-detects your CLI tool, handles round restarts, and checks for conference completion. Works with Claude Code, Codex CLI, OpenCode, and Gemini CLI.

Relationship to autoresearch-skill

Each researcher in a conference runs the autoresearch loop -- the same autonomous experiment-evaluate-iterate cycle from autoresearch-skill. Autoconference adds three layers on top:

Multi-agent orchestration -- N researchers explore different parts of the search space in parallel
Adversarial peer review -- A Reviewer agent challenges findings each round (catches what self-evaluation misses)
Synthesis -- A Synthesizer combines complementary insights rather than just picking the best result

Use autoresearch-skill for a single focused research loop. Use autoconference when your search space is large enough to partition, when diversity of approach matters, or when you want external validation of results.

Guide

Comprehensive documentation is available in the guide/ directory:

Guide	Topic
Getting Started	60-second quickstart + domain cheat sheet
Core Conference	4-phase round structure, convergence, overnight runs
Plan	8-step setup wizard
Resume	Checkpoint recovery
Analyze	Post-conference insight extraction
Debate	Adversarial 2-researcher format
Survey	Multi-database literature survey
Ship	Conference results to publication
Chains	Command chaining patterns
Advanced	Guards, noise, worktrees, CI/CD
Troubleshooting	Common failure modes and fixes

See also: COMPARISON.md for autoconference vs alternatives.

Cross-Platform Compatibility

Platform	Status	Install
Claude Code	Ready	Plugin install or manual symlink
Codex CLI	Ready	See `.codex/INSTALL.md`
OpenCode	Ready	See `.opencode/INSTALL.md`
Gemini CLI	Ready	Uses `gemini-extension.json` auto-discovery

Requirements

Requirement	Details
Python	3.8+ (stdlib only — for `scripts/init_conference.py` only)
LLM CLI	Any CLI with subagent support (Claude Code, Codex CLI, OpenCode, Gemini CLI)
autoresearch-skill	Referenced by each researcher agent's prompt

Contributing

See CONTRIBUTING.md for detailed guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/your-feature)
Commit your changes (git commit -m 'Add your feature')
Push to the branch (git push origin feature/your-feature)
Open a pull request

License

MIT -- see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.claude-plugin		.claude-plugin
.codex		.codex
.opencode		.opencode
assets		assets
evals		evals
examples/sii-hydrate-generation		examples/sii-hydrate-generation
guide		guide
references		references
scripts		scripts
skills		skills
templates		templates
tests		tests
.gitignore		.gitignore
COMPARISON.md		COMPARISON.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README-Ko-KR.md		README-Ko-KR.md
README.md		README.md
SKILL.md		SKILL.md
context7.json		context7.json
cover.png		cover.png
gemini-extension.json		gemini-extension.json
generate_cover.py		generate_cover.py

Folders and files

Latest commit

History

Repository files navigation

autoconference-skill

Example: sII Hydrate + Water .gro Generation

Commands

Features

When to Use

autoconference vs alternatives

Pick autoconference when

Use autoresearch-skill instead when

Quick Start

1. Copy-Paste Install

2. Manual Install

3. Other Tools

Other Platforms

Usage

Prompt Optimization Tournament

Code Performance Competition

Literature Synthesis Conference

Scaffold a New Conference

How It Works

The conference.md Format

Agent Roles

Templates

Configuration Options

Output Files

Overnight Runs

Relationship to autoresearch-skill

Guide

Cross-Platform Compatibility

Requirements

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `conference.md` Format

Packages