Arena

                         █████╗ ██████╗ ███████╗███╗   ██╗ █████╗
                        ██╔══██╗██╔══██╗██╔════╝████╗  ██║██╔══██╗
                        ███████║██████╔╝█████╗  ██╔██╗ ██║███████║
                        ██╔══██║██╔══██╗██╔══╝  ██║╚██╗██║██╔══██║
                        ██║  ██║██║  ██║███████╗██║ ╚████║██║  ██║
                        ╚═╝  ╚═╝╚═╝  ╚═╝╚══════╝╚═╝  ╚═══╝╚═╝  ╚═╝

A position-driven adversarial arena for AI agents. Host provides context and 2+ opposing positions; arena dispatches local CLI models (Claude, Codex, Gemini, OpenAI, Kimi) to argue each position over multiple rounds and returns the transcript.

A standalone CLI — invoke it from your shell, scripts, or any agent that can run shell commands.

Mental model

Host doesn't fight. The caller (Claude Code, Codex CLI, scripts) just supplies what should be argued and which positions to argue.
Position is the unit, not the model. Adversarial value comes from clashing stances, not from "which model wins". Same model with two different system prompts is a valid pair if no other CLI is available.
Arena owns model dispatch. It picks distinct models when multiple CLIs are healthy, falls back to reusing one when not.

Subcommands

Subcommand	Purpose
`arena challenge`	Core. Run N positions over R rounds against the supplied context.
`arena review`	Code-review preset over `arena challenge`. Spawns attacker positions (default: bug-hunter + security-auditor) on the supplied code/diff.
`arena health`	List agent CLIs and their availability.
`arena mcp`	Start arena as a stdio MCP server — exposes each scenario as a tool callable from any MCP client.

Install

# Required: at least one of these CLIs in $PATH
npm install -g @anthropic-ai/claude-cli   # for "claude"
npm install -g @codex-ai/cli              # for "codex" / "openai" / "gemini"
uv tool install kimi-cli                  # for "kimi" (or: pipx install kimi-cli)

Shell (no npm/node required)

Downloads a self-contained native binary from the latest GitHub release. Supports macOS (arm64/x64) and Linux (arm64/x64).

curl -fsSL https://raw.githubusercontent.com/tim101010101/arena/main/install.sh | bash

Installs to ~/.local/bin/arena. Override the directory with ARENA_INSTALL_DIR, or pin a version with ARENA_VERSION:

ARENA_INSTALL_DIR=/usr/local/bin ARENA_VERSION=v0.1.3 \
  curl -fsSL https://raw.githubusercontent.com/tim101010101/arena/main/install.sh | bash

npm

npm install -g arena-mcp        # or: npx arena-mcp

CLI usage

# Adversarial debate — supply your own positions
arena challenge \
  --context "Should we use microservices or a monolith for a 10k-user product with 5 devs?" \
  --position "Pro-microservices: team boundaries justify the split" \
  --position "Pro-monolith: a 5-person team should not carry the ops burden" \
  --rounds 3

# Adversarial code review (positions auto-derived from --focus)
arena review --git-ref feature/auth --focus bugs,security

arena review --files src/login.ts,src/session.ts --focus security

# Override which models to use (must already be healthy)
arena challenge --context "..." --position a --position b --models claude,codex

# Diagnostics
arena health
arena --version
arena --help

MCP server

arena mcp starts a stdio MCP server. Each loaded scenario (challenge, review, and any user-defined ones) is exposed as an MCP tool; a health tool is also included.

Add it to your MCP client config (e.g. Claude Desktop or Claude Code .mcp.json):

{
  "mcpServers": {
    "arena": {
      "command": "arena",
      "args": ["mcp"]
    }
  }
}

Once connected, your AI client can call:

challenge — supply context (string) and positions (array of ≥2 strings); optional rounds and models.
review — supply sources (array of source objects: raw, git_ref, git_range, file_list, or patch_file); optional focus, rounds, and models.
health — returns availability of all local agent CLIs.

Configuration (env vars)

Variable	Default	Notes
`ARENA_TIMEOUT_MS`	`120000`	Per-fighter execution timeout
`ARENA_DEFAULT_ROUNDS`	`3`	Default rounds when not specified
`ARENA_DEFAULT_MODE`	`parallel`	Reserved (challenge runs sequentially)
`ARENA_MAX_CONTEXT_SIZE`	`1000000`	Max bytes from `sources`
`ARENA_CLAUDE_MODEL` / `ARENA_CODEX_MODEL` / `ARENA_GEMINI_MODEL` / `ARENA_OPENAI_MODEL` / `ARENA_KIMI_MODEL`	CLI default	Per-adapter model override

Dispatch behavior

positions = ["A", "B"]
available = healthCheckAll().filter(ok)
override  = caller-supplied --models / models[]

pool = override ?? available
fighter[i].model = pool[i % pool.length]

Prefers distinct models when len(positions) ≤ len(pool).
Cycles when positions outnumber the pool — same model, different prompts.
Each fighter gets a unique id (<model>#<i>) so transcripts stay disambiguated.

Development

bun install
bun test          # full suite
bun run build     # produces dist/index.js

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.claude/skills/arena-eval		.claude/skills/arena-eval
.github/workflows		.github/workflows
evals		evals
src		src
tests		tests
.gitignore		.gitignore
.npmignore		.npmignore
.npmrc		.npmrc
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
arena.example.jsonc		arena.example.jsonc
install.sh		install.sh
package.json		package.json
start.sh		start.sh
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arena

Mental model

Subcommands

Install

Shell (no npm/node required)

npm

CLI usage

MCP server

Configuration (env vars)

Dispatch behavior

Development

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arena

Mental model

Subcommands

Install

Shell (no npm/node required)

npm

CLI usage

MCP server

Configuration (env vars)

Dispatch behavior

Development

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages