model-raising-chat

Internal chat playground for testing unreleased "model-raising" (novel safety-pretrained) checkpoints on a single A100. Includes a one-click Claude-Code (Opus) auditor that probes each model with ~100 questions across 10 categories — plus two specialized subagents: canaries (checks for emergent surfacing of training-time canary values) and persona_trigger (EPE-only; checks whether <assistant> / charter-section openers push the model into its reflection persona) — and writes a short summary of weird/safer/broken behaviors.

Stack

vLLM — one OpenAI-compatible subprocess per model on a fixed port. Only one model loaded at a time on the GPU; switches are serialized on an in-flight counter.
NiceGUI — dashboard with per-model chat, audit launcher, and add-model form.
claude-agent-sdk — drives an Opus auditor with category subagents and an in-process MCP ask_model tool that maintains true multi-turn conversations with the model under test.
OmegaConf + per-file YAML — model registry; new models are added from the dashboard and persisted to conf/models/*.yaml.
pyngrok — public URL on startup, gated by ngrok's built-in OAuth.

Quickstart

# 1. Install
uv sync   # or: pip install -e .

# 2. Authenticate Claude Code with your Max subscription (one-time)
claude login

# 3. Create a .env file in the repo root (auto-loaded by scripts/start.sh)
echo 'NGROK_AUTHTOKEN=...' > .env       # optional — without it, no public tunnel
# echo 'DASHBOARD_STORAGE_SECRET=...' >> .env  # optional — stable NiceGUI storage key

# 4. Start the dashboard (also opens an ngrok tunnel if the token is set)
bash scripts/start.sh

The startup script prints the public URL. Open it, click Load on a model, then Chat or Audit.

Adding models

Either drop a YAML file into conf/models/ (see baseline_safelm_1p7b.yaml for the schema) or use the Add Model form in the dashboard footer.

Layout

conf/             # config.yaml + per-model YAMLs
data/             # audit_questions.yaml + canaries.yaml + audits/{id}.json summaries
logs/             # vllm.log + per-audit Claude logs
src/model_raising_chat/
  config.py       # Pydantic schemas, registry I/O
  supervisor.py   # vLLM subprocess lifecycle
  audit.py        # Claude-Code audit runner + MCP ask_model tool
  tokenize.py     # local tokenizer that preserves registered special tokens
  state.py        # module-level singletons
  dashboard/
    app.py        # NiceGUI entry-point + ngrok
    layout.py     # shared page frame
    theme.py      # dashboard theming
    pages/        # home, chat, audit
scripts/start.sh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
conf		conf
data		data
scripts		scripts
src/model_raising_chat		src/model_raising_chat
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

model-raising-chat

Stack

Quickstart

Adding models

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

model-raising-chat

Stack

Quickstart

Adding models

Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages