ianreppel · ianreppel · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026
@@ -5,22 +5,30 @@ alwaysApply: true
 
 # Architecture
 
-- `domain.py` holds all data structures (not `models.py`, which is ambiguous in an LLM project)
-- `openrouter.py` is the HTTP client layer (separate from orchestration)
-- `simulation.py` holds all dry-run fakes (LLM responses and search results)
-- `prompts.py` contains prompt builders (pure functions) and output parsers (review verdict, synthesis decision)
-- `compression.py` handles both low-level text compression and prompt fitting within token budgets
-- `progress.py` defines the `ProgressCallback` protocol and `NoOpProgress` fallback
-- `reviewers.py` holds the reviewer-to-candidate assignment algorithm
-- `exclamations.py` holds the Simpsons-quote prefix helper used to dress up error messages
-- Cross-group model overlap is allowed: a model may generate and review in the same round
-- Compression priority: candidates first, then reviews, then context; the task instruction is never compressed
-
-## Dependency direction
+## One rule
 
 - `cli.py` depends on `core/` and `ui/`; nothing in `core/` or `ui/` imports from `cli.py`
-- `ui/tui.py` depends on `core/progress.py` (protocol) and `core/domain.py` (types); it never imports from `core/orchestrator.py`
-- Within `core/`, `orchestrator.py` is the only module that imports from `openrouter.py`, `compression.py`, and `reviewers.py`; `search.py` is imported by `orchestrator.py` only but may itself import `simulation.py` for its dry-run path
-- `prompts.py` depends only on `domain.py`; it has no runtime dependencies on other core modules
-- `exclamations.py` is a leaf module (stdlib-only); it may be imported from anywhere in `core/`
-- `config.py` loads `crossfire.toml` with CLI override precedence: CLI flags > TOML values > defaults; per-mode overrides go in `[modes.<mode>.*]`, missing roles fall back to global `[models.*]` defaults
+
+Within `core/`, modules import freely from each other as needed.
+
+## Module responsibilities
+
+- `domain.py` — all data structures (not `models.py`, which is ambiguous in an LLM project)
+- `orchestrator.py` — the generate → review → synthesize loop, concurrency, failure handling
+- `openrouter.py` — OpenRouter HTTP client, retry logic, model ID utilities
+- `prompts.py` — prompt builders (pure functions) and output parsers (review verdict, synthesis decision)
+- `compression.py` — extractive text compression and prompt fitting within token budgets
+- `config.py` — TOML loading with CLI override precedence: CLI flags > TOML values > defaults; per-mode overrides in `[modes.<mode>.*]`
+- `pricing.py` — OpenRouter pricing cache (`pricing.json`) and cost estimation for dry runs
+- `simulation.py` — deterministic fakes for dry-run mode (LLM responses and search results)
+- `search.py` — Tavily web search integration
+- `reviewers.py` — reviewer-to-candidate assignment algorithm
+- `tokens.py` — tiktoken-based token estimation
+- `progress.py` — `ProgressCallback` protocol and `NoOpProgress` fallback
+- `exclamations.py` — Simpsons-quote prefix helper for error messages
+- `archive.py` — disk archival of run artifacts
+
+## Design decisions
+
+- Cross-group model overlap is allowed: a model may generate and review in the same round
+- Compression priority: candidates first, then reviews, then context; the task instruction is never compressed
@@ -0,0 +1,17 @@
+---
+description: Plan validation — end-to-end trace, contradiction check, edge cases
+alwaysApply: true
+---
+
+# Planning
+
+Before finalizing any plan that touches 3+ files:
+
+- Trace every data flow end-to-end through the actual code (not the abstraction): verify function signatures, call sites, and who calls whom
+- Check that the plan is internally consistent: no section may contradict another
+- Identify every existing function, configuration object, or data structure the new code depends on and verify it provides what the plan assumes (e.g. which configuration variant: base vs mode-resolved?)
+- For every external input (files, API responses, environment variables), list what happens when it is missing, empty, malformed, or the wrong type
+- Flag every assumption about the runtime context: sync vs async, working directory, which process or event loop, which thread
+- Separate input estimates from output estimates: never use the same ceiling for both unless explicitly justified
+- After drafting the plan, re-read it as a hostile reviewer looking for gaps, contradictions, and unstated assumptions; fix before presenting
+- After implementation, verify the code matches the plan: trace actual variable values through each formula and code path, checking that upstream mutations (e.g. enrichment rewriting an instruction) are reflected in every downstream use of that variable
@@ -36,7 +36,7 @@
 ### Install

 ```bash
 uv python install 3.12         # transitive incompatibility in docformatter->untokenize
 uv venv --python 3.12
 uv sync
 ```
@@ -96,6 +96,16 @@
   --context-file paper.pdf
 ```
 
+### Cost estimation
+Cost estimation in ``--dry-run`` requires current model prices from OpenRouter.
+These can be grabbed and stored in `pricing.json` with the following command:
+
+```bash
+uv run crossfire prices
+```
+
+Since it fetches pricing on _all_ OpenRouter models, we can add moves to `crossfire.toml` without re-fetching.
+
 ### Clean up
 Remove all generated and cached files (runs, `.venv`, caches, bytecode):
 
@@ -152,7 +162,7 @@
 If all reviews report no weaknesses, the remaining rounds are skipped, as there is no value in further refinement.
 Disable with `--no-early-stop`.
 
-### Generator refusal
+### Generator refusal detection
 Some models occasionally refuse to produce output, responding with meta-commentary like "the sources are insufficient" instead of answering the prompt.
 Crossfire detects these refusals and attempts a replacement model from the generator pool.
 If no replacement is available, the generator is dropped and the round fails gracefully.
@@ -214,7 +224,7 @@
 ## Development

 ```bash
 uv python install 3.12         # transitive incompatibility in docformatter->untokenize
 uv venv --python 3.12
 uv sync
 uv run pre-commit install      # set up git hooks (recommended)
@@ -228,6 +238,7 @@
 
 Use `--dry-run` to verify your changes without making API calls.
 It produces deterministic synthetic outputs via SHA-256 hashing.
+If `pricing.json` is present (from `crossfire prices`), the summary table includes an upper-bound cost estimate.
 
 ```bash
 uv run crossfire run \
@@ -267,6 +278,7 @@
 │   │   ├── progress.py       # progress reporting
 │   │   ├── reviewers.py      # reviewer-to-candidate assignment
 │   │   ├── search.py         # search integration with Tavily
+│   │   ├── pricing.py        # OpenRouter pricing cache and cost estimation
 │   │   ├── exclamations.py   # The Simpsons prefixes for error messages
 │   │   └── archive.py        # disk archival
 │   ├── ui/
@@ -296,6 +308,11 @@
 When prompts exceed the token budget, Crossfire drops sections and sentences rather than summarizing.
 The task instruction is _never_ compressed.
 
+**Cost estimates are approximate.**
+The dry-run estimate uses fixed output-token defaults (~5,000 tokens per generator/synthesizer call, ~2,000 per reviewer) and average pricing across each model group.
+It does not predict early stopping or actual output lengths, so it typically overestimates by 2-4x for runs that stop early.
+If the instruction contains an explicit word or page count, the estimate uses that instead, but the regex may also match counts that describe the input rather than the desired output.
+
 **No streaming.**
 Responses are received in full.
 
@@ -305,6 +322,5 @@
 ### Ideas for improvements
 
 - Resume: restart interrupted runs from the last completed round
-- Cost prediction: estimate total run costs upfront from prompt sizes and model pricing, reported as part of `--dry-run` output
 - Local UI: browser-based interface with live progress, output panel, and searchable run history
 - Library and containerization: extract a reusable library and Docker image so Crossfire can run as a service (switch to UTC-based logs)