Phase 7 docs by DanielKim03 · Pull Request #576 · 666ghj/MiroFish

DanielKim03 · 2026-04-23T19:03:17Z

No description provided.

Introduces backend/app/llm/ with an abstract LLMBackend interface and three concrete implementations: Ollama (local), OpenAI-SDK-compat (OpenAI, Anthropic, Together, DeepInfra, Groq, Fireworks), and vLLM (thin specialization of openai_compat). ModelRouter resolves task roles (fast/balanced/heavy/embed) to backends, wraps every call with exponential-backoff retry + configurable fallback chain, and persists per-call token/latency/cost to a SQLite llm_calls table for the acceptance-check cache-hit-rate metric. Also defers flask imports in app/__init__.py so `import app.llm` works in unit tests without installing the full Flask stack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LLMClient becomes a thin back-compat shim over ModelRouter.default(). Its public methods (chat, chat_json) keep the same signatures, so every existing caller continues to work. Adds chat_raw() for callers that need token counts or finish_reason (used by profile / sim-config generators to detect truncation). Migrates the three duplicated OpenAI(...) instantiation sites: - oasis_profile_generator.py: direct client -> LLMClient(role=BALANCED) - simulation_config_generator.py: direct client -> LLMClient(role=BALANCED) - utils/llm_client.py: openai.OpenAI -> router delegate Assigns task-appropriate roles at the remaining callers: - ontology_generator, zep_tools -> fast - report_agent -> heavy Prompts are unchanged; only the transport moves. cache_key hints are added at the two migrated sites so Anthropic / OpenAI prompt caching actually kicks in on the stable system-prompt prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds BACKEND_MODE (local|cloud|custom), LLM_ROLE_<role>_BACKEND/MODEL/..., VLLM_DRAFT_MODEL / VLLM_SPECULATIVE_TOKENS, LLM_MAX_RETRIES, LLM_CALLS_DB, LLM_PRICING_JSON. Back-compat: legacy LLM_API_KEY/LLM_BASE_URL/LLM_MODEL_NAME remain the default for any cloud role whose per-role keys are unset. .env.example documents the full shape with commented Anthropic-Haiku/Opus example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Coverage per module: base.py 5 tests -- complete/stream defaults, BackendError shape openai_compat.py 7 tests -- usage parsing, cache_key routing by provider, Anthropic cache_control tagging, error classification ollama.py 5 tests -- token counts, JSON mode, 5xx retryable, network wrap, per-text embed loop vllm.py 2 tests -- extra_body forwarding, cache_key suppression accounting.py 7 tests -- cost table, cached-rate math, SQLite round-trip, cache-hit-rate aggregation router.py 7 tests -- happy path, retry-then-succeed, non-retryable skip, fallback chain, accounting wiring, missing-role error, embed dispatch conftest.py redirects LLM_CALLS_DB to a tmp path per test and resets the ModelRouter default singleton between tests to keep runs hermetic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds backend/app/memory/ with: base.py -- MemoryBackend abstract interface; Observation/Reflection dataclasses; Namespace helpers (agent:<sim>:<id> / public:<sim>:timeline); base cosine + recency + importance scoring helpers shared across backends in_memory.py -- dict-backed reference implementation (tests + minimal local runs) zep_cloud.py -- adapter around the existing Zep graph.add / graph.search pipeline; stores observations as marker-prefixed episodes so they can be parsed back on read neo4j_local.py -- self-hosted Neo4j 5.x via bolt:// with a clean Cypher schema (Observation/Reflection/Namespace nodes + IN / DERIVED_FROM / CONTRADICTS edges); cosine sim computed client-side for portability, with a commented upgrade path to native vector indexes neo4j_aura.py -- managed AuraDB subclass, warns on non-TLS URIs hierarchical.py -- ImportanceScorer (fast LLM, 1-10, fallback 5), ReflectionScheduler (every N rounds, top-K by importance, balanced LLM, 3-5 beliefs with source pointers), ContradictionDetector (fast LLM binary, top-3 neighbors, writes conflict_edge on sentiment flip) router.py -- MemoryRouter picks backend from MEMORY_BACKEND env with auto-heuristic (NEO4J_URI > ZEP_API_KEY > in_memory) manager.py -- MemoryManager wraps a backend + the three hierarchical services. Enforces per-agent namespace isolation: public posts get mirrored to public:<sim>:timeline, private observations stay in agent:<sim>:<id>, cross-agent reads never traverse another agent's private partition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Integration points: * ZepGraphMemoryManager grows .get_memory_manager(sim_id) which lazily instantiates a MemoryManager per simulation. The existing Zep batch updater keeps running for document-seeded graph enrichment; every add_activity() now also mirrors the activity into the MemoryManager so Phase-2 features (importance scoring, reflection, contradiction, retrieval) light up without touching simulation_runner.py. * CREATE_POST / QUOTE_POST / REPOST / CREATE_COMMENT are mirrored to the public:<sim>:timeline namespace so peer agents can see them via retrieve_for_agent(include_public=True). Non-public actions stay private. stop_updater() also closes the associated MemoryManager. New blueprint /api/agents: GET /api/agents/<id>/reflections?simulation_id=... GET /api/agents/<id>/conflicts?simulation_id=... POST /api/agents/<id>/retrieve (body: simulation_id, query, top_k, weights) Config + env additions: MEMORY_BACKEND (auto/in_memory/zep_cloud/neo4j_local/ neo4j_aura), NEO4J_URI/USER/PASSWORD/DATABASE, REFLECTION_EVERY_N_ROUNDS, REFLECTION_TOP_K_SOURCES, MEMORY_ALPHA/BETA/GAMMA, MEMORY_ENABLE_*. .env.example documents the full surface with commented examples for Aura / local Neo4j. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Coverage per module: base.py 11 tests -- namespace factories + parsing, record kind defaults, recency/importance/cosine helpers, error shape in_memory.py 9 tests -- namespace validation, combined-score ranking, cross-agent isolation, vector KNN ignores recency/importance, reflection source-id validation, conflict edge persistence & endpoint checks, summarize_window ordering hierarchical.py 9 tests -- importance parsing + verbose-reply recovery + LLM-error fallback; reflection scheduler writes beliefs, skips below cadence, skips with too-few sources; contradiction detector writes edges on positive, skips without embedding, no-op on negative classification manager.py 7 tests -- private namespace writes, public mirroring, cross-agent read isolation, reflection cadence wiring, stance-flip end-to-end (Phase-2 acceptance criterion), close() propagation router.py 5 tests -- explicit selection, auto-heuristic picking in_memory / neo4j_local / neo4j_aura / zep_cloud based on env, unknown-kind error LLM calls are fully stubbed via ScriptedRouter / FakeRouter — no network is required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New package backend/app/transport/: base.py -- Transport (backend) + ServerTransport (subprocess) ABCs; Command / Response / Event dataclasses with JSON + (topic, body) frame roundtrip; in-memory pair factory for tests file_ipc.py -- preserves the original file-poll protocol for back-compat. Adds an append-only jsonl events channel so both sides can at least tail events (not real-time) zmq_transport.py -- DEALER/ROUTER for commands (lets the backend issue concurrent requests without REQ/REP turn-taking) + PUB/SUB for events. Uses ipc:// sockets by default; TCP available via env. A doc comment explains the grpc-vs-zmq tradeoff per the phase ground rules factory.py -- build_client_transport / build_server_transport pick the right pair based on IPC_TRANSPORT env (default: zmq) The legacy simulation_ipc.py is untouched. Callers migrate incrementally by swapping SimulationIPCClient for build_client_transport(); the two can coexist per-simulation during rollout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds backend/app/ws/: bridge.py -- EventBridge: one background worker thread per run_id tails the transport event stream and fans out to every registered subscriber. Thread-safe; subscribe() returns an unsubscribe closure; stop_run() tears down the worker + transport on simulation teardown. Process-wide singleton accessed via get_bridge(). streaming.py -- flask-sock routes. No-op when flask-sock isn't installed so the HTTP API keeps working on bare flask installs. /ws/simulation/<run_id> live event feed /ws/simulation/<run_id>/interview streaming token-by-token reply via router.stream_chat() — skips the subprocess round trip so latency drops from ~200ms to single-digit ms per token. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds backend/app/checkpoint/: serializer.py -- collect_checkpoint() walks every namespace the manager has seen (per-agent + public timeline) and snapshots every record + conflict edge into a CheckpointData dataclass. restore_into() replays records in a safe order (observations before reflections, conflicts last) so source_ids always resolve. format_version mismatch raises so stale archives can't silently corrupt state. archiver.py -- save_checkpoint / restore_checkpoint pack the snapshot into .tar.zst (with gzip fallback when zstandard is unavailable). Archive layout is boring tar so operators can `zstd -d | tar -tv` for inspection. API endpoints mounted under /api/simulation/<sim_id>: POST /checkpoint -- capture round state to disk POST /restore -- restore by path or by round_num GET /checkpoints -- list archived checkpoints with size + mtime Phase-3 config additions: IPC_TRANSPORT, IPC_CMD_ENDPOINT, IPC_EVENT_ENDPOINT. requirements.txt adds pyzmq, flask-sock, zstandard, and neo4j (phase 2 backend driver; optional — not installed unless MEMORY_BACKEND=neo4j_*). .env.example documents the transport and WebSocket endpoints. create_app() now also calls register_ws_routes(app) so /ws/* endpoints are attached automatically when flask-sock is installed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…green) Coverage per module: transport/base 6 tests -- command/response/event JSON roundtrip, event frame encoding, in-memory pair happy-path + timeout-on-silent-server transport/file_ipc 3 tests -- file-based command roundtrip, append-only event tailing, per-run event filtering transport/zmq 3 tests -- inproc:// command roundtrip (DEALER/ROUTER), timeout when server is silent, PUB/SUB slow-joiner-aware multi-event fan-out ws/bridge 3 tests -- multi-subscriber fan-out correctness, unsubscribe halts delivery, stop_run releases transport + worker checkpoint 5 tests -- captures all namespaces (agent private + public timeline), tar.zst archive roundtrip, restore_into reproduces state in a fresh manager (Phase-3 acceptance criterion), format_version mismatch raises, archive path contains round number Totals: 33 (phase 1) + 41 (phase 2) + 20 (phase 3) = 94 passing. No network; ZMQ tests use inproc:// endpoints; WS bridge tests drive the file transport for deterministic timing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds backend/app/personas/: schema.py -- StructuredPersona dataclass (Big Five traits + conviction + credibility + background + initial stance); Archetype enum with defaults per archetype (conviction floor = 1.0 for bots/trolls; credibility ceiling matching persona type). Background hard-capped at 200 chars for prefix cacheability. Clean JSON round-trip (acceptance criterion). prompts.py -- persona_system_block(): the fixed template injected into every agent prompt. Stable prefix (archetype rules + scoring scales) first, volatile persona block last — this ordering is what lets Anthropic / OpenAI prompt caching actually catch the common prefix across every agent in the run. generator.py -- PersonaGenerator uses the `balanced` LLM role + strict JSON schema to fill in Big Five / stance / background. Procedural fallback when the LLM fails so simulation doesn't die. BOT and TROLL personas bypass the LLM entirely — their behavior is dictated by the archetype. population.py -- build_population() mixes normal / media / expert / bot / troll agents by percentage, with deterministic seeding. build_bot_persona / build_troll_persona produce procedural personas with the right extras ({narrative} / {tone}). inertia.py -- StanceInertia: per-agent counter of opposing vs supporting posts seen. Valence threshold (0.2) filters out noise. should_allow_flip(persona) enforces the ceil(10*conviction) spec rule. Snapshot / restore for checkpoints. credibility.py -- CredibilityWeighter: re-ranks retrieval results by author credibility. Formula: base * (1 + weight * (cred - 0.5)). Unknown authors get neutral (0.5) — posts are never silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Integration points: * MemoryManager grows an optional credibility_weighter parameter + set_credibility_weighter() setter. retrieve_for_agent() applies the weighter after merging private + public records, before the final sort. When unset, behavior is identical to Phase 2. * OasisProfileGenerator gains two Phase-4 methods: generate_structured_persona_for_entity(entity, user_id, archetype, topic_summary) -> StructuredPersona via PersonaGenerator attach_structured_persona(profile, persona) -> splices the prompt block + JSON tag into the OASIS profile's `persona` field. The original prose-based path is untouched so legacy callers keep working. Config + env additions: BOT_POPULATION_PCT, TROLL_POPULATION_PCT (default 0/0 — enabling these changes outcomes per the phase spec) MEDIA_POPULATION_PCT, EXPERT_POPULATION_PCT (institutional boosts) POPULATION_SEED (deterministic mixing for reproducible eval runs) CREDIBILITY_WEIGHT (re-rank strength; 0.0 disables) .env.example documents each knob with the "enabling these changes outcomes" warning called out by the spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Coverage per module: schema.py 8 tests -- Big Five clamping, stance valence clamping, background truncation, JSON round-trip (Phase-4 acceptance), opposing_posts_needed scales with conviction, stance_is_opposed_by sign check + neutral-stance safeguard, bot archetype default floors, strict from_dict population.py 8 tests -- default all-normal, floor rounding, exact percentages, deterministic seeding, over-100 rejection, negative rejection, bot narrative extras, troll tone extras inertia.py 8 tests -- high-conviction resists single opposing post (Phase-4 acceptance), resists 20 rounds with 8 opposing (below threshold), flips once threshold crossed, low-conviction flips quickly, valence-threshold filters noise, supporting posts counted separately, reset clears counters, snapshot/restore round-trip credibility.py 5 tests -- high cred outranks low cred at tied base score, multiplier formula, weight=0 is noop, unknown author uses neutral, non-mutating prompts.py 6 tests -- stable prefix identical across agents (prefix cache correctness), bot narrative embedded, troll tone embedded, conviction + opposing- needed count appear in volatile, topic summary appended, archetype rules vary per archetype generator.py 5 tests -- LLM JSON -> persona assembly, code-fence stripping, fallback on any LLM error (runtime / network / parse), background length cap, archetype floor clamping even from LLM output integration.py 4 tests -- credibility reweights public-timeline retrieval (end-to-end), bot population changes retrievable content (Phase-4 acceptance), high-conviction agent holds across 20 rounds (Phase-4 acceptance), deterministic population seed Totals: 33 (p1) + 41 (p2) + 20 (p3) + 44 (p4) = 138 passing. All LLM calls stubbed — no network required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…unner) New package backend/eval/: determinism.py -- DeterministicClock + deterministic_run() context manager. Replaces wall-clock now_ts(), seeds the global RNG, and restores state on exit. DETERMINISTIC_VERSION constant bumped on any math / mock-table change so CI catches drift. scoring.py -- Pure functions: directional_accuracy, magnitude_error, calibration. Composite = 0.5*dir + 0.3*(1-mag) + 0.2*cal. Direction synonyms (support/oppose) accepted. verdict.py -- verdict_from_public_timeline aggregates public-namespace posts into a signed support_ratio, weighted by author credibility when personas are supplied. verdict_from_report parses the optional ReportAgent JSON surface. mocks.py -- MockRouter: deterministic drop-in for ModelRouter. SHA-256 hashes prompt+salt for every decision; importance returns an integer 1-10; reflection returns 3 canned beliefs; contradiction is True ~25% via hash bucket; persona JSON is synthesized from the entity name so downstream code sees varied credibility / valence distributions. pipeline.py -- run_case() orchestrates: build_population -> PersonaGenerator -> MemoryManager (Phase-2/4 features per FeatureFlags) -> rounds of posts with stance-anchored valence -> Verdict. FeatureFlags is the knob the ablation tool sweeps over. storage.py -- JSONL append / read for the eval-results dashboard. EVAL_RESULTS_PATH env override. runner.py -- CLI. `python -m backend.eval.runner --case <name> --deterministic --mock-llm` produces a numeric score. Two runs with those flags are BYTE-IDENTICAL (Phase-5 acceptance). Warns if --deterministic is passed without --mock-llm. ablation.py -- CLI. Sweeps baseline + 7 variants (no_importance, no_reflection, no_contradiction, no_credibility, no_conviction, no_phase2, no_phase4) and prints a comparison table with Δ vs baseline. backend/__init__.py added so `python -m backend.eval.runner` resolves and so modules inside eval/ can reach backend/app/* via the sys.path guard at the top of runner.py / ablation.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cases (backend/eval/datasets/<name>/{seed.md, question.md, truth.json}): sample_policy_carbon_tax -- truth: negative, magnitude 0.45 sample_product_vr_headset -- truth: positive, magnitude 0.55 sample_policy_remote_work -- truth: positive, magnitude 0.35 sample_product_ai_service -- truth: neutral (polarized), magnitude 0.10 sample_election_incumbent_mayor -- truth: positive, magnitude 0.30 Each truth.json cites a comparable real-world analog in its `notes` field. README.md spells out the schema and flags these as starter fixtures to be replaced with peer-reviewed cases before publishing benchmark numbers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

API: GET /api/eval/results?limit=50[&case=<name>] -> {"count": N, "results": [<record>, ...]} — newest first Reads from the JSONL store populated by `runner.py --persist`. CI workflow (.github/workflows/eval-smoke.yml): On every PR against main/master: 1. Install the minimal deterministic-path deps (no Zep / OASIS / Neo4j) 2. pytest backend/tests/ 3. Run `backend.eval.runner` twice with --deterministic --mock-llm, --output-json; `diff -q` enforces byte-identical output (Phase-5 acceptance criterion) 4. Smoke-run backend.eval.ablation to confirm the table format stays stable Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Coverage per module: scoring.py 7 tests -- directional synonym handling, magnitude clipping, calibration rewards/penalties, composite uses weights, custom weights, clamping verdict.py 6 tests -- empty timeline -> neutral/0 confidence, consistent positive maps to positive, credibility tips split votes, ReportAgent JSON parsing with code fences, None on garbage determinism.py 5 tests -- clock monotonic + reproducible, wall-clock fallback outside block, global RNG state restored, seeded_random isolation, version constant exposed mocks.py 6 tests -- chat determinism, importance integer range, reflection 3 beliefs, contradiction mixed distribution, embed determinism, persona schema validity storage.py 7 tests -- append creates file, recorded_ts auto-added, newest-first ordering, limit honored, case filter, missing file -> empty, malformed line skipped runner.py + ablation.py 6 tests (subprocess-based) — numeric score emitted (Phase-5 acceptance 666ghj#1), byte-identical output across two runs (Phase-5 acceptance 666ghj#2), deterministic warning without mock-llm, ablation emits table with all variants, --output-json parseable (Phase-5 acceptance 666ghj#3) Totals: 33 (p1) + 41 (p2) + 20 (p3) + 44 (p4) + 37 (p5) = 175 passing in 14s. Subprocess-based runner tests exercise the real CLI end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds backend/app/observability/: logging.py -- configure_logging() configures structlog -> JSON on stdout when structlog is installed; else falls back to a stdlib JSON formatter that still honors bind_context(). Every log line carries run_id / agent_id / phase via contextvars without code changes at the call sites. metrics.py -- Prometheus registry exposing the Phase-6-spec metrics: llm_calls_total{role,provider,model,status} llm_tokens_total{role,kind} llm_cache_hit_ratio (rolling gauge, recomputed on emit) memory_op_duration_seconds{op,backend} histogram simulation_active_runs, simulation_rounds_total{platform} auth_rejections_total{reason} Degrades to a minimal in-process counter store with a human-readable banner if prometheus_client is missing, so /metrics still responds 200. tracing.py -- OTel setup with OTLP/HTTP exporter. start_span() is a no-op context manager when OTel SDK isn't installed or OTEL_EXPORTER_OTLP_ENDPOINT is unset, so callers can use it unconditionally. Wires observe_llm_call() into the LLM router. Every successful and every failed backend call (including retries) records prompt/completion/cached tokens + status into Prometheus. Metric emission is wrapped in a bare try/except so it can never break the LLM call path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds backend/app/auth/: keys.py -- SQLite-backed ApiKeyStore. Plaintext key format is `mf_<8-hex-id>_<~40-char-urlsafe-secret>`. Only the SHA-256 hash is stored; plaintext is returned ONCE at issue time. Constant-time compare on verify to avoid timing leaks. quotas.py -- QuotaTracker with atomic check-and-debit. Raises QuotaExceeded on over-cap with a structured .to_dict() for the 429 response body. Preview-mode non-mutating check powers the cost-estimator approval flow. 30-day rolling window (env overridable). middleware.py -- @require_api_key Flask decorator. Accepts X-MiroFish-Key header (preferred) or ?api_key= query (fallback). ALLOW_ANONYMOUS_API=true bypasses with a metric increment so dashboards see reliance on anon. Adds backend/app/cost/: estimator.py -- estimate_simulation_cost(agents, rounds, role_models) multiplies per-role default token budgets (tuned against observed qwen-plus runs) by agents × rounds × calls. Resolves (provider, model) -> price via Phase-1's _PRICING table; unknown pairs annotate `note` without crashing. ApprovalRequired raised by require_approval() when estimate exceeds user_cap_usd. New endpoints: GET /metrics -- Prometheus scrape target POST /api/auth/keys -- issue (admin-only) GET /api/auth/keys -- list (admin-only) DELETE /api/auth/keys/<id> -- revoke (admin-only) GET /api/auth/quota -- current key's usage POST /api/simulation/estimate-cost -- pre-flight estimate create_app() now calls configure_logging() + configure_tracing() at startup. Admin endpoints require `X-MiroFish-Admin-Token: $ADMIN_TOKEN` and return 503 when ADMIN_TOKEN is unset (makes misconfiguration loud). Config + env additions: ADMIN_TOKEN, ALLOW_ANONYMOUS_API, AUTH_DB_PATH, QUOTA_DB_PATH, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME, COST_BUDGET_<ROLE>_{CALLS,IN,OUT,CACHED} overrides. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

deploy/helm/mirofish/: Chart.yaml -- appVersion 0.6.0, kubeVersion >=1.24 values.yaml -- every key documented inline; groups: backend / frontend / redis / vllm / memory / llm / auth / observability / ingress. Defaults: backend x2 replicas, redis on, frontend + vllm + ingress off. Neo4j expected external (Aura) per spec. templates/ _helpers.tpl -- fullname + labels backend-{dep,svc}.yaml redis.yaml -- inline, toggleable via .Values.redis.enabled vllm.yaml -- optional GPU deployment configmap.yaml -- flattens values into the backend's envFrom secret.yaml -- placeholder Secret for ADMIN_TOKEN / LLM_API_KEY / ZEP_API_KEY / neo4j-password (populate externally) ingress.yaml -- optional, ingressClassName-aware README.md -- install + lint + values overview requirements.txt: structlog, prometheus_client, opentelemetry-api + sdk + otlp-proto-http. Neo4j 5.x driver stays optional (only installed when MEMORY_BACKEND=neo4j_*). .env.example: documents ALLOW_ANONYMOUS_API, ADMIN_TOKEN, AUTH_DB_PATH, QUOTA_DB_PATH, OTEL_*, and COST_BUDGET_* overrides in a single Phase-6 block before the existing Phase-4 persona section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tal) Coverage per module: observability/logging 5 tests -- bind_context roundtrip, nested LIFO unbind, stdlib JSON formatter w/ contextvars merge, configure_logging idempotency, structlog path selection observability/metrics 8 tests -- llm_call metric labels + cache ratio update, memory_op histogram buckets, active_run gauge, auth rejection counter, content-type, fallback when prometheus_client missing, singleton accessor observability/tracing 4 tests -- no-endpoint -> disabled, no-op span when disabled, configure is idempotent, attribute-setting span opens + shuts down cleanly auth/keys 9 tests -- issue returns plaintext once, verify roundtrip, rejects garbage + tampered + revoked, list filters by owner, excludes revoked by default, to_dict strips secret, quotas stored on key auth/quotas 8 tests -- unlimited key passthrough, token quota enforced, usd quota enforced, atomic debit, failed-debit doesn't apply (critical), preview non- mutating, reset, fresh-key zeros auth/middleware 6 tests -- missing header 401, valid key accepted, invalid key 401, revoked key 401, anonymous flag bypass, query-string fallback cost/estimator 8 tests -- linear scaling, unknown vendor -> zero cost + note, cached fraction discounts, approval flag when over cap, ApprovalRequired exception, zero cap disables, per-role breakdown present, env budget overrides merge Totals: 33 (p1) + 41 (p2) + 20 (p3) + 44 (p4) + 37 (p5) + 48 (p6) = 223 passing in ~24s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MIGRATION.md at repo root: TL;DR + per-phase rundown, running in each mode (local / cloud / vLLM / Kubernetes), full new-keys table with per- phase origin, and notes on what's breaking vs additive. Flags that the public HTTP surface has ZERO breaking changes — every upstream endpoint behaves identically. README.md: inserts a prominent "MiroFish-Cloud (Phase 1-6)" banner above Quick Start pointing at MIGRATION / architecture / BENCHMARKS. Adds Option 2 (multi-provider cloud), Option 3 (local-only via Ollama), and Option 4 (Helm chart) alongside the existing npm-dev quickstart. docs/architecture.md: full module tree for every phase, abstract-backend diagrams (LLM router, memory layer, transport), request data-flow traces for a normal round + streaming interview, the Phase-6 Kubernetes deployment topology, and a cross-phase "notable design decisions" table. BENCHMARKS.md: specifies the four benchmarks that matter (throughput, interview latency, eval scores, cost per 1k-agent sim), gives exact reproduction commands, carries the captured deterministic-ablation table as an in-repo number (verified by CI), and holds ⚠️-marked placeholder tables for the live-LLM numbers operators fill in after their first production runs. Test-suite runtime table (223 tests, ~24s) ships as a CI regression guard baseline. All docs-only; pytest still green at 223/223. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Config.validate() previously required LLM_API_KEY and ZEP_API_KEY unconditionally, which blocked run.py from starting in Path A's local-only mode (Ollama + in-memory backend, no cloud creds). Two relaxations: * When BACKEND_MODE=local, no cloud LLM key is required. The router uses Ollama defaults for every role. * When MEMORY_BACKEND is in_memory or neo4j_*, ZEP_API_KEY stops being required — Zep is only used under MEMORY_BACKEND=zep_cloud (or the `auto` default). * In cloud/custom mode, LLM_ROLE_BALANCED_API_KEY satisfies the check when a per-role key has been configured without the legacy fallback. backend/pyproject.toml grows entries for every phase-1-6 runtime dep that already lives in requirements.txt (pyzmq, flask-sock, zstandard, neo4j, structlog, prometheus_client, opentelemetry-*, requests). A single `uv sync` now pulls the complete set — no follow-up `pip install` needed. Verified live in local mode: GET /health -> 200 GET /metrics -> 200 (Prometheus text + phase-6 metrics) POST /api/simulation/estimate-cost -> 200 with full per-role breakdown Every phase 1-6 blueprint + WebSocket route registered at startup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

.env.example restructured for production deployment: * Adds a prominent "CLOUD DEPLOYMENT QUICKSTART" header block at the top listing the 3 secret lines to replace (LLM_API_KEY, ADMIN_TOKEN, and optionally NEO4J_*). * Default LLM config is now single-vendor OpenAI (sk-REPLACE-ME / gpt-4o-mini) — simplest cloud path, one bill, built-in price table, works for both chat and embeddings. Aliyun DashScope kept as a commented alternative for existing upstream users. * Uncommented LLM_ROLE_HEAVY_* and LLM_ROLE_EMBED_* so the default setup uses gpt-4o for ReportAgent synthesis and text-embedding-3-large for vector retrieval. * FLASK_HOST=0.0.0.0 so containers / load balancers reach the backend. * FLASK_DEBUG=false — auto-reload off in production. * ADMIN_TOKEN gets a literal placeholder (was a commented hint) so cloud deployments fail loud at boot when it's not filled in. * Neo4j Aura block moved above local CE (cloud-first ordering) with explicit connection-string format and REPLACE-ME placeholders. * ZEP_API_KEY placeholder updated to `zk-REPLACE-ME` to match the expected format. backend/uv.lock: locks every phase 1-6 runtime dep added to pyproject.toml in the previous commit — opentelemetry-{api,sdk,exporter-otlp-proto-http}, prometheus-client, structlog, flask-sock, pyzmq, zstandard, neo4j, etc. Pinned versions match what `uv sync` produced in Path A. frontend/package-lock.json: regenerated by `npm install` during Path A setup; no version drift, just lockfile metadata refresh. No functional code changes; .env itself stays gitignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MiroFish Migration and others added 23 commits April 23, 2026 15:50

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation labels Apr 23, 2026

MiroFish Migration and others added 2 commits April 23, 2026 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 7 docs#576

Phase 7 docs#576
DanielKim03 wants to merge 25 commits into
666ghj:mainfrom
DanielKim03:phase-7-docs

DanielKim03 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DanielKim03 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant