usetheodev · usetheodev · May 15, 2026 · May 14, 2026 · May 14, 2026
diff --git a/.claude/knowledge-base/index.md b/.claude/knowledge-base/index.md
@@ -12,6 +12,7 @@ updated_at: 2026-05-14
 
 ## Plans
 - [theo-desktop Bridge Refactor](plans/theo-desktop-bridge-refactor-plan.md) — Reorganiza apps/theo-desktop em ponte fina (Hexagonal + Strategy); empurra regras de negocio para theo-application e theo-infra-llm
+- [Evolution Loop Removal](plans/evolution-loop-removal-plan.md) — Remove o EvolutionLoop orfao (0 ativacoes em 16 runs, observability hardcoded a zero); Memory ERL + budget enforcer cobrem todas as capabilities; nada absorvivel
 
 ## Architecture
 ### Context Domain

diff --git a/.claude/knowledge-base/log.md b/.claude/knowledge-base/log.md
@@ -916,6 +916,116 @@ Não outputarei promise. Iteração 3 encerrada com 2 das 5 acceptance criteria
 - **Next step:** Plan status = `ready`; user may invoke implementation via ralph-loop or direct refactor.
 - **Evidence classification:** Plan is [PLANNED]; will be [MEASURED] after `/cross-validation` + `/dogfood full` in Phase 7.
 
+## 2026-05-14T17:30 — Plan created: `evolution-loop-removal`
+
+- **Trigger:** User asked whether `Self-evolution` domain (SOTA 2.0 YELLOW) should be worked on. Investigation revealed orphaned wiring.
+- **Empirical evidence (3 independent signals):**
+  - 16 trajectories in `.theo/trajectories/`: 100% `evolution_attempts:0`, `evolution_success:false`, zero `PilotLoopComplete` events.
+  - `crates/theo-agent-runtime/src/observability/mod.rs:185-186` hardcodes `evolution_attempts: 0, evolution_success: false` in the production `FinalizeInputs` builder — telemetry disconnected from runtime state.
+  - `grep -rE "theo pilot" apps/theo-benchmark apps/theo-cli/tests` → 0 matches: pilot is never exercised in benchmarks.
+- **Absorption analysis:** 5 EvolutionLoop capabilities (attempt tracking, heuristic reflection, strategy ladder, hard cap, prompt injection) all redundant with Memory ERL + `BudgetEnforcer`. Strategy ladder is advisory-only text (no enforcement code path). Nothing absorbable.
+- **Plan:** `.claude/knowledge-base/plans/evolution-loop-removal-plan.md`. 7 phases. Net delete ~487 LOC + 1 test + 2 dead JSONL fields. ADR D5 reserves `reflector.rs` for follow-up plan (out of scope here).
+- **Match to user feedback memory:** third occurrence of the "zero-activity metric → orphaned wiring" pattern recorded in `feedback_inconclusive_means_bug.md`.
+- **Next step:** `/edge-case-plan evolution-loop-removal` per `/to-plan` skill pipeline.
+
+## 2026-05-14T17:45 — Edge case review: `evolution-loop-removal`
+
+- **Tasks analisadas:** 14 originais
+- **Edge cases encontrados:** 8 (MUST FIX: 6, SHOULD TEST: 1, DOCUMENT: 1)
+- **Veredicto:** PLANO PRECISA DE AJUSTE — depois corrigido in-place
+- **Gaps críticos detectados:**
+  - **EC-1:** `tests/observability_e2e.rs` tem 4 callsites com hardcoded `evolution_*: 0/false` (8 linhas a deletar) — não estava na cobertura de T4.1
+  - **EC-2/EC-3:** Frontend TypeScript `apps/theo-ui/src/features/observability/types.ts:114-115` declara os campos como NÃO-opcionais; `LoopHealthPanel.tsx:55` renderiza badge — quebraria `npm run build` (tsc strict)
+  - **EC-4/EC-5/EC-6:** Python harness em `apps/theo-benchmark/` tem 3 arquivos consumindo os campos (`loop_analysis.py` 8 refs, `_headless.py` 4 refs, `tests/test_headless.py` 5 refs) — quebraria pytest
+- **Ajustes aplicados ao plano:**
+  - ADR D3 ampliado com nota sobre consumers verificados + artefatos históricos
+  - T4.1 ganhou step 4 explícito para `observability_e2e.rs`
+  - **Nova T5.5** ("Remove evolution fields from theo-ui") adicionada à Phase 5
+  - **Nova T5.6** ("Remove evolution accumulators from theo-benchmark") adicionada à Phase 5
+  - T4.3 + Global DoD ampliados com `npm run build` + `pytest` gates
+  - Coverage matrix expandida de 16 → 22 gaps
+  - Dependency graph atualizado: T5.5 e T5.6 podem rodar em paralelo após Phase 4
+- **Report completo:** `.claude/knowledge-base/plans/evolution-loop-removal-edge-cases.md`
+- **Match memória feedback:** confirma novamente que "INCONCLUSIVE means investigate" — assumi sem evidência que "no external parser depends on these fields"; a grep nos diretórios `apps/` mostrou 78 hits e revelou 6 consumers reais.
+
+## 2026-05-14T18:30 — Plan implemented: `evolution-loop-removal` (Phases 1-5 complete)
+
+- **Phase 0 baseline:** Rust workspace 3,567 lib tests; theo-domain 501; theo-agent-runtime 1,156. Pre-existing failures (4 bwrap sandbox, env-dependent) unchanged.
+- **Phase 1 — Domain types deleted:**
+  - `crates/theo-domain/src/evolution.rs` (117 LOC) removed; `lib.rs` module decl removed.
+  - `theo_domain::retry_policy::CorrectionStrategy` enum + Display impl removed (`retry_policy.rs` 107 → 84 LOC). N=0 tests removed for it.
+  - `memory/lesson.rs` docstring updated to remove dangling `theo-domain::evolution::Reflection` cross-reference.
+  - theo-domain: 501 → 497 tests (delta −4: 4 inline tests in deleted `evolution.rs`).
+- **Phase 2 — Runtime EvolutionLoop deleted:**
+  - `crates/theo-agent-runtime/src/evolution.rs` (264 LOC) removed; `lib.rs` module decl removed.
+  - README.md "Safety / Guard" sub-domain table updated (removed `EvolutionLoop`).
+  - Failure surface confined to 7 errors in `pilot/mod.rs` (2) + `pilot/run_loop.rs` (5) — exact prediction.
+- **Phase 3 — PilotLoop surgical removal:**
+  - `pilot/mod.rs`: `evolution: EvolutionLoop` field + init + `build_evolution_prompt` injection block removed.
+  - `pilot/run_loop.rs`: `record_evolution_attempt` helper (~50 LOC) removed; caller in `pilot/mod.rs:336` removed.
+  - `cargo build -p theo-agent-runtime` green.
+- **Phase 4 — Observability cleanup:**
+  - `FinalizeInputs.evolution_*` fields removed (mod.rs:185-186 hardcoded init + 236-237 struct fields + 271-272 pass-through).
+  - `LoopMetrics.evolution_*` fields removed; `compute_loop_metrics` signature reduced from 7 to 5 args.
+  - `test_evolution_attempts_counted` deleted; `test_budget_utilization_correct`, `test_phase_distribution_has_four_phases`, `test_done_blocked_tracked` updated to new arity.
+  - **EC-1 fixed:** `tests/observability_e2e.rs` — 4 callsites cleaned (8 lines).
+  - theo-agent-runtime: 1,156 → 1,155 tests (delta −1: `test_evolution_attempts_counted`).
+- **Phase 5 — Docs + downstream consumers:**
+  - **T5.5 (theo-ui)** — `types.ts:LoopMetrics` (2 fields removed); `LoopHealthPanel.tsx:55` (badge removed). `npm run build` green; `npm test` 10 files / 44 tests passing.
+  - **T5.6 (theo-benchmark)** — `loop_analysis.py` (8 refs removed: accumulator vars + parser + 2 output blocks); `_headless.py` (4 refs removed: dataclass fields + parser); `tests/test_headless.py` (5 refs removed: default + fixture + asserts). Python smoke test confirms old-schema JSON parses cleanly (Serde-like ignore via `.get(..., default)`).
+  - **T5.1** — `CLAUDE.md` Domain Status: "Self-evolution" row removed (now 14 SOTA-scored domains).
+  - **T5.2** — `CHANGELOG.md` `[Unreleased] → Removed` entry added with (#evolution-loop-removal) ref.
+  - **T5.3** — `e2e_auto_evolution.rs` header clarifies it tests memory auto-evolution, NOT EvolutionLoop.
+- **Workspace test delta:** −5 (4 theo-domain inline + 1 theo-agent-runtime observability). All other crates unchanged.
+- **`grep -rn "EvolutionLoop\|theo_domain::evolution\|CorrectionStrategy" crates/`:** 0 production matches (only `tests/read_real_trajectory.rs` historical fixture string + Edit's own docstring in lesson.rs commentary).
+- **Next:** Phase 6 cross-validation + Phase 7 dogfood.
+
+## 2026-05-14T18:50 — Phase 6 + 7 complete: `evolution-loop-removal` SHIPPABLE
+
+- **Phase 6 cross-validation:** APROVADO COM RESSALVAS (4 divergências, 3 MINOR + 1 INFO; zero BLOCKER/CRITICAL/MAJOR). Report: `.claude/knowledge-base/reviews/cross-validation/evolution-loop-removal-xval-2026-05-14.md`.
+  - MINOR-1: theo-domain delta foi −4 não −3 (esqueci `max_evolution_attempts_is_five`).
+  - MINOR-2: workspace delta foi −5 não −4 (consequência de MINOR-1).
+  - MINOR-3: pytest indisponível no env; substituído por py_compile + functional smoke (old-schema JSON parseia, novos attrs verificavelmente ausentes).
+  - INFO-1: gates pré-existentes (duplication 60 pairs, wiring 202 orphan APIs, bwrap sandbox 4 fails, apps/theo-desktop unwrap/panic) confirmadamente zero matches em evolution-related; out of scope.
+- **Phase 7 dogfood:** PASS. Health score 78/100. Report: `.claude/knowledge-base/reviews/dogfood/dogfood-evolution-loop-removal-2026-05-14.md`.
+  - **OAuth Codex auth:** completed via `theo login --provider chatgpt-codex` (user authorized device code `1GG6-7DMCY`).
+  - **Real-LLM E2E single-shot:** `theo agent --headless`, model `gpt-5.4`, 7709 input + 60 output tokens, `success: true`, `convergence_rate: 1.0`, `error_class: "solved"`, duration 2.9s.
+  - **Real-LLM E2E pilot:** `theo pilot --max-iterations 2`, 3 loops, exit `CircuitBreakerOpen` (correct behavior — no-edit task). 33.2k total tokens.
+  - **Negative runtime-metric proof:** all 4 produced trajectories (1 agent + 3 pilot) have **zero** `evolution_attempts|evolution_success` matches. `loop_metrics` schema now exactly: `{budget_utilization, convergence_rate, done_blocked_count, phase_distribution, total_iterations}` — empirical confirmation that the schema deletion shipped.
+- **Final state — plan DoD complete:**
+  - ✅ All 7 phases executed
+  - ✅ All Acceptance Criteria + Global DoD items checked
+  - ✅ Workspace tests 3,562 (baseline 3,567 − 5)
+  - ✅ Clippy clean on affected crates (theo-domain, theo-agent-runtime, theo-application)
+  - ✅ `make check-arch`, `check-changelog`, `check-sizes`, `check-deprecated` all green
+  - ✅ Frontend `npm run build` exit 0; `npm test` 44 tests pass
+  - ✅ Python harness syntax clean; functional smoke test confirms backward-compat parse
+  - ✅ Dogfood PASS, health 78/100, shippable
+- **Net code delta:** ~487 LOC Rust deleted + ~22 LOC TS/Python deleted + 1 test removed + 2 dead JSONL schema fields removed.
+
+## 2026-05-14T19:10 — MINOR-3 RESOLVED: pytest formal validation completed
+
+- Bootstrapped pytest 9.0.3 in env via `python3 get-pip.py --user --break-system-packages` + `pip install --user --break-system-packages pytest`.
+- **Formal delta proven ZERO via `git stash`:**
+  - Pre-deletion: 16 failed / 257 passed (test_thresholds.py FileNotFoundError fixtures, env-dependent)
+  - Post-deletion: 16 failed / 257 passed (IDENTICAL)
+- **`test_headless.py` 44/44 PASS post-deletion** — this is the file modified by T5.6.
+- Zero `evolution`-related failures in either run (grep on output).
+- **DoD #4 now formally met.** Cross-validation MINOR-3 marked RESOLVED.
+
+## 2026-05-14T19:30 — Wiring gate delta hunted to ZERO
+
+- **Detected:** `make check-wiring` orphan count was 201 pre-deletion → 202 post-deletion. Investigated per user feedback memory ("INCONCLUSIVE means investigate, not defer").
+- **Root cause:** name-collision artifact. The wiring gate uses name-based grep (`pub fn record_attempt`). Pre-deletion, both `EvolutionLoop::record_attempt` and `SessionState::record_attempt` (orphan in disguise) collided on the same caller mention in `pilot/run_loop.rs:104`, hiding the SessionState orphan. Post-deletion of EvolutionLoop, only `SessionState::record_attempt` remained — zero callers — correctly flagged as orphan.
+- **Fix:** deleted `SessionState::record_attempt` (14 LOC) per project golden rule "DEAD CODE DEVE SER DELETADO" + wiring-allowlist.txt protocol option 3. Not a scope expansion — direct consequence of evolution deletion exposing the artifact.
+- **Verified:** `cargo build -p theo-agent-runtime` clean; wiring orphan count 201 → 201 (DELTA ZERO restored).
+- **All 4 pre-existing-failure gates have DELTA ZERO post-deletion:**
+  - check-unwrap: 5 → 5
+  - check-panic: 15 → 15
+  - check-duplication: 65 → 65
+  - check-wiring: 201 → 201
+- **Per-failure scope:** zero of these gates flag `evolution`-related items. All pre-existing baseline issues outside this plan's scope.
+
 ---
 
 ## 2026-05-14T18:30 — theo-home-unification implemented (phases 0-6)