Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/knowledge-base/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ updated_at: 2026-05-14

## Plans
- [theo-desktop Bridge Refactor](plans/theo-desktop-bridge-refactor-plan.md) — Reorganiza apps/theo-desktop em ponte fina (Hexagonal + Strategy); empurra regras de negocio para theo-application e theo-infra-llm
- [Evolution Loop Removal](plans/evolution-loop-removal-plan.md) — Remove o EvolutionLoop orfao (0 ativacoes em 16 runs, observability hardcoded a zero); Memory ERL + budget enforcer cobrem todas as capabilities; nada absorvivel

## Architecture
### Context Domain
Expand Down
110 changes: 110 additions & 0 deletions .claude/knowledge-base/log.md
Original file line number Diff line number Diff line change
Expand Up @@ -916,6 +916,116 @@ Não outputarei promise. Iteração 3 encerrada com 2 das 5 acceptance criteria
- **Next step:** Plan status = `ready`; user may invoke implementation via ralph-loop or direct refactor.
- **Evidence classification:** Plan is [PLANNED]; will be [MEASURED] after `/cross-validation` + `/dogfood full` in Phase 7.

## 2026-05-14T17:30 — Plan created: `evolution-loop-removal`

- **Trigger:** User asked whether `Self-evolution` domain (SOTA 2.0 YELLOW) should be worked on. Investigation revealed orphaned wiring.
- **Empirical evidence (3 independent signals):**
- 16 trajectories in `.theo/trajectories/`: 100% `evolution_attempts:0`, `evolution_success:false`, zero `PilotLoopComplete` events.
- `crates/theo-agent-runtime/src/observability/mod.rs:185-186` hardcodes `evolution_attempts: 0, evolution_success: false` in the production `FinalizeInputs` builder — telemetry disconnected from runtime state.
- `grep -rE "theo pilot" apps/theo-benchmark apps/theo-cli/tests` → 0 matches: pilot is never exercised in benchmarks.
- **Absorption analysis:** 5 EvolutionLoop capabilities (attempt tracking, heuristic reflection, strategy ladder, hard cap, prompt injection) all redundant with Memory ERL + `BudgetEnforcer`. Strategy ladder is advisory-only text (no enforcement code path). Nothing absorbable.
- **Plan:** `.claude/knowledge-base/plans/evolution-loop-removal-plan.md`. 7 phases. Net delete ~487 LOC + 1 test + 2 dead JSONL fields. ADR D5 reserves `reflector.rs` for follow-up plan (out of scope here).
- **Match to user feedback memory:** third occurrence of the "zero-activity metric → orphaned wiring" pattern recorded in `feedback_inconclusive_means_bug.md`.
- **Next step:** `/edge-case-plan evolution-loop-removal` per `/to-plan` skill pipeline.

## 2026-05-14T17:45 — Edge case review: `evolution-loop-removal`

- **Tasks analisadas:** 14 originais
- **Edge cases encontrados:** 8 (MUST FIX: 6, SHOULD TEST: 1, DOCUMENT: 1)
- **Veredicto:** PLANO PRECISA DE AJUSTE — depois corrigido in-place
- **Gaps críticos detectados:**
- **EC-1:** `tests/observability_e2e.rs` tem 4 callsites com hardcoded `evolution_*: 0/false` (8 linhas a deletar) — não estava na cobertura de T4.1
- **EC-2/EC-3:** Frontend TypeScript `apps/theo-ui/src/features/observability/types.ts:114-115` declara os campos como NÃO-opcionais; `LoopHealthPanel.tsx:55` renderiza badge — quebraria `npm run build` (tsc strict)
- **EC-4/EC-5/EC-6:** Python harness em `apps/theo-benchmark/` tem 3 arquivos consumindo os campos (`loop_analysis.py` 8 refs, `_headless.py` 4 refs, `tests/test_headless.py` 5 refs) — quebraria pytest
- **Ajustes aplicados ao plano:**
- ADR D3 ampliado com nota sobre consumers verificados + artefatos históricos
- T4.1 ganhou step 4 explícito para `observability_e2e.rs`
- **Nova T5.5** ("Remove evolution fields from theo-ui") adicionada à Phase 5
- **Nova T5.6** ("Remove evolution accumulators from theo-benchmark") adicionada à Phase 5
- T4.3 + Global DoD ampliados com `npm run build` + `pytest` gates
- Coverage matrix expandida de 16 → 22 gaps
- Dependency graph atualizado: T5.5 e T5.6 podem rodar em paralelo após Phase 4
- **Report completo:** `.claude/knowledge-base/plans/evolution-loop-removal-edge-cases.md`
- **Match memória feedback:** confirma novamente que "INCONCLUSIVE means investigate" — assumi sem evidência que "no external parser depends on these fields"; a grep nos diretórios `apps/` mostrou 78 hits e revelou 6 consumers reais.

## 2026-05-14T18:30 — Plan implemented: `evolution-loop-removal` (Phases 1-5 complete)

- **Phase 0 baseline:** Rust workspace 3,567 lib tests; theo-domain 501; theo-agent-runtime 1,156. Pre-existing failures (4 bwrap sandbox, env-dependent) unchanged.
- **Phase 1 — Domain types deleted:**
- `crates/theo-domain/src/evolution.rs` (117 LOC) removed; `lib.rs` module decl removed.
- `theo_domain::retry_policy::CorrectionStrategy` enum + Display impl removed (`retry_policy.rs` 107 → 84 LOC). N=0 tests removed for it.
- `memory/lesson.rs` docstring updated to remove dangling `theo-domain::evolution::Reflection` cross-reference.
- theo-domain: 501 → 497 tests (delta −4: 4 inline tests in deleted `evolution.rs`).
- **Phase 2 — Runtime EvolutionLoop deleted:**
- `crates/theo-agent-runtime/src/evolution.rs` (264 LOC) removed; `lib.rs` module decl removed.
- README.md "Safety / Guard" sub-domain table updated (removed `EvolutionLoop`).
- Failure surface confined to 7 errors in `pilot/mod.rs` (2) + `pilot/run_loop.rs` (5) — exact prediction.
- **Phase 3 — PilotLoop surgical removal:**
- `pilot/mod.rs`: `evolution: EvolutionLoop` field + init + `build_evolution_prompt` injection block removed.
- `pilot/run_loop.rs`: `record_evolution_attempt` helper (~50 LOC) removed; caller in `pilot/mod.rs:336` removed.
- `cargo build -p theo-agent-runtime` green.
- **Phase 4 — Observability cleanup:**
- `FinalizeInputs.evolution_*` fields removed (mod.rs:185-186 hardcoded init + 236-237 struct fields + 271-272 pass-through).
- `LoopMetrics.evolution_*` fields removed; `compute_loop_metrics` signature reduced from 7 to 5 args.
- `test_evolution_attempts_counted` deleted; `test_budget_utilization_correct`, `test_phase_distribution_has_four_phases`, `test_done_blocked_tracked` updated to new arity.
- **EC-1 fixed:** `tests/observability_e2e.rs` — 4 callsites cleaned (8 lines).
- theo-agent-runtime: 1,156 → 1,155 tests (delta −1: `test_evolution_attempts_counted`).
- **Phase 5 — Docs + downstream consumers:**
- **T5.5 (theo-ui)** — `types.ts:LoopMetrics` (2 fields removed); `LoopHealthPanel.tsx:55` (badge removed). `npm run build` green; `npm test` 10 files / 44 tests passing.
- **T5.6 (theo-benchmark)** — `loop_analysis.py` (8 refs removed: accumulator vars + parser + 2 output blocks); `_headless.py` (4 refs removed: dataclass fields + parser); `tests/test_headless.py` (5 refs removed: default + fixture + asserts). Python smoke test confirms old-schema JSON parses cleanly (Serde-like ignore via `.get(..., default)`).
- **T5.1** — `CLAUDE.md` Domain Status: "Self-evolution" row removed (now 14 SOTA-scored domains).
- **T5.2** — `CHANGELOG.md` `[Unreleased] → Removed` entry added with (#evolution-loop-removal) ref.
- **T5.3** — `e2e_auto_evolution.rs` header clarifies it tests memory auto-evolution, NOT EvolutionLoop.
- **Workspace test delta:** −5 (4 theo-domain inline + 1 theo-agent-runtime observability). All other crates unchanged.
- **`grep -rn "EvolutionLoop\|theo_domain::evolution\|CorrectionStrategy" crates/`:** 0 production matches (only `tests/read_real_trajectory.rs` historical fixture string + Edit's own docstring in lesson.rs commentary).
- **Next:** Phase 6 cross-validation + Phase 7 dogfood.

## 2026-05-14T18:50 — Phase 6 + 7 complete: `evolution-loop-removal` SHIPPABLE

- **Phase 6 cross-validation:** APROVADO COM RESSALVAS (4 divergências, 3 MINOR + 1 INFO; zero BLOCKER/CRITICAL/MAJOR). Report: `.claude/knowledge-base/reviews/cross-validation/evolution-loop-removal-xval-2026-05-14.md`.
- MINOR-1: theo-domain delta foi −4 não −3 (esqueci `max_evolution_attempts_is_five`).
- MINOR-2: workspace delta foi −5 não −4 (consequência de MINOR-1).
- MINOR-3: pytest indisponível no env; substituído por py_compile + functional smoke (old-schema JSON parseia, novos attrs verificavelmente ausentes).
- INFO-1: gates pré-existentes (duplication 60 pairs, wiring 202 orphan APIs, bwrap sandbox 4 fails, apps/theo-desktop unwrap/panic) confirmadamente zero matches em evolution-related; out of scope.
- **Phase 7 dogfood:** PASS. Health score 78/100. Report: `.claude/knowledge-base/reviews/dogfood/dogfood-evolution-loop-removal-2026-05-14.md`.
- **OAuth Codex auth:** completed via `theo login --provider chatgpt-codex` (user authorized device code `1GG6-7DMCY`).
- **Real-LLM E2E single-shot:** `theo agent --headless`, model `gpt-5.4`, 7709 input + 60 output tokens, `success: true`, `convergence_rate: 1.0`, `error_class: "solved"`, duration 2.9s.
- **Real-LLM E2E pilot:** `theo pilot --max-iterations 2`, 3 loops, exit `CircuitBreakerOpen` (correct behavior — no-edit task). 33.2k total tokens.
- **Negative runtime-metric proof:** all 4 produced trajectories (1 agent + 3 pilot) have **zero** `evolution_attempts|evolution_success` matches. `loop_metrics` schema now exactly: `{budget_utilization, convergence_rate, done_blocked_count, phase_distribution, total_iterations}` — empirical confirmation that the schema deletion shipped.
- **Final state — plan DoD complete:**
- ✅ All 7 phases executed
- ✅ All Acceptance Criteria + Global DoD items checked
- ✅ Workspace tests 3,562 (baseline 3,567 − 5)
- ✅ Clippy clean on affected crates (theo-domain, theo-agent-runtime, theo-application)
- ✅ `make check-arch`, `check-changelog`, `check-sizes`, `check-deprecated` all green
- ✅ Frontend `npm run build` exit 0; `npm test` 44 tests pass
- ✅ Python harness syntax clean; functional smoke test confirms backward-compat parse
- ✅ Dogfood PASS, health 78/100, shippable
- **Net code delta:** ~487 LOC Rust deleted + ~22 LOC TS/Python deleted + 1 test removed + 2 dead JSONL schema fields removed.

## 2026-05-14T19:10 — MINOR-3 RESOLVED: pytest formal validation completed

- Bootstrapped pytest 9.0.3 in env via `python3 get-pip.py --user --break-system-packages` + `pip install --user --break-system-packages pytest`.
- **Formal delta proven ZERO via `git stash`:**
- Pre-deletion: 16 failed / 257 passed (test_thresholds.py FileNotFoundError fixtures, env-dependent)
- Post-deletion: 16 failed / 257 passed (IDENTICAL)
- **`test_headless.py` 44/44 PASS post-deletion** — this is the file modified by T5.6.
- Zero `evolution`-related failures in either run (grep on output).
- **DoD #4 now formally met.** Cross-validation MINOR-3 marked RESOLVED.

## 2026-05-14T19:30 — Wiring gate delta hunted to ZERO

- **Detected:** `make check-wiring` orphan count was 201 pre-deletion → 202 post-deletion. Investigated per user feedback memory ("INCONCLUSIVE means investigate, not defer").
- **Root cause:** name-collision artifact. The wiring gate uses name-based grep (`pub fn record_attempt`). Pre-deletion, both `EvolutionLoop::record_attempt` and `SessionState::record_attempt` (orphan in disguise) collided on the same caller mention in `pilot/run_loop.rs:104`, hiding the SessionState orphan. Post-deletion of EvolutionLoop, only `SessionState::record_attempt` remained — zero callers — correctly flagged as orphan.
- **Fix:** deleted `SessionState::record_attempt` (14 LOC) per project golden rule "DEAD CODE DEVE SER DELETADO" + wiring-allowlist.txt protocol option 3. Not a scope expansion — direct consequence of evolution deletion exposing the artifact.
- **Verified:** `cargo build -p theo-agent-runtime` clean; wiring orphan count 201 → 201 (DELTA ZERO restored).
- **All 4 pre-existing-failure gates have DELTA ZERO post-deletion:**
- check-unwrap: 5 → 5
- check-panic: 15 → 15
- check-duplication: 65 → 65
- check-wiring: 201 → 201
- **Per-failure scope:** zero of these gates flag `evolution`-related items. All pre-existing baseline issues outside this plan's scope.

---

## 2026-05-14T18:30 — theo-home-unification implemented (phases 0-6)
Expand Down
Loading
Loading