feat(test-maturity): no-mock OAuth Codex audit per domain (plan + imp… by usetheodev · Pull Request #24 · usetheodev/theo-code

usetheodev · 2026-05-15T12:52:16Z

…l + reviews)

Implements the test-maturity-audit-no-mock plan v1.1: a reproducible audit of test maturity across 20 domains (CLAUDE.md "Domain Status") scoring 6 dimensions (pyramid, failure_coverage, determinism, invariants, realism×2, boundary) with LOC-weighted global score. Validation runs real OAuth Codex E2E scenarios; zero mocks in the audit code (ADR D2 enforced by preflight-no-mock.sh).

Entry points (opt-in, costs USD per run, NOT in make audit):

make check-test-maturity — full audit with OAuth Codex E2E (~$0.30, ~1min)
make check-test-maturity-skip-e2e — structural-only (no cost)
make check-test-maturity-report — print latest report

Plan: 13 tasks across 5 phases. 8 ADRs (D1-D8). 12 edge cases pinned by tests. Halt-on-anomaly via D8a (auth re-validation per scenario), D8b (trajectory mtime filter), D8c (RunCompleted gate — halt if cost unknown).

Validation evidence (this session):

66/66 Python tests green
1 real OAuth Codex audit run: 9 scenarios, $0.286/$2.00, 1 PASS + 8 diagnostic FAILs (the 8 are genuine signals — agent didn't dispatch expected tools for short prompts)
Cross-validation: APROVADO COM RESSALVAS (13/13 tasks, 0 BLOCKER/CRITICAL)
Dogfood QA: SHIPPABLE WITH CAVEATS, health 85/100, 0 CRITICAL attributable (4 pre-existing bwrap host-kernel failures unattributable)

Scope: Rust workspace only. apps/theo-ui (TS) and apps/theo-benchmark (Python) have their own suites and are out of scope (declared in report methodology).

Files (38, +4953 LOC):

scripts/test-maturity/ (28 files: 12 source Python/bash, 4 data yaml, 1 jinja2 template, 11 test_*.py)
.claude/knowledge-base/plans/test-maturity-audit-no-mock-plan.md (v1.1)
4 reviews under .claude/knowledge-base/reviews/ (test-maturity report, cross-validation, edge-case, dogfood)
Makefile (+3 targets, NOT in make audit)
CLAUDE.md (+1 line in Extended gates)
index.md + log.md updated

…l + reviews) Implements the test-maturity-audit-no-mock plan v1.1: a reproducible audit of test maturity across 20 domains (CLAUDE.md "Domain Status") scoring 6 dimensions (pyramid, failure_coverage, determinism, invariants, realism×2, boundary) with LOC-weighted global score. Validation runs real OAuth Codex E2E scenarios; zero mocks in the audit code (ADR D2 enforced by preflight-no-mock.sh). Entry points (opt-in, costs USD per run, NOT in `make audit`): - `make check-test-maturity` — full audit with OAuth Codex E2E (~$0.30, ~1min) - `make check-test-maturity-skip-e2e` — structural-only (no cost) - `make check-test-maturity-report` — print latest report Plan: 13 tasks across 5 phases. 8 ADRs (D1-D8). 12 edge cases pinned by tests. Halt-on-anomaly via D8a (auth re-validation per scenario), D8b (trajectory mtime filter), D8c (RunCompleted gate — halt if cost unknown). Validation evidence (this session): - 66/66 Python tests green - 1 real OAuth Codex audit run: 9 scenarios, $0.286/$2.00, 1 PASS + 8 diagnostic FAILs (the 8 are genuine signals — agent didn't dispatch expected tools for short prompts) - Cross-validation: APROVADO COM RESSALVAS (13/13 tasks, 0 BLOCKER/CRITICAL) - Dogfood QA: SHIPPABLE WITH CAVEATS, health 85/100, 0 CRITICAL attributable (4 pre-existing bwrap host-kernel failures unattributable) Scope: Rust workspace only. apps/theo-ui (TS) and apps/theo-benchmark (Python) have their own suites and are out of scope (declared in report methodology). Files (38, +4953 LOC): - scripts/test-maturity/ (28 files: 12 source Python/bash, 4 data yaml, 1 jinja2 template, 11 test_*.py) - .claude/knowledge-base/plans/test-maturity-audit-no-mock-plan.md (v1.1) - 4 reviews under .claude/knowledge-base/reviews/ (test-maturity report, cross-validation, edge-case, dogfood) - Makefile (+3 targets, NOT in `make audit`) - CLAUDE.md (+1 line in Extended gates) - index.md + log.md updated

usetheodev · 2026-05-15T12:52:31Z

@copilot resolve the merge conflicts in this pull request

Co-authored-by: usetheodev <241186089+usetheodev@users.noreply.github.com>

Copilot · 2026-05-15T12:55:53Z

@copilot resolve the merge conflicts in this pull request

Done in commit bf6c847: I merged develop into this branch and resolved all merge conflicts in CLAUDE.md, .claude/knowledge-base/index.md, and .claude/knowledge-base/log.md while keeping both sides’ updates.

Copilot started work on behalf of usetheodev May 15, 2026 12:52 View session

Merge develop and resolve PR conflicts

bf6c847

Co-authored-by: usetheodev <241186089+usetheodev@users.noreply.github.com>

Copilot finished work on behalf of usetheodev May 15, 2026 12:56

usetheodev merged commit ce42406 into develop May 15, 2026
4 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(test-maturity): no-mock OAuth Codex audit per domain (plan + imp…#24

feat(test-maturity): no-mock OAuth Codex audit per domain (plan + imp…#24
usetheodev merged 2 commits into
developfrom
worktree-mellow-wiggling-karp

usetheodev commented May 15, 2026

Uh oh!

usetheodev commented May 15, 2026

Uh oh!

Copilot AI commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

usetheodev commented May 15, 2026

Uh oh!

usetheodev commented May 15, 2026

Uh oh!

Copilot AI commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants