Skip to content

make coverage deterministic and add hook branch tests (fixes flaky main)#41

Merged
pzverkov merged 3 commits into
mainfrom
fix/daemon-test-reap-under-coverage
Jun 3, 2026
Merged

make coverage deterministic and add hook branch tests (fixes flaky main)#41
pzverkov merged 3 commits into
mainfrom
fix/daemon-test-reap-under-coverage

Conversation

@pzverkov

@pzverkov pzverkov commented Jun 3, 2026

Copy link
Copy Markdown
Member

Summary

Fixes the coverage CI job two ways - unblocks main (the flaky daemon failure) and makes the coverage measurement deterministic - and adds sophisticated in-process branch tests for the action hooks.

What changed

Unblock main (tests/test_daemon.py)

  • Under coverage, the daemon shut down slower than the tests' 2s terminate wait, so kill() ran but the Popen was never reaped, tripping a ResourceWarning in Popen.__del__ that filterwarnings=error escalated. All teardown now goes through a _stop_daemon helper that waits after kill.

Deterministic coverage (pyproject.toml, .github/workflows/ci.yml)

  • The job measured the hooks and daemon via subprocess capture (COVERAGE_PROCESS_START + a .pth + combine), which was non-deterministic: two identical runs gave 60% and 67%, with daemon.py swinging 0-81%. A 65% gate on that is a latent flaky failure.
  • Switched to in-process measurement (no subprocess machinery). daemon.py is omitted - it only runs as a spawned subprocess, so in-process coverage can't see it; it stays behavior-tested by tests/test_daemon.py. Dropping the subprocess instrumentation is also what removes the daemon-shutdown slowdown, so it is the primary fix for the red main.

Sophisticated branch tests (not happy-path)

  • New hook_runner fixture (tests/conftest.py): reloads _common + the hook, feeds a stdin payload, calls main(), asserts the outcome.
  • New tests/test_hook_post_tool_edit.py, tests/test_hook_pre_tool_bash.py, tests/test_hook_stop.py; extended tests/test_post_tool_bash.py. Each asserts behavior at a branch: gate warn/ask/block decisions, missing/non-zero exit code, verified vs unverified success claim, push/PR claim recording, transcript parsing.

Coverage

Now a stable 69% (deterministic across runs), gate raised 65 -> 67. Action hooks: hook_pre_tool_bash 93%, hook_post_tool_edit 89%, hook_stop 79%, hook_post_tool_bash 73% (were ~15-33%). The earlier 65% gate was measuring subprocess-capture noise; the new number is lower in spirit but real and stable.

The climb to the Silver test_statement_coverage80 target is tracked in #39 (roadmap updated).

No runtime behavior change, no version bump (tests + CI + docs only).

pzverkov added 2 commits June 3, 2026 09:22
The coverage job ran red on main: under coverage, daemon shutdown is slow enough that terminate's 2s wait times out, kill runs, and the unreaped Popen later trips a ResourceWarning in Popen.__del__ that filterwarnings=error escalates into a failure. The plain test matrix passed because GC timing never surfaced it. Route every daemon teardown through a _stop_daemon helper that waits after kill (and bumps the terminate timeout to 10s).
The coverage job measured the hooks and daemon via subprocess capture, which was non-deterministic: two identical full runs gave 60% and 67%, daemon swinging 0-81%, so the 65% gate could flakily fail. Switch to in-process measurement. A hook_runner fixture reloads _common + the hook, feeds a stdin payload, calls main(), and asserts behavior at each branch (gate decisions, warn/ask/block, missing exit code, verified vs unverified claim). Drop COVERAGE_PROCESS_START/combine and omit daemon.py (subprocess-only, behavior-tested by test_daemon.py). Coverage is now a stable 69%; gate raised to 67. The action hooks go from ~15-33% to 73-93%. #39 tracks the climb to the Silver 80% target.
@pzverkov pzverkov merged commit 9dff41b into main Jun 3, 2026
19 checks passed
@pzverkov pzverkov deleted the fix/daemon-test-reap-under-coverage branch June 3, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant