make coverage deterministic and add hook branch tests (fixes flaky main)#41
Merged
Conversation
The coverage job ran red on main: under coverage, daemon shutdown is slow enough that terminate's 2s wait times out, kill runs, and the unreaped Popen later trips a ResourceWarning in Popen.__del__ that filterwarnings=error escalates into a failure. The plain test matrix passed because GC timing never surfaced it. Route every daemon teardown through a _stop_daemon helper that waits after kill (and bumps the terminate timeout to 10s).
The coverage job measured the hooks and daemon via subprocess capture, which was non-deterministic: two identical full runs gave 60% and 67%, daemon swinging 0-81%, so the 65% gate could flakily fail. Switch to in-process measurement. A hook_runner fixture reloads _common + the hook, feeds a stdin payload, calls main(), and asserts behavior at each branch (gate decisions, warn/ask/block, missing exit code, verified vs unverified claim). Drop COVERAGE_PROCESS_START/combine and omit daemon.py (subprocess-only, behavior-tested by test_daemon.py). Coverage is now a stable 69%; gate raised to 67. The action hooks go from ~15-33% to 73-93%. #39 tracks the climb to the Silver 80% target.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the coverage CI job two ways - unblocks
main(the flaky daemon failure) and makes the coverage measurement deterministic - and adds sophisticated in-process branch tests for the action hooks.What changed
Unblock main (
tests/test_daemon.py)kill()ran but thePopenwas never reaped, tripping aResourceWarninginPopen.__del__thatfilterwarnings=errorescalated. All teardown now goes through a_stop_daemonhelper that waits after kill.Deterministic coverage (
pyproject.toml,.github/workflows/ci.yml)COVERAGE_PROCESS_START+ a.pth+ combine), which was non-deterministic: two identical runs gave 60% and 67%, withdaemon.pyswinging 0-81%. A 65% gate on that is a latent flaky failure.daemon.pyis omitted - it only runs as a spawned subprocess, so in-process coverage can't see it; it stays behavior-tested bytests/test_daemon.py. Dropping the subprocess instrumentation is also what removes the daemon-shutdown slowdown, so it is the primary fix for the red main.Sophisticated branch tests (not happy-path)
hook_runnerfixture (tests/conftest.py): reloads_common+ the hook, feeds a stdin payload, callsmain(), asserts the outcome.tests/test_hook_post_tool_edit.py,tests/test_hook_pre_tool_bash.py,tests/test_hook_stop.py; extendedtests/test_post_tool_bash.py. Each asserts behavior at a branch: gatewarn/ask/blockdecisions, missing/non-zero exit code, verified vs unverified success claim, push/PR claim recording, transcript parsing.Coverage
Now a stable 69% (deterministic across runs), gate raised 65 -> 67. Action hooks:
hook_pre_tool_bash93%,hook_post_tool_edit89%,hook_stop79%,hook_post_tool_bash73% (were ~15-33%). The earlier 65% gate was measuring subprocess-capture noise; the new number is lower in spirit but real and stable.The climb to the Silver
test_statement_coverage80target is tracked in #39 (roadmap updated).No runtime behavior change, no version bump (tests + CI + docs only).