What
P3 depth bundle — the production-hardening work that depends on the keep-or-kill decision and the unified foundation. Tracks four sub-streams; split into separate issues if any grows large.
Why
These close the remaining gap between "strong skeleton" and "operable in production": enforced (or honestly retired) V&V, real runtime resilience, leak-proof secret masking, and tests on the riskiest paths. Each gets blocking-CI acceptance (the draft plan had none).
Sub-streams
1. V&V enforce-or-demote (depends on keep-or-kill decision)
StageVerifier/RtmBuilder/compliance writers are never invoked by the orchestrator and haltOnVerificationFailure has no consumer. Per the decision: either wire V&V into the run loop as an enforced gate (thread haltOnVerificationFailure into stage execution) or demote/extract the TS stack. Acceptance: if enforced, an e2e test shows a failing verification halts the pipeline; if demoted, the unused surface is extracted/documented and the dangling config is removed.
2. Resilience + cost + blocking perf gate
Honor maxParallelAgents; thread AbortController into the timeout path so a StageTimeoutError cancels in-flight SDK work; stop rebuilding the adapter per stage. Fix performance.yml: remove continue-on-error: true from BOTH the benchmark step and the regression-check step (currently the gate cannot fail even after the key mismatch is fixed) and fix the per-results key/path mismatch. Acceptance: tests assert maxParallelAgents is honored and AbortController aborts in-flight work on timeout; an intentional perf regression FAILS CI.
3. Secret masking (two sites)
CommandSanitizer.maskSensitiveArg (CommandSanitizer.ts:829, brittle fixed-length regex) and SecretManager.mask (SecretManager.ts:163, value-based). Add github_pat_, gho_, ghs_, ghr_, ghp_ and variable-length tokens regardless of length; decide whether SecretManager.mask needs token-format masking in addition to known-value masking. Acceptance: a fine-grained/classic GitHub token in a command arg is redacted regardless of length; optionally flip security.yml Gitleaks to blocking on a planted secret.
4. Test-gap closure + e2e de-simulation (depends on keep-or-kill execution)
Add tests for subsystems the decision resolved to keep/wire (monitoring/controller). Replace the two e2e tests that assert on test-local simulations with real pipeline-execution assertions (extend orchestration.e2e.test.ts / pipeline.e2e.test.ts). Acceptance: kept subsystems have e2e reachability tests; the two simulation-asserting e2e tests now exercise the real pipeline path.
Part of #866.
What
P3 depth bundle — the production-hardening work that depends on the keep-or-kill decision and the unified foundation. Tracks four sub-streams; split into separate issues if any grows large.
Why
These close the remaining gap between "strong skeleton" and "operable in production": enforced (or honestly retired) V&V, real runtime resilience, leak-proof secret masking, and tests on the riskiest paths. Each gets blocking-CI acceptance (the draft plan had none).
Sub-streams
1. V&V enforce-or-demote (depends on keep-or-kill decision)
StageVerifier/RtmBuilder/compliance writers are never invoked by the orchestrator andhaltOnVerificationFailurehas no consumer. Per the decision: either wire V&V into the run loop as an enforced gate (threadhaltOnVerificationFailureinto stage execution) or demote/extract the TS stack. Acceptance: if enforced, an e2e test shows a failing verification halts the pipeline; if demoted, the unused surface is extracted/documented and the dangling config is removed.2. Resilience + cost + blocking perf gate
Honor
maxParallelAgents; threadAbortControllerinto the timeout path so aStageTimeoutErrorcancels in-flight SDK work; stop rebuilding the adapter per stage. Fixperformance.yml: removecontinue-on-error: truefrom BOTH the benchmark step and the regression-check step (currently the gate cannot fail even after the key mismatch is fixed) and fix the per-results key/path mismatch. Acceptance: tests assertmaxParallelAgentsis honored andAbortControlleraborts in-flight work on timeout; an intentional perf regression FAILS CI.3. Secret masking (two sites)
CommandSanitizer.maskSensitiveArg(CommandSanitizer.ts:829, brittle fixed-length regex) andSecretManager.mask(SecretManager.ts:163, value-based). Addgithub_pat_,gho_,ghs_,ghr_,ghp_and variable-length tokens regardless of length; decide whetherSecretManager.maskneeds token-format masking in addition to known-value masking. Acceptance: a fine-grained/classic GitHub token in a command arg is redacted regardless of length; optionally flipsecurity.ymlGitleaks to blocking on a planted secret.4. Test-gap closure + e2e de-simulation (depends on keep-or-kill execution)
Add tests for subsystems the decision resolved to keep/wire (monitoring/controller). Replace the two
e2etests that assert on test-local simulations with real pipeline-execution assertions (extendorchestration.e2e.test.ts/pipeline.e2e.test.ts). Acceptance: kept subsystems have e2e reachability tests; the two simulation-asserting e2e tests now exercise the real pipeline path.Part of #866.