Skip to content

P3: production-hardening — V&V enforce-or-demote, resilience/perf gate, secret masking, e2e de-simulation #877

@kcenon

Description

@kcenon

What

P3 depth bundle — the production-hardening work that depends on the keep-or-kill decision and the unified foundation. Tracks four sub-streams; split into separate issues if any grows large.

Why

These close the remaining gap between "strong skeleton" and "operable in production": enforced (or honestly retired) V&V, real runtime resilience, leak-proof secret masking, and tests on the riskiest paths. Each gets blocking-CI acceptance (the draft plan had none).

Sub-streams

1. V&V enforce-or-demote (depends on keep-or-kill decision)

StageVerifier/RtmBuilder/compliance writers are never invoked by the orchestrator and haltOnVerificationFailure has no consumer. Per the decision: either wire V&V into the run loop as an enforced gate (thread haltOnVerificationFailure into stage execution) or demote/extract the TS stack. Acceptance: if enforced, an e2e test shows a failing verification halts the pipeline; if demoted, the unused surface is extracted/documented and the dangling config is removed.

2. Resilience + cost + blocking perf gate

Honor maxParallelAgents; thread AbortController into the timeout path so a StageTimeoutError cancels in-flight SDK work; stop rebuilding the adapter per stage. Fix performance.yml: remove continue-on-error: true from BOTH the benchmark step and the regression-check step (currently the gate cannot fail even after the key mismatch is fixed) and fix the per-results key/path mismatch. Acceptance: tests assert maxParallelAgents is honored and AbortController aborts in-flight work on timeout; an intentional perf regression FAILS CI.

3. Secret masking (two sites)

CommandSanitizer.maskSensitiveArg (CommandSanitizer.ts:829, brittle fixed-length regex) and SecretManager.mask (SecretManager.ts:163, value-based). Add github_pat_, gho_, ghs_, ghr_, ghp_ and variable-length tokens regardless of length; decide whether SecretManager.mask needs token-format masking in addition to known-value masking. Acceptance: a fine-grained/classic GitHub token in a command arg is redacted regardless of length; optionally flip security.yml Gitleaks to blocking on a planted secret.

4. Test-gap closure + e2e de-simulation (depends on keep-or-kill execution)

Add tests for subsystems the decision resolved to keep/wire (monitoring/controller). Replace the two e2e tests that assert on test-local simulations with real pipeline-execution assertions (extend orchestration.e2e.test.ts / pipeline.e2e.test.ts). Acceptance: kept subsystems have e2e reachability tests; the two simulation-asserting e2e tests now exercise the real pipeline path.

Part of #866.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance and resource managementsecuritySecurity relatedtestingTesting relatedvnvVerification & Validation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions