Skip to content

Epic: Remediation — integration & consolidation pass to production-readiness #866

@kcenon

Description

@kcenon

What

Tracking epic for the remediation plan produced by a deep analysis and an adversarial re-review of the ad-sdlc codebase. The system has a strong, code-verified core — a single SdkExecutionAdapter execution seam, defense-in-depth security (shell-less execFile + CommandSanitizer, path-traversal defense), a 228-code error taxonomy used across ~151 modules, and an honest test pyramid — but it is not production-ready as-is. The gap is concentrated, not diffuse.

Why

A real end-to-end run currently hits several integration-boundary failures, runtime observability is effectively absent, and the documentation describes a prior generation of the architecture (pre-v0.1-cutover). For a system whose premise is doc-to-code traceability, that drift is audit-relevant. This epic sequences the fixes so the project can credibly claim the production-grade, audit-traceable posture its docs imply.

Scope and phasing

  • P0 — boundary correctness (parallel, low blast radius): the defects a first real run will hit.
  • P1 — foundation + decisions: unify the divergent SSOT contracts, add enforcement gates, and make the keep-or-kill decision (decision only, no execution).
  • P2 — execution + security + dedupe: execute the keep-or-kill dispositions, harden SSRF, reconcile docs, collapse duplicated retry/circuit-breaker engines.
  • P3 — depth: V&V enforce-or-demote, resilience/cost, secret masking, test-gap closure.

Critical correction from the re-review (read before any deletion)

The four largest "orphaned" subsystems are NOT dead code — they are part of the published library API via src/index.ts (export * of control-plane / data-plane / agents / utilities into dist/index.d.ts):

  • monitoring/ (10,872 LOC) — public via utilities/index.ts:319 export * as Monitoring; the reason the optional @opentelemetry/* peerDeps exist.
  • controller/ (8,820 LOC) — public via control-plane/index.ts:64-91 + agents/index.ts:725; worker/ statically imports its types.
  • ControlPlane / DataPlane facades (1,672 LOC) — public via src/index.ts:19,24; zero production consumers but live named exports.
  • SQLite/Redis scratchpad backends — selectable via BackendFactory.create (case 'sqlite'/'redis'), public via scratchpad/index.ts:168-171, backed by optional peerDeps.

Deleting any is a SemVer-breaking change. The default disposition is therefore extract-to-optional-subpath, gated on the keep-or-kill decision child issue. The deletion gate is "public-export OR run-loop reachability", not run-loop reachability alone.

Children

P0 — boundary correctness (parallel, low blast radius):

P1 — foundation + decision:

P2 — security + dedupe (execution issues filed after #874):

P3 — depth:

Note: WS2 keep-or-kill execution issues (extract-to-subpath / keep-wire / delete per subsystem) are intentionally deferred until #874 records the per-subsystem disposition and the SemVer decision.

Context

Acceptance

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic level issue containing multiple sub-issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions