Skip to content

feat(runtime): controlled test/build/run command executor (#1387)#1546

Merged
oscharko merged 1 commit into
feat/keiko-agent-native-editor-foundation-and-runtimefrom
claude/issue-1387-command-executor
Jun 26, 2026
Merged

feat(runtime): controlled test/build/run command executor (#1387)#1546
oscharko merged 1 commit into
feat/keiko-agent-native-editor-foundation-and-runtimefrom
claude/issue-1387-command-executor

Conversation

@oscharko

Copy link
Copy Markdown
Contributor

Summary

Implements the governed test/build/run command executor for Issue #1387. Keiko can run a closed catalog of workspace tasks (discovered from package.json scripts) through governed local processes with cancellation, output caps, and audit metadata. A run names a discovered task id — never free-form argv — and executes through the single governed spawn boundary (keiko-tools runCommand, ADR-0043 D2), inheriting no-shell execution, name-allowlisted env, ephemeral HOME, workspace-contained cwd, byte-capped + redacted output, timeout (SIGTERM→SIGKILL), and AbortController cancellation.

Refs #1387

Scope

  • In scope: command policy contract; BFF command runner + catalog/result/event routes; UI command surface; tests and security notes.
  • Out of scope: arbitrary shell terminal; background daemon management; remote execution; container execution (separate child issue).

Reuse And No-Duplication

  • Existing Keiko functionality was inspected before implementation.
  • This PR reuses, extends, generalizes, or consolidates existing functionality where practical.
  • Any new implementation is limited to a documented capability gap in the linked issue.
  • This PR does not introduce a parallel execution/policy/evidence/UI subsystem where an existing subsystem can be extended.
  • Refactoring or consolidation was considered when existing functionality was close but not shaped for this change.

Reuse / gap rationale: the runner reuses runCommand (the only spawn boundary, ADR-0043 D2), DEFAULT_SANDBOX_POLICY, readWorkspaceFile + path containment (keiko-workspace), EvidenceStore (keiko-evidence), deepRedactStrings, and mirrors the ADR-0018 TerminalExecutionManager/routes/evidence pattern. The genuine gap — absent today — is a task-oriented layer: package-script discovery, a separate narrow executor allowlist (COMMAND_TASK_RULES, distinct from DEFAULT_COMMAND_RULES/TERMINAL_COMMAND_RULES), discovered-task-id validation (closes the npm run <untrusted-name> RCE), and a structured result with failureReason. The read-only terminal surface is an arbitrary-operand inspection tool and does not satisfy this scope.

Acceptance criteria mapping

Issue AC Where
Only allowed commands can run COMMAND_TASK_RULES + discovered-task-id validation; unknown id → TASK_NOT_FOUND before spawn (command-runner.ts, tests)
Commands can be cancelled AbortController per run; DELETE /api/commands/runs/:runId; UI Cancel armed via SSE run-started/requestId (manager + routes + widget tests)
Output bounded, does not freeze UI runCommand byte cap + truncated; UI renders already-capped output (truncation test, e2e)
Results include exit code, duration, truncation, failure reason CommandTaskRunResult (contract + tests)
Agents can reuse the same command API policy + result types in keiko-contracts; executor is the shared CommandRunnerManager composing runCommand (no second path)

Deliverables

  • Command policy contract — keiko-contracts/src/command-runner.ts
  • BFF command runner — keiko-server/src/command-runner.ts
  • Command result and event routes — keiko-server/src/command-runner-routes.ts
  • UI command surface — keiko-ui/.../CommandsWidget.tsx + lib/commands-api.ts
  • Tests and security notes — unit/integration/e2e + docs/command-runner/security-notes.md

Verification

Required:

  • Required GitHub checks expected to pass before merge.
  • Local verification commands listed below.
  • Reuse/extension/generalization evidence listed above.

Local verification:

npm run typecheck                  PASS (build:packages + check:package-graph + tsc)
npm run lint                       PASS
npm run arch:check                 PASS (0 dependency violations; import-policy PASS)
npm run arch:check:negative        PASS (40 negative fixtures caught)
npm run test:coverage:quality      PASS (12571 tests; keiko-server 87.55% / keiko-contracts within floor)
npm run test:coverage:ui + check:coverage:ui   PASS (keiko-ui 88.99% lines)
npm run check:version-consistency  PASS
npm run build:ui + check:editor-bundle-size --require-static-export   PASS
playwright --grep @command-runner-1387   1 passed (real Studio shell, mocked BFF)
new tests: contracts 19, server manager 16, routes 13, evidence 3, ui 13, descriptor-meta 7
command-runner file coverage: server manager 97.59% L / 98.80% B; errors/evidence 100%; routes 96.42% L; contract 93.38% L

Select only what applies:

  • UI behavior verified by component (jsdom + axe) tests and a Playwright browser smoke.
  • Core logic covered by unit, integration, and fixture tests.
  • Security-sensitive change reviewed for trust boundaries, secrets, external calls, and generated artifacts (see docs/command-runner/security-notes.md).
  • Not applicable items are explained below.

Not applicable rationale:

  • Supply-chain/package-surface: no new package dependencies or exports entries (additive index.ts re-exports only).
  • Release release:check: no release-path semantics changed.

Review And Closure

  • The PR implements only the linked issue scope.
  • Issue acceptance criteria and closure evidence are updated only where evidence exists.
  • Delivery board status updated (In Progress; PR Open to follow).
  • Use Resolves #1387 only when this PR should close the issue — kept as Refs #1387; the issue is closed manually after merge once every Definition-of-Done gate is satisfied.

Risk Notes

  • Execution is gated to discovered package-script task ids mapped to a frozen npm run <script> argv; script bodies are the project's own trusted workspace content, run under the minimized env + containment of runCommand.
  • Dual-layer secret redaction (env-value in runCommand; structural in the BFF for response and SSE frames); audit evidence is content-free (counts/enums only, taskType: "command-run").
  • commands window is restorable (fs-reference) and holds only a project-path reference; no secrets persisted.

🤖 Generated with Claude Code

Implements the governed command runner for Issue #1387: a closed catalog of
test/build/run tasks discovered from package scripts, executed through the
single governed spawn boundary (keiko-tools runCommand) with workspace
containment, output byte caps, timeout, cancellation, content-free evidence,
and dual-layer secret redaction. Reuses the ADR-0043/ADR-0006 spawn boundary
and the ADR-0018 terminal manager pattern rather than adding a second
execution path; a run names a discovered task id, never free-form argv.

- contracts: command-runner wire contract (catalog, run request/result,
  events, COMMAND_TASK_RULES allowlist) + hand-rolled validators; adds the
  additive "command-run" EvidenceTaskType
- server: CommandRunnerManager (package-script discovery + governed execution
  composing runCommand) + routes (/api/commands/catalog|runs|events) +
  content-free run evidence, wired into deps/routes
- ui: CommandsWidget task surface + commands-api client + window registration
- tests: contracts/server/ui unit + integration coverage and an e2e browser
  smoke for catalog discovery, run, and structured result
- docs: command-runner security notes

Refs #1387

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@oscharko oscharko merged commit 8be49af into feat/keiko-agent-native-editor-foundation-and-runtime Jun 26, 2026
12 checks passed
@oscharko oscharko deleted the claude/issue-1387-command-executor branch June 26, 2026 09:33
oscharko added a commit that referenced this pull request Jun 26, 2026
Addresses review findings on PR #1546 (security audit + verifier + PR review):

- contracts: cap taskId (<=256) and requestId (<=128, token charset) at the
  parse boundary so an oversized/non-token id cannot reach the manager, audit
  ledger, or SSE fan-out (the 16 KB body cap was the only prior backstop) + tests
- server: document the per-run catalog re-discovery as an intentional
  untrusted-taskId re-validation; add a test proving a throwing SSE subscriber
  does not block fan-out to other subscribers
- routes: assert Layer-2 redaction is applied to every SSE event frame
- ui: render the project path as the Tasks window subtitle (connectionUtils)
- e2e: deterministically mock the SSE channel and assert a run lifecycle event
  reaches the bounded events log
- docs: correct the audit-evidence note (the standard manifest workspaceRoot
  path is retained; argv values, output bytes, and secrets are excluded)

Refs #1387

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant