Skip to content

v0.8.61: Default independent shell and verifier work to background jobs #3212

Description

@Hmbown

Problem

The runtime still treats too much shell, verifier, and worker work as foreground work. Agents start checks, searches, builds, verifier gates, or child workers and then wait even when the result is not an immediate dependency. This makes the main turn feel stuck and prevents useful parallel progress.

This is the main UX pain point: "allowed shell" is not enough if the agent starts allowed work and then blocks its entire turn waiting for it.

Current CodeWhale already has important substrate:

  • exec_shell supports background = true and returns a task id.
  • task_shell_start wraps shell work as a background task.
  • exec_shell_wait and task_shell_wait can inspect a background task.
  • Turn cancellation during exec_shell_wait can leave the background process running.
  • Tool-batch tests already prove background shell/verifier starts can join parallel read-only work.

The missing slice is making this the default path the model actually chooses, and making waits nonblocking unless the model explicitly asks for a barrier.

Target Design

Independent shell and verifier work should default to background execution with visible job tracking and automatic completion notification. Foreground execution should be reserved for short commands whose result is required before the next decision.

Rules of thumb for the model/runtime:

  • Start long or independent commands in the background immediately.
  • Keep inspecting, editing, or coordinating while they run.
  • Let completion arrive automatically into the transcript/status stream.
  • Poll only when current progress is genuinely needed before continuing.
  • Use a deliberate blocking wait only at final verification gates or true dependencies.

This should apply to:

  • exec_shell / task_shell_start
  • run_verifiers and cargo-style gates
  • Fleet workers and the user-facing "sub-agent" surface once it is backed by Fleet
  • structured user questions / steering where the parent can keep working safely

Fit With Current CodeWhale

  • Change model-visible tool descriptions/results so background tasks explicitly say: "returns immediately; you will be notified when done; do not poll/wait unless you need early output."
  • Change wait tool defaults toward snapshot/nonblocking behavior. Today exec_shell_wait defaults wait = true; make the safer default wait = false or add a canonical nonblocking read path and teach the model to use it.
  • Make task_shell_wait and exec_shell_wait visibly different from join: they inspect progress by default; blocking requires an explicit flag such as block = true / wait = true.
  • When a foreground command times out or looks long-running, the recovery hint should prefer "restart in background and continue other work," not merely "rerun with a longer timeout."
  • Do not let agent_eval block:true, foreground Agent/Fleet workers, or foreground verifier gates become the default for independent evidence collection.
  • Preserve existing user controls: /jobs, cancel, wait, output tail, and foreground-to-background detach.

Acceptance Criteria

  • Shell execution has a clear scheduling policy: foreground only when dependent, background for long-running or independent work.
  • The model/tool descriptions nudge toward background execution for builds, tests, verifiers, servers, broad searches, polling, sleeps, and long diagnostics.
  • Starting a background shell/verifier returns a task id immediately and includes metadata that a completion notification will arrive automatically.
  • Parallel read-only shell commands can run concurrently when safe.
  • run_verifiers and cargo-style gates can start while the agent continues inspecting files or implementing unrelated follow-up work.
  • A cancelled/interrupted wait does not kill the underlying background job unless explicitly requested.
  • Background task completion produces a transcript/status event the agent can consume without manual polling.
  • Tests cover background-start hints, wait default behavior, wait/cancel behavior, automatic-completion metadata, and at least one parallel read-only shell scenario.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestreliabilityReliability, flaky behavior, retries, fallbacks, and robustnesstuiTerminal UI behavior, rendering, or interactionv0.8.61Targeted for CodeWhale v0.8.61workflow-runtimeWorkflow IR, executor, control flow, and replay runtime

    Projects

    Status
    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions