Problem
The runtime still treats too much shell, verifier, and worker work as foreground work. Agents start checks, searches, builds, verifier gates, or child workers and then wait even when the result is not an immediate dependency. This makes the main turn feel stuck and prevents useful parallel progress.
This is the main UX pain point: "allowed shell" is not enough if the agent starts allowed work and then blocks its entire turn waiting for it.
Current CodeWhale already has important substrate:
exec_shell supports background = true and returns a task id.
task_shell_start wraps shell work as a background task.
exec_shell_wait and task_shell_wait can inspect a background task.
- Turn cancellation during
exec_shell_wait can leave the background process running.
- Tool-batch tests already prove background shell/verifier starts can join parallel read-only work.
The missing slice is making this the default path the model actually chooses, and making waits nonblocking unless the model explicitly asks for a barrier.
Target Design
Independent shell and verifier work should default to background execution with visible job tracking and automatic completion notification. Foreground execution should be reserved for short commands whose result is required before the next decision.
Rules of thumb for the model/runtime:
- Start long or independent commands in the background immediately.
- Keep inspecting, editing, or coordinating while they run.
- Let completion arrive automatically into the transcript/status stream.
- Poll only when current progress is genuinely needed before continuing.
- Use a deliberate blocking wait only at final verification gates or true dependencies.
This should apply to:
exec_shell / task_shell_start
run_verifiers and cargo-style gates
- Fleet workers and the user-facing "sub-agent" surface once it is backed by Fleet
- structured user questions / steering where the parent can keep working safely
Fit With Current CodeWhale
- Change model-visible tool descriptions/results so background tasks explicitly say: "returns immediately; you will be notified when done; do not poll/wait unless you need early output."
- Change wait tool defaults toward snapshot/nonblocking behavior. Today
exec_shell_wait defaults wait = true; make the safer default wait = false or add a canonical nonblocking read path and teach the model to use it.
- Make
task_shell_wait and exec_shell_wait visibly different from join: they inspect progress by default; blocking requires an explicit flag such as block = true / wait = true.
- When a foreground command times out or looks long-running, the recovery hint should prefer "restart in background and continue other work," not merely "rerun with a longer timeout."
- Do not let
agent_eval block:true, foreground Agent/Fleet workers, or foreground verifier gates become the default for independent evidence collection.
- Preserve existing user controls:
/jobs, cancel, wait, output tail, and foreground-to-background detach.
Acceptance Criteria
- Shell execution has a clear scheduling policy: foreground only when dependent, background for long-running or independent work.
- The model/tool descriptions nudge toward background execution for builds, tests, verifiers, servers, broad searches, polling, sleeps, and long diagnostics.
- Starting a background shell/verifier returns a task id immediately and includes metadata that a completion notification will arrive automatically.
- Parallel read-only shell commands can run concurrently when safe.
run_verifiers and cargo-style gates can start while the agent continues inspecting files or implementing unrelated follow-up work.
- A cancelled/interrupted wait does not kill the underlying background job unless explicitly requested.
- Background task completion produces a transcript/status event the agent can consume without manual polling.
- Tests cover background-start hints, wait default behavior, wait/cancel behavior, automatic-completion metadata, and at least one parallel read-only shell scenario.
Related
Problem
The runtime still treats too much shell, verifier, and worker work as foreground work. Agents start checks, searches, builds, verifier gates, or child workers and then wait even when the result is not an immediate dependency. This makes the main turn feel stuck and prevents useful parallel progress.
This is the main UX pain point: "allowed shell" is not enough if the agent starts allowed work and then blocks its entire turn waiting for it.
Current CodeWhale already has important substrate:
exec_shellsupportsbackground = trueand returns a task id.task_shell_startwraps shell work as a background task.exec_shell_waitandtask_shell_waitcan inspect a background task.exec_shell_waitcan leave the background process running.The missing slice is making this the default path the model actually chooses, and making waits nonblocking unless the model explicitly asks for a barrier.
Target Design
Independent shell and verifier work should default to background execution with visible job tracking and automatic completion notification. Foreground execution should be reserved for short commands whose result is required before the next decision.
Rules of thumb for the model/runtime:
This should apply to:
exec_shell/task_shell_startrun_verifiersand cargo-style gatesFit With Current CodeWhale
exec_shell_waitdefaultswait = true; make the safer defaultwait = falseor add a canonical nonblocking read path and teach the model to use it.task_shell_waitandexec_shell_waitvisibly different fromjoin: they inspect progress by default; blocking requires an explicit flag such asblock = true/wait = true.agent_eval block:true, foreground Agent/Fleet workers, or foreground verifier gates become the default for independent evidence collection./jobs, cancel, wait, output tail, and foreground-to-background detach.Acceptance Criteria
run_verifiersand cargo-style gates can start while the agent continues inspecting files or implementing unrelated follow-up work.Related