Skip to content

Recover abandoned started agent runs after runtime shutdown #4633

Description

@KyleAMathews

Problem

The UI shows "Thinking" while the latest agent run remains . We fixed one runtime failure path where a handler throws after starting a run, but hard process death / laptop sleep / runtime shutdown can still leave a started run without any terminal update because the process never gets a chance to run catch/finally cleanup.

Proposed second layer

Add recovery semantics for abandoned started runs, e.g. one of:

  • a run lease / heartbeat renewed while a run is active, plus a janitor/startup sweep that marks stale started runs failed, or
  • a simpler startup recovery sweep that marks old runs failed with a clear interrupted/abandoned error.

Acceptance criteria

  • A runtime restart after an interrupted run does not leave the UI stuck on Thinking indefinitely.
  • Stale started runs are marked terminal ( or equivalent) with an explanatory / error message.
  • Long-running active runs are not incorrectly failed if heartbeat/lease support is implemented.
  • Add tests covering an abandoned started run recovery path.

Context

Current PR addresses handler exceptions after a run starts by failing the newly-started run in the handler catch path. This issue tracks crash/shutdown recovery where no in-process catch/finally can run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions