Skip to content

Agent halts after Edit/Write phase with stop_reason: end_turn, never invokes Bash; multi-step prompts fail to complete (claude-sonnet-4-6) #1292

@danilvat7

Description

@danilvat7

Summary

A previously-working multi-step prompt no longer completes. The agent reads files, makes Edits/Writes, then voluntarily halts with stop_reason: end_turn and an empty result string — never invoking Bash for the verification step, git ops, or gh pr create that the prompt explicitly directs. TodoWrite items for the unfinished steps are left in pending.

Reproducible across seven runs in ~14 hours; the immediately preceding baseline ran the same prompt successfully ~16 hours before the first failure.

Environment

  • Action SHAs tested: 8a953dedac4f533f912f13656070914693ed0575 (now-dangling — gh api repos/anthropics/claude-code-action/git/commits/8a953ded... returns 404) and 62238ddb33772a079b0a6d8665a1ff3043583067 (current v1 tag target as of 2026-05-06 07:48Z).
  • Bundled @anthropic-ai/claude-agent-sdk versions tested: 0.2.114 and 0.2.131. Both confirmed via bun install output.
  • Model: claude-sonnet-4-6 (unchanged across all runs).
  • --max-turns 75. --allowedTools includes the relevant Bash patterns: Bash(pnpm install:*), Bash(pnpm run lint:*), Bash(pnpm run typecheck:*), Bash(pnpm run test:unit:*), Bash(pnpm run build:*), Bash(pnpm exec prettier:*), Bash(pnpm exec prisma generate:*), Bash(git checkout:*), Bash(git add:*), Bash(git commit:*), Bash(git push:*), Bash(gh pr create:*).
  • permission_denials: [] in the result envelope (no permission blocks).

Symptom

In the working baseline (~16h before first failure):

  • 42 Bash invocations
  • Multi-step prompt completed: install → prisma generate → prettier → lint → typecheck → test:unit → build → branch → commit → push → gh pr create → done
  • Drafted PR successfully

In all seven subsequent runs:

  • 0 Bash invocations
  • Agent reads spec + relevant files (Read/Grep/Glob/ToolSearch)
  • Creates TodoWrite items including "Run full verification suite" and "Create branch, commit, push, open draft PR"
  • Makes a sequence of Edit/Write tool calls implementing parts of the spec
  • Halts with stop_reason: end_turn, result: "" (empty), is_error: false
  • Pending TodoWrite items remain pending; nothing else happens
  • Action returns outcome: success
  • Net effect: silently broken — looks like a normal completion to the orchestrating workflow, but no work product is produced

Tool-name histogram of a typical broken run

(post-bump, SDK 0.2.131)

   ~16  Read
     5  TodoWrite
     5  Grep
     4  Glob
     4  Edit
     2  Write
     1  ToolSearch
     0  Bash         ← here's the problem

Reproducibility matrix

Run Action SHA SDK Prompt state Bash PR
1 (baseline) 8a953ded 0.2.114 working baseline 42 yes
2 8a953ded 0.2.114 + 5-line skip-rerun rule 0 no
3 8a953ded 0.2.114 + 5-line skip-rerun rule 0 no
4 8a953ded 0.2.114 skip-rerun removed 0 no
5 8a953ded 0.2.114 byte-identical to baseline 0 no
6 8a953ded 0.2.114 byte-identical to baseline 0 no
7 62238dd 0.2.131 byte-identical to baseline 0 no

The only invariant across all seven runs is model: claude-sonnet-4-6. The working baseline used the same model.

Hypothesis

A behavioral change occurred in claude-sonnet-4-6 (or its serving stack) between ~2026-05-05 21:38Z (last working run) and ~2026-05-05 22:45Z (first failed run) that affects how the model handles long agentic loops with our prompt shape. The model decides it's "done" after the Edit/Write phase even though the prompt's procedure explicitly requires verification + git ops afterwards.

Asks

  1. Is there a known regression in claude-sonnet-4-6 around end_turn semantics for long agentic loops?
  2. Should claude-code-action surface a warning when result is empty AND stop_reason is end_turn AND TodoWrite items are pending? Right now outcome: success masks this failure mode — orchestrating workflows think the run succeeded.
  3. If reproducible on your side, can you suggest a temporary workaround (different model, different prompt shape, etc.)?

Happy to share the full streamed transcripts (1–1.5 MB each, JSONL) — let me know how you'd like them. Also happy to share the full prompt text if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingp2Non-showstopper bug or popular feature requestprovider:1pAnthropic First-Party API

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions