Summary
A previously-working multi-step prompt no longer completes. The agent reads files, makes Edits/Writes, then voluntarily halts with stop_reason: end_turn and an empty result string — never invoking Bash for the verification step, git ops, or gh pr create that the prompt explicitly directs. TodoWrite items for the unfinished steps are left in pending.
Reproducible across seven runs in ~14 hours; the immediately preceding baseline ran the same prompt successfully ~16 hours before the first failure.
Environment
- Action SHAs tested:
8a953dedac4f533f912f13656070914693ed0575 (now-dangling — gh api repos/anthropics/claude-code-action/git/commits/8a953ded... returns 404) and 62238ddb33772a079b0a6d8665a1ff3043583067 (current v1 tag target as of 2026-05-06 07:48Z).
- Bundled
@anthropic-ai/claude-agent-sdk versions tested: 0.2.114 and 0.2.131. Both confirmed via bun install output.
- Model:
claude-sonnet-4-6 (unchanged across all runs).
--max-turns 75. --allowedTools includes the relevant Bash patterns: Bash(pnpm install:*), Bash(pnpm run lint:*), Bash(pnpm run typecheck:*), Bash(pnpm run test:unit:*), Bash(pnpm run build:*), Bash(pnpm exec prettier:*), Bash(pnpm exec prisma generate:*), Bash(git checkout:*), Bash(git add:*), Bash(git commit:*), Bash(git push:*), Bash(gh pr create:*).
permission_denials: [] in the result envelope (no permission blocks).
Symptom
In the working baseline (~16h before first failure):
- 42 Bash invocations
- Multi-step prompt completed: install → prisma generate → prettier → lint → typecheck → test:unit → build → branch → commit → push →
gh pr create → done
- Drafted PR successfully
In all seven subsequent runs:
- 0 Bash invocations
- Agent reads spec + relevant files (Read/Grep/Glob/ToolSearch)
- Creates TodoWrite items including "Run full verification suite" and "Create branch, commit, push, open draft PR"
- Makes a sequence of Edit/Write tool calls implementing parts of the spec
- Halts with
stop_reason: end_turn, result: "" (empty), is_error: false
- Pending TodoWrite items remain pending; nothing else happens
- Action returns
outcome: success
- Net effect: silently broken — looks like a normal completion to the orchestrating workflow, but no work product is produced
Tool-name histogram of a typical broken run
(post-bump, SDK 0.2.131)
~16 Read
5 TodoWrite
5 Grep
4 Glob
4 Edit
2 Write
1 ToolSearch
0 Bash ← here's the problem
Reproducibility matrix
| Run |
Action SHA |
SDK |
Prompt state |
Bash |
PR |
| 1 (baseline) |
8a953ded |
0.2.114 |
working baseline |
42 |
yes |
| 2 |
8a953ded |
0.2.114 |
+ 5-line skip-rerun rule |
0 |
no |
| 3 |
8a953ded |
0.2.114 |
+ 5-line skip-rerun rule |
0 |
no |
| 4 |
8a953ded |
0.2.114 |
skip-rerun removed |
0 |
no |
| 5 |
8a953ded |
0.2.114 |
byte-identical to baseline |
0 |
no |
| 6 |
8a953ded |
0.2.114 |
byte-identical to baseline |
0 |
no |
| 7 |
62238dd |
0.2.131 |
byte-identical to baseline |
0 |
no |
The only invariant across all seven runs is model: claude-sonnet-4-6. The working baseline used the same model.
Hypothesis
A behavioral change occurred in claude-sonnet-4-6 (or its serving stack) between ~2026-05-05 21:38Z (last working run) and ~2026-05-05 22:45Z (first failed run) that affects how the model handles long agentic loops with our prompt shape. The model decides it's "done" after the Edit/Write phase even though the prompt's procedure explicitly requires verification + git ops afterwards.
Asks
- Is there a known regression in
claude-sonnet-4-6 around end_turn semantics for long agentic loops?
- Should
claude-code-action surface a warning when result is empty AND stop_reason is end_turn AND TodoWrite items are pending? Right now outcome: success masks this failure mode — orchestrating workflows think the run succeeded.
- If reproducible on your side, can you suggest a temporary workaround (different model, different prompt shape, etc.)?
Happy to share the full streamed transcripts (1–1.5 MB each, JSONL) — let me know how you'd like them. Also happy to share the full prompt text if helpful.
Summary
A previously-working multi-step prompt no longer completes. The agent reads files, makes Edits/Writes, then voluntarily halts with
stop_reason: end_turnand an emptyresultstring — never invoking Bash for the verification step, git ops, orgh pr createthat the prompt explicitly directs. TodoWrite items for the unfinished steps are left inpending.Reproducible across seven runs in ~14 hours; the immediately preceding baseline ran the same prompt successfully ~16 hours before the first failure.
Environment
8a953dedac4f533f912f13656070914693ed0575(now-dangling —gh api repos/anthropics/claude-code-action/git/commits/8a953ded...returns 404) and62238ddb33772a079b0a6d8665a1ff3043583067(currentv1tag target as of 2026-05-06 07:48Z).@anthropic-ai/claude-agent-sdkversions tested:0.2.114and0.2.131. Both confirmed viabun installoutput.claude-sonnet-4-6(unchanged across all runs).--max-turns 75.--allowedToolsincludes the relevant Bash patterns:Bash(pnpm install:*),Bash(pnpm run lint:*),Bash(pnpm run typecheck:*),Bash(pnpm run test:unit:*),Bash(pnpm run build:*),Bash(pnpm exec prettier:*),Bash(pnpm exec prisma generate:*),Bash(git checkout:*),Bash(git add:*),Bash(git commit:*),Bash(git push:*),Bash(gh pr create:*).permission_denials: []in the result envelope (no permission blocks).Symptom
In the working baseline (~16h before first failure):
gh pr create→ doneIn all seven subsequent runs:
stop_reason: end_turn,result: ""(empty),is_error: falseoutcome: successTool-name histogram of a typical broken run
(post-bump, SDK 0.2.131)
Reproducibility matrix
The only invariant across all seven runs is
model: claude-sonnet-4-6. The working baseline used the same model.Hypothesis
A behavioral change occurred in
claude-sonnet-4-6(or its serving stack) between ~2026-05-05 21:38Z (last working run) and ~2026-05-05 22:45Z (first failed run) that affects how the model handles long agentic loops with our prompt shape. The model decides it's "done" after the Edit/Write phase even though the prompt's procedure explicitly requires verification + git ops afterwards.Asks
claude-sonnet-4-6aroundend_turnsemantics for long agentic loops?claude-code-actionsurface a warning whenresultis empty ANDstop_reasonisend_turnAND TodoWrite items are pending? Right nowoutcome: successmasks this failure mode — orchestrating workflows think the run succeeded.Happy to share the full streamed transcripts (1–1.5 MB each, JSONL) — let me know how you'd like them. Also happy to share the full prompt text if helpful.