Skip to content

Fix #279: worker-queue ESC-cancel for Anthropic/Minimax providers#311

Open
ericleepi314 wants to merge 1 commit into
fix/issue-278-bounded-cancel-queuefrom
feature/issue-279-provider-stream-worker
Open

Fix #279: worker-queue ESC-cancel for Anthropic/Minimax providers#311
ericleepi314 wants to merge 1 commit into
fix/issue-278-bounded-cancel-queuefrom
feature/issue-279-provider-stream-worker

Conversation

@ericleepi314

Copy link
Copy Markdown
Collaborator

Closes #279

Stacked on #310 (← #308#307#306#305#304). Merge in order; GitHub retargets automatically.

Summary

PR #148's worker-thread + queue ESC-unwind only covered OpenAICompatibleProvider; Anthropic and Minimax relied solely on the close-on-abort listener, which is best-effort — behind a buffering proxy (LiteLLM, corporate, mitmproxy) the anthropic SDK's sync httpx read doesn't unblock on a cross-thread response.close(), so ESC hung.

  • New src/providers/_stream_worker.pyrun_stream_on_worker(produce, on_item, guard): the OpenAICompatibleProvider ESC-cancel uses an unbounded queue.Queue; pathological streams can accumulate megabytes after ESC #278-hardened pattern (bounded queue at 64, drop-on-abort, consumer_gone release, items-before-abort delivered / nothing after) generalized so each provider only owns its SDK iteration shape. Error relay preserves exception chaining (AbortError from original) and re-raises KeyboardInterrupt/SystemExit as-is.
  • OpenAI-compatible: inline machinery replaced by the helper — behavior pinned unchanged by the existing 11 abort tests.
  • Anthropic: only the iteration + get_final_message move to the worker. Stream lifecycle, guard.attach, and the WI-5.2 watchdog arm/fallback stay on the consumer thread; a consumer-side finally: watchdog.disarm() guarantees the 90s timer can't leak when ESC abandons a stuck worker (critic find). Watchdog state crosses threads via a dict written before the queue relay (queue mutex is the happens-before edge).
  • Minimax: same shape minus the watchdog.

Test plan

  • New _Stuck*Stream regression tests for both providers (iterator never honors close(), blocks on a never-set Event): ESC unwinds in <3s where it previously hung forever
  • All 480 watchdog/provider/stream/abort tests pass; both watchdog-fallback paths verified through the worker relay
  • Full suite on the stack: 7799 passed, 0 failed, 5 skipped
  • Critic review loop: APPROVE (error-ordering + consumer-side disarm folded in from review)

🤖 Generated with Claude Code

…max (#279)

The close-on-abort listener alone is best-effort: behind a buffering
proxy (LiteLLM, corporate proxy, mitmproxy) the anthropic SDK's sync
httpx read doesn't unblock on a cross-thread response.close(), so ESC
hung — the same failure the OpenAI-compatible path fixed in PR #148.

- extract the #278-hardened worker+bounded-queue pattern into
  src/providers/_stream_worker.py (run_stream_on_worker): emit/on_item
  relay, bounded queue, drop-on-abort/consumer-gone, error relay that
  preserves exception chaining and KeyboardInterrupt semantics
- openai_compatible: replaced inline machinery with the helper
  (behavior pinned by the existing 11 abort tests)
- anthropic: only the iteration + get_final_message move to the
  worker; stream lifecycle, guard.attach, watchdog arm/fallback stay
  on the consumer thread, with a consumer-side disarm guarantee so a
  stuck worker can't leak the 90s timer; watchdog state crosses
  threads via a dict written before the queue relay
- minimax: same shape minus the watchdog

Closes #279

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant