File: src/providers/openai_compatible.py:655
chunk_queue: _queue.Queue = _queue.Queue()
Background: PR #148 introduced the worker-thread + queue pattern to make ESC unwind promptly under LiteLLM. Its own PR body flags:
Memory: queue.Queue is unbounded, so a worst-case "stuck iterator AND chunks flowing" path could accumulate megabytes. Flagged as a follow-up; not blocking on current evidence.
Impact: A non-graceful disconnect from a proxy that keeps sending bytes after ESC (and never closes the SDK iterator) → the worker keeps chunk_queue.put(c) forever. The main thread has already raised AbortError and dropped its reference, but the queue holds chunks alive. Memory grows until the upstream socket closes.
Fix sketch: Bound the queue (e.g. maxsize=64) and use put_nowait() with drop-on-full after the abort signal trips; OR make the worker check guard.aborted between chunks and exit early.
File:
src/providers/openai_compatible.py:655Background: PR #148 introduced the worker-thread + queue pattern to make ESC unwind promptly under LiteLLM. Its own PR body flags:
Impact: A non-graceful disconnect from a proxy that keeps sending bytes after ESC (and never closes the SDK iterator) → the worker keeps
chunk_queue.put(c)forever. The main thread has already raisedAbortErrorand dropped its reference, but the queue holds chunks alive. Memory grows until the upstream socket closes.Fix sketch: Bound the queue (e.g.
maxsize=64) and useput_nowait()with drop-on-full after the abort signal trips; OR make the worker checkguard.abortedbetween chunks and exit early.