mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-25 19:20:16 +00:00
604b44a254
The new _handle_clarify_sse_stream handler in #1355 holds clarify._lock and then calls clarify.get_pending(sid) under the lock. get_pending also acquires _lock internally — and clarify._lock is a non-reentrant threading.Lock(), so the second acquisition deadlocks the SSE handler thread the moment any client connects to /api/clarify/stream. Existing tests pass because they only exercise sse_subscribe, sse_unsubscribe, _clarify_sse_notify, and submit_pending directly — none of them invoke the route handler. The deadlock would only manifest when a real EventSource opens the connection. Reproduced with a tiny harness that holds _lock and calls get_pending: the worker thread is still blocked after a 2s timeout. With the fix, both empty and populated queue cases complete in <1ms. Fix: read clarify._gateway_queues / clarify._pending inline under the same _lock acquisition, mirroring the approval SSE handler's pattern at api/routes.py:2785-2793. No recursive lock; head-of-queue snapshot is identical to what get_pending would have returned. Added tests/test_pr1355_sse_handler_no_deadlock.py with three tests: - behavioural: empty queue snapshot completes within 2s - behavioural: populated queue snapshot returns the head entry - source-level invariant: routes.py must not call get_clarify_pending() inside `with _clarify_lock:` block (locks the regression in) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>