Files
hermes-webui/api
Hermes Bot 51a87ebdc7 fix(sqlite): close state.db connections explicitly to stop FD leak in sidebar polling (#1494)
Production WebUI on macOS launchd reproduced an HTTP-unhealthy wedge after
#1483 closed the bootstrap supervisor double-fork: process alive, port
listening, every HTTP request reset by peer before a response. The reporter
(@insecurejezza) traced it to FD exhaustion — 366 open FDs on the wedged
process, 238 of them `~/.hermes/state.db`, `state.db-wal`, and `state.db-shm`.

Root cause: four sqlite callsites use `with sqlite3.connect(...) as conn:`.
Python's sqlite3 connection context manager only commits or rolls back on
exit; it does NOT close the connection. `/api/sessions` polling calls these
on every sidebar refresh, so each poll leaked one or more open state.db FDs
until the process hit macOS's soft FD limit and new sqlite3.connect() calls
inside fresh request handlers raised before any response bytes were written.

Fix: wrap each `sqlite3.connect(...)` in `contextlib.closing(...)` so the
connection is explicitly closed on scope exit, in addition to the auto-
commit / rollback semantics that `Connection.__exit__` already provides.

Callsites patched:
- api/agent_sessions.py:read_importable_agent_session_rows
- api/agent_sessions.py:read_session_lineage_metadata
- api/models.py:get_cli_session_messages
- api/models.py:delete_cli_session

Reporter's verification (post-patch, 100-request stress loop against
/api/sessions and /api/projects):

  batch=1 fd=92 state_handles=0
  batch=2 fd=92 state_handles=0
  ...
  batch=5 fd=92 state_handles=0

Pre-patch the same loop made FD count and state.db handle count climb
monotonically.

4 regression tests in tests/test_issue1494_state_db_fd_leak.py monkeypatch
sqlite3.connect with a tracking wrapper that records .close() calls and
assert every connection opened by each of the four functions is explicitly
closed. Verified to fail (catching the original bug) when the closing()
wrap is reverted: "leaked 5 of 5 sqlite connection(s) — context-manager-
only `with sqlite3.connect()` does not close. Wrap in contextlib.closing()."

This addresses Bug #2 of the umbrella issue #1458. Bug #3 (HTTP-unhealthy
wedge in the absence of FD exhaustion) remains open pending separate
diagnostic data — explicit scope discipline.

Closes #1494
Refs #1458 (Bug #2 of 3)

Co-authored-by: insecurejezza <70424851+insecurejezza@users.noreply.github.com>
2026-05-03 01:15:26 +00:00
..
2026-04-29 19:54:07 -07:00