Skip to content

Harden Mimir subprocess lifecycle (deadlock, restart, timeout, key collisions)#6

Merged
tcconnally merged 2 commits into
masterfrom
fix/subprocess-lifecycle
Jun 12, 2026
Merged

Harden Mimir subprocess lifecycle (deadlock, restart, timeout, key collisions)#6
tcconnally merged 2 commits into
masterfrom
fix/subprocess-lifecycle

Conversation

@tcconnally

Copy link
Copy Markdown
Collaborator

Fixes the four subprocess lifecycle failure modes from the 2026-06-12 deep-dive review (A-1 through A-5).

Fixes

  • A-1 — stderr deadlock: Mimir was spawned with stderr=PIPE that nothing drained; once the OS pipe buffer filled, Mimir blocked on its next stderr write and the plugin hung forever. Now stderr=DEVNULL.
  • A-2 — no recovery when Mimir dies: previously a failed startup left _mimir_proc = None and every invoke leaked AttributeError: 'NoneType' object has no attribute 'stdin'; a mid-session death raised forever with no restart. Now _mcp_send lazily restarts Mimir once, and if that fails the agent gets a clean JSON-RPC error: Memory backend unavailable — ….
  • A-3 — unbounded read: _mcp_send blocked indefinitely on readline(). A reader thread now feeds a queue, and reads enforce a deadline (RECALL_MCP_TIMEOUT, default 30s) — a stuck call raises instead of freezing every memory tool.
  • A-4 — silent key collisions: auto-generated keys were the first ~50 normalized chars of the fact, so distinct facts sharing a prefix overwrote each other (Mimir updates on key match). Keys now end with an 8-char SHA-256 content hash.
  • A-5 — no tests/CI: 15 unit tests covering the Executa protocol seam (handle()), key generation, timeout/restart machinery, and the reader thread — no Mimir binary required. GitHub Actions workflow added.

Verification

$ python -m pytest tests/ -q
15 passed in 0.24s

🤖 Generated with Claude Code

tcconnally and others added 2 commits June 12, 2026 13:36
- stderr=DEVNULL (was PIPE, never drained): prevents pipe-buffer
  deadlock that silently hung the plugin on long sessions
- reader thread + queue with deadline: _mcp_send now times out
  (RECALL_MCP_TIMEOUT, default 30s) instead of blocking forever
- lazy single restart when mimir died; clean "Memory backend
  unavailable" JSON-RPC error instead of leaked AttributeError
- content-hash suffix on auto-generated keys: distinct facts with a
  shared 50-char prefix no longer silently overwrite each other
- guarded shutdown wait (TimeoutExpired -> kill)
- tests: 15 unit tests over the protocol seam + lifecycle machinery
- CI: GitHub Actions pytest workflow

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant