Files
hermes-webui/api
nesquena-hermes 1d9a0cbba1 fix(P0 #1557): metadata-only Session.save() was wiping conversation history
v0.50.279 introduced api.routes._clear_stale_stream_state() (#1525) which
calls session.save() to clear stale active_stream_id/pending_* fields. The
helper is called from /api/session and /api/session/status — both of which
load the session with metadata_only=True. Session.load_metadata_only()
synthesizes a stub with messages=[] (its whole purpose: fast metadata read
without parsing the 400KB+ messages array). Session.save() unconditionally
writes self.messages to disk via os.replace(), so saving a metadata-only
stub atomically overwrites the on-disk JSON with messages=[], wiping the
entire conversation.

Production trigger: every SSE reconnect cycle after a server restart polls
/api/session/status, which fans out to _clear_stale_stream_state, which
saves the metadata-only stub. The user reported losing 1000+ message
conversations and seeing 'Reconnecting…' loops on every prompt — the
reconnect loop kept the cycle running until the conversation was empty.

Fix: three layers, defense in depth.

(1) api/models.py: load_metadata_only() now sets _loaded_metadata_only=True
    on the returned stub. Session.save() raises RuntimeError if that flag
    is set — a hard guard so any future caller making the same mistake
    cannot wipe data, only crash visibly.

(2) api/routes.py: _clear_stale_stream_state() now detects the metadata-only
    flag and re-loads the full session with metadata_only=False before
    mutating persisted state. The full-load path also runs
    _repair_stale_pending() which independently clears the stream flags,
    so the explicit clear becomes a no-op in most cases — but messages
    stay intact.

(3) api/models.py + api/session_recovery.py: every save() that would
    SHRINK the messages array (the precise failure shape of #1557) first
    snapshots the previous file to <sid>.json.bak. Server.py runs
    recover_all_sessions_on_startup() at boot — any session whose live
    JSON has fewer messages than its .bak is restored automatically.
    Idempotent on clean state. Backup overhead is zero on the normal
    grow-the-conversation path.

Reproducer (master): test_metadata_only_save_does_not_wipe_messages goes
from 1000 messages to 0 in a single save() call. After the fix, 1000
messages survive.

Tests: 6 new regression tests in tests/test_metadata_save_wipe_1557.py
covering all three layers. Full pytest: 4019 → 4025 (+6, all green).

Live verified on port 8789: write 1000-msg session with stale active_stream_id,
hit /api/session/status, /api/session — file ends with 1002 messages
(_repair_stale_pending injects an error-marker pair on full reload, harmless
existing behavior), active_stream_id cleared, pending cleared, no Reconnecting
loop.

Closes #1557.

Reported by AvidFuturist via user feedback on v0.50.282.
2026-05-03 19:45:10 +00:00
..
2026-04-29 19:54:07 -07:00