Skip to content

fix(session-codec): validate session ID format in serialize/deserialize#116

Open
LucidPaths wants to merge 1 commit into
NousResearch:mainfrom
LucidPaths:fix/session-codec-format-validation
Open

fix(session-codec): validate session ID format in serialize/deserialize#116
LucidPaths wants to merge 1 commit into
NousResearch:mainfrom
LucidPaths:fix/session-codec-format-validation

Conversation

@LucidPaths
Copy link
Copy Markdown

What does this PR do?

Adds format validation to the sessionCodec in src/server/index.ts. The codec currently accepts any non-empty string as a session ID — including the word "from", which is false-captured from Hermes error output by SESSION_ID_REGEX_LEGACY. Once stored, the corrupted value poisons --resume on every subsequent run, creating an unrecoverable agent stall.

This is a defense-in-depth complement to #112 (regex anchoring). Even if a future output format fools the regex, the codec rejects invalid values at the storage boundary before they reach the database.

Related Issue

Complements #112 — that PR anchors the regex to prevent the false capture; this PR validates at the codec layer so no garbage can be persisted regardless of how it was parsed.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • src/server/index.ts — Added HERMES_SESSION_ID_RE (/^\d{8}_\d{6}_[a-f0-9]+$/) and readValidSessionId() helper. Both serialize() and deserialize() now validate session IDs match the Hermes YYYYMMDD_HHMMSS_hex format before accepting them. Non-matching values (e.g. "from", truncated IDs like "20260513_091310_", empty strings) return null.
  • src/server/session-codec.test.mjs — 10 regression tests covering both serialize and deserialize: valid IDs accepted, "from" rejected, truncated IDs rejected, empty/null handling, snake_case key fallback.

How to Test

node --test src/server/session-codec.test.mjs

All 10 tests pass. To reproduce the original bug without this fix:

// The legacy regex false-captures "from" from error help text
"Use a session ID from a previous CLI run"
  .match(/session[_ ](?:id|saved)[:\s]+([a-zA-Z0-9_-]+)/i)?.[1]
// → "from"

// Without codec validation, this gets stored and poisons --resume:
sessionCodec.serialize({ sessionId: "from" })
// Before: { sessionId: "from" }  ← stored, breaks next run
// After:  null                    ← rejected at boundary

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(session-codec):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've added tests for my changes (10 test cases in session-codec.test.mjs)
  • I've tested on my platform: Scaleway Kapsule (K8s v1.34.6, Node.js v24.15.0, Paperclip v2026.428.0)
  • tsc --noEmit passes with zero errors

Documentation & Housekeeping

  • No config keys added/changed — N/A
  • No architecture changes — N/A
  • Cross-platform: pure regex + string validation, no platform-specific code — N/A

Context

Discovered in production on a Paperclip instance running 7 Hermes agents (4 claude_local + 3 hermes_local). When any Hermes run failed, the legacy regex captured "from" from the error help text "Use a session ID from a previous CLI run". The codec stored it without validation. Every subsequent run tried --resume from, failed instantly with "Session not found: from", re-captured "from" from the new error output, and the loop never broke. Two of three Hermes agents were stuck in this state. Claude agents on the same instance were unaffected (different adapter, different codec).

The regex anchoring in #112 prevents the specific false capture. This codec validation prevents the class of corruption — any non-Hermes-format string is rejected before it can poison stored state.

The sessionCodec accepts any non-empty string as a session ID. When
parseHermesOutput false-captures the word "from" from the error message
"Use a session ID from a previous CLI run", the codec stores it without
question. The corrupted value then poisons --resume on every subsequent
run, creating an unrecoverable stall.

Add HERMES_SESSION_ID_RE (/^\d{8}_\d{6}_[a-f0-9]+$/) validation to
readValidSessionId() — used by both serialize() and deserialize(). Any
value that does not match the Hermes YYYYMMDD_HHMMSS_hex format is
rejected at the storage boundary, regardless of how it was parsed.

Complements NousResearch#112 (regex anchoring in parseHermesOutput) as a
defense-in-depth layer: even if a future output format fools the regex,
the codec rejects it before it reaches the database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iyurinok
Copy link
Copy Markdown

This is a good defense-in-depth layer and should land with the parser fix, not instead of it.

I reproduced the current failure class locally from Paperclip hermes_local runs:

  • session_id can be poisoned to from by parser output from Hermes' Use a session ID from... help text.
  • A truncated display-ish id like 20260513_144718_ can also be passed back into resume and fail even though the full Hermes session exists.

This PR blocks invalid values at the codec boundary, which is exactly the right invariant for adapter runtime state. I would recommend adding/confirming tests for both invalid examples:

  • from
  • 20260513_144718_

Then merge alongside #83 or after #112; otherwise one future parser regression can re-poison durable runtime state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants