import-jsonl (and live capture) hangs indefinitely on observations containing multibyte/CJK text — request never completes, no CPU activity
Summary
A single observation whose content contains CJK (Korean) text causes the import/ingestion request to stall forever (the CLI gives up at its hard-coded 2-minute AbortSignal.timeout(12e4)). During the stall both the Node worker and the native iii engine are completely idle — this is not a CPU/regex blowup; it is a deadlock / lost-response stall. Removing the CJK characters from the offending line makes the exact same import complete in ~1 second.
- Version:
@agentmemory/agentmemory v0.9.27, iii-engine v0.11.2 (pinned)
- Platform: macOS (Darwin arm64), Node v24.16.0
- Provider: no LLM key (no-op provider), local embeddings enabled
Impact
agentmemory import-jsonl cannot ingest transcripts that contain such a line (the whole file/chunk aborts at 2 min).
- More importantly, the same ingestion path is used by the live auto-capture hooks, so a normal session containing such content stalls capture of that observation for ~120s before the hook's own timeout fires and drops it.
Reproduction
The triggering line is an assistant tool_use (Bash) whose command contains Korean text mixed with an auth-header/token pattern, e.g.:
# 한글 주석: Python으로 JSON 생성 후 curl 전송 테스트 (Korean comment)
export MY_BOT_TOKEN=$(grep MY_BOT_TOKEN ~/.env | cut -d= -f2)
JSON_PAYLOAD=$(python3 -c "
import json
message = ('🤖 **작업 완료 알림** ...\n' ...) # contains Hangul + emoji
print(json.dumps({'content': message}))
")
HTTP_CODE=$(curl -s ... -H "Authorization: Bot ${MY_BOT_TOKEN}" ...)
Note: the actual reproduction only requires an assistant tool_use whose command
string mixes Korean (Hangul) text with a typical auth-header/token pattern. The
exact tool/service is irrelevant; values above are anonymized.
Steps:
- Put that single JSONL line in
one.jsonl.
agentmemory import-jsonl one.jsonl → hangs, fails with import timed out after 2 minutes.
- Strip the Korean (Hangul) characters from
command only → re-run → imported 1 file(s), N observation(s) in ~1s.
The line is small (~2.3 KB; longest unbroken string ~59 chars), so it is not a size issue.
What I ruled out (with isolated timing)
- Secret-redaction regexes (
SECRET_PATTERN_SOURCES / stripPrivateData): all 14 patterns run in 0 ms on the line.
- BM25 tokenizer (
SearchIndex.tokenize → segmentCjk/segmentHangul/stem): reproduced in isolation, 0–1 ms.
@node-rs/jieba dict load: ~70 ms (and the line is Hangul, not Han, so jieba isn't even on the path).
LESSON_PATTERNS in deriveCrystalAndLessons: 0 ms, and they don't run for a tool_use-only line (empty toText).
Profiling evidence (the key finding)
sample taken while the import is actually stalled (using a correctly-extracted line, not a shell-mangled one):
- Node worker: main thread in
uv__io_poll / kevent (waiting on I/O); V8 workers idle. Essentially no JS executing.
- iii engine: all
tokio-rt-worker threads in park_internal / wait_for_task; only ~10 samples in hyper keeping the HTTP connection open.
So the request is dispatched, the HTTP connection is held open, but neither process does any work — it waits forever. Combined with the fact that removing CJK characters fixes it, this points to a UTF-8 / message-framing bug in the worker ↔ native-engine RPC (a multibyte char landing on a buffer/chunk boundary corrupting framing → response never delivered → deadlock), rather than anything CPU-bound.
Suggested directions
- Audit the worker↔engine message (de)serialization / chunking for non-ASCII (multibyte UTF-8) payload handling.
- As a safety net, give the server-side ingestion of a single observation its own timeout so one stalled observation can't hang the whole request indefinitely.
- Consider making the CLI's 2-minute timeout configurable (currently hard-coded
AbortSignal.timeout(12e4) in cli.mjs).
Workaround for users
Import transcripts file-by-file; if one file stalls, bisect it down to the offending line (10-line → 1-line chunks) and drop/neutralize that line. Re-importing the same path duplicates observations (sessions dedupe by id, observations do not), so always reset the store between attempts.
import-jsonl(and live capture) hangs indefinitely on observations containing multibyte/CJK text — request never completes, no CPU activitySummary
A single observation whose content contains CJK (Korean) text causes the import/ingestion request to stall forever (the CLI gives up at its hard-coded 2-minute
AbortSignal.timeout(12e4)). During the stall both the Node worker and the nativeiiiengine are completely idle — this is not a CPU/regex blowup; it is a deadlock / lost-response stall. Removing the CJK characters from the offending line makes the exact same import complete in ~1 second.@agentmemory/agentmemoryv0.9.27, iii-engine v0.11.2 (pinned)Impact
agentmemory import-jsonlcannot ingest transcripts that contain such a line (the whole file/chunk aborts at 2 min).Reproduction
The triggering line is an
assistanttool_use(Bash) whosecommandcontains Korean text mixed with an auth-header/token pattern, e.g.:Steps:
one.jsonl.agentmemory import-jsonl one.jsonl→ hangs, fails withimport timed out after 2 minutes.commandonly → re-run →imported 1 file(s), N observation(s)in ~1s.The line is small (~2.3 KB; longest unbroken string ~59 chars), so it is not a size issue.
What I ruled out (with isolated timing)
SECRET_PATTERN_SOURCES/stripPrivateData): all 14 patterns run in 0 ms on the line.SearchIndex.tokenize→segmentCjk/segmentHangul/stem): reproduced in isolation, 0–1 ms.@node-rs/jiebadict load: ~70 ms (and the line is Hangul, not Han, so jieba isn't even on the path).LESSON_PATTERNSinderiveCrystalAndLessons: 0 ms, and they don't run for atool_use-only line (emptytoText).Profiling evidence (the key finding)
sampletaken while the import is actually stalled (using a correctly-extracted line, not a shell-mangled one):uv__io_poll/kevent(waiting on I/O); V8 workers idle. Essentially no JS executing.tokio-rt-workerthreads inpark_internal/wait_for_task; only ~10 samples inhyperkeeping the HTTP connection open.So the request is dispatched, the HTTP connection is held open, but neither process does any work — it waits forever. Combined with the fact that removing CJK characters fixes it, this points to a UTF-8 / message-framing bug in the worker ↔ native-engine RPC (a multibyte char landing on a buffer/chunk boundary corrupting framing → response never delivered → deadlock), rather than anything CPU-bound.
Suggested directions
AbortSignal.timeout(12e4)incli.mjs).Workaround for users
Import transcripts file-by-file; if one file stalls, bisect it down to the offending line (10-line → 1-line chunks) and drop/neutralize that line. Re-importing the same path duplicates observations (sessions dedupe by id, observations do not), so always reset the store between attempts.