Problem
Thread memory persists stale or incorrect context across threads. When the agent hallucinates during one conversation (e.g., calls a web app a "macOS weather app"), that gets saved to thread memory and poisons every future conversation in that channel.
Currently the only fix is manually deleting .shellack/thread-memory/*.md.
Solution
A memory calibration agent (Haiku) that runs on every thread memory load:
- Reads the thread memory
- Compares against the project config (name, language, platform, description from projects.yaml)
- Flags or removes contradictions
- Returns only validated context
Flow
New thread starts → load thread memory → calibration check:
Haiku:
"Does this thread memory match the project config?
Project: Atmos, platform: web, language: typescript
Memory says: macOS app, Swift
→ INVALID. Discard memory, start fresh."
Calibration rules
- If memory references a different platform than config → discard
- If memory references a different language than config → discard
- If memory contains open questions that STATE.md already answers → resolve them
- If memory is older than 24h → flag for review, prefer STATE.md
- If memory aligns with config → pass through
Cost
One Haiku call per new thread (~$0.0005). Only runs when thread memory exists. Skip if no thread memory file found.
Alternative: TTL-based expiry
Simpler approach — thread memory files expire after a configurable TTL (e.g., 24h). Stale files are deleted on load. No Haiku call needed.
Could combine both: TTL for age-based cleanup + calibration for content validation.
Priority
High — this is actively causing bad UX. Every wrong answer gets persisted and repeated.
Problem
Thread memory persists stale or incorrect context across threads. When the agent hallucinates during one conversation (e.g., calls a web app a "macOS weather app"), that gets saved to thread memory and poisons every future conversation in that channel.
Currently the only fix is manually deleting
.shellack/thread-memory/*.md.Solution
A memory calibration agent (Haiku) that runs on every thread memory load:
Flow
Calibration rules
Cost
One Haiku call per new thread (~$0.0005). Only runs when thread memory exists. Skip if no thread memory file found.
Alternative: TTL-based expiry
Simpler approach — thread memory files expire after a configurable TTL (e.g., 24h). Stale files are deleted on load. No Haiku call needed.
Could combine both: TTL for age-based cleanup + calibration for content validation.
Priority
High — this is actively causing bad UX. Every wrong answer gets persisted and repeated.