fix: align the LLM's conversational view with the SMS transcript by 1xabhay · Pull Request #37 · wwbp/texet

1xabhay · 2026-06-11T20:23:15Z

Why

Prod utterance eb02e4ed (Bedrock / Llama 4 Maverick) showed the bot repeatedly denying it had chat history it actually had, hallucinating the content of a hub opening it couldn't see, and confusing times across days. Root causes: hub openings were stripped from history (only the first-ever one was injected into the system prompt — stale since #35), failed sends stayed in history, multi-day history had no day boundaries, and the prompt never told the model what it remembers.

What

Five test-gated steps, in deploy-safe order:

Bedrock engine normalization — normalize_converse_messages merges consecutive same-role turns, prepends a [start of conversation] user placeholder for assistant-first history, drops empty/None content. Owns the Converse user-first/alternating constraint at the engine boundary so the transcript layer doesn't have to lie. No-op for current traffic.
Transcript fidelity — history = exactly what crossed SMS: hub openings included as assistant turns ([Opening message] injection and get_opening_message removed), bot messages only when status=sent (failed/queued dropped), moderated excluded on both sides.
Day markers + conventions — first message of each user-local calendar day is prefixed [Tuesday, June 9] (offset from per-utterance user_local_time, UTC fallback); every system prompt ends with a code-owned [Conversation history conventions] section telling the model its context is a real multi-day SMS thread it does remember. texet_generation snapshot version → 2.
Prompt v2 + label — daily section becomes [Today's Activity (Day N)]; docs/prompts/charla-system-prompt-v2.md is the deployable base prompt (memory self-knowledge, SMS constraints, anti-repetition, instruction privacy without amnesia claims) — paste via admin console after this deploys.
Verification — e2e tests through the real Kani round (assistant-first history survives generation; two openings merge correctly for Bedrock), plus scripts/replay_generation.py to diff any stored generation snapshot against current-code context.

Testing

203 tests pass (make test), mypy clean, no new lint errors vs main.
Post-deploy: replay eb02e4ed on prod and inspect the context diff before any prompt/model change.

🤖 Generated with Claude Code

Merge consecutive same-role turns, prepend a placeholder user turn when history starts with an assistant message, and drop empty/None content. Converse requires user-first, strictly alternating, non-empty turns; owning that constraint in the engine lets build_chat_history stay a faithful transcript. No-op for current alternating histories. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The LLM now sees exactly what was exchanged over SMS: hub opening messages (texet_hub_initial) stay in history as assistant turns, bot messages count only once delivered (sent) — dropping failed/queued — and moderated exchanges remain withheld on both sides. Previously only the first-ever opening was injected into the system prompt; since conversations merged to one-per-user (#35) every later daily opening was invisible to the model, which then hallucinated their content or denied having context. Remove the [Opening message] injection and get_opening_message entirely; Bedrock's user-first/ alternating constraint is owned by the engine-boundary normalization. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

History spanning multiple days was an undifferentiated blob: the model could not map 'what we talked about this week' onto its context, and stale time references in old replies contradicted [User's Local Time]. - build_chat_history(annotate_days=True) prefixes the first message of each user-local calendar day with a [Tuesday, June 9] marker; the offset comes from per-utterance user_local_time meta (bot rows via their generation snapshot), backfilled for leading messages, UTC fallback. Only the LLM view is annotated — stored text and exports are untouched, and the moderation-email caller keeps the default. - compose_instruction_prompt always appends a code-owned [Conversation history conventions] section telling the model what its context actually is: a real SMS thread since Sunday, day-marked, with its own openings included and safety-withheld messages absent. - texet_generation snapshot version bumped to 2 (history semantics changed). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

compose_instruction_prompt takes day_number and labels the daily section [Today's Activity (Day N)] so the model can tie the curriculum to the study day. docs/prompts/charla-system-prompt-v2.md is the deployable base prompt (paste via admin console; latest row wins). It adds what v1 lacked: a memory self-knowledge section (the model sees this week's real SMS thread + last week's summary — never deny it, never invent beyond it), usage guidance for the activity/summary sections, SMS length and anti-repetition rules, stale-time handling, and instruction privacy decoupled from memory denial. Also recommends moving off Llama 4 Maverick 17B. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The autouse kani_stub bypassed kani entirely, so nothing proved an assistant-first history survives a real chat round. Two new e2e tests restore the real _generate_reply: one drives a capture engine through the full Kani round (hub opening reaches the engine assistant-first, day-marked, reply persisted and sent), the other drives a stubbed BedrockEngine and asserts two back-to-back openings reach the Converse payload merged behind the placeholder user turn. scripts/replay_generation.py loads a bot utterance's texet_generation snapshot and prints unified diffs of the snapshot system prompt/history vs what current code would build — read-only, for replaying prod generations like eb02e4ed against context changes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Abhay Singh and others added 6 commits June 11, 2026 15:50

docs: clarify prompt v2 deploy ordering (code first, paste after)

574e673

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

1xabhay merged commit 31a5d58 into main Jun 11, 2026
1 check passed

1xabhay deleted the fix/llm-context-fidelity branch June 11, 2026 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align the LLM's conversational view with the SMS transcript#37

fix: align the LLM's conversational view with the SMS transcript#37
1xabhay merged 6 commits into
mainfrom
fix/llm-context-fidelity

1xabhay commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant