Skip to content

fix(prompts): force valid_at fallback to REFERENCE_TIME#1492

Open
brentkearney wants to merge 1 commit into
getzep:mainfrom
brentkearney:feat/extract-edges-prompt-no-today
Open

fix(prompts): force valid_at fallback to REFERENCE_TIME#1492
brentkearney wants to merge 1 commit into
getzep:mainfrom
brentkearney:feat/extract-edges-prompt-no-today

Conversation

@brentkearney
Copy link
Copy Markdown

@brentkearney brentkearney commented May 15, 2026

Resolves part of #1489 (Gap 3).

Summary

The fact-extraction prompt offers the LLM an escape clause — "Leave both fields null if no explicit or resolvable time is stated" — and gpt-4.1 / gpt-5-class models reliably read it as permission to guess. Empirically they choose today @ 00:00 UTC instead of falling back to the REFERENCE_TIME the prompt also provides.

The plumbing is correct: Graphiti.add_episode(reference_time=…) sets episode.valid_at, and extract_edges already passes latest_episode.valid_at into the prompt context as REFERENCE_TIME (graphiti_core/utils/maintenance/edge_operations.py:195). The bug is prompt-shape — the rules contradict each other and the LLM picks the loosest interpretation.

This PR tightens three prompts (edge, extract_timestamps, extract_timestamps_batch) to remove the "leave null" escape for valid_at, mandate REFERENCE_TIME as the fallback, and explicitly prohibit substituting "today" / "now" / any inferred current time.

Why this matters

Graphiti's bi-temporal model is the differentiator that lets agents reason about what was true when. That guarantee silently breaks for any caller that backfills historical data (memory consolidators, migration scripts, ingesting old logs/conversations) — every fact extracted from an ambiguous past-tense sentence collapses to "valid as of today," erasing the temporal axis.

Scale observed in production

On a graph with 1384 edges extracted from 264 historical-backfill episodes (all called with explicit historical reference_time):

Field Pattern Count %
valid_at Today midnight UTC (LLM hallucination) 771 56%
valid_at Legitimate in-text date 610 44%
valid_at null 3 0.2%
invalid_at null (correct) 1016 73%
invalid_at Today midnight UTC (LLM hallucination) 100 7%
invalid_at Today with microseconds (graphiti's auto-invalidation chain — legitimate) 60 4%
invalid_at Legitimate in-text date 208 15%

The hallucination signature is clean midnight UTC on the same date as created_at (hour=minute=second=nanosecond=0). Microsecond-precision today values are NOT hallucinations — they're graphiti's own supersede-chain timestamps and are preserved.

Changes

Single file, three prompts, prompt-only:

edge prompt — DATETIME RULES section:

 # DATETIME RULES

 - Use ISO 8601 with "Z" suffix (UTC) (e.g., 2025-04-30T00:00:00Z).
-- If the fact is ongoing (present tense), set `valid_at` to the timestamp of the episode the fact originates from. If no per-episode timestamp is available, use REFERENCE_TIME.
-- If a change/termination is expressed, set `invalid_at` to the relevant timestamp.
-- Leave both fields `null` if no explicit or resolvable time is stated.
+- `valid_at` MUST always be set. Resolution order:
+  1. An explicit or resolvable date in the source text (resolved against the originating episode's timestamp, or REFERENCE_TIME if no per-episode timestamp).
+  2. If the text contains no explicit/resolvable date, fall back to the originating episode's per-episode timestamp, or REFERENCE_TIME if no per-episode timestamp is available.
+- `invalid_at`: set when the text explicitly states the fact ended or was superseded; otherwise leave `null`.
+- NEVER use today's date, the current date, "now", or any inferred "current" time as `valid_at` or `invalid_at`. REFERENCE_TIME is the ONLY acceptable default — do not substitute your own notion of the present.
 - If only a date is mentioned (no time), assume 00:00:00.
 - If only a year is mentioned, use January 1st at 00:00:00.

extract_timestamps and extract_timestamps_batch — same loophole, same fix (see diff).

Design notes

REFERENCE_TIME is always populated. Graphiti.add_episode takes reference_time: datetime as a required parameter; extract_edges derives REFERENCE_TIME from episode.valid_at (edge_operations.py:195). There is no code path where the prompt sees a missing REFERENCE_TIME, so making it the mandatory fallback is always satisfiable.

Backwards compatibility for callers that don't supply a meaningful reference_time. If a caller passes today's datetime.now() (the common case for "ingest a thing happening right now"), REFERENCE_TIME is today, and the new rules cause valid_at to land on today's actual timestamp instead of null or today-midnight. That's the same date the LLM was already guessing — just anchored on a real value instead of an invented one. No regression direction.

invalid_at stays null-by-default. The fix only mandates a fallback for valid_at. invalid_at is still set only when the source text explicitly states a fact ended — and the "NEVER today" prohibition still applies to it.

Code logic and schema untouched. Prompt-only diff. The Pydantic schemas (Edge.valid_at: str | None) are unchanged; the LLM is still free to return null for invalid_at. Tests for extract_edges are unaffected because they don't assert on prompt text.

Verification

  • ruff check + ruff format --check clean.
  • Existing tests/utils/maintenance/test_edge_operations.py passes (8/8).
  • Behavioral verification requires an LLM round-trip. Manually verified on a self-hosted deployment: ingesting an episode with reference_time = 2024-01-15T12:00:00Z and body "Brent installed Plex on minimind" (a past-tense fact with no in-text date) now produces an extracted edge with valid_at = 2024-01-15T12:00:00Z instead of today-midnight.

Notes for operators with existing bad data

Repairing rows already affected by this bug is per-operator and out of scope for this PR. The signature is narrow enough to fix with two Cypher statements (see "Notes for operators" in #1489 for the full pattern):

  • Set r.valid_at = episode.valid_at for edges whose valid_at is clean-midnight on the same date as created_at.
  • Set r.invalid_at = NULL for edges whose invalid_at matches the same clean-midnight signature.

The clean-midnight precision filter is important — a looser date()-only filter overwrites legitimate same-day text extractions. On the production graph used to characterize this, the tight filter spared 7 legitimate same-day extractions from the loose filter's blast radius.

Related PRs

This is PR 3 of 3 from #1489. The companion PRs:

The three PRs are independent of each other and can be reviewed in any order.

Edge fact extraction was hallucinating today's date for past-tense
events that lack an explicit in-text date. The prompt offered the LLM
an escape hatch ("Leave both fields null if no explicit or resolvable
time is stated"), and gpt-4.1-mini/gpt-5-mini-class models consistently
chose to invent today's midnight UTC instead of falling back to the
REFERENCE_TIME the prompt provides.

This is observable in the wild: an episode created with reference_time
= 2026-03-28 produces edges whose valid_at lands at 2026-05-15T00:00Z
(today midnight) for any fact whose source sentence doesn't name a
date — breaking bi-temporal invalidation for historical backfill.

Tightens three prompts (edge, extract_timestamps,
extract_timestamps_batch):

- Remove the "leave null" escape hatch.
- Make REFERENCE_TIME the mandatory fallback for valid_at.
- Add explicit prohibition: NEVER use today's date, "now", or any
  inferred "current" time.

invalid_at semantics are unchanged: still null unless the text states
the fact ended.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant