fix(prompts): force valid_at fallback to REFERENCE_TIME#1492
Open
brentkearney wants to merge 1 commit into
Open
fix(prompts): force valid_at fallback to REFERENCE_TIME#1492brentkearney wants to merge 1 commit into
brentkearney wants to merge 1 commit into
Conversation
Edge fact extraction was hallucinating today's date for past-tense
events that lack an explicit in-text date. The prompt offered the LLM
an escape hatch ("Leave both fields null if no explicit or resolvable
time is stated"), and gpt-4.1-mini/gpt-5-mini-class models consistently
chose to invent today's midnight UTC instead of falling back to the
REFERENCE_TIME the prompt provides.
This is observable in the wild: an episode created with reference_time
= 2026-03-28 produces edges whose valid_at lands at 2026-05-15T00:00Z
(today midnight) for any fact whose source sentence doesn't name a
date — breaking bi-temporal invalidation for historical backfill.
Tightens three prompts (edge, extract_timestamps,
extract_timestamps_batch):
- Remove the "leave null" escape hatch.
- Make REFERENCE_TIME the mandatory fallback for valid_at.
- Add explicit prohibition: NEVER use today's date, "now", or any
inferred "current" time.
invalid_at semantics are unchanged: still null unless the text states
the fact ended.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves part of #1489 (Gap 3).
Summary
The fact-extraction prompt offers the LLM an escape clause — "Leave both fields null if no explicit or resolvable time is stated" — and gpt-4.1 / gpt-5-class models reliably read it as permission to guess. Empirically they choose today @ 00:00 UTC instead of falling back to the
REFERENCE_TIMEthe prompt also provides.The plumbing is correct:
Graphiti.add_episode(reference_time=…)setsepisode.valid_at, andextract_edgesalready passeslatest_episode.valid_atinto the prompt context asREFERENCE_TIME(graphiti_core/utils/maintenance/edge_operations.py:195). The bug is prompt-shape — the rules contradict each other and the LLM picks the loosest interpretation.This PR tightens three prompts (
edge,extract_timestamps,extract_timestamps_batch) to remove the "leave null" escape forvalid_at, mandate REFERENCE_TIME as the fallback, and explicitly prohibit substituting "today" / "now" / any inferred current time.Why this matters
Graphiti's bi-temporal model is the differentiator that lets agents reason about what was true when. That guarantee silently breaks for any caller that backfills historical data (memory consolidators, migration scripts, ingesting old logs/conversations) — every fact extracted from an ambiguous past-tense sentence collapses to "valid as of today," erasing the temporal axis.
Scale observed in production
On a graph with 1384 edges extracted from 264 historical-backfill episodes (all called with explicit historical
reference_time):valid_atvalid_atvalid_atinvalid_atinvalid_atinvalid_atinvalid_atThe hallucination signature is clean midnight UTC on the same date as
created_at(hour=minute=second=nanosecond=0). Microsecond-precision today values are NOT hallucinations — they're graphiti's own supersede-chain timestamps and are preserved.Changes
Single file, three prompts, prompt-only:
edgeprompt — DATETIME RULES section:extract_timestampsandextract_timestamps_batch— same loophole, same fix (see diff).Design notes
REFERENCE_TIMEis always populated.Graphiti.add_episodetakesreference_time: datetimeas a required parameter;extract_edgesderivesREFERENCE_TIMEfromepisode.valid_at(edge_operations.py:195). There is no code path where the prompt sees a missing REFERENCE_TIME, so making it the mandatory fallback is always satisfiable.Backwards compatibility for callers that don't supply a meaningful
reference_time. If a caller passes today'sdatetime.now()(the common case for "ingest a thing happening right now"), REFERENCE_TIME is today, and the new rules causevalid_atto land on today's actual timestamp instead of null or today-midnight. That's the same date the LLM was already guessing — just anchored on a real value instead of an invented one. No regression direction.invalid_atstays null-by-default. The fix only mandates a fallback forvalid_at.invalid_atis still set only when the source text explicitly states a fact ended — and the "NEVER today" prohibition still applies to it.Code logic and schema untouched. Prompt-only diff. The Pydantic schemas (
Edge.valid_at: str | None) are unchanged; the LLM is still free to return null forinvalid_at. Tests forextract_edgesare unaffected because they don't assert on prompt text.Verification
ruff check+ruff format --checkclean.tests/utils/maintenance/test_edge_operations.pypasses (8/8).reference_time = 2024-01-15T12:00:00Zand body"Brent installed Plex on minimind"(a past-tense fact with no in-text date) now produces an extracted edge withvalid_at = 2024-01-15T12:00:00Zinstead of today-midnight.Notes for operators with existing bad data
Repairing rows already affected by this bug is per-operator and out of scope for this PR. The signature is narrow enough to fix with two Cypher statements (see "Notes for operators" in #1489 for the full pattern):
r.valid_at = episode.valid_atfor edges whosevalid_atis clean-midnight on the same date ascreated_at.r.invalid_at = NULLfor edges whoseinvalid_atmatches the same clean-midnight signature.The clean-midnight precision filter is important — a looser
date()-only filter overwrites legitimate same-day text extractions. On the production graph used to characterize this, the tight filter spared 7 legitimate same-day extractions from the loose filter's blast radius.Related PRs
This is PR 3 of 3 from #1489. The companion PRs:
The three PRs are independent of each other and can be reviewed in any order.