fix(prompts): force valid_at fallback to REFERENCE_TIME by brentkearney · Pull Request #1492 · getzep/graphiti

brentkearney · 2026-05-15T22:21:30Z

Resolves part of #1489 (Gap 3).

Summary

The fact-extraction prompt offers the LLM an escape clause — "Leave both fields null if no explicit or resolvable time is stated" — and gpt-4.1 / gpt-5-class models reliably read it as permission to guess. Empirically they choose today @ 00:00 UTC instead of falling back to the REFERENCE_TIME the prompt also provides.

The plumbing is correct: Graphiti.add_episode(reference_time=…) sets episode.valid_at, and extract_edges already passes latest_episode.valid_at into the prompt context as REFERENCE_TIME (graphiti_core/utils/maintenance/edge_operations.py:195). The bug is prompt-shape — the rules contradict each other and the LLM picks the loosest interpretation.

This PR tightens three prompts (edge, extract_timestamps, extract_timestamps_batch) to remove the "leave null" escape for valid_at, mandate REFERENCE_TIME as the fallback, and explicitly prohibit substituting "today" / "now" / any inferred current time.

Why this matters

Graphiti's bi-temporal model is the differentiator that lets agents reason about what was true when. That guarantee silently breaks for any caller that backfills historical data (memory consolidators, migration scripts, ingesting old logs/conversations) — every fact extracted from an ambiguous past-tense sentence collapses to "valid as of today," erasing the temporal axis.

Scale observed in production

On a graph with 1384 edges extracted from 264 historical-backfill episodes (all called with explicit historical reference_time):

Field	Pattern	Count	%
`valid_at`	Today midnight UTC (LLM hallucination)	771	56%
`valid_at`	Legitimate in-text date	610	44%
`valid_at`	null	3	0.2%
`invalid_at`	null (correct)	1016	73%
`invalid_at`	Today midnight UTC (LLM hallucination)	100	7%
`invalid_at`	Today with microseconds (graphiti's auto-invalidation chain — legitimate)	60	4%
`invalid_at`	Legitimate in-text date	208	15%

The hallucination signature is clean midnight UTC on the same date as created_at (hour=minute=second=nanosecond=0). Microsecond-precision today values are NOT hallucinations — they're graphiti's own supersede-chain timestamps and are preserved.

Changes

Single file, three prompts, prompt-only:

edge prompt — DATETIME RULES section:

 # DATETIME RULES

 - Use ISO 8601 with "Z" suffix (UTC) (e.g., 2025-04-30T00:00:00Z).
-- If the fact is ongoing (present tense), set `valid_at` to the timestamp of the episode the fact originates from. If no per-episode timestamp is available, use REFERENCE_TIME.
-- If a change/termination is expressed, set `invalid_at` to the relevant timestamp.
-- Leave both fields `null` if no explicit or resolvable time is stated.
+- `valid_at` MUST always be set. Resolution order:
+  1. An explicit or resolvable date in the source text (resolved against the originating episode's timestamp, or REFERENCE_TIME if no per-episode timestamp).
+  2. If the text contains no explicit/resolvable date, fall back to the originating episode's per-episode timestamp, or REFERENCE_TIME if no per-episode timestamp is available.
+- `invalid_at`: set when the text explicitly states the fact ended or was superseded; otherwise leave `null`.
+- NEVER use today's date, the current date, "now", or any inferred "current" time as `valid_at` or `invalid_at`. REFERENCE_TIME is the ONLY acceptable default — do not substitute your own notion of the present.
 - If only a date is mentioned (no time), assume 00:00:00.
 - If only a year is mentioned, use January 1st at 00:00:00.

extract_timestamps and extract_timestamps_batch — same loophole, same fix (see diff).

Design notes

REFERENCE_TIME is always populated. Graphiti.add_episode takes reference_time: datetime as a required parameter; extract_edges derives REFERENCE_TIME from episode.valid_at (edge_operations.py:195). There is no code path where the prompt sees a missing REFERENCE_TIME, so making it the mandatory fallback is always satisfiable.

Backwards compatibility for callers that don't supply a meaningful reference_time. If a caller passes today's datetime.now() (the common case for "ingest a thing happening right now"), REFERENCE_TIME is today, and the new rules cause valid_at to land on today's actual timestamp instead of null or today-midnight. That's the same date the LLM was already guessing — just anchored on a real value instead of an invented one. No regression direction.

invalid_at stays null-by-default. The fix only mandates a fallback for valid_at. invalid_at is still set only when the source text explicitly states a fact ended — and the "NEVER today" prohibition still applies to it.

Code logic and schema untouched. Prompt-only diff. The Pydantic schemas (Edge.valid_at: str | None) are unchanged; the LLM is still free to return null for invalid_at. Tests for extract_edges are unaffected because they don't assert on prompt text.

Verification

ruff check + ruff format --check clean.
Existing tests/utils/maintenance/test_edge_operations.py passes (8/8).
Behavioral verification requires an LLM round-trip. Manually verified on a self-hosted deployment: ingesting an episode with reference_time = 2024-01-15T12:00:00Z and body "Brent installed Plex on minimind" (a past-tense fact with no in-text date) now produces an extracted edge with valid_at = 2024-01-15T12:00:00Z instead of today-midnight.

Notes for operators with existing bad data

Repairing rows already affected by this bug is per-operator and out of scope for this PR. The signature is narrow enough to fix with two Cypher statements (see "Notes for operators" in #1489 for the full pattern):

Set r.valid_at = episode.valid_at for edges whose valid_at is clean-midnight on the same date as created_at.
Set r.invalid_at = NULL for edges whose invalid_at matches the same clean-midnight signature.

The clean-midnight precision filter is important — a looser date()-only filter overwrites legitimate same-day text extractions. On the production graph used to characterize this, the tight filter spared 7 legitimate same-day extractions from the loose filter's blast radius.

Related PRs

This is PR 3 of 3 from #1489. The companion PRs:

The three PRs are independent of each other and can be reviewed in any order.

Edge fact extraction was hallucinating today's date for past-tense events that lack an explicit in-text date. The prompt offered the LLM an escape hatch ("Leave both fields null if no explicit or resolvable time is stated"), and gpt-4.1-mini/gpt-5-mini-class models consistently chose to invent today's midnight UTC instead of falling back to the REFERENCE_TIME the prompt provides. This is observable in the wild: an episode created with reference_time = 2026-03-28 produces edges whose valid_at lands at 2026-05-15T00:00Z (today midnight) for any fact whose source sentence doesn't name a date — breaking bi-temporal invalidation for historical backfill. Tightens three prompts (edge, extract_timestamps, extract_timestamps_batch): - Remove the "leave null" escape hatch. - Make REFERENCE_TIME the mandatory fallback for valid_at. - Add explicit prohibition: NEVER use today's date, "now", or any inferred "current" time. invalid_at semantics are unchanged: still null unless the text states the fact ended. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

This was referenced May 15, 2026

feat(mcp): add optional reference_time to add_memory #1490

Open

fix(mcp): delete_episode cascade to extracted edges and orphan entities #1491

Open

[Bug + Feature] Historical backfill exposes three temporal-correctness gaps #1489

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prompts): force valid_at fallback to REFERENCE_TIME#1492

fix(prompts): force valid_at fallback to REFERENCE_TIME#1492
brentkearney wants to merge 1 commit into
getzep:mainfrom
brentkearney:feat/extract-edges-prompt-no-today

brentkearney commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brentkearney commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

Scale observed in production

Changes

Design notes

Verification

Notes for operators with existing bad data

Related PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

brentkearney commented May 15, 2026 •

edited

Loading