Skip to content

[Bug + Feature] Historical backfill exposes three temporal-correctness gaps #1489

@brentkearney

Description

@brentkearney

Summary

While attempting to backfill historical data into Graphiti —- ingesting data from old session log files, so that the resulting facts are stamped with the time the underlying event actually happened, not the time ingestion ran —- three independent gaps were uncovered. The bi-temporal model is the differentiator that makes backfilling work: agents can reason about "what was true when," and contradictions get resolved via temporal supersedence. However, in the case of importing structured data, we know precisely when an event happened, so we need not rely on inference to discern the dates from the episode body, if we are able to pass it in as a parameter.

I propose modifying the MCP interface to add reference_time, and updating the prompt to tell the model to use it, and closing gaps in the existing prompt that caused the model to guess at dates of some related edges. I also suggest an update to delete_episode so that it does not leave dangling references, pointing at no-longer existing nodes, in the graph.

The use case

A "memory consolidation" script runs on old Claude session logs, to extract useful context information and import it into the graph using add_memory on the Graphiti MCP server for each one. Each save passes the session's actual date as the temporal reference.

For this to work end-to-end three things are needed to behave correctly:

  1. The MCP add_memory tool must accept and propagate a caller-supplied reference_time.
  2. In case of mistakes or corrections and we want to delete an entry, the MCP delete_episode tool must cascade to the edges and entities derived from the episode (so a bad ingest can be cleanly undone and re-tried).
  3. The fact-extraction LLM must honor REFERENCE_TIME when text contains no explicit date, instead of guessing.

Currently, Graphiti lacks all three.


Gap 1: MCP add_memory throws away reference_time

graphiti_core.Graphiti.add_episode already accepts a reference_time parameter and uses it correctly. The MCP wrapper at mcp_server/src/graphiti_mcp_server.py does not expose it — add_memory has no such parameter, and the internal queue service hardcodes reference_time=datetime.now(timezone.utc).

Result: every MCP-driven ingest is stamped with the ingestion time, instead of the time when the underlying event happened. This blocks the backfill use case.

Proposed fix:

  • Add optional reference_time: str | None = None (ISO-8601) to the MCP add_memory tool.
  • Parse the string to datetime inside the tool (MCP can't carry datetime across the JSON boundary).
  • Plumb through QueueService.add_episode, defaulting to datetime.now(timezone.utc) when the caller omits it — preserves current behavior for every existing caller.
  • Unit test that an explicit reference_time reaches the underlying add_episode call and that None falls back to a fresh now().

Gap 2: MCP delete_episode is shallow — leaves orphan edges and entities

The MCP delete_episode tool calls EpisodicNode.delete(client.driver), which drops only the Episodic node. The RELATES_TO edges that were extracted from the episode remain in the graph, with stale episodes arrays pointing at the now-deleted UUID. Entity nodes that were mentioned only by the deleted episode are also left behind as orphans.

graphiti_core.Graphiti.remove_episode already exists and does the cascading delete correctly: it removes the episode, the edges where this episode is the first provenance entry, and any entity nodes mentioned only by the deleted episode.

This blocks the use case at "if I make a bad ingest, I can't cleanly undo it and retry."

Proposed fix:

  • One-line change in mcp_server/src/graphiti_mcp_server.py: replace the EpisodicNode.get_by_uuid + episodic_node.delete pair with await client.remove_episode(uuid).

Gap 3: Fact extractor hallucinates today's date for ambiguous past-tense facts

When an episode is ingested with a historical reference_time (so episode.valid_at is correctly set in the past), graphiti-core correctly passes latest_episode.valid_at to the fact-extraction LLM as REFERENCE_TIME (graphiti_core/utils/maintenance/edge_operations.py:195). But the LLM frequently ignores it and stamps extracted edges with today @ 00:00 UTC as valid_at.

Root cause is prompt-shape, not plumbing. The current prompt at graphiti_core/prompts/extract_edges.py contains contradicting rules:

- If the fact is ongoing (present tense), set `valid_at` to the timestamp of the episode the fact originates from. If no per-episode timestamp is available, use REFERENCE_TIME.
- ...
- Leave both fields `null` if no explicit or resolvable time is stated.

For past-tense completed actions without an in-text date ("Bob called Roger"), the "ongoing" trigger doesn't apply and the LLM falls to the "leave null" escape. gpt-4.1 / gpt-5-class models reliably read this as "guess" and choose today's midnight rather than emit null. The same loophole exists in the extract_timestamps and extract_timestamps_batch prompts.

Scale observed in the wild (1384 edges from 264 historical-backfill episodes on a production graph):

Field Pattern Count %
valid_at Today midnight UTC (LLM hallucination) 771 56%
valid_at Legitimate in-text date 610 44%
valid_at null 3 0.2%
invalid_at null (correct) 1016 73%
invalid_at Today midnight UTC (LLM hallucination) 100 7%
invalid_at Today with microseconds (graphiti's auto-invalidation chain — legitimate) 60 4%
invalid_at Legitimate in-text date 208 15%

Detection signature is clean midnight UTC on the same date as created_at (hour=minute=second=nanosecond=0). Microsecond-precision today values are NOT hallucinations — they're graphiti's own supersede-chain timestamps.

This is the gap that breaks the use case at step 3: even with reference_time plumbed and delete_episode cascading, ambiguous past-tense facts still collapse to today, erasing the temporal axis.

Proposed fix: Tighten the DATETIME RULES in all three prompts (edge, extract_timestamps, extract_timestamps_batch):

  1. Remove the "leave null" escape for valid_at. Make REFERENCE_TIME, which is always set by Graphiti, the mandatory fallback whenever the text has no resolvable date.
  2. Keep invalid_at null-by-default — only set when the text explicitly states the fact ended or was superseded.
  3. Add an explicit prohibition: "NEVER use today's date, the current date, 'now', or any inferred 'current' time as valid_at or invalid_at. REFERENCE_TIME is the ONLY acceptable default — do not substitute your own notion of the present."

Prompt-only change; no code logic or schema impact.


Why these three are related but separable

All three are needed end-to-end for historical backfill, but each fix is independently valuable and independently mergeable:

  • The reference_time MCP param is useful to any caller that wants to backfill — even if the prompt still hallucinates, at least the episode lands at the right time.
  • The delete_episode cascade is a correctness fix that applies to any caller, not just backfill — it stops silent graph corruption from any ingest that needs to be retried.
  • The prompt fix improves temporal accuracy for all ingests, not just historical ones. (Even today-aligned ingests benefit when REFERENCE_TIME genuinely is today and the LLM should anchor on it.)

Submitting as separate PRs keeps review focused and lets each merge on its own timeline.

PR checklist

  • PR 1 — feat(mcp): add reference_time param to add_memory -- #1490
  • PR 2 — fix(mcp): delete_episode cascade to edges and orphan entities -- #1491
  • PR 3 — fix(prompts): force valid_at fallback to REFERENCE_TIME, ban "today" -- #1492

Notes for operators with existing bad data

Repairing rows already affected by Gap 3 is per-operator (not part of any of the three PRs). The signature is narrow enough to fix with two Cypher statements:

// Fix valid_at hallucinations: clean-midnight on same date as created_at
MATCH (e:Episodic)
MATCH ()-[r:RELATES_TO]->()
WHERE r.episodes[0] = e.uuid
  AND date(r.valid_at) = date(r.created_at)
  AND r.valid_at.hour = 0 AND r.valid_at.minute = 0
  AND r.valid_at.second = 0 AND r.valid_at.nanosecond = 0
  AND r.valid_at <> e.valid_at
SET r.valid_at = e.valid_at;

// Null out invalid_at hallucinations: clean-midnight only (preserves graphiti's
// auto-invalidation chain, which uses microsecond precision)
MATCH (e:Episodic)
MATCH ()-[r:RELATES_TO]->()
WHERE r.episodes[0] = e.uuid
  AND date(r.invalid_at) = date(r.created_at)
  AND r.invalid_at.hour = 0 AND r.invalid_at.minute = 0
  AND r.invalid_at.second = 0 AND r.invalid_at.nanosecond = 0
SET r.invalid_at = NULL;

The clean-midnight precision filter on Fix 1 is important — a looser date()-only filter overwrites legitimate same-day text extractions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions