fix: skip redundant name_embedding computation in create_entity_node_embeddings by GraphiteEdgeR · Pull Request #1457 · getzep/graphiti

GraphiteEdgeR · 2026-04-30T09:02:49Z

Summary

create_entity_node_embeddings() unconditionally computes embeddings for all nodes with a non-empty name, even if they already have a valid name_embedding. This PR adds a simple filter to skip nodes that already have their embedding set, making the function idempotent and avoiding redundant API calls.

Problem

During add_episode(), resolved nodes (merged to existing graph entities) go through create_entity_node_embeddings() which calls embedder.create_batch() for all nodes. Since get_entity_node_return_query() deliberately excludes name_embedding from its return fields (to reduce query payload), resolved nodes arrive with name_embedding=None and get re-embedded unnecessarily.

However, if callers pre-load the embedding (via node.load_name_embedding() or node_load_embeddings_bulk()), the current code still re-computes it because there's no check for existing values.

Change

`diff

filter out falsey values from nodes
filtered_nodes = [node for node in nodes if node.name]

Only compute embeddings for nodes that need them (have a name but no existing embedding)
filtered_nodes = [node for node in nodes if node.name and node.name_embedding is None]
`

Benefits

Idempotent: Safe to call multiple times without redundant API calls
Enables optimization: Callers can now pre-load embeddings from DB (via load_name_embedding() / node_load_embeddings_bulk()) before calling this function, and the pre-loaded values will be respected
Zero risk: Nodes without embeddings still get computed as before

Suggested follow-up

For maximum benefit, a follow-up PR could add a pre-loading step in extract_attributes_from_nodes() before calling create_entity_node_embeddings():

`python

Pre-load existing name_embeddings for resolved nodes

nodes_missing = [n for n in nodes if n.name and n.name_embedding is None]
if nodes_missing:
await clients.driver.graph_operations_interface.node_load_embeddings_bulk(
clients.driver, nodes_missing
)
`

This would eliminate redundant embedding API calls for all resolved nodes (typically 2-4 per episode).

…dding In create_entity_node_embeddings(), filter out nodes that already have name_embedding set. This avoids redundant embedding API calls for nodes that were resolved to existing graph entities whose embeddings were pre-loaded from the database. Previously, all nodes with a non-empty name were unconditionally re-embedded, even if they already had a valid name_embedding from a prior computation or database load. This wasted API calls and tokens for resolved (merged) nodes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip redundant name_embedding computation in create_entity_node_embeddings#1457

fix: skip redundant name_embedding computation in create_entity_node_embeddings#1457
GraphiteEdgeR wants to merge 1 commit into
getzep:mainfrom
GraphiteEdgeR:fix/skip-existing-node-embeddings

GraphiteEdgeR commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GraphiteEdgeR commented Apr 30, 2026

Summary

Problem

Change

filter out falsey values from nodes

Only compute embeddings for nodes that need them (have a name but no existing embedding)

Benefits

Suggested follow-up

Pre-load existing name_embeddings for resolved nodes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant