fix: skip redundant name_embedding computation in create_entity_node_embeddings#1457
Open
GraphiteEdgeR wants to merge 1 commit into
Open
fix: skip redundant name_embedding computation in create_entity_node_embeddings#1457GraphiteEdgeR wants to merge 1 commit into
GraphiteEdgeR wants to merge 1 commit into
Conversation
…dding In create_entity_node_embeddings(), filter out nodes that already have name_embedding set. This avoids redundant embedding API calls for nodes that were resolved to existing graph entities whose embeddings were pre-loaded from the database. Previously, all nodes with a non-empty name were unconditionally re-embedded, even if they already had a valid name_embedding from a prior computation or database load. This wasted API calls and tokens for resolved (merged) nodes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
create_entity_node_embeddings()unconditionally computes embeddings for all nodes with a non-empty name, even if they already have a validname_embedding. This PR adds a simple filter to skip nodes that already have their embedding set, making the function idempotent and avoiding redundant API calls.Problem
During
add_episode(), resolved nodes (merged to existing graph entities) go throughcreate_entity_node_embeddings()which callsembedder.create_batch()for all nodes. Sinceget_entity_node_return_query()deliberately excludesname_embeddingfrom its return fields (to reduce query payload), resolved nodes arrive withname_embedding=Noneand get re-embedded unnecessarily.However, if callers pre-load the embedding (via
node.load_name_embedding()ornode_load_embeddings_bulk()), the current code still re-computes it because there's no check for existing values.Change
`diff
filter out falsey values from nodes
Only compute embeddings for nodes that need them (have a name but no existing embedding)
`
Benefits
load_name_embedding()/node_load_embeddings_bulk()) before calling this function, and the pre-loaded values will be respectedSuggested follow-up
For maximum benefit, a follow-up PR could add a pre-loading step in
extract_attributes_from_nodes()before callingcreate_entity_node_embeddings():`python
Pre-load existing name_embeddings for resolved nodes
nodes_missing = [n for n in nodes if n.name and n.name_embedding is None]
if nodes_missing:
await clients.driver.graph_operations_interface.node_load_embeddings_bulk(
clients.driver, nodes_missing
)
`
This would eliminate redundant embedding API calls for all resolved nodes (typically 2-4 per episode).