fix: force batch_size=1 for gemini-embedding-2 models by Maanik23 · Pull Request #1474 · getzep/graphiti

Maanik23 · 2026-05-07T14:10:00Z

Summary

Fix GeminiEmbedder.create_batch() silently returning fewer vectors than inputs when using gemini-embedding-2-preview or gemini-embedding-2 models.

Closes #1467

Problem

The gemini-embedding-2* models do not support batching via embed_content(contents=[...]). They treat the list as parts of a single document and return only one embedding vector regardless of how many strings are passed.

This caused create_batch() to silently return fewer vectors than inputs, leading to ValueError in downstream zip(..., strict=True) calls during entity deduplication (e.g. in _semantic_candidate_search).

Fix

Extend the batch_size=1 special case — the same defense already applied to gemini-embedding-001 is now applied to the -2 and -2-preview models. This ensures each input is sent individually, matching the API's actual behavior.
Add a defensive length-mismatch check — if a future model (or misconfiguration) causes the API to return fewer embeddings than requested, create_batch() now detects this and triggers the existing per-item fallback path instead of silently returning partial results.

Tests

Added 6 new tests covering:

gemini-embedding-2-preview defaults to batch_size=1
gemini-embedding-2 defaults to batch_size=1
gemini-embedding-001 still defaults to batch_size=1 (existing behavior)
Other models use default batch size (100)
Explicit batch_size overrides model-specific default
Length-mismatch triggers fallback to individual processing

The gemini-embedding-2 and gemini-embedding-2-preview models do not support batching via embed_content(contents=[...]). They treat the list as parts of a single document and return only one embedding vector regardless of how many strings are passed. This caused create_batch() to silently return fewer vectors than inputs, leading to ValueError in downstream zip(..., strict=True) calls during entity deduplication. Fix: - Extend the existing batch_size=1 special case (already applied to gemini-embedding-001) to cover the -2 and -2-preview models. - Add a defensive length-mismatch check in create_batch() that triggers the per-item fallback path when the API returns fewer embeddings than requested. Closes getzep#1467

prasmussen15 requested a review from paul-paliychuk May 7, 2026 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: force batch_size=1 for gemini-embedding-2 models#1474

fix: force batch_size=1 for gemini-embedding-2 models#1474
Maanik23 wants to merge 1 commit into
getzep:mainfrom
Maanik23:fix/gemini-embedding-2-batch-size

Maanik23 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Maanik23 commented May 7, 2026

Summary

Problem

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant