Skip to content

fix: validate LanceDB vector dimensions before batch writes#2414

Open
Kevin-Li-2025 wants to merge 1 commit into
microsoft:mainfrom
Kevin-Li-2025:kevin/fix-embedding-batch-vectors
Open

fix: validate LanceDB vector dimensions before batch writes#2414
Kevin-Li-2025 wants to merge 1 commit into
microsoft:mainfrom
Kevin-Li-2025:kevin/fix-embedding-batch-vectors

Conversation

@Kevin-Li-2025

@Kevin-Li-2025 Kevin-Li-2025 commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Fixes #2265 by validating each LanceDB document vector against the configured vector_size before constructing the PyArrow fixed-size-list column.

When the embedding model returns vectors with a different dimension than the configured LanceDB index, the current code reaches FixedSizeListArray.from_arrays(...) and fails later with a confusing row-count-style Arrow error such as Column 1 named vector expected length 44 but got length 11. This change fails earlier with the exact document id, actual vector dimension, index name, and configured vector size.

Tests

  • UV_CACHE_DIR=/private/tmp/uv-cache-graphrag uv run pytest -q tests/integration/vector_stores/test_lancedb.py -k "load_documents"
  • UV_CACHE_DIR=/private/tmp/uv-cache-graphrag uv run pytest -q tests/integration/vector_stores/test_lancedb.py
  • UV_CACHE_DIR=/private/tmp/uv-cache-graphrag uv run ruff format --check packages/graphrag-vectors/graphrag_vectors/lancedb.py tests/integration/vector_stores/test_lancedb.py
  • UV_CACHE_DIR=/private/tmp/uv-cache-graphrag uv run ruff check packages/graphrag-vectors/graphrag_vectors/lancedb.py tests/integration/vector_stores/test_lancedb.py

@Kevin-Li-2025 Kevin-Li-2025 requested a review from a team as a code owner June 24, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Pipeline error: Column 1 named vector expected length 44 but got length 11

1 participant