Skip to content

Fix invalid L2 distances from IndexHNSWFlat with stale cached_l2norms#5326

Open
notandruu wants to merge 1 commit into
facebookresearch:mainfrom
notandruu:fix/indexflatl2-stale-l2norms-5320
Open

Fix invalid L2 distances from IndexHNSWFlat with stale cached_l2norms#5326
notandruu wants to merge 1 commit into
facebookresearch:mainfrom
notandruu:fix/indexflatl2-stale-l2norms-5320

Conversation

@notandruu

Copy link
Copy Markdown

Fixes #5320

Root cause

IndexFlatL2::get_FlatCodesDistanceComputer() selects the cached-norms distance computer (FlatL2WithNormsDis) whenever cached_l2norms is non-empty. That computer reads cached_l2norms[i] for every vector up to ntotal. However add() after a sync_l2norms() does not extend or invalidate cached_l2norms, so its size can be smaller than ntotal. The computer then reads stale / out-of-range norms and returns invalid squared L2 distances (including negative values), with inconsistent and wrong top-k results. This is reachable from IndexHNSWFlat (METRIC_L2), whose storage is an IndexFlatL2.

Fix

Take the cached path only when cached_l2norms.size() == ntotal, i.e. when the cache covers every vector; otherwise fall back to the on-the-fly L2 computer. The fast path is unchanged for the normal usage (add all vectors, then sync_l2norms() once). One line, plus a comment.

Reproduction and validation

Repro: add a partial batch to an IndexHNSWFlat, sync_l2norms() on the downcast IndexFlatL2, add the rest, then search. Validated with a from-source build (BLAS=Accelerate, libomp) using a C++ negative control:

build negative squared-L2 distances min distance
before fix 46 / 50 -7628.0
after fix 0 739.0

Regression test TestSyncL2Norms.test_indexflat_l2_sync_norms_stale_after_add added next to the existing sync_l2norms test. It reproduces the scenario and asserts no negative distances and that each query's nearest neighbour matches an exact IndexFlatL2 oracle. Run against the released wheel (which has the bug) the test fails (min distance -11114); with this fix the underlying cause is removed.

Note: I could not build the Python bindings locally (no swig), so the C++ fix was validated via a from-source C++ build and the Python regression test was validated to fail on the unpatched build; CI will exercise it against the patched bindings.

IndexFlatL2::get_FlatCodesDistanceComputer() used the cached-norms
distance computer whenever cached_l2norms was non-empty. That computer
indexes cached_l2norms[i] for every vector up to ntotal, but add() after
a sync_l2norms() does not extend (or invalidate) cached_l2norms, so its
size can be smaller than ntotal. The computer then read stale or
out-of-range norms and returned invalid squared L2 distances, including
negative values, with inconsistent top-k results.

Only take the cached path when cached_l2norms covers every vector
(size == ntotal); otherwise fall back to the on-the-fly L2 computer. The
fast path is unchanged for the normal case (add all vectors, then call
sync_l2norms once).

Fixes facebookresearch#5320
@meta-cla

meta-cla Bot commented Jun 19, 2026

Copy link
Copy Markdown

Hi @notandruu!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla Bot added the CLA Signed label Jun 19, 2026
@meta-cla

meta-cla Bot commented Jun 19, 2026

Copy link
Copy Markdown

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndexHNSWFlat can return invalid L2 results after IndexFlatL2 sync_l2norms() followed by add()

1 participant