[BugFix] Fix NPE in nested kNN search when index contains documents without nested object#3368
[BugFix] Fix NPE in nested kNN search when index contains documents without nested object#3368naykudev wants to merge 11 commits into
Conversation
PR Code Suggestions ✨Latest suggestions up to d2d7700 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 951e5e3
Suggestions up to commit a39e645
Suggestions up to commit 1a4dc98
Suggestions up to commit 1a4dc98
Suggestions up to commit 2243d0e
|
…-project#3359) When a segment contains documents without nested vector fields, Lucene's TimeLimitingKnnCollectorManager wraps a null collector in a Decorator, causing NPE on topDocs(). Override approximateSearch in OSDiversifyingChildrenFloatKnnVectorQuery and OSDiversifyingChildrenByteKnnVectorQuery to catch the NPE and return empty results for segments with no vectors. Signed-off-by: ved naykude <vnaykude@amazon.com>
Signed-off-by: ved naykude <vnaykude@amazon.com>
28062ad to
5bfc3b6
Compare
navneet1v
left a comment
There was a problem hiding this comment.
Thanks @naykudev for raising the PR. Few things:
- As per the GH issue the issue is only happening in 3.7 . My question would be is this a problem with older versions too? like 3.6 etc? can we validate that please.
- Please add Integration Test and Unit for this change.
- Is this change only needed for Lucene engine or we need change in Faiss engine with/without Memory Optimized Search
…-project#3359) When a segment contains no nested documents (e.g., only an empty doc), DiversifyingNearestChildrenKnnCollectorManager.newCollector() returns null because parentBitSet is null. Lucene's TimeLimitingKnnCollectorManager wraps this null in a Decorator, causing NPE on topDocs(). Fix: Check parentFilter.getBitSet(context) before calling super.approximateSearch(). If the segment has no parent documents, return empty results immediately. This is the same null check that the collector manager does internally, but applied earlier to avoid the TimeLimitingKnnCollectorManager wrapping issue. This fix is only needed for the Lucene engine path (OSDiversifyingChildrenFloatKnnVectorQuery and OSDiversifyingChildrenByteKnnVectorQuery). Faiss engine uses a different query path (NativeEngineKnnVectorQuery -> KNNWeight) that manages its own collectors and is not affected. Signed-off-by: ved naykude <vnaykude@amazon.com>
ec8bec6 to
0d85b0b
Compare
PR Reviewer Guide 🔍(Review updated until commit d2d7700)Here are some key observations to aid the review process:
|
|
NPE catch removed — Replaced with null check on parentFilter.getBitSet(context) before calling super.approximateSearch(). If segment has no parent docs, return empty TopDocs. Same check DiversifyingNearestChildrenKnnCollectorManager does internally, applied earlier to avoid TimeLimitingKnnCollectorManager wrapping null. Faiss / MOS not affected — Tested with same repro scenario, works without fix. Faiss uses NativeEngineKnnVectorQuery → KNNWeight which doesn't go through Lucene's AbstractKnnVectorQuery.approximateSearch() where the wrapping happens. Older versions — Validated via code analysis: the vulnerable code path exists on the 3.6 and 3.5 branches. OSDiversifyingChildrenFloatKnnVectorQuery on 3.6 extends the same Lucene DiversifyingChildrenFloatKnnVectorQuery without overriding approximateSearch, and Lucene 10.1 (used by 3.6) has TimeLimitingKnnCollectorManager with the same null-wrapping behavior. The bug is present on 3.5+ wherever this code path + separate segments + empty docs condition exists. The reporter likely didn't hit it on 3.6 due to different data/segment patterns. Tests — Will add integration + unit tests. |
- Integration test: verifies nested kNN search succeeds when an empty document (without nested object) exists in a separate segment, for both Lucene and Faiss engines. - Unit test: verifies approximateSearch returns empty TopDocs when parentBitSet is null (segment has no nested docs). Signed-off-by: ved naykude <vnaykude@amazon.com>
|
Persistent review updated to latest commit 056b585 |
Signed-off-by: ved naykude <vnaykude@amazon.com>
|
Persistent review updated to latest commit 1f4e9bb |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3368 +/- ##
============================================
+ Coverage 83.73% 83.76% +0.03%
- Complexity 4358 4366 +8
============================================
Files 453 454 +1
Lines 15773 15784 +11
Branches 2056 2057 +1
============================================
+ Hits 13207 13222 +15
+ Misses 1782 1776 -6
- Partials 784 786 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- Unit test: verifies byte query returns empty TopDocs when parentBitSet is null (covers null branch) - Integration test: verifies Lucene byte vector nested kNN search succeeds with empty doc in separate segment (covers non-null branch) Signed-off-by: ved naykude <vnaykude@amazon.com>
|
Persistent review updated to latest commit b01d3de |
- Import TotalHits and ScoreDoc at top instead of fully qualified names - Extract static EMPTY_TOP_DOCS constant for reuse - Create NestedKnnUtil.hasNoParentDocs() utility method to avoid duplicated null check logic across float and byte query classes - Add unit tests for NestedKnnUtil covering both null and non-null cases Signed-off-by: ved naykude <vnaykude@amazon.com>
|
Persistent review updated to latest commit b811036 |
- Move static EMPTY_TOP_DOCS constant to NestedKnnUtil to avoid duplication across both query classes - Add integration test for Faiss with MOS (ON_DISK mode) + empty doc in separate segment Signed-off-by: ved naykude <vnaykude@amazon.com>
|
Persistent review updated to latest commit 2243d0e |
The MOS test should validate memory-optimized search using an explicit index.knn.memory_optimized_search setting on a fp32 index, not by relying on ON_DISK mode which uses MOS implicitly. Signed-off-by: ved naykude <vnaykude@amazon.com>
9d330db to
1a4dc98
Compare
|
Persistent review updated to latest commit 1a4dc98 |
1 similar comment
|
Persistent review updated to latest commit 1a4dc98 |
Signed-off-by: ved naykude <vnaykude@amazon.com>
|
Persistent review updated to latest commit a39e645 |
|
Persistent review updated to latest commit 951e5e3 |
The createKnnIndex(String, Settings, String) overload doesn't automatically add index.knn=true. Include getKNNDefaultIndexSettings() as the base settings when creating the MOS test index. Signed-off-by: ved naykude <vnaykude@amazon.com>
951e5e3 to
d2d7700
Compare
|
Persistent review updated to latest commit d2d7700 |
Summary
Fixes #3359
NullPointerException: Cannot invoke "org.apache.lucene.search.KnnCollector.topDocs()" because "this.collector" is nullwhen the index contains documents without the nested object in a separate segment.TimeLimitingKnnCollectorManager.newCollector()wraps a null collector (returned when a segment has no vector values) in aTimeLimitingKnnCollectordecorator. WhentopDocs()is called on this decorator, it delegates to the null inner collector → NPE.approximateSearchinOSDiversifyingChildrenFloatKnnVectorQueryandOSDiversifyingChildrenByteKnnVectorQueryto catch the NPE and return empty results for segments with no vectors.Test plan
index.knn: true, Lucene engine){})