Skip to content

[BugFix] Fix NPE in nested kNN search when index contains documents without nested object#3368

Open
naykudev wants to merge 11 commits into
opensearch-project:mainfrom
naykudev:fix/nested-knn-npe-3359
Open

[BugFix] Fix NPE in nested kNN search when index contains documents without nested object#3368
naykudev wants to merge 11 commits into
opensearch-project:mainfrom
naykudev:fix/nested-knn-npe-3359

Conversation

@naykudev

@naykudev naykudev commented Jun 16, 2026

Copy link
Copy Markdown

Summary

Fixes #3359

  • KNN search on a nested vector field throws NullPointerException: Cannot invoke "org.apache.lucene.search.KnnCollector.topDocs()" because "this.collector" is null when the index contains documents without the nested object in a separate segment.
  • Root cause: Lucene's TimeLimitingKnnCollectorManager.newCollector() wraps a null collector (returned when a segment has no vector values) in a TimeLimitingKnnCollector decorator. When topDocs() is called on this decorator, it delegates to the null inner collector → NPE.
  • Fix: Override approximateSearch in OSDiversifyingChildrenFloatKnnVectorQuery and OSDiversifyingChildrenByteKnnVectorQuery to catch the NPE and return empty results for segments with no vectors.

Test plan

  • Create index with nested knn_vector fields (index.knn: true, Lucene engine)
  • Index one doc with nested vectors and one empty doc ({})
  • Force flush between docs to create separate segments
  • Run nested kNN search — should return hits without NPE
  • Verify correct result (hits=1, score=1.0 for the doc with vectors)

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Code Suggestions ✨

Latest suggestions up to d2d7700

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for parentFilter parameter

Add null check for parentFilter parameter to prevent potential NPE. If parentFilter
is null, calling getBitSet() will throw a NullPointerException. Consider returning
true or throwing an IllegalArgumentException when parentFilter is null.

src/main/java/org/opensearch/knn/index/query/lucenelib/NestedKnnUtil.java [36-39]

 static boolean hasNoParentDocs(BitSetProducer parentFilter, LeafReaderContext context) throws IOException {
+    if (parentFilter == null) {
+        return true;
+    }
     BitSet parentBitSet = parentFilter.getBitSet(context);
     return parentBitSet == null;
 }
Suggestion importance[1-10]: 3

__

Why: While adding a null check for parentFilter could prevent NPE, the parentFilter is passed from the constructor of the calling classes where it's a required parameter. The suggestion addresses a theoretical edge case but doesn't reflect a realistic scenario in the current codebase context.

Low
Add null check for context parameter

Add null check for context parameter before passing it to hasNoParentDocs(). The
method could receive a null context in edge cases, which would cause NPE when
parentFilter.getBitSet(context) is called.

src/main/java/org/opensearch/knn/index/query/lucenelib/OSDiversifyingChildrenByteKnnVectorQuery.java [43-54]

 @Override
 protected TopDocs approximateSearch(
     LeafReaderContext context,
     AcceptDocs acceptDocs,
     int visitedLimit,
     KnnCollectorManager knnCollectorManager
 ) throws IOException {
-    if (NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
+    if (context == null || NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
         return NestedKnnUtil.EMPTY_TOP_DOCS;
     }
     return super.approximateSearch(context, acceptDocs, visitedLimit, knnCollectorManager);
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion to check for null context is overly defensive. The context parameter comes from Lucene's internal search framework and is not expected to be null during normal operation. This check would add unnecessary overhead without addressing a real issue.

Low

Previous suggestions

Suggestions up to commit 951e5e3
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for parentFilter parameter

Add null check for parentFilter parameter to prevent NPE if a null BitSetProducer is
passed. This defensive check ensures robustness when the method is called from
different contexts where the parent filter might not be properly initialized.

src/main/java/org/opensearch/knn/index/query/lucenelib/NestedKnnUtil.java [36-38]

 static boolean hasNoParentDocs(BitSetProducer parentFilter, LeafReaderContext context) throws IOException {
+    if (parentFilter == null) {
+        return true;
+    }
     BitSet parentBitSet = parentFilter.getBitSet(context);
     return parentBitSet == null;
 }
Suggestion importance[1-10]: 3

__

Why: While adding a null check for parentFilter could improve defensive programming, the parentFilter is passed from the constructor where it's already initialized and stored as a field. The likelihood of it being null in practice is very low, making this a minor defensive improvement rather than fixing an actual issue.

Low
Add null check for context parameter

Add null check for context parameter before calling hasNoParentDocs. If context is
null, the method could fail when parentFilter.getBitSet(context) is invoked, leading
to unexpected behavior during search operations.

src/main/java/org/opensearch/knn/index/query/lucenelib/OSDiversifyingChildrenByteKnnVectorQuery.java [43-53]

 @Override
 protected TopDocs approximateSearch(
     LeafReaderContext context,
     AcceptDocs acceptDocs,
     int visitedLimit,
     KnnCollectorManager knnCollectorManager
 ) throws IOException {
-    if (NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
+    if (context == null || NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
         return NestedKnnUtil.EMPTY_TOP_DOCS;
     }
     return super.approximateSearch(context, acceptDocs, visitedLimit, knnCollectorManager);
 }
Suggestion importance[1-10]: 2

__

Why: Adding a null check for context is overly defensive. The LeafReaderContext is provided by Lucene's search framework and should never be null during normal search operations. This check would only mask potential framework-level issues and is not a practical improvement.

Low
Suggestions up to commit a39e645
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for parentFilter parameter

Add null check for parentFilter parameter before calling getBitSet(). If
parentFilter is null, the method will throw a NullPointerException, which defeats
the purpose of preventing NPEs in nested kNN search.

src/main/java/org/opensearch/knn/index/query/lucenelib/NestedKnnUtil.java [36-38]

 static boolean hasNoParentDocs(BitSetProducer parentFilter, LeafReaderContext context) throws IOException {
+    if (parentFilter == null) {
+        return true;
+    }
     BitSet parentBitSet = parentFilter.getBitSet(context);
     return parentBitSet == null;
 }
Suggestion importance[1-10]: 3

__

Why: While adding a null check for parentFilter could prevent NPEs, the suggestion assumes parentFilter might be null without evidence from the PR context. The parentFilter is passed from constructors where it's already validated, making this a defensive but low-impact suggestion.

Low
General
Add null check for context parameter

Consider adding a null check for context parameter before passing it to
hasNoParentDocs(). While the parent class likely validates this, defensive
programming would prevent potential NPEs if the contract changes or if this method
is called directly.

src/main/java/org/opensearch/knn/index/query/lucenelib/OSDiversifyingChildrenByteKnnVectorQuery.java [43-53]

 @Override
 protected TopDocs approximateSearch(
     LeafReaderContext context,
     AcceptDocs acceptDocs,
     int visitedLimit,
     KnnCollectorManager knnCollectorManager
 ) throws IOException {
-    if (NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
+    if (context == null || NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
         return NestedKnnUtil.EMPTY_TOP_DOCS;
     }
     return super.approximateSearch(context, acceptDocs, visitedLimit, knnCollectorManager);
 }
Suggestion importance[1-10]: 2

__

Why: This suggestion adds defensive null checking for context, but there's no evidence in the PR that context can be null. The parent class contract likely ensures non-null parameters, and returning EMPTY_TOP_DOCS when context is null may not be semantically correct behavior.

Low
Suggestions up to commit 1a4dc98
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for parentFilter parameter

Add null check for parentFilter parameter to prevent NPE if a null BitSetProducer is
passed. This defensive check ensures robustness when the method is called from
different contexts where the parent filter might not be initialized.

src/main/java/org/opensearch/knn/index/query/lucenelib/NestedKnnUtil.java [36-39]

 static boolean hasNoParentDocs(BitSetProducer parentFilter, LeafReaderContext context) throws IOException {
+    if (parentFilter == null) {
+        return true;
+    }
     BitSet parentBitSet = parentFilter.getBitSet(context);
     return parentBitSet == null;
 }
Suggestion importance[1-10]: 3

__

Why: While adding a null check for parentFilter could improve defensive programming, the parentFilter is passed from the constructor where it's already initialized and stored as a field. The likelihood of it being null in practice is very low, making this a minor defensive improvement rather than a critical fix.

Low
Add null check for context parameter

Add null check for context parameter before calling hasNoParentDocs to prevent
potential NPE. The LeafReaderContext could be null in edge cases, and this would
cause a failure when passed to parentFilter.getBitSet().

src/main/java/org/opensearch/knn/index/query/lucenelib/OSDiversifyingChildrenByteKnnVectorQuery.java [43-54]

 @Override
 protected TopDocs approximateSearch(
     LeafReaderContext context,
     AcceptDocs acceptDocs,
     int visitedLimit,
     KnnCollectorManager knnCollectorManager
 ) throws IOException {
-    if (NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
+    if (context == null || NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
         return NestedKnnUtil.EMPTY_TOP_DOCS;
     }
     return super.approximateSearch(context, acceptDocs, visitedLimit, knnCollectorManager);
 }
Suggestion importance[1-10]: 2

__

Why: The LeafReaderContext parameter is provided by Lucene's search framework and is not expected to be null during normal operation. Adding this check would be overly defensive without evidence of actual null cases. The suggestion addresses a theoretical edge case rather than a real issue.

Low
Suggestions up to commit 1a4dc98
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for parentFilter parameter

Add null check for parentFilter parameter to prevent potential NPE. If parentFilter
is null, calling getBitSet() will throw a NullPointerException. Consider returning
true or throwing an IllegalArgumentException when parentFilter is null.

src/main/java/org/opensearch/knn/index/query/lucenelib/NestedKnnUtil.java [36-38]

 static boolean hasNoParentDocs(BitSetProducer parentFilter, LeafReaderContext context) throws IOException {
+    if (parentFilter == null) {
+        return true;
+    }
     BitSet parentBitSet = parentFilter.getBitSet(context);
     return parentBitSet == null;
 }
Suggestion importance[1-10]: 3

__

Why: While adding a null check for parentFilter could prevent NPE, the parameter is passed from constructors where it's already validated and stored as a field. The suggestion assumes parentFilter could be null without evidence from the PR context. This is a defensive programming practice but not a critical issue given the current usage pattern.

Low
Add null check for context parameter

Add null check for context parameter before passing to hasNoParentDocs(). The method
could receive a null context which would cause NPE when
parentFilter.getBitSet(context) is called. Validate the parameter at the method
entry point.

src/main/java/org/opensearch/knn/index/query/lucenelib/OSDiversifyingChildrenByteKnnVectorQuery.java [43-53]

 @Override
 protected TopDocs approximateSearch(
     LeafReaderContext context,
     AcceptDocs acceptDocs,
     int visitedLimit,
     KnnCollectorManager knnCollectorManager
 ) throws IOException {
-    if (NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
+    if (context == null || NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
         return NestedKnnUtil.EMPTY_TOP_DOCS;
     }
     return super.approximateSearch(context, acceptDocs, visitedLimit, knnCollectorManager);
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion to check if context is null is overly defensive. The approximateSearch method is an override of a parent class method, and LeafReaderContext is provided by Lucene's search framework which would not pass null. Adding this check would be unnecessary and inconsistent with standard Lucene API usage patterns.

Low
Suggestions up to commit 2243d0e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Check for empty BitSet

The method should also check if the BitSet is empty (has no set bits) in addition to
checking for null. A non-null but empty BitSet would still cause issues in the kNN
search as there are no parent documents to process.

src/main/java/org/opensearch/knn/index/query/lucenelib/NestedKnnUtil.java [36-39]

 static boolean hasNoParentDocs(BitSetProducer parentFilter, LeafReaderContext context) throws IOException {
     BitSet parentBitSet = parentFilter.getBitSet(context);
-    return parentBitSet == null;
+    return parentBitSet == null || parentBitSet.cardinality() == 0;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that checking only for null BitSet may not be sufficient. An empty BitSet (with no set bits) could also indicate no parent documents exist. However, the score is moderate because the current implementation may be intentionally checking only for null based on Lucene's behavior, and the PR description specifically mentions preventing NPE rather than handling empty BitSets. The suggestion is valid but may require verification of whether cardinality() == 0 is a realistic scenario that needs handling.

Medium

naykudev added 2 commits June 16, 2026 13:05
…-project#3359)

When a segment contains documents without nested vector fields,
Lucene's TimeLimitingKnnCollectorManager wraps a null collector
in a Decorator, causing NPE on topDocs(). Override approximateSearch
in OSDiversifyingChildrenFloatKnnVectorQuery and
OSDiversifyingChildrenByteKnnVectorQuery to catch the NPE and
return empty results for segments with no vectors.

Signed-off-by: ved naykude <vnaykude@amazon.com>
Signed-off-by: ved naykude <vnaykude@amazon.com>
@naykudev naykudev force-pushed the fix/nested-knn-npe-3359 branch from 28062ad to 5bfc3b6 Compare June 16, 2026 20:05

@navneet1v navneet1v left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @naykudev for raising the PR. Few things:

  1. As per the GH issue the issue is only happening in 3.7 . My question would be is this a problem with older versions too? like 3.6 etc? can we validate that please.
  2. Please add Integration Test and Unit for this change.
  3. Is this change only needed for Lucene engine or we need change in Faiss engine with/without Memory Optimized Search

…-project#3359)

When a segment contains no nested documents (e.g., only an empty doc),
DiversifyingNearestChildrenKnnCollectorManager.newCollector() returns
null because parentBitSet is null. Lucene's TimeLimitingKnnCollectorManager
wraps this null in a Decorator, causing NPE on topDocs().

Fix: Check parentFilter.getBitSet(context) before calling
super.approximateSearch(). If the segment has no parent documents,
return empty results immediately. This is the same null check that
the collector manager does internally, but applied earlier to avoid
the TimeLimitingKnnCollectorManager wrapping issue.

This fix is only needed for the Lucene engine path
(OSDiversifyingChildrenFloatKnnVectorQuery and
OSDiversifyingChildrenByteKnnVectorQuery). Faiss engine uses a
different query path (NativeEngineKnnVectorQuery -> KNNWeight) that
manages its own collectors and is not affected.

Signed-off-by: ved naykude <vnaykude@amazon.com>
@naykudev naykudev force-pushed the fix/nested-knn-npe-3359 branch from ec8bec6 to 0d85b0b Compare June 16, 2026 21:20
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Reviewer Guide 🔍

(Review updated until commit d2d7700)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Incomplete check

The hasNoParentDocs method only checks if parentBitSet is null, but does not verify if the BitSet is empty (no bits set). A non-null but empty BitSet would still cause the same NPE when no parent documents exist in the segment. This occurs when BitSetProducer.getBitSet() returns an empty BitSet instead of null for segments with no matching documents.

static boolean hasNoParentDocs(BitSetProducer parentFilter, LeafReaderContext context) throws IOException {
    BitSet parentBitSet = parentFilter.getBitSet(context);
    return parentBitSet == null;
}
Possible Issue

The approximateSearch override catches the case where parentFilter.getBitSet(context) returns null, but the actual NPE described in the PR occurs when TimeLimitingKnnCollectorManager wraps a null collector. This fix may not prevent the NPE if the null collector is created for other reasons (e.g., no vector values in segment) while a valid parent BitSet exists. The check should verify vector field presence, not just parent document presence.

@Override
protected TopDocs approximateSearch(
    LeafReaderContext context,
    AcceptDocs acceptDocs,
    int visitedLimit,
    KnnCollectorManager knnCollectorManager
) throws IOException {
    if (NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
        return NestedKnnUtil.EMPTY_TOP_DOCS;
    }
    return super.approximateSearch(context, acceptDocs, visitedLimit, knnCollectorManager);
}
Possible Issue

Same issue as in the byte variant: the fix checks for parent documents but the root cause is a null collector from segments without vector values. A segment could have parent documents but no vectors, still triggering the NPE. The check should verify vector field presence in the segment, not parent document presence.

@Override
protected TopDocs approximateSearch(
    LeafReaderContext context,
    AcceptDocs acceptDocs,
    int visitedLimit,
    KnnCollectorManager knnCollectorManager
) throws IOException {
    if (NestedKnnUtil.hasNoParentDocs(parentFilter, context)) {
        return NestedKnnUtil.EMPTY_TOP_DOCS;
    }
    return super.approximateSearch(context, acceptDocs, visitedLimit, knnCollectorManager);
}

@naykudev

Copy link
Copy Markdown
Author

NPE catch removed — Replaced with null check on parentFilter.getBitSet(context) before calling super.approximateSearch(). If segment has no parent docs, return empty TopDocs. Same check DiversifyingNearestChildrenKnnCollectorManager does internally, applied earlier to avoid TimeLimitingKnnCollectorManager wrapping null.

Faiss / MOS not affected — Tested with same repro scenario, works without fix. Faiss uses NativeEngineKnnVectorQuery → KNNWeight which doesn't go through Lucene's AbstractKnnVectorQuery.approximateSearch() where the wrapping happens.

Older versions — Validated via code analysis: the vulnerable code path exists on the 3.6 and 3.5 branches. OSDiversifyingChildrenFloatKnnVectorQuery on 3.6 extends the same Lucene DiversifyingChildrenFloatKnnVectorQuery without overriding approximateSearch, and Lucene 10.1 (used by 3.6) has TimeLimitingKnnCollectorManager with the same null-wrapping behavior. The bug is present on 3.5+ wherever this code path + separate segments + empty docs condition exists. The reporter likely didn't hit it on 3.6 due to different data/segment patterns.

Tests — Will add integration + unit tests.

- Integration test: verifies nested kNN search succeeds when an empty
  document (without nested object) exists in a separate segment, for
  both Lucene and Faiss engines.
- Unit test: verifies approximateSearch returns empty TopDocs when
  parentBitSet is null (segment has no nested docs).

Signed-off-by: ved naykude <vnaykude@amazon.com>
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 056b585

Signed-off-by: ved naykude <vnaykude@amazon.com>
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 1f4e9bb

@naykudev naykudev requested a review from navneet1v June 16, 2026 21:43
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.76%. Comparing base (80c40d9) to head (d2d7700).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3368      +/-   ##
============================================
+ Coverage     83.73%   83.76%   +0.03%     
- Complexity     4358     4366       +8     
============================================
  Files           453      454       +1     
  Lines         15773    15784      +11     
  Branches       2056     2057       +1     
============================================
+ Hits          13207    13222      +15     
+ Misses         1782     1776       -6     
- Partials        784      786       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Unit test: verifies byte query returns empty TopDocs when
  parentBitSet is null (covers null branch)
- Integration test: verifies Lucene byte vector nested kNN search
  succeeds with empty doc in separate segment (covers non-null branch)

Signed-off-by: ved naykude <vnaykude@amazon.com>
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit b01d3de

- Import TotalHits and ScoreDoc at top instead of fully qualified names
- Extract static EMPTY_TOP_DOCS constant for reuse
- Create NestedKnnUtil.hasNoParentDocs() utility method to avoid
  duplicated null check logic across float and byte query classes
- Add unit tests for NestedKnnUtil covering both null and non-null cases

Signed-off-by: ved naykude <vnaykude@amazon.com>
@naykudev naykudev requested a review from kotwanikunal June 17, 2026 20:00
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit b811036

@naveentatikonda naveentatikonda changed the title Fix NPE in nested kNN search when docs lack nested object [BugFix] Fix NPE in nested kNN search when index contains documents without nested object Jun 17, 2026
Comment thread src/test/java/org/opensearch/knn/integ/NestedSearchIT.java
- Move static EMPTY_TOP_DOCS constant to NestedKnnUtil to avoid
  duplication across both query classes
- Add integration test for Faiss with MOS (ON_DISK mode) + empty doc
  in separate segment

Signed-off-by: ved naykude <vnaykude@amazon.com>
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 2243d0e

@naykudev naykudev requested a review from naveentatikonda June 17, 2026 22:17
Comment thread src/test/java/org/opensearch/knn/integ/NestedSearchIT.java Outdated
The MOS test should validate memory-optimized search using an explicit
index.knn.memory_optimized_search setting on a fp32 index, not by
relying on ON_DISK mode which uses MOS implicitly.

Signed-off-by: ved naykude <vnaykude@amazon.com>
@naykudev naykudev force-pushed the fix/nested-knn-npe-3359 branch from 9d330db to 1a4dc98 Compare June 22, 2026 18:14
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 1a4dc98

1 similar comment
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 1a4dc98

Signed-off-by: ved naykude <vnaykude@amazon.com>
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit a39e645

@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 951e5e3

The createKnnIndex(String, Settings, String) overload doesn't
automatically add index.knn=true. Include getKNNDefaultIndexSettings()
as the base settings when creating the MOS test index.

Signed-off-by: ved naykude <vnaykude@amazon.com>
@naykudev naykudev force-pushed the fix/nested-knn-npe-3359 branch from 951e5e3 to d2d7700 Compare June 23, 2026 00:53
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit d2d7700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Running a KNN search query on a vector field inside a nested object results in a NullPointerException

4 participants