Skip to content

fix: skip missing sparse term lists#1985

Open
jac0626 wants to merge 6 commits into
antgroup:mainfrom
jac0626:codex/fix-sparse-term-null-list
Open

fix: skip missing sparse term lists#1985
jac0626 wants to merge 6 commits into
antgroup:mainfrom
jac0626:codex/fix-sparse-term-null-list

Conversation

@jac0626
Copy link
Copy Markdown
Collaborator

@jac0626 jac0626 commented May 7, 2026

Summary

Skip missing sparse-term posting lists in SparseTermDataCell::InsertHeapByTermLists.

The search path now ignores terms whose posting-list pointer is absent and clamps retained term size to the actual posting-list length before indexing. Regression coverage exercises both a null term list and an oversized recorded term size.

Fixes #1965.
Part of #1960.

Validation

Known CI State

The fork PR CI run failed in Test X86 Functests on an unrelated HNSW recall assertion:

tests/test_index.cpp:535: REQUIRE( cur_recall > expected_recall * query_count * RECALL_THRESHOLD ), with nanf > 72.25f.

The same log still contains existing non-target runtime errors from fast_bitset.cpp and inner_index_interface.cpp, which are handled by separate PRs.

Notes

Local macOS rebuild was blocked by the existing CMake configuration error: gcc does not support using libc++.

Signed-off-by: JiangChao <jacllovey@qq.com>
Copilot AI review requested due to automatic review settings May 7, 2026 07:15
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 7, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

🟢 Require linked issue for feature/bug PRs

Wonderful, this rule succeeded.
  • body~=(?im)(?:^|[\s\-\*])(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)\s*:?\s+(?:#\d+|[\w.\-]+/[\w.\-]+#\d+|https?://github\.com/[\w.\-]+/[\w.\-]+/issues/\d+)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the safety of SparseTermDataCell::InsertHeapByTermLists by adding bounds checks, null pointer validation, and clamping term sizes to prevent out-of-bounds access. It also includes unit tests for these scenarios. The review feedback suggests simplifying the new conditional check by removing a redundant size comparison to improve readability.

Comment thread src/datacell/sparse_term_datacell.cpp
@jac0626 jac0626 marked this pull request as ready for review May 7, 2026 07:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Skip missing sparse-term posting lists during search by guarding against null posting-list pointers and clamping the effective term size to the actual posting-list length to avoid out-of-bounds indexing.

Changes:

  • Add safety checks in SparseTermDataCell::InsertHeapByTermLists for missing/empty term lists and clamp retained term_size to the posting-list length.
  • Add regression tests covering (1) null term list skip and (2) oversized recorded term size clamping.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/datacell/sparse_term_datacell.cpp Skips missing sparse-term lists and clamps retained term size to posting-list length before indexing.
src/datacell/sparse_term_datacell_test.cpp Adds unit coverage for null posting-list pointers and oversized recorded term sizes.

Comment thread src/datacell/sparse_term_datacell_test.cpp Outdated
Comment thread src/datacell/sparse_term_datacell_test.cpp Outdated
Comment thread src/datacell/sparse_term_datacell_test.cpp Outdated
Comment thread src/datacell/sparse_term_datacell.cpp Outdated
Signed-off-by: JiangChao <jacllovey@qq.com>
@jac0626 jac0626 requested a review from Copilot May 7, 2026 07:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread src/datacell/sparse_term_datacell.cpp Outdated
Comment thread src/datacell/sparse_term_datacell_test.cpp
Comment thread src/datacell/sparse_term_datacell_test.cpp
Signed-off-by: JiangChao <jacllovey@qq.com>
@jac0626 jac0626 requested a review from Copilot May 7, 2026 08:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread src/datacell/sparse_term_datacell.cpp Outdated
Signed-off-by: JiangChao <jacllovey@qq.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: JiangChao <jacllovey@qq.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread src/datacell/sparse_term_datacell.cpp Outdated
Comment thread src/datacell/sparse_term_datacell_test.cpp
Signed-off-by: JiangChao <jacllovey@qq.com>
@jac0626 jac0626 requested a review from Copilot May 8, 2026 04:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle null term id lists in SparseTermDataCell search

2 participants