fix: skip missing sparse term lists#1985
Conversation
Signed-off-by: JiangChao <jacllovey@qq.com>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Require kind labelWonderful, this rule succeeded.
🟢 Require version labelWonderful, this rule succeeded.
🟢 Require linked issue for feature/bug PRsWonderful, this rule succeeded.
|
There was a problem hiding this comment.
Code Review
This pull request improves the safety of SparseTermDataCell::InsertHeapByTermLists by adding bounds checks, null pointer validation, and clamping term sizes to prevent out-of-bounds access. It also includes unit tests for these scenarios. The review feedback suggests simplifying the new conditional check by removing a redundant size comparison to improve readability.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Skip missing sparse-term posting lists during search by guarding against null posting-list pointers and clamping the effective term size to the actual posting-list length to avoid out-of-bounds indexing.
Changes:
- Add safety checks in
SparseTermDataCell::InsertHeapByTermListsfor missing/empty term lists and clamp retainedterm_sizeto the posting-list length. - Add regression tests covering (1) null term list skip and (2) oversized recorded term size clamping.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/datacell/sparse_term_datacell.cpp | Skips missing sparse-term lists and clamps retained term size to posting-list length before indexing. |
| src/datacell/sparse_term_datacell_test.cpp | Adds unit coverage for null posting-list pointers and oversized recorded term sizes. |
Signed-off-by: JiangChao <jacllovey@qq.com>
Signed-off-by: JiangChao <jacllovey@qq.com>
Signed-off-by: JiangChao <jacllovey@qq.com>
Signed-off-by: JiangChao <jacllovey@qq.com>
Signed-off-by: JiangChao <jacllovey@qq.com>
Summary
Skip missing sparse-term posting lists in
SparseTermDataCell::InsertHeapByTermLists.The search path now ignores terms whose posting-list pointer is absent and clamps retained term size to the actual posting-list length before indexing. Regression coverage exercises both a null term list and an oversized recorded term size.
Fixes #1965.
Part of #1960.
Validation
git diff --checksparse_term_datacell.cppruntime error is gone in this branch.Known CI State
The fork PR CI run failed in
Test X86 Functestson an unrelated HNSW recall assertion:tests/test_index.cpp:535: REQUIRE( cur_recall > expected_recall * query_count * RECALL_THRESHOLD ), withnanf > 72.25f.The same log still contains existing non-target runtime errors from
fast_bitset.cppandinner_index_interface.cpp, which are handled by separate PRs.Notes
Local macOS rebuild was blocked by the existing CMake configuration error:
gcc does not support using libc++.