perf(io): optimize CRC32 with lookup table and reduce ReaderIO heap allocation#2158
perf(io): optimize CRC32 with lookup table and reduce ReaderIO heap allocation#2158jfeng18 wants to merge 2 commits into
Conversation
…llocation in ReaderIO - Replace bit-by-bit CRC32 implementation (8 iterations per byte) with a 256-entry lookup table (1 iteration per byte), resolving the existing TODO(wxyu) for performance optimization - Also fix a latent sign-extension bug in the old calculate_checksum: char was implicitly sign-extended before XOR, producing non-standard CRC32 for bytes >= 0x80; now correctly cast to uint8_t (backward compatible since checksummed metadata is ASCII-only JSON) - ReaderIO::MultiReadImpl: replace std::vector heap allocation with a stack buffer (1KB) for counts <= 128, with std::unique_ptr heap fallback for larger counts to maintain exception safety Signed-off-by: 水芝 <fengjiangtian.fjt@alibaba-inc.com> Assisted-by: Claude:claude-opus-4-6
|
Hi maintainers, could you please add |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Require kind labelWonderful, this rule succeeded.
🟢 Require version labelWonderful, this rule succeeded.
|
There was a problem hiding this comment.
Code Review
This pull request optimizes performance in ReaderIO::MultiReadImpl by using a stack-allocated buffer for small offset counts and optimizes Footer::calculate_checksum by implementing a lookup-table-based CRC32 calculation. The review feedback suggests using the custom allocator_ for heap allocations in ReaderIO to maintain consistent memory tracking, and using static constexpr for the CRC32 lookup table to guarantee compile-time initialization.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| constexpr uint64_t kStackThreshold = 128; | ||
| uint64_t local_buf[kStackThreshold]; | ||
| std::unique_ptr<uint64_t[]> heap_buf; | ||
| uint64_t* real_offsets; | ||
| if (count <= kStackThreshold) { | ||
| real_offsets = local_buf; | ||
| } else { | ||
| heap_buf.reset(new uint64_t[count]); | ||
| real_offsets = heap_buf.get(); | ||
| } |
There was a problem hiding this comment.
The current implementation uses new uint64_t[count] to allocate memory on the heap when count exceeds kStackThreshold. This bypasses the custom allocator_ provided to ReaderIO, which is used for memory tracking, custom alignment, or pool allocation in the vsag library. To maintain consistency and ensure proper memory management, please use the custom allocator_ for heap allocations.
constexpr uint64_t kStackThreshold = 128;
uint64_t local_buf[kStackThreshold];
struct AllocatorDeleter {
Allocator* allocator;
void operator()(uint64_t* ptr) const {
allocator->Deallocate(ptr);
}
};
std::unique_ptr<uint64_t[], AllocatorDeleter> heap_buf(nullptr, AllocatorDeleter{allocator_});
uint64_t* real_offsets;
if (count <= kStackThreshold) {
real_offsets = local_buf;
} else {
auto* ptr = static_cast<uint64_t*>(allocator_->Allocate(count * sizeof(uint64_t)));
heap_buf.reset(ptr);
real_offsets = heap_buf.get();
}There was a problem hiding this comment.
Good point — fixed in commit 97726ea. Now using allocator_->Allocate/Deallocate for consistency with the rest of the IO layer.
| static const uint32_t* | ||
| crc32_table() { | ||
| static const uint32_t table[256] = { |
There was a problem hiding this comment.
Using static const for the lookup table inside crc32_table() can be improved by using static constexpr. This guarantees compile-time initialization of the table and avoids any potential runtime overhead or thread-safety initialization guards (magic statics) that the compiler might otherwise generate.
| static const uint32_t* | |
| crc32_table() { | |
| static const uint32_t table[256] = { | |
| static const uint32_t* | |
| crc32_table() { | |
| static constexpr uint32_t table[256] = { |
There was a problem hiding this comment.
Done — changed to static constexpr uint32_t table[256] in the same commit (97726ea).
|
Hi @jfeng18, please check the review comments and resolve them. |
- ReaderIO::MultiReadImpl: use allocator_->Allocate/Deallocate instead of raw new[]/delete[] for consistency with the rest of the IO layer - CRC32 lookup table: use static constexpr for compile-time initialization Signed-off-by: 水芝 <fengjiangtian.fjt@alibaba-inc.com> Assisted-by: Claude:claude-opus-4-6
Summary
std::vector<uint64_t>heap allocation with a 1KB stack buffer for counts <= 128, withstd::unique_ptrheap fallback for larger counts to maintain exception safety.Test plan
🤖 Generated with Claude Code