Skip to content

perf(io): optimize CRC32 with lookup table and reduce ReaderIO heap allocation#2158

Open
jfeng18 wants to merge 2 commits into
antgroup:mainfrom
jfeng18:fix/io-storage-optimizations
Open

perf(io): optimize CRC32 with lookup table and reduce ReaderIO heap allocation#2158
jfeng18 wants to merge 2 commits into
antgroup:mainfrom
jfeng18:fix/io-storage-optimizations

Conversation

@jfeng18

@jfeng18 jfeng18 commented Jun 8, 2026

Copy link
Copy Markdown

Summary

  • CRC32 checksum: replace bit-by-bit implementation (8 branch-iterations per byte) with a 256-entry lookup table (1 table lookup per byte), ~8x speedup for serialization/deserialization checksum computation. Also fixes a latent sign-extension bug for non-ASCII bytes.
  • ReaderIO::MultiReadImpl: replace std::vector<uint64_t> heap allocation with a 1KB stack buffer for counts <= 128, with std::unique_ptr heap fallback for larger counts to maintain exception safety.

Test plan

  • Footer tests: 21 assertions passed
  • ReaderIO tests: 1,031 assertions passed
  • Full unit test suite: 441 cases / 85M assertions passed (release)
  • Adversarial review: CRC32 table verified against Python-generated reference (256 entries, all match), known CRC32 check value 0xCBF43926 confirmed
  • CI format/lint check

🤖 Generated with Claude Code

…llocation in ReaderIO

- Replace bit-by-bit CRC32 implementation (8 iterations per byte) with a
  256-entry lookup table (1 iteration per byte), resolving the existing
  TODO(wxyu) for performance optimization
- Also fix a latent sign-extension bug in the old calculate_checksum:
  char was implicitly sign-extended before XOR, producing non-standard
  CRC32 for bytes >= 0x80; now correctly cast to uint8_t (backward
  compatible since checksummed metadata is ASCII-only JSON)
- ReaderIO::MultiReadImpl: replace std::vector heap allocation with a
  stack buffer (1KB) for counts <= 128, with std::unique_ptr heap
  fallback for larger counts to maintain exception safety

Signed-off-by: 水芝 <fengjiangtian.fjt@alibaba-inc.com>
Assisted-by: Claude:claude-opus-4-6
@jfeng18

jfeng18 commented Jun 8, 2026

Copy link
Copy Markdown
Author

Hi maintainers, could you please add kind/improvement and version/1.0 labels? I don't have label permissions as a fork contributor. Thanks!

@mergify

mergify Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.
  • label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.
  • label~=^version/

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes performance in ReaderIO::MultiReadImpl by using a stack-allocated buffer for small offset counts and optimizes Footer::calculate_checksum by implementing a lookup-table-based CRC32 calculation. The review feedback suggests using the custom allocator_ for heap allocations in ReaderIO to maintain consistent memory tracking, and using static constexpr for the CRC32 lookup table to guarantee compile-time initialization.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/io/reader_io.cpp
Comment on lines +95 to +104
constexpr uint64_t kStackThreshold = 128;
uint64_t local_buf[kStackThreshold];
std::unique_ptr<uint64_t[]> heap_buf;
uint64_t* real_offsets;
if (count <= kStackThreshold) {
real_offsets = local_buf;
} else {
heap_buf.reset(new uint64_t[count]);
real_offsets = heap_buf.get();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation uses new uint64_t[count] to allocate memory on the heap when count exceeds kStackThreshold. This bypasses the custom allocator_ provided to ReaderIO, which is used for memory tracking, custom alignment, or pool allocation in the vsag library. To maintain consistency and ensure proper memory management, please use the custom allocator_ for heap allocations.

    constexpr uint64_t kStackThreshold = 128;
    uint64_t local_buf[kStackThreshold];
    struct AllocatorDeleter {
        Allocator* allocator;
        void operator()(uint64_t* ptr) const {
            allocator->Deallocate(ptr);
        }
    };
    std::unique_ptr<uint64_t[], AllocatorDeleter> heap_buf(nullptr, AllocatorDeleter{allocator_});
    uint64_t* real_offsets;
    if (count <= kStackThreshold) {
        real_offsets = local_buf;
    } else {
        auto* ptr = static_cast<uint64_t*>(allocator_->Allocate(count * sizeof(uint64_t)));
        heap_buf.reset(ptr);
        real_offsets = heap_buf.get();
    }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — fixed in commit 97726ea. Now using allocator_->Allocate/Deallocate for consistency with the rest of the IO layer.

Comment thread src/storage/serialization.h Outdated
Comment on lines +159 to +161
static const uint32_t*
crc32_table() {
static const uint32_t table[256] = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using static const for the lookup table inside crc32_table() can be improved by using static constexpr. This guarantees compile-time initialization of the table and avoids any potential runtime overhead or thread-safety initialization guards (magic statics) that the compiler might otherwise generate.

Suggested change
static const uint32_t*
crc32_table() {
static const uint32_t table[256] = {
static const uint32_t*
crc32_table() {
static constexpr uint32_t table[256] = {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — changed to static constexpr uint32_t table[256] in the same commit (97726ea).

@wxyucs wxyucs added version/1.0 kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) labels Jun 8, 2026
@wxyucs

wxyucs commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Hi @jfeng18, please check the review comments and resolve them.

- ReaderIO::MultiReadImpl: use allocator_->Allocate/Deallocate instead
  of raw new[]/delete[] for consistency with the rest of the IO layer
- CRC32 lookup table: use static constexpr for compile-time initialization

Signed-off-by: 水芝 <fengjiangtian.fjt@alibaba-inc.com>
Assisted-by: Claude:claude-opus-4-6
@jfeng18

jfeng18 commented Jun 8, 2026

Copy link
Copy Markdown
Author

Hi @wxyucs, review comments addressed in commit 97726ea:

  • Use allocator_ (high): Fixed — now uses allocator_->Allocate/Deallocate instead of raw new[]/delete[]
  • static constexpr (medium): Fixed — changed to static constexpr uint32_t table[256]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) size/M version/1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants