refactor(simd): unify cross-ISA implementations via traits + kernel templates by LHT129 · Pull Request #2156 · antgroup/vsag

LHT129 · 2026-06-08T03:13:56Z

Summary

Resolves #2131

Introduce a type-traits abstraction layer and shared kernel templates to eliminate
copy-paste SIMD implementations across ISAs (SSE/AVX/AVX2/AVX512/NEON).

Key changes:

Add per-ISA traits headers (src/simd/traits/simd_traits_*.h) exposing a uniform
static interface over each ISA's native vector types
Add 17 kernel templates (src/simd/kernels/*.h) that implement algorithms once
in terms of traits, replacing 5-way duplicated hand-written intrinsics
Covered function families: FP32 compute/batch/binary-ops/reduce, SQ8/SQ4 quantized
compute, SQ4/SQ8 uniform code IP, INT8 compute, BF16/FP16 half-precision compute,
bit operations, normalize, RaBitQ binary/batch/split-code IP, butterfly/rotate,
and scalar ops

Net effect:

ISA implementation files shrink from ~11,049 to ~4,628 lines (-58%)
New SIMD function cost drops from ~7 files x 30 lines to 1 kernel template + 0-2 trait extensions
Zero runtime overhead: all traits are always_inline, verified via CPU-pinned benchmarks

Out of scope: SVE (scalable vectors), AMX (tile/matrix model), simd_dispatch.h (unchanged)

Test plan

All 51 SIMD unit test cases pass (30,591,891 assertions)
CPU-pinned (taskset -c 0) benchmarks show no regression on AVX2/AVX512 paths
Release build compiles cleanly with all ISA flags

mergify · 2026-06-08T03:14:32Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Require kind label

Wonderful, this rule succeeded.

label~=^kind/

🟢 Require version label

Wonderful, this rule succeeded.

label~=^version/

gemini-code-assist

Code Review

This pull request refactors the SIMD codebase by introducing a unified, template-based kernel architecture across all instruction set architectures (ISAs), replacing duplicate ISA-specific implementations with common templates parameterized by SIMD traits. While this significantly improves code maintainability, several critical issues were identified during the review. Specifically, the AVX trait implementation of load_u8_as_float incorrectly uses _mm_set_epi8 to load from a pointer, which will corrupt data. Additionally, multiple fallback functions in avx.cpp and avx2.cpp bypass intermediate SIMD tiers (e.g., falling back directly to generic or sse instead of avx), breaking the optimization hierarchy and risking performance regressions on tail data. Furthermore, RotateOpImpl in butterfly.h assumes the step size is a multiple of the vector width, potentially leaving tail elements unprocessed, and the NEON trait implementation of load_nibbles_2x_as_float relies on inefficient scalar operations that should be vectorized.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

wxyucs

Thanks for the large SIMD cleanup. The traits/kernel direction is good, but I found two correctness issues in the new NEON path that should be fixed before merge. I’m leaving the fallback-tier/perf-only items out of the blocking list; these two can produce wrong results or unsafe reads.

Copilot

Pull request overview

This PR refactors the SIMD layer to eliminate cross-ISA duplication by introducing per-ISA trait specializations and shared, trait-parameterized kernel templates. ISA-specific .cpp files become thin wrappers that select the appropriate traits and preserve the existing fallback cascade (e.g., AVX512 → AVX2 → AVX → SSE → generic; NEON → generic).

Changes:

Added ISA traits headers (src/simd/traits/simd_traits_*.h) to provide a uniform, static interface over each ISA’s vector types and operations.
Added shared kernel templates (src/simd/kernels/*.h) implementing core FP32, INT8, SQ4/SQ8, BF16/FP16, bit-ops, normalization, and RaBitQ routines once in terms of traits.
Updated SIMD backends (generic.cpp, sse.cpp, avx*.cpp, neon.cpp) to call the shared kernels with the correct trait specialization and fallback.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/simd/generic.cpp	Switches generic implementations to trait-based kernels (no-ISA baseline + fallback root).
src/simd/sse.cpp	Refactors SSE backend to shared kernels via `SimdTraits<SSE_Tag>` and related trait families.
src/simd/avx.cpp	Refactors AVX backend to shared kernels via `SimdTraits<AVX_Tag>` and AVX-specific traits.
src/simd/avx2.cpp	Refactors AVX2 backend to shared kernels via `SimdTraits<AVX2_Tag>` and AVX2-specific traits.
src/simd/avx512.cpp	Refactors AVX512 backend to shared kernels via `SimdTraits<AVX512_Tag>` and AVX512-specific traits.
src/simd/neon.cpp	Refactors NEON backend to shared kernels via `SimdTraits<NEON_Tag>` and NEON-specific traits.
src/simd/traits/simd_traits_generic.h	Defines generic (scalar) traits and primary templates for all trait families.
src/simd/traits/simd_traits_sse.h	Adds SSE traits for FP32/INT8/bit-ops/SQ4/SQ8/BF16/uniform-code ops.
src/simd/traits/simd_traits_avx.h	Adds AVX traits for FP32/bit-ops/SQ4/SQ8/BF16/FP16/RaBitQ ops.
src/simd/traits/simd_traits_avx2.h	Adds AVX2 trait overrides and AVX2-specific INT8/bit/SQ4/SQ8/BF16/FP16/uniform/RaBitQ traits.
src/simd/traits/simd_traits_avx512.h	Adds AVX512 traits for FP32/INT8/bit/SQ4/SQ8/BF16/FP16/uniform/RaBitQ ops.
src/simd/traits/simd_traits_neon.h	Adds NEON traits for FP32/INT8/bit/SQ4/SQ8/BF16/FP16 ops.
src/simd/kernels/kernels.h	Aggregator header for all kernel templates.
src/simd/kernels/compute_ip.h	Trait-parameterized FP32 inner-product kernel with optional unroll and fallback.
src/simd/kernels/compute_l2.h	Trait-parameterized FP32 L2-squared kernel with optional unroll and fallback.
src/simd/kernels/compute_batch4.h	Batch-of-4 IP/L2 kernels to reuse query loads across 4 code vectors.
src/simd/kernels/binary_op.h	Shared FP32 elementwise binary ops (add/sub/mul/div).
src/simd/kernels/reduce_add.h	Shared FP32 reduce-add kernel with fallback for sub-vector tails.
src/simd/kernels/pq_distance.h	Shared PQ distance-table update kernel (256 centers).
src/simd/kernels/int8_compute.h	Shared INT8 IP/L2 kernels parameterized on `Int8Traits`.
src/simd/kernels/half_compute.h	Shared BF16/FP16 IP/L2 kernels parameterized on half-traits.
src/simd/kernels/sq8_compute.h	Shared SQ8 quantized compute kernels (IP/L2/codes-IP/codes-L2).
src/simd/kernels/sq4_compute.h	Shared SQ4 quantized compute kernels (IP/L2/codes-IP/codes-L2).
src/simd/kernels/sq8_uniform_compute.h	Shared SQ8 “uniform code” integer-IP kernel via `UniformCodeTraits`.
src/simd/kernels/sq4_uniform_compute.h	Shared SQ4 “uniform code” integer-IP kernel via `UniformCodeTraits`.
src/simd/kernels/bit_op.h	Shared bitwise AND/OR/XOR/NOT kernels via `BitTraits`.
src/simd/kernels/scalar_op.h	Shared scalar-broadcast kernels (DivScalar, VecRescale) via `SimdTraits`.
src/simd/kernels/butterfly.h	Shared butterfly kernels for FHT (RotateOp, KacsWalk) via `SimdTraits`.
src/simd/kernels/normalize.h	Shared NormalizeWithCentroid / inverse normalization kernels via `SimdTraits`.
src/simd/kernels/rabitq_compute.h	Shared RaBitQ float binary/split-code kernels via `RaBitQTraits`.

…emplates Introduce a type-traits abstraction layer and shared kernel templates to eliminate copy-paste SIMD implementations across ISAs (SSE/AVX/AVX2/AVX512/NEON). Changes: - Add per-ISA traits headers (traits/simd_traits_*.h) exposing a uniform static interface (load/store/fmadd/reduce_add/etc.) over each ISA's native vector types. - Add 17 kernel templates (kernels/*.h) that implement the algorithm once in terms of traits, replacing 5-way duplicated hand-written intrinsics. - Covered function families: FP32 compute/batch/binary-ops/reduce, SQ8/SQ4 quantized compute, SQ4/SQ8 uniform code IP, INT8 compute, BF16/FP16 half-precision compute, bit operations, normalize, RaBitQ binary/batch/split-code IP, butterfly/rotate, and scalar ops. - SVE and AMX are explicitly out of scope (different vector model). Net effect: ISA implementation files shrink from ~11,049 to ~4,628 lines (-58%). New SIMD function cost drops from ~7 files x 30 lines to 1 kernel template + 0-2 trait extensions. All 51 SIMD test cases pass (30,591,891 assertions). Signed-off-by: LHT129 <tianlan.lht@antgroup.com>

LHT129 requested review from inabao, jiaweizone and wxyucs as code owners June 8, 2026 03:13

LHT129 self-assigned this Jun 8, 2026

pull-request-size Bot added the size/XXL label Jun 8, 2026

LHT129 added kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) version/1.0 labels Jun 8, 2026

mergify Bot added the module/simd label Jun 8, 2026

gemini-code-assist Bot reviewed Jun 8, 2026

View reviewed changes

wxyucs requested changes Jun 8, 2026

View reviewed changes

Comment thread src/simd/kernels/butterfly.h Outdated

Comment thread src/simd/traits/simd_traits_neon.h Outdated

LHT129 force-pushed the 2026-06-01-重构-simd-层用模板消除跨isa重复实现 branch from 64b4e79 to f8fad58 Compare June 8, 2026 05:01

Copilot AI review requested due to automatic review settings June 8, 2026 07:02

LHT129 force-pushed the 2026-06-01-重构-simd-层用模板消除跨isa重复实现 branch from f8fad58 to 6f681b4 Compare June 8, 2026 07:02

Copilot started reviewing on behalf of LHT129 June 8, 2026 07:02 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Comment thread src/simd/kernels/int8_compute.h

Comment thread src/simd/kernels/int8_compute.h

Comment thread src/simd/traits/simd_traits_avx.h

Comment thread src/simd/kernels/rabitq_compute.h

Comment thread src/simd/traits/simd_traits_avx.h

LHT129 requested a review from wxyucs June 8, 2026 07:17

LHT129 force-pushed the 2026-06-01-重构-simd-层用模板消除跨isa重复实现 branch from 6f681b4 to 5c5540e Compare June 8, 2026 07:27

Conversation

LHT129 commented Jun 8, 2026

Summary

Test plan

Uh oh!

mergify Bot commented Jun 8, 2026

Merge Protections

🟢 Require kind label

🟢 Require version label

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wxyucs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants