Skip to content

WIP: feat(probe-subop): normalized-key probe AArch64 SVE on HashAgg#566

Open
LeiRui wants to merge 1 commit into
bytedance:mainfrom
LeiRui:pr-probe
Open

WIP: feat(probe-subop): normalized-key probe AArch64 SVE on HashAgg#566
LeiRui wants to merge 1 commit into
bytedance:mainfrom
LeiRui:pr-probe

Conversation

@LeiRui

@LeiRui LeiRui commented May 17, 2026

Copy link
Copy Markdown

What problem does this PR solve?

HashAgg with normalized-key hash tables can spend significant time in groupNormalizedKeyProbe on AArch64. This PR adds an opt-in SVE vectorized probe/insert path (17-byte slot layout + tag region), wired through the aggregation hash-table factory, with scalar parity tests and non-aarch64 link stubs. Join hash tables (createForJoin / joinProbe) are unchanged and stay on native Bolt.

Issue Number: N/A

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

What

  • Add SVE normalized-key probe for HashAgg: groupNormalizedKeyProbeSVE, insertForGroupBySve, 17B slot+tag table layout; scalar path uses BaseHashTable::NormalizedKeySlot (16B key+pointer, in HashTable.h).
  • Split SVE TU HashTableNormalizedKeyProbeSve.cpp (aarch64, -march=armv8-a+sve); stubs HashTableNormalizedKeyProbeStubs.cpp on other platforms for link safety.
  • HashTableSveRuntime.h: boltHashAggSveNormalizedKeyProbeEnabledFromEnv() + linuxAarch64RuntimeHasSve() (AT_HWCAP / HWCAP_SVE).
  • Extension hooks (default stock behavior): HashAggregation::createGroupingSetForHashAggregation, GroupingSet::createAggregationHashTable — override point for custom BaseHashTable / env-driven table type without copying HashAggregation.
  • NormalizedKeyMode: kNativeBolt (default) / kScalar (16B slots) / kSve (17B+tag); Join build forces kNativeBolt via createForJoin(..., sve=false) and isJoinBuild_.

Env opt-in (BOLT_HASH_AGG_SVE_NORMALIZED_KEY_PROBE)

Env value Aggregation hash table
unset, empty, or unknown offkNativeBolt (same as upstream default)
1 enable SVE path when Linux aarch64 + runtime SVE
true, yes, or on (ASCII, case-insensitive) same as 1
0 off
false, no, or off (ASCII, case-insensitive) same as 0

Executor example (Gluten / Spark): spark.executorEnv.BOLT_HASH_AGG_SVE_NORMALIZED_KEY_PROBE=1.

Registration / runtime chain (HashAgg only)

createForAggregation does not call updateNormalizedKeyModeForAggregation() directly; it only sets sveNormalizedKeyProbeRequested_ in the ctor. Mode is chosen later when the table switches to HashMode::kNormalizedKey via setHashMode (from checkSize / analyze).

GroupingSet::createAggregationHashTable()  [override hook]
  └─ sveFlag = boltHashAggSveNormalizedKeyProbeEnabledFromEnv()
        └─ HashTable::createForAggregation(..., sveFlag)
              └─ ctor: sveNormalizedKeyProbeRequested_ = sveFlag
                  (normalizedKeyMode_ stays kNativeBolt until setHashMode)

addInput → checkSize / analyze → setHashMode(kNormalizedKey)   [when applicable]
  └─ updateNormalizedKeyModeForAggregation()
        ├─ isJoinBuild_ || !sveNormalizedKeyProbeRequested_ → kNativeBolt
        ├─ linuxAarch64RuntimeHasSve() → kSve
        └─ else → kScalar

groupProbe (hashMode_ == kNormalizedKey)
  └─ groupNormalizedKeyProbe()
        ├─ kScalar  → groupNormalizedKeyProbeScalar
        ├─ kSve     → groupNormalizedKeyProbeSVE  (SVE TU)
        └─ kNativeBolt → ProbeState / fullProbe (existing)

Join (out of scope for SVE)

HashTable::createForJoin(..., sveNormalizedKeyProbeEnabled=false, isJoinBuild=true)
  → normalizedKeyMode_ = kNativeBolt always
joinProbe → joinNormalizedKeyProbe   (never groupNormalizedKeyProbeSVE)
Flowchart — HashAgg probe mode (mermaid — click to expand)
flowchart TD
  A[groupProbe + kNormalizedKey] --> B[groupNormalizedKeyProbe]
  B --> C{normalizedKeyMode_}
  C -->|kScalar| D[groupNormalizedKeyProbeScalar]
  C -->|kSve| E[groupNormalizedKeyProbeSVE]
  C -->|kNativeBolt| F[ProbeState / fullProbe]
  G[createForJoin / joinProbe] --> H[kNativeBolt + joinNormalizedKeyProbe]
Loading
File roles (HashTable + GroupingSet) (mermaid — click to expand)
flowchart TB
  subgraph hooks["Extension hooks"]
    HA["HashAggregation.h/cpp\ncreateGroupingSetForHashAggregation"]
    GS["GroupingSet.h/cpp\ncreateAggregationHashTable"]
  end

  subgraph hashtable["HashTable"]
    HT["HashTable.cpp / .h\ndispatch, allocateTables, insert"]
    RT["HashTableSveRuntime.h\nenv + HWCAP_SVE"]
    SVE["HashTableNormalizedKeyProbeSve.cpp\nSVE probe + insertForGroupBySve"]
    STUB["HashTableNormalizedKeyProbeStubs.cpp\nnon-aarch64"]
  end

  GS -->|env| RT
  GS --> HT
  HA --> GS
  HT --> SVE
  HT --> STUB
  classDef prNew fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000
  classDef prModified fill:#fff3cd,stroke:#ffc107,stroke-width:2px,color:#000
  class SVE,STUB,RT prNew
  class HT,HA,GS prModified
Loading

Legend: green = new; yellow = modified.

Step File / symbol Role
0 GroupingSet::createAggregationHashTable Reads env; calls createForAggregation(..., sveFlag). Override for plugin/custom table.
1 HashTableSveRuntime.h Env parser; linuxAarch64RuntimeHasSve() on Linux aarch64.
2 HashTable ctor Stores sveNormalizedKeyProbeRequested_ from createForAggregation(..., sveFlag).
2b setHashMode(kNormalizedKey)updateNormalizedKeyModeForAggregation Sets kSve / kScalar / kNativeBolt; Join build → always kNativeBolt.
3 allocateTables / clear / insertForGroupBy 16B vs 17B+tag vs F14 layout; SVE insert via insertForGroupBySve when kSve.
4 groupNormalizedKeyProbe Dispatches scalar / SVE / native probe.
5 HashTableNormalizedKeyProbeSve.cpp SVE intrinsics; not linked on non-aarch64.
6 HashTableNormalizedKeyProbeStubs.cpp Non-aarch64: SVE probe → scalar fallback; insert → BOLT_FAIL if called.
7 createForJoin Always sveNormalizedKeyProbeEnabled=false; joinProbe unchanged.

Tests

  • HashTableTest.boltHashAggSveNormalizedKeyProbeEnv — env parser (setenv / unset / 0 / 1)
  • HashTableTest.normalizedKeySveMatchesScalar#if __aarch64__ && __linux__: helper sets kSve vs kScalar, compares probe hits; GTEST_SKIP if no runtime SVE
make unittest_release TEST=bbolt_exec_test \
  TEST_ARGS='--gtest_filter=HashTableTest.boltHashAggSveNormalizedKeyProbeEnv:HashTableTest.normalizedKeySveMatchesScalar*'

Performance Impact

  • No Impact
  • Positive Impact: I have run benchmarks.
  • Negative Impact

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- HashAgg normalized-key probe: optional AArch64 SVE path (17B slot + tag layout).
- Opt-in: set BOLT_HASH_AGG_SVE_NORMALIZED_KEY_PROBE=1 (or true/yes/on) on executors; default off (native Bolt).
- Join hash tables (createForJoin / joinProbe) unchanged.
- Extension hooks on HashAggregation / GroupingSet for custom aggregation hash tables.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No
  • Yes (Description: ...)

@LeiRui LeiRui changed the title WIP: feat(probe-subop): HashAgg normalized-key probe AArch64 SVE on HashAgg WIP: feat(probe-subop): normalized-key probe AArch64 SVE on HashAgg May 18, 2026
Add env-gated BOLT_HASH_AGG_SVE_NORMALIZED_KEY_PROBE (default off) with 17B normalized-key table layout, SVE probe/insert TU on aarch64, non-aarch64 link stubs, HashAggregation/GroupingSet extension hooks, and HashTableTest parity coverage. Join hash tables (createForJoin / joinProbe) remain on native Bolt layout and probe; SVE applies only to HashAgg groupProbe.

Co-authored-by: Old-Li883 <lichenhao9@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant