Skip to content

Support jit execution of aggregate functions to improve performance of aggregate operator#660

Open
taiyang-li wants to merge 98 commits into
bytedance:mainfrom
taiyang-li:hash_aggr_jit_oss
Open

Support jit execution of aggregate functions to improve performance of aggregate operator#660
taiyang-li wants to merge 98 commits into
bytedance:mainfrom
taiyang-li:hash_aggr_jit_oss

Conversation

@taiyang-li

@taiyang-li taiyang-li commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Hash Aggregation JIT is a compilation optimization technique for hash aggregation operators in the Bolt engine. The core idea is to generate a JIT kernel for "similar numerical aggregation functions" in HashAgg (such as multiple sum(xi)), merging multiple virtual function calls with scattered small loops into a compact, vectorizable update loop, thereby significantly reducing CPU overhead.

Performance Impact

benchmark report

$ ./_build/Release/bolt/exec/benchmarks/bolt_hashaggr_jit_benchmark                      
============================================================================
[...]c/benchmarks/HashAggrJitBenchmark.cpp     relative  time/iter   iters/s
============================================================================
width4_merge_sum_nojit                                      3.86ms    259.40
width4_merge_sum_jit                                        3.67ms    272.25
----------------------------------------------------------------------------
width4_merge_avg_nojit                                      4.70ms    212.54
width4_merge_avg_jit                                        3.98ms    251.08
----------------------------------------------------------------------------
width4_merge_min_nojit                                      3.50ms    286.02
width4_merge_min_jit                                        3.57ms    280.20
----------------------------------------------------------------------------
width4_merge_max_nojit                                      3.64ms    274.82
width4_merge_max_jit                                        3.60ms    277.98
----------------------------------------------------------------------------
width4_merge_count_nojit                                    3.56ms    281.03
width4_merge_count_jit                                      3.04ms    328.56
----------------------------------------------------------------------------
width8_merge_sum_nojit                                      6.38ms    156.76
width8_merge_sum_jit                                        5.00ms    199.99
----------------------------------------------------------------------------
width8_merge_avg_nojit                                      7.59ms    131.80
width8_merge_avg_jit                                        6.40ms    156.14
----------------------------------------------------------------------------
width8_merge_min_nojit                                      5.39ms    185.63
width8_merge_min_jit                                        4.93ms    202.81
----------------------------------------------------------------------------
width8_merge_max_nojit                                      5.50ms    181.70
width8_merge_max_jit                                        4.89ms    204.56
----------------------------------------------------------------------------
width8_merge_count_nojit                                    5.81ms    172.17
width8_merge_count_jit                                      3.71ms    269.33
----------------------------------------------------------------------------
width16_merge_sum_nojit                                    11.54ms     86.69
width16_merge_sum_jit                                       8.10ms    123.43
----------------------------------------------------------------------------
width16_merge_avg_nojit                                    14.83ms     67.45
width16_merge_avg_jit                                      11.60ms     86.18
----------------------------------------------------------------------------
width16_merge_min_nojit                                    10.15ms     98.50
width16_merge_min_jit                                       8.40ms    119.07
----------------------------------------------------------------------------
width16_merge_max_nojit                                    10.26ms     97.46
width16_merge_max_jit                                       8.10ms    123.48
----------------------------------------------------------------------------
width16_merge_count_nojit                                   9.79ms    102.12
width16_merge_count_jit                                     5.46ms    182.99
----------------------------------------------------------------------------
width32_merge_sum_nojit                                    22.10ms     45.24
width32_merge_sum_jit                                      15.79ms     63.34
----------------------------------------------------------------------------
width32_merge_avg_nojit                                    27.53ms     36.32
width32_merge_avg_jit                                      22.57ms     44.31
----------------------------------------------------------------------------
width32_merge_min_nojit                                    19.51ms     51.26
width32_merge_min_jit                                      15.89ms     62.93
----------------------------------------------------------------------------
width32_merge_max_nojit                                    20.37ms     49.09
width32_merge_max_jit                                      15.87ms     63.03
----------------------------------------------------------------------------
width32_merge_count_nojit                                  18.95ms     52.78
width32_merge_count_jit                                    10.56ms     94.69
----------------------------------------------------------------------------
width4_high_card_merge_sum_nojit                           43.30ms     23.09
width4_high_card_merge_sum_jit                             39.52ms     25.30
----------------------------------------------------------------------------
width4_high_card_merge_avg_nojit                           54.65ms     18.30
width4_high_card_merge_avg_jit                             50.12ms     19.95
----------------------------------------------------------------------------
width4_high_card_merge_min_nojit                           41.63ms     24.02
width4_high_card_merge_min_jit                             37.68ms     26.54
----------------------------------------------------------------------------
width4_high_card_merge_max_nojit                           42.74ms     23.40
width4_high_card_merge_max_jit                             38.78ms     25.78
----------------------------------------------------------------------------
width4_high_card_merge_count_nojit                         39.19ms     25.51
width4_high_card_merge_count_jit                           40.49ms     24.70
----------------------------------------------------------------------------
width8_high_card_merge_sum_nojit                           62.27ms     16.06
width8_high_card_merge_sum_jit                             51.89ms     19.27
----------------------------------------------------------------------------
width8_high_card_merge_avg_nojit                           85.79ms     11.66
width8_high_card_merge_avg_jit                             80.49ms     12.42
----------------------------------------------------------------------------
width8_high_card_merge_min_nojit                           61.46ms     16.27
width8_high_card_merge_min_jit                             53.26ms     18.78
----------------------------------------------------------------------------

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

liyang.127 added 26 commits June 23, 2026 16:53
    Derive chunk function names from compact hashed slot descriptions and avoid storing derived function-name members. Simplify JIT planning debug output to reuse slot and chunk descriptions.
Keep only planned JIT chunks on GroupingSet and allocate transient add/extract runtimes locally so per-call state does not persist on the operator.
Submit each chunk's codegen to the global CPU executor instead of
compiling synchronously on the first batch. Chunks start not-ready
(std::atomic<bool> ready_) and the query thread falls back to the
non-JIT path until compilation completes, then switches to JIT. Also
removes benchmark warmup so first-batch compile latency is measured.
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


liyang.127 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants