Skip to content

perf: BytesView pattern matching for hash + benchmarks#3278

Open
bobzhang wants to merge 3 commits into
mainfrom
hongbo/perf
Open

perf: BytesView pattern matching for hash + benchmarks#3278
bobzhang wants to merge 3 commits into
mainfrom
hongbo/perf

Conversation

@bobzhang

Copy link
Copy Markdown
Contributor

Summary

  • Rewrites Hasher::combine_bytes and Hash for BytesView to use [u32le(x), .. rest] pattern matching instead of manual index arithmetic with endian32
  • Removes the now-unused endian32 helper function
  • Adds hash benchmarks for Bytes and BytesView at 16B / 256B / 4KB sizes

Benchmark results (native)

The pattern matching version is currently ~3x slower than the original:

Size Before (manual index) After (pattern match)
Bytes 16B 0.18 µs 0.33 µs
Bytes 256B 1.41 µs 4.09 µs
Bytes 4KB 20.71 µs 64.20 µs
BytesView 16B 0.16 µs 0.32 µs
BytesView 256B 1.41 µs 4.09 µs
BytesView 4KB 20.59 µs 64.37 µs

This is filed as a compiler optimization opportunity — the [u32le(x), .. rest] view pattern should ideally compile down to the same code as manual indexing, but currently creates a new BytesView on each loop iteration.

Test plan

  • All 2666 builtin tests pass
  • Benchmarks run successfully on native target

🤖 Generated with Claude Code

@bobzhang bobzhang requested a review from Yu-zh March 13, 2026 08:52

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@Yu-zh

Yu-zh commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Updated the PR branch to commit 1f0ea7c6 with the u64le version of the BytesView-pattern hasher loop.

Benchmarked with:

moon version
# moon 0.1.20260423 (48eeec5 2026-04-23)
moon bench --target native -p moonbitlang/core/bench

Results on native:

case baseline manual u32le pattern u64le pattern
Bytes 16B 144.35 ns 131.91 ns 133.52 ns
Bytes 256B 1.25 us 1.08 us 951.93 ns
Bytes 4KB 18.22 us 15.50 us 13.72 us
BytesView 16B 135.45 ns 120.46 ns 117.89 ns
BytesView 256B 1.24 us 1.07 us 957.91 ns
BytesView 4KB 18.17 us 15.52 us 13.82 us

Also checked semantics with:

moon test builtin/hasher_test.mbt --target native
# Total tests: 18, passed: 18, failed: 0.

So on this toolchain the u64le variant improves the larger inputs further than the u32le pattern version while preserving the existing hash test results.

@Yu-zh Yu-zh marked this pull request as draft April 29, 2026 03:55
@Yu-zh

Yu-zh commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

I noticed regression on Js backend so I just made this draft.

@Yu-zh

Yu-zh commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

The problem with js/wasm-gc backend is caused by the unnecessary view wrapper compared with the original implementation. Should we make the optimization platform specific? @bobzhang

@Yu-zh

Yu-zh commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Updated the PR branch to commit bda8c1f6 with the u32le version of the Bytes/BytesView hasher loop, rebased onto origin/main (22dbc697).

Benchmarked with:

moon version
# moon 0.1.20260506 (b2155af 2026-05-06)
moonc -v
# v0.9.1+a6975c97d-dev (2026-05-07)
moon bench --target <backend> -p moonbitlang/core/bench

Results, comparing baseline/manual, u32le, and u64le:

backend case baseline u32le u64le
native Bytes 16B 17.20 ns 17.07 ns 16.74 ns
native Bytes 256B 125.95 ns 116.33 ns 129.58 ns
native Bytes 4KB 2.23 us 2.06 us 2.34 us
native BytesView 16B 17.10 ns 16.94 ns 17.08 ns
native BytesView 256B 126.72 ns 115.55 ns 134.92 ns
native BytesView 4KB 2.21 us 2.05 us 2.35 us
wasm-gc Bytes 16B 19.91 ns 20.02 ns 19.54 ns
wasm-gc Bytes 256B 157.60 ns 159.22 ns 155.70 ns
wasm-gc Bytes 4KB 2.24 us 2.24 us 2.52 us
wasm-gc BytesView 16B 18.44 ns 18.50 ns 18.04 ns
wasm-gc BytesView 256B 153.89 ns 154.80 ns 184.11 ns
wasm-gc BytesView 4KB 2.23 us 2.24 us 2.51 us
wasm Bytes 16B 53.65 ns 49.88 ns 49.37 ns
wasm Bytes 256B 339.65 ns 315.03 ns 244.71 ns
wasm Bytes 4KB 4.98 us 4.53 us 3.04 us
wasm BytesView 16B 50.42 ns 43.99 ns 45.36 ns
wasm BytesView 256B 336.39 ns 235.41 ns 216.29 ns
wasm BytesView 4KB 3.48 us 2.92 us 3.02 us
js Bytes 16B 19.14 ns 19.08 ns 285.45 ns
js Bytes 256B 137.35 ns 135.39 ns 4.19 us
js Bytes 4KB 2.24 us 2.24 us 68.05 us
js BytesView 16B 12.99 ns 13.52 ns 274.72 ns
js BytesView 256B 128.58 ns 127.24 ns 4.19 us
js BytesView 4KB 2.24 us 2.15 us 67.57 us

Takeaway: u32le is the better cross-backend choice. It improves native medium/large, is neutral on wasm-gc and js, and improves wasm. u64le helps wasm, but regresses native medium/large and is catastrophic on js due to the generated BigInt path.

Checked semantics/build after rebasing onto origin/main:

moon test builtin/hasher_test.mbt --target native
# Total tests: 18, passed: 18, failed: 0.
moon check --target all
# passed, existing warnings only

@Yu-zh Yu-zh marked this pull request as ready for review May 8, 2026 09:55
Copilot AI review requested due to automatic review settings May 8, 2026 10:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

@coveralls

Copy link
Copy Markdown
Collaborator

Coverage Report for CI Build 4242

Coverage decreased (-0.002%) to 94.613%

Details

  • Coverage decreased (-0.002%) from the base build.
  • Patch coverage: 14 of 14 lines across 1 file are fully covered (100%).
  • No coverage regressions found.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 15668
Covered Lines: 14824
Line Coverage: 94.61%
Coverage Strength: 219015.07 hits per line

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants