perf(bigint): speed up large BigInt x small scalar multiplication by mizchi · Pull Request #3620 · moonbitlang/core

mizchi · 2026-05-23T09:20:54Z

Summary

BigInt::mul currently dispatches into two general-purpose multiplication routines:

grade_school_mul - the O(n * m) classical algorithm: a nested loop over both operands' limbs with per-iteration carry propagation and a j < other_len || carry != 0 guard. Used when at least one operand has fewer than karatsuba_threshold (= 50) limbs.
karatsuba_mul - the recursive O(n^log2(3)) algorithm: splits both operands in half, computes three half-size products, and recombines via the Karatsuba identity. Used when both operands cross the threshold.

For asymmetric cases where one operand fits in a single radix limb, neither routine is ideal. grade_school_mul still pays the general nested-loop overhead, and karatsuba_mul is never reached because the 1-limb operand is below karatsuba_threshold.

This PR adds a mul_single_limb fast path that runs a single carry-propagating loop in O(self.len). The dispatch in Mul::mul checks both operands, so the fast path fires regardless of operand order.

Real-world impact

This is not specific to factorial. It optimizes the common shape where a large BigInt is repeatedly multiplied by a machine-word-sized scalar.

One concrete core-library path is arbitrary-radix parsing. For non-decimal / non-power-of-two radices, the value is naturally built as:

acc = acc * base + digit

Here base <= 36, so it is a 1-limb BigInt, while acc grows with the input length. Long base-3/base-5/base-36 inputs therefore repeatedly hit the n-limb * 1-limb case optimized by this PR.

The same shape appears in user code for exact combinatorics and product accumulation, for example factorials, permutations, binomial/product formulas, and loops of the form:

acc = acc * k

where acc grows beyond one limb but k remains a small integer.

The profiler found this because, in the factorial-style workload, grade_school_mul dominated wasm-gc self time. The new fast path removes the general nested-loop overhead for the asymmetric case and turns it into a single carry-propagating pass over the large operand. Workloads with this scalar-growth profile therefore see whole-program speedups on wasm, wasm-gc, and native backends, while balanced n * n multiplication and the JS backend remain effectively unchanged.

Benchmarks

moonbit 0.1.20260522 + this patch, Linux x86_64, wall time (3-run median, no GuestProfiler).

factorial(800) (100 iterations), which repeatedly multiplies a growing accumulator by a 1-limb integer. Scenario: bench/cmd/bigint_ops/main.mbt.

backend	baseline	patched	delta
wasm	255.1 ms	80.2 ms	-68.6%
wasm-gc	93.5 ms	25.2 ms	-73.0%
native	48.9 ms	20.0 ms	-59.1%
js	21.2 ms	22.3 ms	noise

JS is unchanged because the JS backend transpiles BigInt to V8's native BigInt rather than going through grade_school_mul.

Balanced-multiplication probe (repeated squaring of a 30-digit seed x 11 iterations, both operands grow together so the existing karatsuba_mul path is exercised) confirms the n * n path is not regressed. Scenario: bench/cmd/bigint_square/main.mbt.

backend	baseline	patched	delta
wasm	257.1 ms	271.9 ms	noise
wasm-gc	84.8 ms	78.2 ms	-7.8%
native	57.0 ms	58.0 ms	noise

Test results

moon test against this branch on all four targets (full core suite):

target	result
wasm	6500 / 6500 pass
wasm-gc	6500 / 6500 pass
js	6459 / 6459 pass
native	6411 / 6411 pass

Notes on the helper

mul_single_limb returns sign: Positive regardless of self.sign. This matches the existing convention of grade_school_mul and karatsuba_mul: they return magnitude-only results, and Mul::mul overwrites sign with the combined sign of the operands at the dispatch site. The doc comment on mul_single_limb spells this out.

Add a mul_single_limb fast path that runs a single carry-propagating loop in O(self.len). The dispatch in Mul::mul checks both operands so the fast path fires regardless of operand order. For factorial(800)-style chains where one operand is always 1 limb: wasm : 255.1 -> 80.2 ms (-68.6%) wasm-gc : 93.5 -> 25.2 ms (-73.0%) native : 48.9 -> 20.0 ms (-59.1%) The n*n path (Karatsuba) is untouched; bigint_square is within noise.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR optimizes BigInt multiplication for the common “n-limb × 1-limb” case (e.g., factorial-style multiplication chains) by adding a specialized fast path and helper routine.

Changes:

Adds a len == 1 fast path in Mul::mul to avoid the general grade-school loop overhead.
Introduces BigInt::mul_single_limb to efficiently multiply by a single radix limb.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mizchi · 2026-05-23T09:48:01Z

+    limbs[n] = carry.to_uint()
+    n + 1
+  }
+  { limbs, sign: Positive, len }


Documented in ada88f8. The magnitude-only contract here mirrors grade_school_mul (line 447 in this file returns sign: Positive the same way) and karatsuba_mul — Mul::mul always overwrites sign with the combined sign of the operands at the dispatch site. Renaming to mul_single_limb_abs would break the naming symmetry with those other helpers, so I kept the name and added a doc comment that spells out the contract and warns against direct callers needing a signed product.

mizchi · 2026-05-23T09:48:05Z

+  let mut carry = 0UL
+  for i in 0..<n {
+    let product = self.limbs[i].to_uint64() * xq + carry
+    limbs[i] = (product & radix_mask).to_uint()
+    carry = product >> radix_bit_len


radix_mask and radix_bit_len are already in the right types: let radix_mask : UInt64 = radix - 1 (line 89) and let radix_bit_len = 32 (line 81, Int, which is the expected shift-width type for UInt64 >> Int in moonbit). So product & radix_mask is UInt64 & UInt64 and product >> radix_bit_len is UInt64 >> Int, no implicit conversion. The existing grade_school_mul, karatsuba_mul, and div helpers in this same file all use these constants the same way (see e.g. lines 595, 597, 670–688), so the new code follows established convention and stays in sync if those constants are ever retyped.

Address review comment: mul_single_limb always returns sign: Positive, which is the same convention as grade_school_mul and karatsuba_mul -- Mul::mul overwrites sign with the combined sign of the operands. Spell out that contract in the doc so the helper is not mistaken for a general signed scalar multiply.

Copilot AI review requested due to automatic review settings May 23, 2026 09:20

Copilot AI reviewed May 23, 2026

View reviewed changes

mizchi added 5 commits May 23, 2026 18:47

Merge branch 'main' into pr-bigint-mul-single-limb

f57187e

Merge branch 'main' into pr-bigint-mul-single-limb

451a442

Merge branch 'main' into pr-bigint-mul-single-limb

074616a

Merge branch 'main' into pr-bigint-mul-single-limb

de08ec0

mizchi changed the title ~~perf(bigint): specialize (n-limb) x (1-limb) multiplication~~ perf(bigint): speed up large BigInt x small scalar multiplication May 27, 2026

mizchi added 2 commits May 28, 2026 01:00

Merge branch 'main' into pr-bigint-mul-single-limb

4a5fc0b

Merge branch 'main' into pr-bigint-mul-single-limb

d580abe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(bigint): speed up large BigInt x small scalar multiplication#3620

perf(bigint): speed up large BigInt x small scalar multiplication#3620
mizchi wants to merge 8 commits into
moonbitlang:mainfrom
mizchi:pr-bigint-mul-single-limb

mizchi commented May 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

mizchi May 23, 2026

Uh oh!

mizchi May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mizchi commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Real-world impact

Benchmarks

Test results

Notes on the helper

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

mizchi May 23, 2026

Choose a reason for hiding this comment

Uh oh!

mizchi May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mizchi commented May 23, 2026 •

edited

Loading