perf(string): drop per-call Iter alloc in to_lower by mizchi · Pull Request #3635 · moonbitlang/core

mizchi · 2026-05-26T13:39:22Z

Summary

String::to_lower and StringView::to_lower both fall into a
for c in self.view(start_offset=idx) loop after the all-lowercase
fast-path guard. On the native target, that for-loop desugars into a
heap-allocated Iter (a closure + size_hint pair) — one per call.
Combined with the StringBuilder and result String they already
allocate, every uppercase-containing string passed through to_lower
costs one extra heap object.

Replace the iterator loop with a plain UTF-16 code-unit loop over the
view via unsafe_get. The conversion is unchanged (still ASCII-only
per the existing TODO); any non-ASCII unit — including unpaired
surrogate halves — is written through bit-identically because
write_char of a code unit ≤ 0xFFFF emits exactly one UTF-16 unit.

Numbers

Measured by running mizchi/pprof-mbt's memprofile-native against
moonbitlang/async's http_server_benchmark
under wrk -t 8 -c 128 -d 8s (~108 k requests served, ~4 headers each
→ ~430 k to_lower calls):

metric	before	after	Δ
total allocations	14 381 000	12 607 200	−12.3 % (−1.77 M)
total bytes	473.67 MB	459.68 MB	−3.0 % (−14 MB)
`to_lower` attribution	8.44 MB / 466 100 allocs	gone from top sites	—
`StringView::iter2` (mostly the to_lower loop)	8.04 MB / 469 300	4.0 % / 434 500	partial

StringBuilder::to_string and the StringBuilder itself are still
allocated; only the per-iter overhead goes away.

Why this is safe

The original iterator decoded the StringView into Char values
(BMP and surrogate-pair-merged), then write_char would re-emit
them — surrogate pairs as two code units, BMP chars as one.

The new loop iterates UTF-16 code units directly and writes each
through write_char of the same code unit. Since every code unit
(including surrogate halves 0xD800..0xDFFF) is ≤ 0xFFFF, the
<= 0xFFFFU arm of write_char runs and writes the unit verbatim.
Output is byte-identical to the original on all inputs.

ASCII uppercase detection is the same 0x41..=0x5A range, just
expressed as raw code-unit comparison instead of Char.is_ascii_uppercase().

Test plan

moon test --target native -p builtin — 2806 / 2806 pass
moon test --target wasm-gc -p builtin — 2848 / 2848 pass
moon fmt --check clean

Same workflow as #3632 / #3633 / #3634 (native alloc profile a real
workload → attack the top site), this time exercising a live HTTP
server instead of a parser bench.

`String::to_lower` and `StringView::to_lower` both fall into a `for c in self.view(start_offset=idx)` loop after the all-lowercase fast-path guard. On the native target, that for-loop desugars into a heap-allocated `Iter` (a closure + size_hint pair) — one per call. Combined with the `StringBuilder` and result `String` they already allocate, every uppercase-containing string passed through `to_lower` costs one extra heap object. Replace the iterator loop with a plain UTF-16 code-unit loop over the view via `unsafe_get`. The conversion is unchanged (still ASCII-only per the existing `TODO`); any non-ASCII unit, including unpaired surrogate halves, is written through bit-identically because `write_char` of a code unit ≤ 0xFFFF emits exactly one UTF-16 unit. ## Numbers Measured by running mizchi/pprof-mbt's `memprofile-native` against moonbitlang/async's `http_server_benchmark` under `wrk -t 8 -c 128 -d 8s` (~108k requests served, 4 headers each → ~430k `to_lower` calls): * total allocations: 14 381 000 → 12 607 200 (−12.3 %, 1.77 M fewer) * total bytes: 473.67 MB → 459.68 MB (−3.0 %, −14 MB) * `to_lower` site: 8.44 MB / 466 100 allocs → gone from top sites * `StringView::iter2` site (mostly the to_lower loop): 8.04 MB → 4.0 % `StringBuilder::to_string` and the `StringBuilder` itself are still allocated; only the per-iter overhead goes away. Tests: `moon test --target native -p builtin` (2806/2806) and `moon test --target wasm-gc -p builtin` (2848/2848). `moon fmt --check` clean.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Optimizes to_lower for StringView and String by replacing iterator-based traversal with a manual loop intended to avoid a native-target heap allocation, while keeping the ASCII-only lowercase mapping behavior.

Changes:

Replaced for c in self.view(start_offset=idx) with an index-based loop using unsafe_get.
Implemented ASCII uppercase detection via integer range checks (0x41..0x5A) and conversion via + 32.
Added comments explaining the performance motivation and UTF-16/code-unit assumptions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mizchi · 2026-05-27T08:27:19Z

+  let len = self.length()
+  for i = idx; i < len; i = i + 1 {
+    let cui = self.unsafe_get(i).to_int()
+    if cui >= 0x41 && cui <= 0x5A {
+      // 'A' is 65, 'a' is 97; difference is 32.
+      buf.write_char((cui + 32).unsafe_to_char())
    } else {
-      buf.write_char(c)
+      buf.write_char(cui.unsafe_to_char())
    }
  }


The current version avoids the surrogate-half issue by not writing non-ASCII code units through write_char(unsafe_to_char(...)).

write_ascii_lowercase_code_unit is only called after is_ascii_uppercase_code_unit, so the unsafe conversion there is limited to ASCII A..Z -> a..z. All non-ASCII ranges, including valid surrogate pairs and unpaired surrogate halves, are copied with write_substring, preserving the original UTF-16 units verbatim.

I added coverage for both valid non-BMP surrogate-pair paths ("🤣A", "A🤣B", sliced StringView) and unpaired surrogate passthrough in 522ffda. Verified with:

moon test --target native -p builtin — 2806 / 2806 pass

moon test --target wasm-gc -p builtin — 2848 / 2848 pass

+    let cui = self.unsafe_get(i).to_int()
+    if cui >= 0x41 && cui <= 0x5A {
+      // 'A' is 65, 'a' is 97; difference is 32.
+      buf.write_char((cui + 32).unsafe_to_char())
    } else {
-      buf.write_char(c)
+      buf.write_char(cui.unsafe_to_char())


bobzhang · 2026-05-27T14:41:17Z

Codex review pass: I reviewed the current head (522ffda) for correctness and did not find a blocker.

The refactor looks safe to me because it consistently uses UTF-16 code-unit offsets after the scan. That fixes the previous char-index/code-unit-index mismatch around non-BMP characters before uppercase ASCII. ASCII uppercase detection remains equivalent (0x41..0x5A), and non-ASCII spans are copied as substrings so valid surrogate pairs are preserved.

I also checked malformed-surrogate edge behavior around the added tests. The high-surrogate passthrough case is covered and preserved; a leading low-surrogate plus uppercase still aborts through the same substring-boundary class of path as the old implementation, so I do not see a compatibility regression there.

Validation run locally:

moon test --target all -p builtin
moon check --target all -p builtin
moon fmt --check
moon info
git diff --check

Note: a formal approval review could not be submitted because this account already has a pending review on the PR.

bobzhang · 2026-05-29T03:41:12Z

note we are working on to make string unicode iteration fast (may come next week), so put on hold to make a comparison later

Copilot AI review requested due to automatic review settings May 26, 2026 13:39

Copilot AI reviewed May 26, 2026

View reviewed changes

mizchi added 2 commits May 27, 2026 14:38

fix(string): preserve non-BMP chars in to_lower

46f9ae9

test(string): cover to_lower unpaired surrogate passthrough

522ffda

Merge branch 'main' into string/to-lower-no-iter-alloc

0dbc5a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(string): drop per-call Iter alloc in to_lower#3635

perf(string): drop per-call Iter alloc in to_lower#3635
mizchi wants to merge 4 commits into
moonbitlang:mainfrom
mizchi:string/to-lower-no-iter-alloc

mizchi commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mizchi May 27, 2026

Uh oh!

bobzhang commented May 27, 2026

Uh oh!

bobzhang commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mizchi commented May 26, 2026

Summary

Numbers

Why this is safe

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

mizchi May 27, 2026

Choose a reason for hiding this comment

Uh oh!

bobzhang commented May 27, 2026

Uh oh!

bobzhang commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants