Skip to content

<__msvc_int128.hpp>: use __umulh on ARM64/ARM64EC (#6184)#6281

Open
Adesh4477 wants to merge 1 commit into
microsoft:mainfrom
Adesh4477:feat/int128-arm64-umulh
Open

<__msvc_int128.hpp>: use __umulh on ARM64/ARM64EC (#6184)#6281
Adesh4477 wants to merge 1 commit into
microsoft:mainfrom
Adesh4477:feat/int128-arm64-umulh

Conversation

@Adesh4477
Copy link
Copy Markdown

@Adesh4477 Adesh4477 commented May 11, 2026

On ARM64 _UMul128 falls through to the Knuth fallback because _STL_128_INTRINSICS is x64-only. ARM64 has umulh as a single instruction so we can do this in two ops
instead of ~thirty. Patch adds the obvious #elif branch using __umulh for the high half and a regular 64-bit * for the low half.

Tested locally this microbench on Snapdragon X Elite (5M random uint64 pairs * 5 reps):
Knuth fallback : ~82 ms (~3.27 ns/op)
__umulh path : ~27 ms (~1.08 ns/op)
Speedup : ~3.03x

Per the issue author, x64 is intentionally not modified -- _umul128 remains preferable there.

Adds an ARM64/ARM64EC fast path to _Base128::_UMul128 that uses the
__umulh intrinsic for the high 64 bits and a plain 64-bit multiply for
the low 64 bits, in place of the Knuth-base-2^32 fallback.

Microbench on Snapdragon X Elite (5M random uint64 pairs * 5 reps):
  Knuth fallback : ~82 ms (~3.27 ns/op)
  __umulh path   : ~27 ms (~1.08 ns/op)
  Speedup        : ~3.03x

Disassembly collapses from ~30 ops (incl. /GS cookie push) to 4 ops
(umulh / mul / str / ret).

_STL_128_INTRINSICS is intentionally not enabled for ARM64; that macro
also gates _addcarry_u64, _subborrow_u64, __shiftleft128, __shiftright128,
and _udiv128/_div128, which have no direct single-instruction ARM64
equivalents and are out of scope for this change.

Per the issue author, x64 is intentionally not modified -- _umul128 remains
preferable there.
Copilot AI review requested due to automatic review settings May 11, 2026 10:41
@Adesh4477 Adesh4477 requested a review from a team as a code owner May 11, 2026 10:41
@github-project-automation github-project-automation Bot moved this to Initial Review in STL Code Reviews May 11, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an ARM64/ARM64EC runtime fast-path for _Base128::_UMul128() in <__msvc_int128.hpp> by computing the high 64 bits via __umulh() and the low 64 bits via a normal 64-bit multiply, avoiding the existing Knuth base-2^32 fallback in non-constant-evaluated code.

Changes:

  • Add an ARM64/ARM64EC __umulh-based implementation for the high half of the 128-bit product.
  • Keep the existing constexpr/Knuth fallback for constant evaluation and other targets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stl/inc/__msvc_int128.hpp
Comment on lines +161 to +165
#elif (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE)
if (!_Is_constant_evaluated()) {
_High_result = __umulh(_Left, _Right);
return _Left * _Right;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the correct conclusion from the comment in stl/inc/xcharconv_ryu.h:57-62.

However, the comment itself seems to be obsolete: the intrinsics are in <intrin0.h> already.

So I'm suggesting against any changes here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after further looking into this I think that VSO-1918426 may be about something else, not even what Copilot made me believe it means.

I'm still holding against any changes here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VSO-1918426 "ifdef guards for Arm64EC/CHPE are missing or not done correctly in C/C++ headers" was resolved as fixed on 2023-12-01.

Copy link
Copy Markdown
Contributor

@AlexGuteniev AlexGuteniev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd also need to use power words to link the PR to the issue from the PR description.
Please edit the description to include "Fixes #6184" or "Resolves #6184" or in any other form.

Comment thread stl/inc/__msvc_int128.hpp
_High_result = __umulh(_Left, _Right);
return _Left * _Right;
}
#endif // _STL_128_INTRINSICS / ARM64 __umulh
Copy link
Copy Markdown
Contributor

@AlexGuteniev AlexGuteniev May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the latest practice of preprocessor comments, we just mention the last block with arrows:

#endif // ^^^ (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE) ^^^

Comment thread stl/inc/__msvc_int128.hpp
Comment on lines +161 to +165
#elif (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE)
if (!_Is_constant_evaluated()) {
_High_result = __umulh(_Left, _Right);
return _Left * _Right;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the correct conclusion from the comment in stl/inc/xcharconv_ryu.h:57-62.

However, the comment itself seems to be obsolete: the intrinsics are in <intrin0.h> already.

So I'm suggesting against any changes here.

@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture ARM64EC I can't believe it's not x64! labels May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64EC I can't believe it's not x64! ARM64 Related to the ARM64 architecture performance Must go faster

Projects

Status: Initial Review

Development

Successfully merging this pull request may close these issues.

4 participants