Motivation
For acceptable performance in library code compiled to Wasm, we will want RyuJit to understand certain target specific intrinsics, as is the case on other architectures. In particular, with the simd extension, WebAssembly allows for a fixed v128 type, which can be used as a runtime representation for Vector128. WebAssembly's vector instructions are specified here. RyuJIT should have appropriate handling in place such that operations from System.Runtime.Intrinsics.Wasm.PackedSimd, and the cross-platform System.Runtime.Intrinsics.Vector128 are mapped to appropriate Wasm SIMD instructions where possible.
System.Runtime.Intrinsics.Wasm.PackedSimd -> Wasm HW Intrinsic Mapping
The PackedSimd API exists already and is supported with acceleration in Mono. It exposes low-level Wasm SIMD operations directly. We should support expanding all methods in the PackedSimd API to native Wasm HW intrinsics. The mapping here for calls to these intrinsics should be 1-1 in nearly all cases, since PackedSimd already represents effectively a 1-1 mapping between API surface and Wasm-supported SIMD instructions. We should be able to leverage table driven codegen to accomplish expanding these intrinsics.
System.Runtime.Intrinsics.Vector128 -> Wasm HW Intrinsic Mapping
Many of the operations in the System.Runtime.Intrinsics.Vector128 API can be mapped 1-1 to Wasm using a table-driven approach. For example, vector arithmetic, comparison, conversion, bitwise operations, and even certain cases of element shuffle have direct Wasm mappings.
For operations that don't map 1-1, we will need slightly more complicated codegen in the JIT, where composing multiple instructions to implement an intrinsic may be sufficient.
.NET types and Wasm SIMD Lane Mappings
The lane configurations for v128 on Wasm are: i8x16, i16x8, i32x4, i64x2, f32x2, and f64x2, and will generally be selected based on the instantiated generic type of the intrinsic operation, i.e., Vector128<ushort> -> i16x8.
Instruction Groupings
The SIMD instructions to be implemented can be split into a few different categories, listed below with a few examples for each:
HW_Category_SimpleSIMD - Majority of SIMD ops (arithmetic, bitwise, shifts, compares, fp math)
HW_Category_IMM - GetElement/WithElement, Shuffle (with constant indices)
HW_Category_Scalar - EqualsAll / EqualsAny, LessThanAll/LessThanAny
HW_Category_MemoryLoad - Load, LoadUnsafe, LoadAligned
HW_Category_MemoryStore - Store, StoreUnsafe, StoreAligned
HW_Category_Helper - Create, Zero, One, As
HW_Category_Special - Shuffle, Narrow, Sum, Multiply, Min/Max (ulong)
The special category is catch-all, and encompasses operations that will need some special handling (generally, operations which aren't a 1-1 mapping).
Named Intrinsics
Certain named intrinsics, such as those in the System.Math namespace, should also be expanded into native Wasm instructions. A tentative list of these follows below. Some of these were already implemented during basic codegen.
System.Math
System.Numerics
First Implementation Steps (More to be added)
Testing
We have existing tests for PackedSimd here which should hopefully provide coverage for basic functionality. These tests should be very useful for initial codegen bring up.
We additionally have library code which leverages intrinsics, so once we have intrinsics enabled, running library tests in a Wasm context should give fairly good coverage of SIMD operations. We may wish to write Wasm-specific smoke tests to ensure that SIMD instructions are being emitted properly at all when SIMD is enabled; Ideally, we'll want to have tests to ensure that we're getting the benefit of acceleration and not falling back to managed implementations.
Motivation
For acceptable performance in library code compiled to Wasm, we will want RyuJit to understand certain target specific intrinsics, as is the case on other architectures. In particular, with the
simdextension, WebAssembly allows for a fixedv128type, which can be used as a runtime representation forVector128. WebAssembly's vector instructions are specified here. RyuJIT should have appropriate handling in place such that operations fromSystem.Runtime.Intrinsics.Wasm.PackedSimd, and the cross-platformSystem.Runtime.Intrinsics.Vector128are mapped to appropriate Wasm SIMD instructions where possible.System.Runtime.Intrinsics.Wasm.PackedSimd -> Wasm HW Intrinsic Mapping
The
PackedSimdAPI exists already and is supported with acceleration in Mono. It exposes low-level Wasm SIMD operations directly. We should support expanding all methods in thePackedSimdAPI to native Wasm HW intrinsics. The mapping here for calls to these intrinsics should be 1-1 in nearly all cases, since PackedSimd already represents effectively a 1-1 mapping between API surface and Wasm-supported SIMD instructions. We should be able to leverage table driven codegen to accomplish expanding these intrinsics.System.Runtime.Intrinsics.Vector128 -> Wasm HW Intrinsic Mapping
Many of the operations in the System.Runtime.Intrinsics.Vector128 API can be mapped 1-1 to Wasm using a table-driven approach. For example, vector arithmetic, comparison, conversion, bitwise operations, and even certain cases of element shuffle have direct Wasm mappings.
For operations that don't map 1-1, we will need slightly more complicated codegen in the JIT, where composing multiple instructions to implement an intrinsic may be sufficient.
.NET types and Wasm SIMD Lane Mappings
The lane configurations for
v128on Wasm are:i8x16,i16x8,i32x4,i64x2,f32x2, andf64x2, and will generally be selected based on the instantiated generic type of the intrinsic operation, i.e.,Vector128<ushort>->i16x8.Instruction Groupings
The SIMD instructions to be implemented can be split into a few different categories, listed below with a few examples for each:
HW_Category_SimpleSIMD- Majority of SIMD ops (arithmetic, bitwise, shifts, compares, fp math)HW_Category_IMM- GetElement/WithElement, Shuffle (with constant indices)HW_Category_Scalar- EqualsAll / EqualsAny, LessThanAll/LessThanAnyHW_Category_MemoryLoad- Load, LoadUnsafe, LoadAlignedHW_Category_MemoryStore- Store, StoreUnsafe, StoreAlignedHW_Category_Helper- Create, Zero, One, AsHW_Category_Special- Shuffle, Narrow, Sum, Multiply, Min/Max (ulong)The special category is catch-all, and encompasses operations that will need some special handling (generally, operations which aren't a 1-1 mapping).
Named Intrinsics
Certain named intrinsics, such as those in the
System.Mathnamespace, should also be expanded into native Wasm instructions. A tentative list of these follows below. Some of these were already implemented during basic codegen.System.Math
Math.Max->maxMath.Min->minMath.Floor->floorMath.Ceil->ceilMath.Abs->absMath.Sqrt->sqrtMath.Round->nearestMath.Truncate->truncateMath.CopySign->copysignSystem.Numerics
BitOperations.TrailingZeroCount->ctzBitOperations.LeadingZeroCount->clzFirst Implementation Steps (More to be added)
hwintrinsicswasm.cppin the JIT) to support PackedSimd instructionshwintrinsiccodegenwasm.cppin the JIT)HW_Category_Specialcategory on a case-by-case basis.Testing
We have existing tests for PackedSimd here which should hopefully provide coverage for basic functionality. These tests should be very useful for initial codegen bring up.
We additionally have library code which leverages intrinsics, so once we have intrinsics enabled, running library tests in a Wasm context should give fairly good coverage of SIMD operations. We may wish to write Wasm-specific smoke tests to ensure that SIMD instructions are being emitted properly at all when SIMD is enabled; Ideally, we'll want to have tests to ensure that we're getting the benefit of acceleration and not falling back to managed implementations.