[Wasm RyuJIT] SIMD + Intrinsics

## Motivation
For acceptable performance in library code compiled to Wasm, we will want RyuJit to understand certain target specific intrinsics, as is the case on other architectures. In particular, with the `simd` extension, WebAssembly allows for a fixed `v128` type, which can be used as a runtime representation for `Vector128`. WebAssembly's vector instructions are specified [here](https://webassembly.github.io/spec/core/syntax/instructions.html#vector-instructions). RyuJIT should have appropriate handling in place such that operations from `System.Runtime.Intrinsics.Wasm.PackedSimd`, and the cross-platform `System.Runtime.Intrinsics.Vector128` are mapped to appropriate Wasm SIMD instructions where possible.

## System.Runtime.Intrinsics.Wasm.PackedSimd -> Wasm HW Intrinsic Mapping

The `PackedSimd` API exists already and is supported with acceleration in Mono. It exposes low-level Wasm SIMD operations directly. We should support expanding all methods in the `PackedSimd` API to native Wasm HW intrinsics. The mapping here for calls to these intrinsics should be 1-1 in nearly all cases, since PackedSimd already represents effectively a 1-1 mapping between API surface and Wasm-supported SIMD instructions. We should be able to leverage table driven codegen to accomplish expanding these intrinsics.

## System.Runtime.Intrinsics.Vector128 -> Wasm HW Intrinsic Mapping
Many of the operations in the System.Runtime.Intrinsics.Vector128 API can be mapped 1-1 to Wasm using a table-driven approach. For example, vector arithmetic, comparison, conversion, bitwise operations, and even certain cases of element shuffle have direct Wasm mappings. 

For operations that don't map 1-1, we will need slightly more complicated codegen in the JIT, where composing multiple instructions to implement an intrinsic may be sufficient.

### .NET types and Wasm SIMD Lane Mappings

The lane configurations for `v128` on Wasm are: `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x2`, and `f64x2`, and will generally be selected based on the instantiated generic type of the intrinsic operation, i.e., `Vector128<ushort>` -> `i16x8`.

### Instruction Groupings
The SIMD instructions to be implemented can be split into a few different categories, listed below with a few examples for each:

* `HW_Category_SimpleSIMD`   - Majority of SIMD ops (arithmetic, bitwise, shifts, compares, fp math)  
* `HW_Category_IMM`          - GetElement/WithElement, Shuffle (with constant indices)
* `HW_Category_Scalar`       - EqualsAll / EqualsAny, LessThanAll/LessThanAny 
* `HW_Category_MemoryLoad`   - Load, LoadUnsafe, LoadAligned
* `HW_Category_MemoryStore`  - Store, StoreUnsafe, StoreAligned
* `HW_Category_Helper`       - Create, Zero, One, As
* `HW_Category_Special`     - Shuffle, Narrow, Sum, Multiply, Min/Max (ulong)

The special category is catch-all, and encompasses operations that will need some special handling (generally, operations which aren't a 1-1 mapping).

## Named Intrinsics
Certain named intrinsics, such as those in the `System.Math` namespace, should also be expanded into native Wasm instructions. A tentative list of these follows below. Some of these were already implemented during basic codegen.

### System.Math
- [x] `Math.Max`              -> `max`
- [x] `Math.Min`              -> `min`
- [x] `Math.Floor`            -> `floor`
- [x] `Math.Ceil`             -> `ceil`
- [x] `Math.Abs`              -> `abs`
- [x] `Math.Sqrt`             -> `sqrt`
- [x] `Math.Round`            -> `nearest`
- [x] `Math.Truncate`         -> `truncate`
- [ ] `Math.CopySign`         -> `copysign`

### System.Numerics
- [ ] `BitOperations.TrailingZeroCount` -> `ctz`
- [ ] `BitOperations.LeadingZeroCount`  ->  `clz`

## First Implementation Steps (More to be added)
- [x] JIT Emitter support for vector 128 instructions and v128 type;
- [ ] Implement hardware intrinsic table for simple mappings (create a `hwintrinsicswasm.cpp` in the JIT) to support [PackedSimd](https://github.com/dotnet/runtime/blob/f1ca590eaa790d84d8d9eebf3185bff65865a65d/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Wasm/PackedSimd.cs) instructions
- [ ] Codegen for table driven cases (create a `hwintrinsiccodegenwasm.cpp` in the JIT)
- [ ] Codegen for table driven cases in System.Runtime.Intrinsics.Vector128 API
- [ ] Codegen for `HW_Category_Special` category on a case-by-case basis.

## Testing
We have existing tests for PackedSimd [here](https://github.com/dotnet/runtime/blob/f1ca590eaa790d84d8d9eebf3185bff65865a65d/src/libraries/System.Runtime.Intrinsics/tests/Wasm/PackedSimdTests.cs#L4) which should hopefully provide coverage for basic functionality. These tests should be very useful for initial codegen bring up.

We additionally have library code which leverages intrinsics, so once we have intrinsics enabled, running library tests in a Wasm context should give fairly good coverage of SIMD operations. We may wish to write Wasm-specific smoke tests to ensure that SIMD instructions are being emitted properly at all when SIMD is enabled; Ideally, we'll want to have tests to ensure that we're getting the benefit of acceleration and not falling back to managed implementations.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Wasm RyuJIT] SIMD + Intrinsics #127665

Motivation

System.Runtime.Intrinsics.Wasm.PackedSimd -> Wasm HW Intrinsic Mapping

System.Runtime.Intrinsics.Vector128 -> Wasm HW Intrinsic Mapping

.NET types and Wasm SIMD Lane Mappings

Instruction Groupings

Named Intrinsics

System.Math

System.Numerics

First Implementation Steps (More to be added)

Testing

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Wasm RyuJIT] SIMD + Intrinsics #127665

Description

Motivation

System.Runtime.Intrinsics.Wasm.PackedSimd -> Wasm HW Intrinsic Mapping

System.Runtime.Intrinsics.Vector128 -> Wasm HW Intrinsic Mapping

.NET types and Wasm SIMD Lane Mappings

Instruction Groupings

Named Intrinsics

System.Math

System.Numerics

First Implementation Steps (More to be added)

Testing

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions