Skip to content

[Wasm RyuJIT] SIMD + Intrinsics #127665

@adamperlin

Description

@adamperlin

Motivation

For acceptable performance in library code compiled to Wasm, we will want RyuJit to understand certain target specific intrinsics, as is the case on other architectures. In particular, with the simd extension, WebAssembly allows for a fixed v128 type, which can be used as a runtime representation for Vector128. WebAssembly's vector instructions are specified here. RyuJIT should have appropriate handling in place such that operations from System.Runtime.Intrinsics.Wasm.PackedSimd, and the cross-platform System.Runtime.Intrinsics.Vector128 are mapped to appropriate Wasm SIMD instructions where possible.

System.Runtime.Intrinsics.Wasm.PackedSimd -> Wasm HW Intrinsic Mapping

The PackedSimd API exists already and is supported with acceleration in Mono. It exposes low-level Wasm SIMD operations directly. We should support expanding all methods in the PackedSimd API to native Wasm HW intrinsics. The mapping here for calls to these intrinsics should be 1-1 in nearly all cases, since PackedSimd already represents effectively a 1-1 mapping between API surface and Wasm-supported SIMD instructions. We should be able to leverage table driven codegen to accomplish expanding these intrinsics.

System.Runtime.Intrinsics.Vector128 -> Wasm HW Intrinsic Mapping

Many of the operations in the System.Runtime.Intrinsics.Vector128 API can be mapped 1-1 to Wasm using a table-driven approach. For example, vector arithmetic, comparison, conversion, bitwise operations, and even certain cases of element shuffle have direct Wasm mappings.

For operations that don't map 1-1, we will need slightly more complicated codegen in the JIT, where composing multiple instructions to implement an intrinsic may be sufficient.

.NET types and Wasm SIMD Lane Mappings

The lane configurations for v128 on Wasm are: i8x16, i16x8, i32x4, i64x2, f32x2, and f64x2, and will generally be selected based on the instantiated generic type of the intrinsic operation, i.e., Vector128<ushort> -> i16x8.

Instruction Groupings

The SIMD instructions to be implemented can be split into a few different categories, listed below with a few examples for each:

  • HW_Category_SimpleSIMD - Majority of SIMD ops (arithmetic, bitwise, shifts, compares, fp math)
  • HW_Category_IMM - GetElement/WithElement, Shuffle (with constant indices)
  • HW_Category_Scalar - EqualsAll / EqualsAny, LessThanAll/LessThanAny
  • HW_Category_MemoryLoad - Load, LoadUnsafe, LoadAligned
  • HW_Category_MemoryStore - Store, StoreUnsafe, StoreAligned
  • HW_Category_Helper - Create, Zero, One, As
  • HW_Category_Special - Shuffle, Narrow, Sum, Multiply, Min/Max (ulong)

The special category is catch-all, and encompasses operations that will need some special handling (generally, operations which aren't a 1-1 mapping).

Named Intrinsics

Certain named intrinsics, such as those in the System.Math namespace, should also be expanded into native Wasm instructions. A tentative list of these follows below. Some of these were already implemented during basic codegen.

System.Math

  • Math.Max -> max
  • Math.Min -> min
  • Math.Floor -> floor
  • Math.Ceil -> ceil
  • Math.Abs -> abs
  • Math.Sqrt -> sqrt
  • Math.Round -> nearest
  • Math.Truncate -> truncate
  • Math.CopySign -> copysign

System.Numerics

  • BitOperations.TrailingZeroCount -> ctz
  • BitOperations.LeadingZeroCount -> clz

First Implementation Steps (More to be added)

  • JIT Emitter support for vector 128 instructions and v128 type;
  • Implement hardware intrinsic table for simple mappings (create a hwintrinsicswasm.cpp in the JIT) to support PackedSimd instructions
  • Codegen for table driven cases (create a hwintrinsiccodegenwasm.cpp in the JIT)
  • Codegen for table driven cases in System.Runtime.Intrinsics.Vector128 API
  • Codegen for HW_Category_Special category on a case-by-case basis.

Testing

We have existing tests for PackedSimd here which should hopefully provide coverage for basic functionality. These tests should be very useful for initial codegen bring up.

We additionally have library code which leverages intrinsics, so once we have intrinsics enabled, running library tests in a Wasm context should give fairly good coverage of SIMD operations. We may wish to write Wasm-specific smoke tests to ensure that SIMD instructions are being emitted properly at all when SIMD is enabled; Ideally, we'll want to have tests to ensure that we're getting the benefit of acceleration and not falling back to managed implementations.

Metadata

Metadata

Assignees

Labels

Cost:LWork that requires one engineer up to 4 weeksarea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions