Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal by ruihe774 · Pull Request #9523 · gfx-rs/wgpu

ruihe774 · 2026-05-07T16:30:20Z

Connections

None

Description

Currently wgpu unconditionally sets VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT whenever Features::SUBGROUP is enabled. The driver may pick any subgroup size in [subgroup_min_size, subgroup_max_size], and on Vulkan even different subgroups in the same dispatch may use different sizes. There is no way to specify the requested subgroup size.

That's a real limitation for performance-sensitive compute. Two motivating cases:

Workgroup-memory sizing tied to subgroup count. A common reduction / scan / sort pattern is:
```
const WG_SIZE: u32 = 256;
const SG_SIZE: u32 = 32;                         // assumed
const NUM_SG:  u32 = WG_SIZE / SG_SIZE;          // 8

var<workgroup> partial: array<f32, NUM_SG>;      // one slot per subgroup

@compute @workgroup_size(WG_SIZE)
fn reduce(@builtin(local_invocation_id) lid: vec3<u32>) {
    let v = subgroupAdd(input[lid.x]);
    if (subgroupElect()) { partial[lid.x / SG_SIZE] = v; }
    workgroupBarrier();
    // ... second-level reduction over partial[]
}
```
partial's size has to be known at compile time. Without SubgroupSize::Fixed(32), you must size for the worst case (WG_SIZE / subgroup_min_size), wasting shared memory and hurting occupancy on devices where the actual size is larger. Cross-vendor this is painful — NVIDIA = 32, AMD pre-RDNA = 64, AMD RDNA = 32 or 64 (may vary across dispatches or within a dispatch), Intel = 8/16/32. Fixed(n) lets the kernel pin the assumption it was written against; Varying keeps the current "implementation chooses" behavior.
Avoiding partial trailing subgroups. When WG_SIZE isn't a multiple of the subgroup size, the last subgroup is partially populated. Subgroup ops on inactive lanes return implementation-defined values, so ballot / vote / shuffle patterns need explicit masking and tend to be subtly wrong. SubgroupSize::Full (Vulkan's REQUIRE_FULL_SUBGROUPS_BIT) guarantees every invocation in the workgroup belongs to a fully-populated subgroup — required for compute/task/mesh, hence the validation rejecting it on vertex/fragment.

This PR exposes both knobs as a single SubgroupSize enum on PipelineCompilationOptions:

Varying (default) — implementation chooses, within [subgroup_min_size, subgroup_max_size]. Matches today's behavior.
Full — require full subgroups. Compute / task / mesh stages only.
Fixed(u32) — must be a power of two within [subgroup_min_size, subgroup_max_size].

Gated behind a new Features::SUBGROUP_SIZE_CONTROL, which is only advertised on Vulkan with VK_EXT_subgroup_size_control (promoted to 1.3). Other backends (Metal / D3D12 / GL / WebGPU) don't advertise the feature, so non-Varying is rejected at pipeline creation rather than silently no-oping.

Validation lives in wgpu-core and rejects: non-Varying without the feature, Fixed(n) that isn't a power of two or is outside the adapter's range, and Full on vertex/fragment stages. The Vulkan adapter only advertises SUBGROUP_SIZE_CONTROL when the device supports both subgroupSizeControl and computeFullSubgroups, so all variants are honorable once the feature is enabled (this combination is universal in practice — Vulkan 1.3 mandates both).

On the Vulkan side, Varying sets ALLOW_VARYING_SUBGROUP_SIZE_BIT, Full sets REQUIRE_FULL_SUBGROUPS_BIT, and Fixed(n) chains a VkPipelineShaderStageRequiredSubgroupSizeCreateInfo via p_next (boxed and stored on CompiledStage so the address survives moves).

Testing
A new validation test subgroup_size.rs is added. Furthermore, I tested a real compute pipeline with the added API with my graphical card.

Squash or Rebase?

Squash

Checklist

I self-reviewed and fully understand this PR.
WebGPU implementations built with wgpu may be affected behaviorally.
Validation and feature gates are in place to confine behavioral changes.
Tests demonstrate the validation and altered logic works.
CHANGELOG.md entries for the user-facing effects of this change are present.
The PR is minimal, and doesn't make sense to land as multiple PRs.
Commits are logically scoped and individually reviewable.
The PR description has enough context to understand the motivation and solution implemented.

Adds `Features::SUBGROUP_SIZE_CONTROL` and a `SubgroupSize` enum (`Varying` / `Full` / `Fixed(u32)`) wired through `PipelineCompilationOptions` on every shader stage. On Vulkan, this maps to `VK_EXT_subgroup_size_control` (`ALLOW_VARYING_SUBGROUP_SIZE_BIT`, `REQUIRE_FULL_SUBGROUPS_BIT`, and `VkPipelineShaderStageRequiredSubgroupSizeCreateInfo`). Validation in wgpu-core rejects non-`Varying` without the feature, out-of-range or non-power-of-two `Fixed(n)`, and `Full` on vertex/fragment stages. The Vulkan adapter only advertises `SUBGROUP_SIZE_CONTROL` when the device supports both `subgroupSizeControl` and `computeFullSubgroups`, so all variants are honorable once the feature is enabled.

andyleiserson · 2026-05-07T17:11:45Z

There is a draft proposal for subgroup size control discussed in gpuweb/gpuweb#5545. I haven't studied it in detail, but it would be worth aligning with that API where it makes sense and providing feedback where it doesn't.

…validate workgroup size Without `ALLOW_VARYING_SUBGROUP_SIZE`, the Vulkan driver pins one subgroup size at pipeline creation (typically `maxSubgroupSize`), forcing `workgroup_size.x` to be a multiple of that value. That conflicts with the WebGPU spec, where the WGSL `subgroup_size` builtin reflects the actual size used at each invocation. With both flags, full subgroups are guaranteed and the runtime is free to pick any size in `[subgroup_min_size, subgroup_max_size]` that divides `workgroup_size.x`. Also reject `SubgroupSize::Full` when `workgroup_size.x` is below `subgroup_min_size`, since no full subgroup can fit. Adds `Interface::workgroup_size` to expose the entry point's `@workgroup_size` to pipeline validation, plus new error variants `WorkgroupSizeTooSmallForFullSubgroups` on both compute and render pipeline error types.

ruihe774 · 2026-05-07T17:52:06Z

Thanks for the pointer to #5545 and the subgroup-size-control proposal. I've reframed this PR as a precursor to that proposal rather than a parallel design (latest commit: ff7a12d).

Scope changes

The subgroup_size field on PipelineCompilationOptions is now documented as intended for passthrough shaders. The proposal models the WGSL/Naga path via a @subgroup_size entry-point attribute, which a passthrough shader can't carry — so threading the size through the pipeline-creation API stays useful even after the WGSL attribute lands. For non-passthrough shaders, callers should leave it at Varying and (eventually) use the WGSL attribute.
Validation for Fixed(n) now matches the proposal's @subgroup_size rules: power of two, within [subgroup_min_size, subgroup_max_size], and workgroup_size.x (when present) must be a multiple of n. Shaders authored against Fixed(n) should remain valid once the WGSL attribute lands. Test coverage is extended accordingly.
SubgroupSize and Features::SUBGROUP_SIZE_CONTROL docs now point at the proposal; CHANGELOG is updated.
Naga is intentionally untouched in this PR — implementing the WGSL attribute is a separate piece of work.

Where this still diverges from the proposal (intentionally, as feedback)

SubgroupSize::Full (Vulkan's REQUIRE_FULL_SUBGROUPS_BIT) has no counterpart in the proposal. It's a wgpu extension; the rationale is in the original PR description (avoiding partial trailing subgroups in patterns where masking inactive lanes is fragile).
The proposal targets compute only; this PR also accepts the field on task and mesh stages, which take @workgroup_size and so face the same partial-subgroup concerns. Full is rejected on vertex/fragment. The validation can be added for WGSL; for passthrough shaders, I think it does not make sense to enforce this limitation.
The proposal introduces new explicit limits minExplicitComputeSubgroupSize / maxExplicitComputeSubgroupSize separate from the existing range. This PR validates Fixed(n) against subgroup_min_size/subgroup_max_size and skips the new range. Adding the explicit limits is a wgt::AdapterInfo change I'd prefer to land separately so the proposal's lattice can be plumbed through cleanly.
The proposal's maxComputeWorkgroupSubgroups limit is also not enforced yet, for the same reason; this is called out in the Fixed doc.

This PR can now be considered as a fundamental step to support the proposal. After it lands, we can start works at naga side.

…roposal Frame this as a precursor to the proposal: doc the `PipelineCompilationOptions::subgroup_size` field as intended for passthrough shaders (since the proposal uses a `@subgroup_size` WGSL attribute for non-passthrough), and reject `Fixed(n)` when `workgroup_size.x` is not a multiple of `n` per the proposal's Vulkan- derived rule.

HLSL uses `[WaveSize(n)]` in the shader source, and Metal provides no API to control the SIMD-group width, so this field has no effect on HLSL or MetalLib/MSL passthrough shaders.

ruihe774 force-pushed the subgroup_control branch from e69b6bb to 93ec9a4 Compare May 7, 2026 16:34

ruihe774 force-pushed the subgroup_control branch from 93ec9a4 to 0ddd336 Compare May 7, 2026 16:43

ruihe774 force-pushed the subgroup_control branch from ff7a12d to 8c1eed1 Compare May 7, 2026 17:52

ruihe774 changed the title ~~Add per-stage subgroup size control~~ Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal May 7, 2026

Clarify that subgroup_size applies only to SPIR-V passthrough shaders

382acfb

HLSL uses `[WaveSize(n)]` in the shader source, and Metal provides no API to control the SIMD-group width, so this field has no effect on HLSL or MetalLib/MSL passthrough shaders.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal#9523

Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal#9523
ruihe774 wants to merge 4 commits into
gfx-rs:trunkfrom
ruihe774:subgroup_control

ruihe774 commented May 7, 2026 •

edited

Loading

Uh oh!

andyleiserson commented May 7, 2026

Uh oh!

ruihe774 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruihe774 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyleiserson commented May 7, 2026

Uh oh!

ruihe774 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ruihe774 commented May 7, 2026 •

edited

Loading