Skip to content

Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal#9523

Open
ruihe774 wants to merge 4 commits into
gfx-rs:trunkfrom
ruihe774:subgroup_control
Open

Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal#9523
ruihe774 wants to merge 4 commits into
gfx-rs:trunkfrom
ruihe774:subgroup_control

Conversation

@ruihe774
Copy link
Copy Markdown
Contributor

@ruihe774 ruihe774 commented May 7, 2026

Connections

None

Description

Currently wgpu unconditionally sets VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT whenever Features::SUBGROUP is enabled. The driver may pick any subgroup size in [subgroup_min_size, subgroup_max_size], and on Vulkan even different subgroups in the same dispatch may use different sizes. There is no way to specify the requested subgroup size.

That's a real limitation for performance-sensitive compute. Two motivating cases:

  1. Workgroup-memory sizing tied to subgroup count. A common reduction / scan / sort pattern is:

    const WG_SIZE: u32 = 256;
    const SG_SIZE: u32 = 32;                         // assumed
    const NUM_SG:  u32 = WG_SIZE / SG_SIZE;          // 8
    
    var<workgroup> partial: array<f32, NUM_SG>;      // one slot per subgroup
    
    @compute @workgroup_size(WG_SIZE)
    fn reduce(@builtin(local_invocation_id) lid: vec3<u32>) {
        let v = subgroupAdd(input[lid.x]);
        if (subgroupElect()) { partial[lid.x / SG_SIZE] = v; }
        workgroupBarrier();
        // ... second-level reduction over partial[]
    }

    partial's size has to be known at compile time. Without SubgroupSize::Fixed(32), you must size for the worst case (WG_SIZE / subgroup_min_size), wasting shared memory and hurting occupancy on devices where the actual size is larger. Cross-vendor this is painful — NVIDIA = 32, AMD pre-RDNA = 64, AMD RDNA = 32 or 64 (may vary across dispatches or within a dispatch), Intel = 8/16/32. Fixed(n) lets the kernel pin the assumption it was written against; Varying keeps the current "implementation chooses" behavior.

  2. Avoiding partial trailing subgroups. When WG_SIZE isn't a multiple of the subgroup size, the last subgroup is partially populated. Subgroup ops on inactive lanes return implementation-defined values, so ballot / vote / shuffle patterns need explicit masking and tend to be subtly wrong. SubgroupSize::Full (Vulkan's REQUIRE_FULL_SUBGROUPS_BIT) guarantees every invocation in the workgroup belongs to a fully-populated subgroup — required for compute/task/mesh, hence the validation rejecting it on vertex/fragment.

This PR exposes both knobs as a single SubgroupSize enum on PipelineCompilationOptions:

  • Varying (default) — implementation chooses, within [subgroup_min_size, subgroup_max_size]. Matches today's behavior.
  • Full — require full subgroups. Compute / task / mesh stages only.
  • Fixed(u32) — must be a power of two within [subgroup_min_size, subgroup_max_size].

Gated behind a new Features::SUBGROUP_SIZE_CONTROL, which is only advertised on Vulkan with VK_EXT_subgroup_size_control (promoted to 1.3). Other backends (Metal / D3D12 / GL / WebGPU) don't advertise the feature, so non-Varying is rejected at pipeline creation rather than silently no-oping.

Validation lives in wgpu-core and rejects: non-Varying without the feature, Fixed(n) that isn't a power of two or is outside the adapter's range, and Full on vertex/fragment stages. The Vulkan adapter only advertises SUBGROUP_SIZE_CONTROL when the device supports both subgroupSizeControl and computeFullSubgroups, so all variants are honorable once the feature is enabled (this combination is universal in practice — Vulkan 1.3 mandates both).

On the Vulkan side, Varying sets ALLOW_VARYING_SUBGROUP_SIZE_BIT, Full sets REQUIRE_FULL_SUBGROUPS_BIT, and Fixed(n) chains a VkPipelineShaderStageRequiredSubgroupSizeCreateInfo via p_next (boxed and stored on CompiledStage so the address survives moves).

Testing
A new validation test subgroup_size.rs is added. Furthermore, I tested a real compute pipeline with the added API with my graphical card.

Squash or Rebase?

Squash

Checklist

  • I self-reviewed and fully understand this PR.
  • WebGPU implementations built with wgpu may be affected behaviorally.
  • Validation and feature gates are in place to confine behavioral changes.
  • Tests demonstrate the validation and altered logic works.
  • CHANGELOG.md entries for the user-facing effects of this change are present.
  • The PR is minimal, and doesn't make sense to land as multiple PRs.
  • Commits are logically scoped and individually reviewable.
  • The PR description has enough context to understand the motivation and solution implemented.

@ruihe774 ruihe774 force-pushed the subgroup_control branch from e69b6bb to 93ec9a4 Compare May 7, 2026 16:34
Adds `Features::SUBGROUP_SIZE_CONTROL` and a `SubgroupSize` enum
(`Varying` / `Full` / `Fixed(u32)`) wired through `PipelineCompilationOptions`
on every shader stage. On Vulkan, this maps to
`VK_EXT_subgroup_size_control` (`ALLOW_VARYING_SUBGROUP_SIZE_BIT`,
`REQUIRE_FULL_SUBGROUPS_BIT`, and `VkPipelineShaderStageRequiredSubgroupSizeCreateInfo`).

Validation in wgpu-core rejects non-`Varying` without the feature,
out-of-range or non-power-of-two `Fixed(n)`, and `Full` on vertex/fragment
stages. The Vulkan adapter only advertises `SUBGROUP_SIZE_CONTROL` when
the device supports both `subgroupSizeControl` and `computeFullSubgroups`,
so all variants are honorable once the feature is enabled.
@ruihe774 ruihe774 force-pushed the subgroup_control branch from 93ec9a4 to 0ddd336 Compare May 7, 2026 16:43
@andyleiserson
Copy link
Copy Markdown
Contributor

There is a draft proposal for subgroup size control discussed in gpuweb/gpuweb#5545. I haven't studied it in detail, but it would be worth aligning with that API where it makes sense and providing feedback where it doesn't.

…validate workgroup size

Without `ALLOW_VARYING_SUBGROUP_SIZE`, the Vulkan driver pins one
subgroup size at pipeline creation (typically `maxSubgroupSize`), forcing
`workgroup_size.x` to be a multiple of that value. That conflicts with
the WebGPU spec, where the WGSL `subgroup_size` builtin reflects the
actual size used at each invocation. With both flags, full subgroups are
guaranteed and the runtime is free to pick any size in
`[subgroup_min_size, subgroup_max_size]` that divides `workgroup_size.x`.

Also reject `SubgroupSize::Full` when `workgroup_size.x` is below
`subgroup_min_size`, since no full subgroup can fit. Adds
`Interface::workgroup_size` to expose the entry point's `@workgroup_size`
to pipeline validation, plus new error variants
`WorkgroupSizeTooSmallForFullSubgroups` on both compute and render
pipeline error types.
@ruihe774
Copy link
Copy Markdown
Contributor Author

ruihe774 commented May 7, 2026

Thanks for the pointer to #5545 and the subgroup-size-control proposal. I've reframed this PR as a precursor to that proposal rather than a parallel design (latest commit: ff7a12d).

Scope changes

  • The subgroup_size field on PipelineCompilationOptions is now documented as intended for passthrough shaders. The proposal models the WGSL/Naga path via a @subgroup_size entry-point attribute, which a passthrough shader can't carry — so threading the size through the pipeline-creation API stays useful even after the WGSL attribute lands. For non-passthrough shaders, callers should leave it at Varying and (eventually) use the WGSL attribute.
  • Validation for Fixed(n) now matches the proposal's @subgroup_size rules: power of two, within [subgroup_min_size, subgroup_max_size], and workgroup_size.x (when present) must be a multiple of n. Shaders authored against Fixed(n) should remain valid once the WGSL attribute lands. Test coverage is extended accordingly.
  • SubgroupSize and Features::SUBGROUP_SIZE_CONTROL docs now point at the proposal; CHANGELOG is updated.
  • Naga is intentionally untouched in this PR — implementing the WGSL attribute is a separate piece of work.

Where this still diverges from the proposal (intentionally, as feedback)

  • SubgroupSize::Full (Vulkan's REQUIRE_FULL_SUBGROUPS_BIT) has no counterpart in the proposal. It's a wgpu extension; the rationale is in the original PR description (avoiding partial trailing subgroups in patterns where masking inactive lanes is fragile).
  • The proposal targets compute only; this PR also accepts the field on task and mesh stages, which take @workgroup_size and so face the same partial-subgroup concerns. Full is rejected on vertex/fragment. The validation can be added for WGSL; for passthrough shaders, I think it does not make sense to enforce this limitation.
  • The proposal introduces new explicit limits minExplicitComputeSubgroupSize / maxExplicitComputeSubgroupSize separate from the existing range. This PR validates Fixed(n) against subgroup_min_size/subgroup_max_size and skips the new range. Adding the explicit limits is a wgt::AdapterInfo change I'd prefer to land separately so the proposal's lattice can be plumbed through cleanly.
  • The proposal's maxComputeWorkgroupSubgroups limit is also not enforced yet, for the same reason; this is called out in the Fixed doc.

This PR can now be considered as a fundamental step to support the proposal. After it lands, we can start works at naga side.

…roposal

Frame this as a precursor to the proposal: doc the
`PipelineCompilationOptions::subgroup_size` field as intended for
passthrough shaders (since the proposal uses a `@subgroup_size` WGSL
attribute for non-passthrough), and reject `Fixed(n)` when
`workgroup_size.x` is not a multiple of `n` per the proposal's Vulkan-
derived rule.
@ruihe774 ruihe774 force-pushed the subgroup_control branch from ff7a12d to 8c1eed1 Compare May 7, 2026 17:52
@ruihe774 ruihe774 changed the title Add per-stage subgroup size control Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal May 7, 2026
HLSL uses `[WaveSize(n)]` in the shader source, and Metal provides no
API to control the SIMD-group width, so this field has no effect on
HLSL or MetalLib/MSL passthrough shaders.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants