Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal#9523
Open
ruihe774 wants to merge 4 commits into
Open
Add per-stage subgroup size control - step 1 to support subgroup-size-control proposal#9523ruihe774 wants to merge 4 commits into
ruihe774 wants to merge 4 commits into
Conversation
Adds `Features::SUBGROUP_SIZE_CONTROL` and a `SubgroupSize` enum (`Varying` / `Full` / `Fixed(u32)`) wired through `PipelineCompilationOptions` on every shader stage. On Vulkan, this maps to `VK_EXT_subgroup_size_control` (`ALLOW_VARYING_SUBGROUP_SIZE_BIT`, `REQUIRE_FULL_SUBGROUPS_BIT`, and `VkPipelineShaderStageRequiredSubgroupSizeCreateInfo`). Validation in wgpu-core rejects non-`Varying` without the feature, out-of-range or non-power-of-two `Fixed(n)`, and `Full` on vertex/fragment stages. The Vulkan adapter only advertises `SUBGROUP_SIZE_CONTROL` when the device supports both `subgroupSizeControl` and `computeFullSubgroups`, so all variants are honorable once the feature is enabled.
Contributor
|
There is a draft proposal for subgroup size control discussed in gpuweb/gpuweb#5545. I haven't studied it in detail, but it would be worth aligning with that API where it makes sense and providing feedback where it doesn't. |
…validate workgroup size Without `ALLOW_VARYING_SUBGROUP_SIZE`, the Vulkan driver pins one subgroup size at pipeline creation (typically `maxSubgroupSize`), forcing `workgroup_size.x` to be a multiple of that value. That conflicts with the WebGPU spec, where the WGSL `subgroup_size` builtin reflects the actual size used at each invocation. With both flags, full subgroups are guaranteed and the runtime is free to pick any size in `[subgroup_min_size, subgroup_max_size]` that divides `workgroup_size.x`. Also reject `SubgroupSize::Full` when `workgroup_size.x` is below `subgroup_min_size`, since no full subgroup can fit. Adds `Interface::workgroup_size` to expose the entry point's `@workgroup_size` to pipeline validation, plus new error variants `WorkgroupSizeTooSmallForFullSubgroups` on both compute and render pipeline error types.
Contributor
Author
|
Thanks for the pointer to #5545 and the Scope changes
Where this still diverges from the proposal (intentionally, as feedback)
This PR can now be considered as a fundamental step to support the proposal. After it lands, we can start works at naga side. |
…roposal Frame this as a precursor to the proposal: doc the `PipelineCompilationOptions::subgroup_size` field as intended for passthrough shaders (since the proposal uses a `@subgroup_size` WGSL attribute for non-passthrough), and reject `Fixed(n)` when `workgroup_size.x` is not a multiple of `n` per the proposal's Vulkan- derived rule.
HLSL uses `[WaveSize(n)]` in the shader source, and Metal provides no API to control the SIMD-group width, so this field has no effect on HLSL or MetalLib/MSL passthrough shaders.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Connections
None
Description
Currently wgpu unconditionally sets
VK_PIPELINE_SHADER_STAGE_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BITwheneverFeatures::SUBGROUPis enabled. The driver may pick any subgroup size in[subgroup_min_size, subgroup_max_size], and on Vulkan even different subgroups in the same dispatch may use different sizes. There is no way to specify the requested subgroup size.That's a real limitation for performance-sensitive compute. Two motivating cases:
Workgroup-memory sizing tied to subgroup count. A common reduction / scan / sort pattern is:
partial's size has to be known at compile time. WithoutSubgroupSize::Fixed(32), you must size for the worst case (WG_SIZE / subgroup_min_size), wasting shared memory and hurting occupancy on devices where the actual size is larger. Cross-vendor this is painful — NVIDIA = 32, AMD pre-RDNA = 64, AMD RDNA = 32 or 64 (may vary across dispatches or within a dispatch), Intel = 8/16/32.Fixed(n)lets the kernel pin the assumption it was written against;Varyingkeeps the current "implementation chooses" behavior.Avoiding partial trailing subgroups. When
WG_SIZEisn't a multiple of the subgroup size, the last subgroup is partially populated. Subgroup ops on inactive lanes return implementation-defined values, so ballot / vote / shuffle patterns need explicit masking and tend to be subtly wrong.SubgroupSize::Full(Vulkan'sREQUIRE_FULL_SUBGROUPS_BIT) guarantees every invocation in the workgroup belongs to a fully-populated subgroup — required for compute/task/mesh, hence the validation rejecting it on vertex/fragment.This PR exposes both knobs as a single
SubgroupSizeenum onPipelineCompilationOptions:Varying(default) — implementation chooses, within[subgroup_min_size, subgroup_max_size]. Matches today's behavior.Full— require full subgroups. Compute / task / mesh stages only.Fixed(u32)— must be a power of two within[subgroup_min_size, subgroup_max_size].Gated behind a new
Features::SUBGROUP_SIZE_CONTROL, which is only advertised on Vulkan withVK_EXT_subgroup_size_control(promoted to 1.3). Other backends (Metal / D3D12 / GL / WebGPU) don't advertise the feature, so non-Varyingis rejected at pipeline creation rather than silently no-oping.Validation lives in
wgpu-coreand rejects: non-Varyingwithout the feature,Fixed(n)that isn't a power of two or is outside the adapter's range, andFullon vertex/fragment stages. The Vulkan adapter only advertisesSUBGROUP_SIZE_CONTROLwhen the device supports bothsubgroupSizeControlandcomputeFullSubgroups, so all variants are honorable once the feature is enabled (this combination is universal in practice — Vulkan 1.3 mandates both).On the Vulkan side,
VaryingsetsALLOW_VARYING_SUBGROUP_SIZE_BIT,FullsetsREQUIRE_FULL_SUBGROUPS_BIT, andFixed(n)chains aVkPipelineShaderStageRequiredSubgroupSizeCreateInfoviap_next(boxed and stored onCompiledStageso the address survives moves).Testing
A new validation test
subgroup_size.rsis added. Furthermore, I tested a real compute pipeline with the added API with my graphical card.Squash or Rebase?
Squash
Checklist
wgpumay be affected behaviorally.CHANGELOG.mdentries for the user-facing effects of this change are present.