Skip to content

Commit ddd1bfc

Browse files
committed
Add in Sascha's suggested changes.
1 parent 1c9410e commit ddd1bfc

5 files changed

Lines changed: 28 additions & 20 deletions

File tree

antora/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@
129129
*** xref:Building_a_Simple_Engine/Mobile_Development/05_vulkan_extensions.adoc[Vulkan extensions]
130130
*** xref:Building_a_Simple_Engine/Mobile_Development/06_conclusion.adoc[Conclusion]
131131
** Extra Courses
132+
*** xref:Building_a_Simple_Engine/Courses/00_courses_overview.adoc[Building with Course Modules]
132133
*** Opacity Micromaps
133134
**** xref:Building_a_Simple_Engine/Courses/Opacity_Micromaps/00_introduction.adoc[Introduction]
134135
**** xref:Building_a_Simple_Engine/Courses/Opacity_Micromaps/01_the_shadow_problem.adoc[The shadow problem]

en/Building_a_Simple_Engine/Courses/Opacity_Micromaps/00_introduction.adoc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@ This is not a hypothetical problem. Alpha-tested foliage is one of the most comm
1717

1818
The result is that the expensive any-hit shader fires dramatically less often — only for that thin ring of edge pixels where certainty is genuinely ambiguous. For the vast majority of a leaf's surface, the answer is known in advance. The hardware acts on that knowledge in a single cycle.
1919

20-
A brief note on lineage is worth making before we go further. `VK_KHR_opacity_micromap` is the Khronos-ratified evolution of the original `VK_EXT_opacity_micromap` extension. The key architectural change is that micromaps are no longer a separate `VkMicromapEXT` object — they fold directly into `VkAccelerationStructureKHR`, using a dedicated type at creation time. Host-build commands are removed in favour of a pure device-side API: micromap construction is driven exclusively through `vkCmdBuildAccelerationStructuresKHR` on a command buffer. Ray query shaders now require an explicit `OpacityMicromapKHR` execution mode declaration to benefit from the optimisation; without it the hardware ignores the micromap entirely. These changes unify the API and reflect what hardware actually supports. If you have read older documentation or sample code that refers to `vkBuildMicromapsEXT` or `VkMicromapEXT`, be aware that those concepts have been superseded; this course covers the KHR API exclusively.
21-
22-
This course will give you a complete conceptual understanding of why this problem exists, how micromaps solve it, and what the implementation looks like in the simple engine's source code. We will begin with the visual language of shadows themselves — why they look the way they do, and why foliage breaks the assumptions that fast shadow algorithms rely on. We will then descend into the GPU's ray traversal hardware to understand exactly where the performance cost originates. From there we will build up the micromap concept from scratch, walking through the subdivision model, the three-state classification, and the way micromap data attaches to acceleration structures. Finally, we will tour the engine's `OpacityMicromapBuilder` implementation and discuss when this optimization earns its keep — and when it doesn't.
20+
This course will give you a complete conceptual understanding of why this problem exists, how micromaps solve it, and what the implementation looks like in the simple engine's source code. We will begin with the visual language of shadows themselves — why they look the way they do, and why foliage breaks the assumptions that fast shadow algorithms rely on. We will then descend into the GPU's ray traversal hardware to understand exactly where the performance cost originates. From there we will build up the micromap concept from scratch, walking through the subdivision model, the 2-state and 4-state classification formats, and the way micromap data attaches to acceleration structures. Finally, we will tour the engine's `OpacityMicromapBuilder` implementation and discuss when this optimization earns its keep — and when it doesn't.
2321

2422
No prior knowledge of GPU hardware internals is required, but you should be comfortable with the basics of ray tracing: what a BVH is, how shadow rays work, and what an acceleration structure does. If those concepts feel solid, you are ready to begin.
2523

en/Building_a_Simple_Engine/Courses/Opacity_Micromaps/03_what_are_micromaps.adoc

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,22 @@ Everything we have discussed so far converges on a single insight: the GPU is do
66

77
This is the complete description of an **Opacity Micromap**: a data structure, attached to a triangle mesh in the GPU's acceleration structure, that subdivides each triangle into a regular grid of smaller triangles and assigns each one a pre-computed opacity state. The word "micro" is used precisely: these are not additional geometric triangles that the ray tracer has to traverse. They are sub-triangle classifications that the traversal hardware reads as metadata, using them to accelerate the decision of whether a given ray-triangle intersection represents an actual hit.
88

9-
== Three States, Not Two
9+
== Two Formats: 2-State and 4-State
1010

11-
You might expect the micromap to store a simple binary value — opaque or transparent. But Opacity Micromaps use a three-state system, and the third state is the key to making the scheme both correct and practical.
11+
`VK_KHR_opacity_micromap` defines two opacity micromap formats. Which format you choose depends on how much precision your alpha content requires at boundaries.
1212

13-
The first state is **Opaque**. A micro-triangle marked Opaque always blocks rays. When traversal finds that a shadow ray intersects a triangle in a region marked Opaque, it commits the hit immediately, without running any shader. This is identical behavior to fully opaque geometry — fixed-function, zero shader cost.
13+
The **2-state format** (`VK_OPACITY_MICROMAP_FORMAT_2_STATE_EXT`) stores one bit per micro-triangle. Each micro-triangle is either Transparent (0) or Opaque (1). This binary classification is the simplest possible model: no shader is ever invoked — hardware immediately commits or discards every hit based on the stored bit. The 2-state format is ideal when you have full control over alpha content and can guarantee that no micro-triangle genuinely straddles an alpha boundary, or when you are willing to accept the small visual error that comes from forced binary classification.
1414

15-
The second state is **Transparent**. A micro-triangle marked Transparent never blocks rays. When traversal finds that the intersection falls in a Transparent region, it ignores the hit and continues traversal without running any shader. This too is zero shader cost.
15+
The **4-state format** (`VK_OPACITY_MICROMAP_FORMAT_4_STATE_EXT`) stores two bits per micro-triangle and supports four distinct states:
1616

17-
The third state is **Unknown**. A micro-triangle marked Unknown triggers normal any-hit shader behavior — the traversal falls back to the same shader-driven alpha test that the non-micromap path uses. The any-hit shader fires, samples the texture, makes the decision.
17+
* **Transparent (0)**: the hit is discarded immediately without running any shader. Zero cost.
18+
* **Unknown-Transparent (1)**: the any-hit shader is invoked; if the shader does not explicitly make a decision, traversal defaults to treating the hit as transparent.
19+
* **Unknown-Opaque (2)**: the any-hit shader is invoked; if the shader does not explicitly make a decision, traversal defaults to treating the hit as opaque.
20+
* **Opaque (3)**: the hit is committed immediately without running any shader. Zero cost.
1821

19-
The power of this system lies in what the Unknown state makes possible. Instead of trying to perfectly classify every pixel of every triangle in advance — which would require infinite subdivision — we can be conservative. Where the alpha texture is clearly solid across the entire micro-triangle, we say Opaque. Where it is clearly empty, we say Transparent. Only in the boundary regions, where a single micro-triangle straddles the alpha cutoff edge, do we admit uncertainty and fall back to the shader.
22+
The two Unknown states are the key to making the 4-state scheme both correct and practical. Instead of trying to perfectly classify every pixel of every triangle in advance — which would require infinite subdivision — we can be conservative. Where the alpha texture is clearly solid across the entire micro-triangle, we encode Opaque (3). Where it is clearly empty, we encode Transparent (0). Only in the boundary regions, where a single micro-triangle straddles the alpha cutoff edge, do we admit uncertainty and fall back to the any-hit shader using one of the two Unknown states. Unknown-Transparent is the typical choice for foliage: the hint biases the driver toward transparency, which matches the statistical expectation at an alpha boundary.
2023

21-
For a well-designed leaf texture, the opaque and transparent regions dominate. The boundary between them is a thin perimeter. If the micro-triangle grid is fine enough, that perimeter region corresponds to perhaps five or ten percent of all micro-triangles. The other ninety to ninety-five percent are classified definitively, and the any-hit shader never fires for them. The cost reduction is proportional.
24+
For a well-designed leaf texture, the Opaque and Transparent regions dominate. The boundary between them is a thin perimeter. If the micro-triangle grid is fine enough, that perimeter region corresponds to perhaps five or ten percent of all micro-triangles. The other ninety to ninety-five percent are classified definitively, and the any-hit shader never fires for them. The cost reduction is proportional.
2225

2326
== Visualizing the Subdivision
2427

@@ -42,15 +45,21 @@ The right choice depends on the texture content and the performance targets of t
4245

4346
== Building the Micromap: A One-Time Investment
4447

45-
The micromap is built during scene loading, or as an offline pre-process before the application ships. The procedure is conceptually simple. For each triangle in an alpha-tested mesh, and for each micro-triangle at the chosen subdivision level, the builder samples the alpha texture at several points within the micro-triangle — typically at the centroid or at multiple jittered positions. It averages those samples and compares against a threshold.
48+
The micromap build process has two conceptually distinct phases that can be separated in time:
4649

47-
If the average alpha is above the upper threshold (close to 1.0), the micro-triangle is classified Opaque. If it is below the lower threshold (close to 0.0), it is classified Transparent. If it falls between the thresholds, it is classified Unknown. This is a one-time operation per mesh, per texture, per subdivision level. It runs in a GPU compute shader and its cost is paid at load time, not at runtime.
50+
**Phase 1 — Classification (CPU, one-time per asset):** For each triangle in an alpha-tested mesh, and for each micro-triangle at the chosen subdivision level, the builder samples the alpha texture at several points within the micro-triangle — typically at the centroid or at multiple jittered positions. It averages those samples and compares against a threshold. If the average alpha is above the upper threshold (close to 1.0), the micro-triangle is classified Opaque. If it is below the lower threshold (close to 0.0), it is classified Transparent. If it falls between the thresholds, it is classified Unknown (specifically Unknown-Transparent or Unknown-Opaque, depending on application policy). The output is a compact 2-bit-per-micro-triangle array — a few hundred kilobytes for a typical mesh.
4851

49-
The output of this process is a compact array of 2-bit values: one for each micro-triangle. Opaque is encoded as 3, Transparent as 0, Unknown as 1 (and there is also an Unknown-Opaque variant that interacts with certain pipeline flags — but for our purposes, three conceptual states are all we need). For a mesh with 10,000 leaf triangles at subdivision level 3, the micromap data is 10,000 times 64 micro-triangles times 2 bits, which is about 160 kilobytes. This is negligible compared to the mesh data itself.
52+
**Phase 2 — GPU Build (device-only, runs at load time):** The classified 2-bit data is uploaded to GPU memory. A `vkCmdBuildAccelerationStructuresKHR` command is recorded and submitted to the device queue. The driver performs internal layout transformation and compaction, producing the micromap `VkAccelerationStructureKHR` in the hardware's native traversal format. `VK_KHR_opacity_micromap` provides no host-side build path — this GPU submission step is always required.
53+
54+
The two phases can be combined at load time (**online mode**) or separated (**offline mode**). In online mode, both classification and GPU build run when the scene is first loaded — useful during development but adds startup cost proportional to mesh and texture complexity. In offline mode, Phase 1 runs as a pre-process during content authoring; the resulting 2-bit arrays are saved to disk and shipped with the application. At runtime only Phase 2 executes, keeping startup cost minimal. The GPU build cannot itself be pre-baked: the driver needs to run it on the target hardware to produce its native internal representation.
55+
56+
This is a one-time operation per mesh, per texture, per subdivision level. For a shipping game or visualization application, the payoff ratio is enormous — the classification cost is paid once during authoring, and the benefit accumulates over every frame the application renders.
57+
58+
The output of this process is a compact array of 2-bit values (for the 4-state format): one per micro-triangle. Opaque is encoded as 3, Transparent as 0, Unknown-Transparent as 1, and Unknown-Opaque as 2. For most foliage, Unknown-Transparent (1) is used for boundary micro-triangles: it lets the any-hit shader make the final call while hinting that the outcome is more likely transparent. For a mesh with 10,000 leaf triangles at subdivision level 3, the micromap data is 10,000 times 64 micro-triangles times 2 bits, which is about 160 kilobytes. Using the 2-state format halves this to approximately 80 kilobytes, at the cost of eliminating the any-hit fallback entirely. This memory footprint is negligible compared to the mesh data itself.
5059

5160
== Where the Data Lives
5261

53-
The classified micro-triangle data is uploaded to the GPU and used to construct a **`VkAccelerationStructureKHR`** whose type is set to `VK_ACCELERATION_STRUCTURE_TYPE_OPACITY_MICROMAP_KHR`. This is a deliberate unification in the KHR API: rather than introducing a separate object type for micromap data, `VK_KHR_opacity_micromap` folds the micromap directly into the existing acceleration structure abstraction. The same `VkAccelerationStructureKHR` handle you use for BLASes and TLASes is also the handle for a micromap — the `type` field at creation time is what distinguishes them. The micromap acceleration structure is allocated via `vkCreateAccelerationStructure2KHR` (from `VK_KHR_device_address_commands`), passing `VK_ACCELERATION_STRUCTURE_TYPE_OPACITY_MICROMAP_KHR` as the type. Building the micromap uses `vkCmdBuildAccelerationStructuresKHR` with `geometryType` set to `VK_GEOMETRY_TYPE_MICROMAP_KHR` — the same device-side command used for all acceleration structure builds. There is no separate host-side build path for micromaps in the KHR API; all micromap construction is GPU-driven.
62+
The classified micro-triangle data is uploaded to the GPU and used to construct a **`VkAccelerationStructureKHR`** whose type is set to `VK_ACCELERATION_STRUCTURE_TYPE_OPACITY_MICROMAP_KHR`. This is a deliberate unification in the KHR API: rather than introducing a separate object type for micromap data, `VK_KHR_opacity_micromap` folds the micromap directly into the existing acceleration structure abstraction. The same `VkAccelerationStructureKHR` handle you use for BLASes and TLASes is also the handle for a micromap — the `type` field at creation time is what distinguishes them. The micromap acceleration structure is allocated via `vkCreateAccelerationStructureKHR` (from `VK_KHR_acceleration_structure`), passing `VK_ACCELERATION_STRUCTURE_TYPE_OPACITY_MICROMAP_KHR` as the type. Building the micromap uses `vkCmdBuildAccelerationStructuresKHR` with `geometryType` set to `VK_GEOMETRY_TYPE_MICROMAP_KHR` — the same device-side command used for all acceleration structure builds. There is no separate host-side build path for micromaps in the KHR API; all micromap construction is GPU-driven.
5463

5564
Once the micromap acceleration structure is built, it is attached to the corresponding BLAS during the BLAS build or rebuild. The attachment is specified through the `VkAccelerationStructureTrianglesOpacityMicromapKHR` structure, which is chained into the geometry description passed to the acceleration structure build. The `micromap` field in this structure is the `VkAccelerationStructureKHR` handle of the micromap you just built. The `indexBuffer` field, which maps original triangles to their micromap entries, is a plain `VkDeviceAddress` — there is no host address variant in the KHR API, reflecting the device-only design philosophy. At that point, the micromap data becomes part of the acceleration structure itself — stored in GPU memory in the hardware's native format for traversal.
5665

en/Building_a_Simple_Engine/Courses/Opacity_Micromaps/04_hardware_traversal_with_omm.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ In `VK_KHR_opacity_micromap`, the hardware micromap fast-path for ray queries is
5353
layout(constant_id = N) gl_EnableOpacityMicromapExt;
5454
----
5555

56-
where `N` is a specialisation constant ID. This is a one-line addition to the shader — not a logical change — but without it, the hardware silently ignores all micromap data during ray query traversal and falls back to the full any-hit shader path on every intersection, as if no micromaps were attached at all. This was a deliberate design change from `VK_EXT_opacity_micromap`, where the optimisation was implicit for ray queries. The explicit declaration makes the hardware intent unambiguous and allows compilers to reason about it correctly.
56+
where `N` is a specialisation constant ID. This declaration is required by the `SPV_KHR_opacity_micromap` SPIR-V extension: if a ray query shader traverses acceleration structures that contain opacity micromaps but does not declare `OpacityMicromapKHR` with a value of `true`, the SPIR-V specification mandates **undefined behavior** — not a graceful fallback to the any-hit path. Do not rely on any particular outcome from omitting the declaration. The declaration makes the hardware intent unambiguous and allows drivers and compilers to reason about traversal behavior correctly.
5757

5858
== The Cascade Effect on Warp Efficiency
5959

@@ -77,7 +77,7 @@ image::images/omm_shadow_ray_lifecycle.svg[Flowchart showing the lifecycle of a
7777

7878
The complete lifecycle of a shadow ray in an OMM-enabled scene can be summarized as a decision tree. The ray enters BVH traversal. For each candidate triangle intersection, the hardware checks for an attached micromap. If no micromap is present (non-alpha-tested geometry), it uses the standard opaque-geometry path: commit the hit immediately. If a micromap is present, it looks up the micro-triangle state. Opaque: commit, no shader. Transparent: discard, no shader. Unknown: invoke the any-hit shader and let the shader make the final call.
7979

80-
This decision tree lives entirely in fixed-function hardware for the Opaque and Transparent branches. Only the Unknown branch enters programmable execution. The result is that the GPU's shader execution units are freed from the overwhelming majority of alpha-testing work and can focus on the cases that genuinely require a shader. In `VK_KHR_opacity_micromap`, reaching this optimized path also depends on the shader having declared the `OpacityMicromapKHR` execution mode — the hardware and the shader must both be in agreement for the optimization to take effect.
80+
This decision tree lives entirely in fixed-function hardware for the Opaque and Transparent branches. Only the Unknown branch enters programmable execution. The result is that the GPU's shader execution units are freed from the overwhelming majority of alpha-testing work and can focus on the cases that genuinely require a shader. In `VK_KHR_opacity_micromap`, reaching this optimized path also requires that the shader has declared the `OpacityMicromapKHR` execution mode. Omitting that declaration when the acceleration structures contain opacity micromaps results in undefined behavior per the SPIR-V specification — the hardware and the shader must both be in agreement for correct results.
8181

8282
'''
8383

0 commit comments

Comments
 (0)