[VectorDistribute] Rework LDS operand promotion by sommerlukas · Pull Request #24408 · iree-org/iree

sommerlukas · 2026-05-08T16:08:12Z

Rework how promotion of operands to LDS works in the VectorDistribute pipeline.

So far, linalg.copy operations were inserted early in the pipeline. Now, we skip the insertion of linalg.copy for operands (we keep the behavior for results). Instead, the analysis from #24227 propagates the promotion types upwards from the compute operations that actually configure the promotion of operands up to the operations accessing the data in memory (transfer_read, gather) when we reach GPUVectorAlloc. Based on the propagated information, the necessary promotion to LDS can be inserted.

Layout conflicts are still resolved through an LDS roundtrip at the conflict point.

Assisted-by: Codex

sommerlukas · 2026-05-08T16:09:39Z

The changes in this file are mostly a code move to GPUNestedLayoutUtils to make computation of a derived_thread_config layout available as a shared helper.

sommerlukas · 2026-05-08T16:10:23Z

 }

+FailureOr<NestedLayoutAttr>
+getDerivedThreadLayout(MLIRContext *context, ArrayRef<int64_t> workgroupSize,


This code was moved here from LLVMGPUConfigureTensorLayouts to make computation of a derived_thread_config layout available as a shared helper.

sommerlukas · 2026-05-08T16:14:54Z

Failure of the gfx1100 pipeline test is expected, this PR needs to be rebased on top of #24402 once that lands.

kuhar

Just a drive-by nit, I haven't had time to review the logic

keshavvinayak01

Needs some changes.

keshavvinayak01 · 2026-05-18T11:16:23Z

+    auto toLayout = IREE::VectorExt::ToLayoutOp::create(builder, op->getLoc(),
+                                                        vector, *readLayout);
+
+    FailureOr<Value> copied = allocateTensorAndWriteVector(
+        builder, op->getLoc(), toLayout.getResult(),
+        allocationLayout.getUndistributedShape());
+    if (failed(copied)) {
+      return failure();
+    }
+
+    auto synced =
+        IREE::GPU::ValueBarrierOp::create(builder, op->getLoc(), *copied);
+    Value newRead =
+        readVectorFromTensor(builder, readType, synced.getResult(0));


The expected IR is correct for the write-then-read roundtrip itself, but the new direct operand-promotion path does not insert the pre-write loop-iteration barrier that the existing shared_memory_conversion materialization path deliberately inserts. If promoted reads can appear in loop bodies and reuse the same workgroup allocation, this can race with reads from the previous iteration.

If we can prove these promoted read roundtrips never land in a loop, then I guess this is fine.

I've added the barrier back (thanks for catching that) and updated the tests.

keshavvinayak01 · 2026-05-18T11:18:44Z

+    funcOp.walk([](IREE::VectorExt::ToLayoutOp op) {
+      op.removeSharedMemoryConversionAttr();
+    });


A shared_memory_conversion = #iree_gpu.use_global_load_dma marker is propagated by the analysis, then erased, then ignored by the late materializer. The resulting IR may be valid, but it no longer performs the requested operand promotion.

Please do a lit-test to confirm this behaviour ?

Added a test. I also have the follow-up adding support for use_global_load_dma already lined up locally, if you prefer, I can make it part of this PR.

I think we should merge it here.

sommerlukas

Thanks for the feedback!

sommerlukas · 2026-05-18T13:05:36Z

+    funcOp.walk([](IREE::VectorExt::ToLayoutOp op) {
+      op.removeSharedMemoryConversionAttr();
+    });


Added a test. I also have the follow-up adding support for use_global_load_dma already lined up locally, if you prefer, I can make it part of this PR.

sommerlukas · 2026-05-18T13:06:16Z

+    auto toLayout = IREE::VectorExt::ToLayoutOp::create(builder, op->getLoc(),
+                                                        vector, *readLayout);
+
+    FailureOr<Value> copied = allocateTensorAndWriteVector(
+        builder, op->getLoc(), toLayout.getResult(),
+        allocationLayout.getUndistributedShape());
+    if (failed(copied)) {
+      return failure();
+    }
+
+    auto synced =
+        IREE::GPU::ValueBarrierOp::create(builder, op->getLoc(), *copied);
+    Value newRead =
+        readVectorFromTensor(builder, readType, synced.getResult(0));


I've added the barrier back (thanks for catching that) and updated the tests.

keshavvinayak01

Looks fine, please add the use_global_load_dma changes here.

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

sommerlukas · 2026-05-19T13:36:44Z

@keshavvinayak01 I've added the async DMA handling to the PR.

keshavvinayak01

Let's split the async_dma extension work into a follow up PR.

sommerlukas requested review from Groverkss, Max191, krzysz00, kuhar, nirvedhmeshram and qedawkins as code owners May 8, 2026 16:08

sommerlukas requested a review from keshavvinayak01 May 8, 2026 16:08

sommerlukas commented May 8, 2026

View reviewed changes

sommerlukas force-pushed the new-lds-promotion branch from 53bfe40 to 3753712 Compare May 11, 2026 08:23

kuhar reviewed May 11, 2026

View reviewed changes

Comment thread compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Outdated

sommerlukas requested a review from kuhar May 12, 2026 09:45

kuhar reviewed May 12, 2026

View reviewed changes

Comment thread compiler/src/iree/compiler/Codegen/Common/GPU/GPUVectorAlloc.cpp Outdated

sommerlukas force-pushed the new-lds-promotion branch from 3b0974d to 7d0aeb7 Compare May 13, 2026 07:53

sommerlukas requested a review from kuhar May 13, 2026 16:09

keshavvinayak01 requested changes May 18, 2026

View reviewed changes

sommerlukas mentioned this pull request May 18, 2026

[Codegen][GPU] Add XOR swizzle hints to GPUVectorAlloc for bank conflict elimination #23778

Draft

sommerlukas commented May 18, 2026

View reviewed changes

sommerlukas requested a review from keshavvinayak01 May 18, 2026 13:11

keshavvinayak01 approved these changes May 18, 2026

View reviewed changes

sommerlukas requested a review from MaheshRavishankar as a code owner May 19, 2026 11:38

sommerlukas added 8 commits May 19, 2026 11:56

Skip operand promotion in VectorDistribute

daf1868

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Materialize promotion based on analysis in GPUVectorAlloc

7495143

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Fix gpu_vector_alloc test

ed66a55

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Documentation and test improvements

01d1faa

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Address PR feedback

a3af5d9

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Address nit

9a0c200

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

Address PR feedback

695beba

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

DMA promotion in GPUVectorAlloc

06c7ae7

Signed-off-by: Lukas Sommer <lukas.sommer@amd.com>

sommerlukas force-pushed the new-lds-promotion branch from 1c43e95 to 06c7ae7 Compare May 19, 2026 12:57

sommerlukas requested a review from keshavvinayak01 May 19, 2026 13:36

keshavvinayak01 reviewed May 19, 2026

View reviewed changes

keshavvinayak01 approved these changes May 19, 2026

View reviewed changes

Conversation

sommerlukas commented May 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sommerlukas commented May 8, 2026

Uh oh!

Uh oh!

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

keshavvinayak01 left a comment

Choose a reason for hiding this comment

Uh oh!

keshavvinayak01 May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sommerlukas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keshavvinayak01 left a comment

Choose a reason for hiding this comment

Uh oh!

sommerlukas commented May 19, 2026

Uh oh!

keshavvinayak01 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keshavvinayak01 May 18, 2026 •

edited

Loading