Skip to content

[VPlan] Insert VPBlendRecipes in post order. NFC#201782

Merged
lukel97 merged 6 commits into
llvm:mainfrom
lukel97:loop-vectorize/insert-blends-post-order
Jun 10, 2026
Merged

[VPlan] Insert VPBlendRecipes in post order. NFC#201782
lukel97 merged 6 commits into
llvm:mainfrom
lukel97:loop-vectorize/insert-blends-post-order

Conversation

@lukel97

@lukel97 lukel97 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

#201783 wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.

@llvmorg-github-actions

Copy link
Copy Markdown

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

An upcoming PR wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.


Full diff: https://github.com/llvm/llvm-project/pull/201782.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+12-3)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
index 2717b80e2eeaa..b01fef556f8ed 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@@ -69,6 +69,8 @@ class VPPredicator {
     return EdgeMaskCache[{Src, Dst}] = Mask;
   }
 
+  DenseMap<const VPBasicBlock *, VPBasicBlock::iterator> InsertPoints;
+
 public:
   VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan) {}
 
@@ -136,6 +138,10 @@ void VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
   // Start inserting after the block's phis, which be replaced by blends later.
   Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
 
+  // Keep track of where in VPBB we are inserting the masks into.
+  scope_exit UpdateInsertPoint(
+      [this, &VPBB]() { InsertPoints[VPBB] = Builder.getInsertPoint(); });
+
   // Reuse the mask of the immediate dominator if the VPBB post-dominates the
   // immediate dominator.
   auto *IDom = VPDT.getNode(VPBB)->getIDom();
@@ -225,6 +231,7 @@ void VPPredicator::createSwitchEdgeMasks(const VPInstruction *SI) {
 }
 
 void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
+  Builder.setInsertPoint(VPBB, InsertPoints[VPBB]);
   SmallVector<VPPhi *> Phis;
   for (VPRecipeBase &R : VPBB->phis())
     Phis.push_back(cast<VPPhi>(&R));
@@ -276,10 +283,8 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     // Introduce the mask for VPBB, which may introduce needed edge masks, and
     // convert all phi recipes of VPBB to blend recipes unless VPBB is the
     // header.
-    if (VPBB != Header) {
+    if (VPBB != Header)
       Predicator.createBlockInMask(VPBB);
-      Predicator.convertPhisToBlends(VPBB);
-    }
 
     VPValue *BlockMask = Predicator.getBlockInMask(VPBB);
     if (!BlockMask)
@@ -292,6 +297,10 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     }
   }
 
+  for (VPBlockBase *VPB : reverse(RPOT))
+    if (VPB != Header)
+      Predicator.convertPhisToBlends(cast<VPBasicBlock>(VPB));
+
   // Linearize the blocks of the loop into one serial chain.
   VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {

@llvmorg-github-actions

Copy link
Copy Markdown

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

An upcoming PR wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.


Full diff: https://github.com/llvm/llvm-project/pull/201782.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+12-3)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
index 2717b80e2eeaa..b01fef556f8ed 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@@ -69,6 +69,8 @@ class VPPredicator {
     return EdgeMaskCache[{Src, Dst}] = Mask;
   }
 
+  DenseMap<const VPBasicBlock *, VPBasicBlock::iterator> InsertPoints;
+
 public:
   VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan) {}
 
@@ -136,6 +138,10 @@ void VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
   // Start inserting after the block's phis, which be replaced by blends later.
   Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
 
+  // Keep track of where in VPBB we are inserting the masks into.
+  scope_exit UpdateInsertPoint(
+      [this, &VPBB]() { InsertPoints[VPBB] = Builder.getInsertPoint(); });
+
   // Reuse the mask of the immediate dominator if the VPBB post-dominates the
   // immediate dominator.
   auto *IDom = VPDT.getNode(VPBB)->getIDom();
@@ -225,6 +231,7 @@ void VPPredicator::createSwitchEdgeMasks(const VPInstruction *SI) {
 }
 
 void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
+  Builder.setInsertPoint(VPBB, InsertPoints[VPBB]);
   SmallVector<VPPhi *> Phis;
   for (VPRecipeBase &R : VPBB->phis())
     Phis.push_back(cast<VPPhi>(&R));
@@ -276,10 +283,8 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     // Introduce the mask for VPBB, which may introduce needed edge masks, and
     // convert all phi recipes of VPBB to blend recipes unless VPBB is the
     // header.
-    if (VPBB != Header) {
+    if (VPBB != Header)
       Predicator.createBlockInMask(VPBB);
-      Predicator.convertPhisToBlends(VPBB);
-    }
 
     VPValue *BlockMask = Predicator.getBlockInMask(VPBB);
     if (!BlockMask)
@@ -292,6 +297,10 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     }
   }
 
+  for (VPBlockBase *VPB : reverse(RPOT))
+    if (VPB != Header)
+      Predicator.convertPhisToBlends(cast<VPBasicBlock>(VPB));
+
   // Linearize the blocks of the loop into one serial chain.
   VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 175333 tests passed
  • 3453 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

Flang

Flang.Driver/omp-driver-offload.f90
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 10
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -### -fopenmp /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 2>&1 | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck --check-prefixes=CHECK-OPENMP /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang '-###' -fopenmp /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck --check-prefixes=CHECK-OPENMP /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90
# note: command had no output on stdout or stderr
# RUN: at line 15
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S -### /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp 2>&1  -fopenmp --offload-arch=gfx90a --offload-arch=sm_70  --target=aarch64-unknown-linux-gnu -nogpulib   | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-HOST-AND-DEVICE
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S '-###' /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --target=aarch64-unknown-linux-gnu -nogpulib
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-HOST-AND-DEVICE
# note: command had no output on stdout or stderr
# RUN: at line 20
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S -### /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp 2>&1  -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-device  --target=aarch64-unknown-linux-gnu -nogpulib   | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-HOST-AND-DEVICE
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S '-###' /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-device --target=aarch64-unknown-linux-gnu -nogpulib
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-HOST-AND-DEVICE
# note: command had no output on stdout or stderr
# RUN: at line 30
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S -### /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp 2>&1  -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only  --target=aarch64-unknown-linux-gnu -nogpulib   | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-HOST
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S '-###' /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-host-only --target=aarch64-unknown-linux-gnu -nogpulib
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-HOST
# note: command had no output on stdout or stderr
# RUN: at line 40
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S -### /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 2>&1  -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only  --target=aarch64-unknown-linux-gnu -nogpulib   | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-DEVICE
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S '-###' /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only --target=aarch64-unknown-linux-gnu -nogpulib
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OFFLOAD-DEVICE
# note: command had no output on stdout or stderr
# RUN: at line 51
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 2>&1 | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang '-###' -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90
# note: command had no output on stdout or stderr
# RUN: at line 55
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S -### /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp 2>&1  -fopenmp --offload-arch=gfx90a  --target=aarch64-unknown-linux-gnu -nogpulib   | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OPENMP-OFFLOAD-ARGS
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang -S '-###' /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 -o /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp -fopenmp --offload-arch=gfx90a --target=aarch64-unknown-linux-gnu -nogpulib
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90 --check-prefix=OPENMP-OFFLOAD-ARGS
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90:64:24: error: OPENMP-OFFLOAD-ARGS: expected string not found in input
# | ! OPENMP-OFFLOAD-ARGS: "{{[^"]*}}llvm-offload-binary{{.*}}" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"
# |                        ^
# | <stdin>:7:649: note: scanning from here
# |  "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm" "-flto=full" "-mrelocation-model" "pic" "-pic-level" "2" "-target-cpu" "gfx90a" "-fopenmp" "-resource-dir" "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/23" "-foffload-device" "-fopenmp-host-ir-file-path" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-514264.bc" "-fopenmp-is-target-device" "-nogpulib" "-mframe-pointer=all" "-o" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-gfx90a-1b8941.s" "-x" "f95" "/home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90"
# |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ^
# | <stdin>:8:185: note: possible intended match here
# |  "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llvm-offload-binary" "-o" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-f732a9.out" "--image=file=/tmp/lit-tmp-plhlbl3u/omp-driver-offload-gfx90a-1b8941.s,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"
# |                                                                                                                                                                                         ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: flang version 23.0.0git (https://github.com/llvm/llvm-project 839d1c014b9fb7d0332dd9cfa32d5e5eea042354) 
# |             2: Target: aarch64-unknown-linux-gnu 
# |             3: Thread model: posix 
# |             4: InstalledDir: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin 
# |             5: Build config: +assertions 
# |             6:  "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" "-emit-llvm-bc" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-target-cpu" "generic" "-target-feature" "+v8a" "-target-feature" "+fp-armv8" "-target-feature" "+neon" "-fopenmp" "-resource-dir" "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/23" "--offload-targets=amdgcn-amd-amdhsa" "-mframe-pointer=non-leaf-no-reserve" "-o" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-514264.bc" "-x" "f95" "/home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90" 
# |             7:  "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang" "-fc1" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm" "-flto=full" "-mrelocation-model" "pic" "-pic-level" "2" "-target-cpu" "gfx90a" "-fopenmp" "-resource-dir" "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/23" "-foffload-device" "-fopenmp-host-ir-file-path" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-514264.bc" "-fopenmp-is-target-device" "-nogpulib" "-mframe-pointer=all" "-o" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-gfx90a-1b8941.s" "-x" "f95" "/home/gha/actions-runner/_work/llvm-project/llvm-project/flang/test/Driver/omp-driver-offload.f90" 
# | check:64'0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             X error: no match found
# |             8:  "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/llvm-offload-binary" "-o" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-f732a9.out" "--image=file=/tmp/lit-tmp-plhlbl3u/omp-driver-offload-gfx90a-1b8941.s,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp" 
# | check:64'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | check:64'1                                                                                                                                                                                             ?                                                                                      possible intended match
# |             9:  "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/flang" "-fc1" "-triple" "aarch64-unknown-linux-gnu" "-S" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-target-cpu" "generic" "-target-feature" "+v8a" "-target-feature" "+fp-armv8" "-target-feature" "+neon" "-fopenmp" "-resource-dir" "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/lib/clang/23" "-fembed-offload-object=/tmp/lit-tmp-plhlbl3u/omp-driver-offload-f732a9.out" "--offload-targets=amdgcn-amd-amdhsa" "-mframe-pointer=non-leaf-no-reserve" "-o" "/home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/flang/test/Driver/Output/omp-driver-offload.f90.tmp" "-x" "ir" "/tmp/lit-tmp-plhlbl3u/omp-driver-offload-514264.bc" 
# | check:64'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated

@david-arm david-arm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least on the VPlan-level, there may be some changes with the patch.

Something like below in the predicator.ll VPlan test should have the generated blends changed. For this one, everything gets folded and there's no IR change, not sure if there's any way to construct a test case with end-to-end changes, but may be god to add the test.

define void @blend_chain_non_trivial(ptr noalias %a, ptr noalias %b) {
entry:
  br label %loop.header

loop.header:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
  %lb = load i64, ptr %b
  %v1 = add i64 %iv, %lb
  %v2 = mul i64 %iv, 3
  %gep = getelementptr i64, ptr %a, i64 %iv
  %c0 = icmp sle i64 %iv, 0
  br i1 %c0, label %if.a, label %merge.a

if.a:
  %ca = icmp sle i64 %iv, 8
  br i1 %ca, label %if.a.inner, label %merge.a.inner

if.a.inner:
  br label %merge.a.inner

merge.a.inner:
  %blend.a.inner = phi i64 [ %v1, %if.a ], [ %v1, %if.a.inner ]
  br label %merge.a

merge.a:
  %blend.a = phi i64 [ %v1, %loop.header ], [ %blend.a.inner, %merge.a.inner ]
  %d0 = icmp sgt i64 %iv, 0
  br i1 %d0, label %if.b, label %merge.b

if.b:
  %cb = icmp sle i64 %iv, 16
  br i1 %cb, label %if.b.inner, label %merge.b.inner

if.b.inner:
  br label %merge.b.inner

merge.b.inner:
  %blend.b.inner = phi i64 [ %v2, %if.b ], [ %v2, %if.b.inner ]
  br label %merge.b

merge.b:
  %blend.b = phi i64 [ %v2, %merge.a ], [ %blend.b.inner, %merge.b.inner ]
  %sum = add i64 %blend.a, %blend.b
  store i64 %sum, ptr %gep
  br label %loop.latch

loop.latch:
  %iv.next = add nuw nsw i64 %iv, 1
  %ec = icmp eq i64 %iv.next, 128
  br i1 %ec, label %exit, label %loop.header

exit:
  ret void
}

scope_exit UpdateInsertPoint([this, &VPBB]() {
assert(!InsertPoints.contains(VPBB) && "InsertPoint clobbered?");
InsertPoints[VPBB] = Builder.getInsertPoint();
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the scope_exit? I think we don't reset the inert point, and only insert before, so it should not change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't reset the insert point but it changes when new edge masks or block in-masks are created. Although I think we can just recalculate the insert point based off of the block-in mask. I've removed the InsertPoints map in 73427ee

@lukel97 lukel97 force-pushed the loop-vectorize/insert-blends-post-order branch from e6431bc to 73427ee Compare June 8, 2026 07:33
@lukel97

lukel97 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I think at least on the VPlan-level, there may be some changes with the patch.

For the cases where all the incoming values are equal, yeah there should be no end-to-end change because simplifyBlends will also fold those blends away to the underlying value.

#201783 will restore the VPlan-level changes at it will be able to see when all incoming values are equal through nested phis.

Added the test in f07305e

@eas

eas commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I guess this is fine for now, so I definitely wouldn't want to block it, but I have a feeling RPOT would be better if we ever try to preserve some "uniform" branches as that would require placing SSA-phis sometimes.

@lukel97

lukel97 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

I guess this is fine for now, so I definitely wouldn't want to block it, but I have a feeling RPOT would be better if we ever try to preserve some "uniform" branches as that would require placing SSA-phis sometimes.

Do you mean as in for partial control flow linearisation? I don't think we need to insert new phis, only adjust existing ones by replacing the incoming values with blends where control flow was linearized. Is there any reason why post order wouldn't work for that?

@Mel-Chen Mel-Chen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
@eas

eas commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

I guess this is fine for now, so I definitely wouldn't want to block it, but I have a feeling RPOT would be better if we ever try to preserve some "uniform" branches as that would require placing SSA-phis sometimes.

Do you mean as in for partial control flow linearisation? I don't think we need to insert new phis, only adjust existing ones by replacing the incoming values with blends where control flow was linearized. Is there any reason why post order wouldn't work for that?

        BB0
       Uni = ...; (so that branching wouldn't be UB even if BB1.Div == all-true)
         |
      BB1 (Div)
        /  \    
     BB2 BB3 (Uni)
       |   |       \
       | Expensive1 Expensive 2
       \     |      /
       MergeBB (phi)

Into

  BB0
  BB1
  BB3
  / \
 E1 E2 (both are sill masked with !BB1.Div)
   \ /
   NewBB (new SSA phi)
   |
  BB2
   |
  MergeBB (blend)

One could argue that we can go BB1->BB2->BB3, but it's dictated by the RPOT and you can simply mirror image the CFG to still have the linearization order where BB2 is processed last.

@lukel97

lukel97 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

One could argue that we can go BB1->BB2->BB3, but it's dictated by the RPOT and you can simply mirror image the CFG to still have the linearization order where BB2 is processed last.

Can we not do this processing in post order? I.e. after linearization (no new blocks are created AFAIK):

  BB0
  BB1
  BB3
  / \
 E1 E2 (both are sill masked with !BB1.Div)
   \ /
   |
  BB2
   |
  MergeBB (phi)

We would visit MergeBB and see the original phi. E1 + E2 don't dominate MergeBB so we would decide to insert a phi in BB2, and then replace the original phi with a blend. The incoming values are the new phi and BB2's incoming value, blended.

@lukel97 lukel97 force-pushed the loop-vectorize/insert-blends-post-order branch from 73427ee to e40766f Compare June 9, 2026 17:51

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@lukel97 lukel97 enabled auto-merge (squash) June 10, 2026 08:03
@lukel97 lukel97 merged commit 1a09ed1 into llvm:main Jun 10, 2026
9 of 10 checks passed
Jianhui-Li pushed a commit to Jianhui-Li/llvm-project that referenced this pull request Jun 11, 2026
llvm#201783 wants to optimize blend masks by peeking through the contents of
other phi nodes. Currently we eagerly convert phis to blends in reverse
post order, so switch it to post order so that phis at the bottom can
see the phis in their uses.
carlobertolli pushed a commit to carlobertolli/llvm-project that referenced this pull request Jun 11, 2026
llvm#201783 wants to optimize blend masks by peeking through the contents of
other phi nodes. Currently we eagerly convert phis to blends in reverse
post order, so switch it to post order so that phis at the bottom can
see the phis in their uses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants