[VPlan] Compute blend masks from minimum set of edge masks by lukel97 · Pull Request #201783 · llvm/llvm-project

lukel97 · 2026-06-05T08:38:21Z

#201784 aims to preserve SSA in early exit loops, and in doing so insert phi nodes. More phi nodes results in more VPBlendRecipes, so this PR optimizes the edge masks generated for those blend recipes to prevent regressions.

The idea is to compute a minimal set of edges that lead to each unique incoming value in a phi. It does this by iterating up the edges in the post dominance frontier till the outgoing edges no longer lead to the same value.

This is a simpler, less general version of #184838 since this can't optimize away edges that aren't postdominated by the phi. This is fine the early exit use case though, since we only need to optimize phi nodes inserted in the latch.

The big advantage over #184838 is that it doesn't require several depth-first searches to compute the set of reachable nodes, and can be done entirely by iterating the post-dominator frontier.

llvmorg-github-actions · 2026-06-05T08:39:00Z

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Stacked on #201782

Another PR aims to model the control flow of early exits explicitly, and in doing so insert phi nodes to preserve SSA. Inserting phi nodes results in more VPBlendRecipes, so this PR optimizes the edge masks generated for those blend recipes to prevent regressions.

The shape of the CFG and the phis that would be emitted are precomitted in the predicator-early-exit.ll test.

The idea is to compute a minimal set of edges that lead to each unique incoming value in a phi. It does this by iterating up the edges in the post dominance frontier till the outgoing edges no longer lead to the same value. It also recursively looks through the incoming edges of any values that are phi nodes themselves.

This is a simpler, less general version of #184838 since this requires the phi node to postdominate its incoming values. This is fine the early exit use case though, since we only need to optimize phi nodes inserted in the latch.

The big advantage over #184838 is that it doesn't require several depth-first searches to compute the set of reachable nodes, and can be done entirely by iterating the post-dominator frontier.

Patch is 34.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/201783.diff

9 Files Affected:

(modified) llvm/include/llvm/Analysis/DominanceFrontier.h (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h (+8)
(modified) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+156-6)
(added) llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll (+331)
(modified) llvm/test/Transforms/LoopVectorize/VPlan/predicator.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+12-9)
(modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+4-6)

diff --git a/llvm/include/llvm/Analysis/DominanceFrontier.h b/llvm/include/llvm/Analysis/DominanceFrontier.h
index fd38891e901e3..4a8ab96cf71a7 100644
--- a/llvm/include/llvm/Analysis/DominanceFrontier.h
+++ b/llvm/include/llvm/Analysis/DominanceFrontier.h
@@ -78,6 +78,7 @@ class DominanceFrontierBase {
   const_iterator end() const { return Frontiers.end(); }
   iterator find(BlockT *B) { return Frontiers.find(B); }
   const_iterator find(BlockT *B) const { return Frontiers.find(B); }
+  const_iterator find(const BlockT *B) const { return Frontiers.find(B); }
 
   /// print - Convert to human readable form
   ///
diff --git a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
index 2864670f44913..1ad522880c709 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
@@ -18,6 +18,8 @@
 #include "VPlan.h"
 #include "VPlanCFG.h"
 #include "llvm/ADT/GraphTraits.h"
+#include "llvm/Analysis/DominanceFrontier.h"
+#include "llvm/Analysis/DominanceFrontierImpl.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/Support/GenericDomTree.h"
 #include "llvm/Support/GenericDomTreeConstruction.h"
@@ -67,5 +69,11 @@ template <>
 struct GraphTraits<const VPDomTreeNode *>
     : public DomTreeGraphTraitsBase<const VPDomTreeNode,
                                     VPDomTreeNode::const_iterator> {};
+
+class VPPostDominanceFrontier
+    : public DominanceFrontierBase<VPBlockBase, true> {
+public:
+  explicit VPPostDominanceFrontier(const DomTreeT &VPDT) { analyze(VPDT); }
+};
 } // namespace llvm
 #endif // LLVM_TRANSFORMS_VECTORIZE_VPLANDOMINATORTREE_H
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
index 2717b80e2eeaa..2ec3df8ccf8c1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@@ -34,6 +34,9 @@ class VPPredicator {
   /// Post-dominator tree for the VPlan.
   VPPostDominatorTree VPPDT;
 
+  /// Post-dominator frontier for the VPlan.
+  VPPostDominanceFrontier VPPDF;
+
   /// When we if-convert we need to create edge masks. We have to cache values
   /// so that we don't end up with exponential recursion/IR.
   using EdgeMaskCacheTy =
@@ -69,8 +72,19 @@ class VPPredicator {
     return EdgeMaskCache[{Src, Dst}] = Mask;
   }
 
+  using EdgeTy = std::pair<const VPBasicBlock *, const VPBasicBlock *>;
+
+  /// Compute the "furthest up" set of edges for each incoming value of \Phi.
+  MapVector<EdgeTy, VPValue *> computeBlendEdges(VPPhi *Phi);
+
+  /// Given a set of \p Edges that lead to \p VPBB, return the OR of all edges
+  /// or an equivalent block in-mask.
+  VPValue *createMaskDisjunction(ArrayRef<EdgeTy> Edges, VPBasicBlock *VPBB);
+
+  DenseMap<const VPBasicBlock *, VPBasicBlock::iterator> InsertPoints;
+
 public:
-  VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan) {}
+  VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan), VPPDF(VPPDT) {}
 
   /// Returns the *entry* mask for \p VPBB.
   VPValue *getBlockInMask(const VPBasicBlock *VPBB) const {
@@ -136,6 +150,10 @@ void VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
   // Start inserting after the block's phis, which be replaced by blends later.
   Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
 
+  // Keep track of where in VPBB we are inserting the masks into.
+  scope_exit UpdateInsertPoint(
+      [this, &VPBB]() { InsertPoints[VPBB] = Builder.getInsertPoint(); });
+
   // Reuse the mask of the immediate dominator if the VPBB post-dominates the
   // immediate dominator.
   auto *IDom = VPDT.getNode(VPBB)->getIDom();
@@ -224,7 +242,117 @@ void VPPredicator::createSwitchEdgeMasks(const VPInstruction *SI) {
   setEdgeMask(Src, DefaultDst, DefaultMask);
 }
 
+// Compute the "furthest up" set of edges for each incoming value of a phi.
+//
+// Start by keeping track of what edges lead to which value. Then see if any
+// node has the same value for all outgoing edges. If so then propagate that
+// value up to every node it postdominates.
+MapVector<VPPredicator::EdgeTy, VPValue *>
+VPPredicator::computeBlendEdges(VPPhi *Phi) {
+  MapVector<EdgeTy, VPValue *> Edges;
+
+  // Mark the given edge as providing the value \p V.
+  auto AddEdge = [&Edges](const VPBlockBase *From, const VPBlockBase *To,
+                          VPValue *V) {
+    EdgeTy Edge = {cast<VPBasicBlock>(From), cast<VPBasicBlock>(To)};
+    assert((!Edges.contains(Edge) || Edges.lookup(Edge) == V) &&
+           "Clobbering an edge?");
+    Edges[Edge] = V;
+  };
+
+  for (auto [InVal, InVPBB] : Phi->incoming_values_and_blocks())
+    AddEdge(InVPBB, Phi->getParent(), InVal);
+
+  // The root phi must postdominate every incoming block. Also don't touch
+  // phis in a reduction chain since they need to be in a specific structure
+  // for handle*Reductions.
+  for (auto [InVal, InVPBB] : Phi->incoming_values_and_blocks())
+    if (!VPPDT.dominates(Phi->getParent(), InVPBB) ||
+        isa<VPReductionPHIRecipe>(InVal))
+      return Edges;
+
+  // Given a list of edges, check if they all have the same value and return it.
+  auto GetAllEqual = [&Edges](ArrayRef<EdgeTy> OutEdges) -> VPValue * {
+    VPValue *Common = nullptr;
+    for (EdgeTy E : OutEdges) {
+      VPValue *V = Edges.lookup(E);
+      if (!V)
+        return nullptr;
+      if (match(V, m_Poison()))
+        continue;
+      if (!Common)
+        Common = V;
+      else if (Common != V)
+        return nullptr;
+    }
+    return Common;
+  };
+
+  SetVector<const VPBlockBase *> Worklist(from_range, Phi->incoming_blocks());
+  while (!Worklist.empty()) {
+    auto *VPBB = cast<VPBasicBlock>(Worklist.pop_back_val());
+
+    // Check that all outgoing edges from VPBB have the same value.
+    SmallVector<EdgeTy> OutEdges;
+    for (const VPBlockBase *Succ : VPBB->getSuccessors())
+      OutEdges.emplace_back(VPBB, cast<VPBasicBlock>(Succ));
+    VPValue *Common = GetAllEqual(OutEdges);
+    if (!Common)
+      continue;
+
+    // They have the same value: we can move the edges up
+    for (EdgeTy Edge : OutEdges)
+      Edges.erase(Edge);
+
+    // Peek through phis that are postdominated by VPBB
+    if (auto *Phi = dyn_cast<VPPhi>(Common))
+      if (VPPDT.dominates(VPBB, Phi->getParent())) {
+        for (auto [InV, InVPBB] : Phi->incoming_values_and_blocks()) {
+          AddEdge(InVPBB, Phi->getParent(), InV);
+          Worklist.insert(InVPBB);
+        }
+        continue;
+      }
+
+    // Iterate up through the post dominance frontier
+    for (const VPBlockBase *Frontier : VPPDF.find(VPBB)->second) {
+      for (const VPBlockBase *FrontierSucc : Frontier->getSuccessors())
+        if (VPPDT.dominates(VPBB, FrontierSucc))
+          AddEdge(Frontier, FrontierSucc, Common);
+      Worklist.insert(cast<VPBasicBlock>(Frontier));
+    }
+  }
+
+  return Edges;
+}
+
+VPValue *VPPredicator::createMaskDisjunction(ArrayRef<EdgeTy> Edges,
+                                             VPBasicBlock *VPBB) {
+  auto Dsts = map_range(Edges, [](auto E) { return E.second; });
+  const VPBasicBlock *PostDom = *Dsts.begin();
+  for (const VPBasicBlock *VPBB : drop_begin(Dsts))
+    PostDom =
+        cast<VPBasicBlock>(VPPDT.findNearestCommonDominator(PostDom, VPBB));
+  assert(VPPDT.dominates(VPBB, PostDom) && "Edges don't postdominate VPBB");
+  if (PostDom != VPBB)
+    return getBlockInMask(PostDom);
+
+  VPValue *Mask = nullptr;
+  for (auto [Src, Dst] : Edges) {
+    VPValue *EdgeMask;
+    {
+      VPBuilder::InsertPointGuard Guard(Builder);
+      Builder.setInsertPoint(const_cast<VPBasicBlock *>(Dst),
+                             InsertPoints[Dst]);
+      EdgeMask = createEdgeMask(Src, Dst);
+    }
+    Mask = Mask ? Builder.createOr(Mask, EdgeMask) : EdgeMask;
+  }
+  return Mask;
+}
+
 void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
+  Builder.setInsertPoint(VPBB, InsertPoints[VPBB]);
   SmallVector<VPPhi *> Phis;
   for (VPRecipeBase &R : VPBB->phis())
     Phis.push_back(cast<VPPhi>(&R));
@@ -245,10 +373,30 @@ void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
       continue;
     }
 
+    MapVector<VPValue *, SmallVector<EdgeTy>> InValEdgesMap;
+    for (auto [Edge, Val] : computeBlendEdges(PhiR))
+      InValEdgesMap[Val].push_back(Edge);
+    auto InValEdges = InValEdgesMap.takeVector();
+
+    if (InValEdges.size() == 1) {
+      PhiR->replaceAllUsesWith(InValEdges[0].first);
+      PhiR->eraseFromParent();
+      continue;
+    }
+
+    // Sort the incoming value order to match PhiR as much as possible.
+    llvm::stable_sort(InValEdges, [&PhiR](auto &L, auto &R) {
+      auto InVs = PhiR->incoming_values();
+      return std::distance(InVs.begin(), find(InVs, L.first)) <
+             std::distance(InVs.begin(), find(InVs, R.first));
+    });
+
     SmallVector<VPValue *, 2> OperandsWithMask;
-    for (const auto &[InVPV, InVPBB] : PhiR->incoming_values_and_blocks()) {
+    for (const auto &[InVPV, Edges] : InValEdges) {
+      if (match(InVPV, m_Poison()))
+        continue;
       OperandsWithMask.push_back(InVPV);
-      OperandsWithMask.push_back(createEdgeMask(InVPBB, VPBB));
+      OperandsWithMask.push_back(createMaskDisjunction(Edges, VPBB));
     }
     PHINode *IRPhi = cast_or_null<PHINode>(PhiR->getUnderlyingValue());
     auto *Blend =
@@ -276,10 +424,8 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     // Introduce the mask for VPBB, which may introduce needed edge masks, and
     // convert all phi recipes of VPBB to blend recipes unless VPBB is the
     // header.
-    if (VPBB != Header) {
+    if (VPBB != Header)
       Predicator.createBlockInMask(VPBB);
-      Predicator.convertPhisToBlends(VPBB);
-    }
 
     VPValue *BlockMask = Predicator.getBlockInMask(VPBB);
     if (!BlockMask)
@@ -292,6 +438,10 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     }
   }
 
+  for (VPBlockBase *VPB : reverse(RPOT))
+    if (VPB != Header)
+      Predicator.convertPhisToBlends(cast<VPBasicBlock>(VPB));
+
   // Linearize the blocks of the loop into one serial chain.
   VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
diff --git a/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll b/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll
new file mode 100644
index 0000000000000..4d388a3be84e4
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll
@@ -0,0 +1,331 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -disable-output < %s -p loop-vectorize -vplan-print-after=introduceMasksAndLinearize -vplan-print-vector-region-scope 2>&1 | FileCheck %s
+
+; Test various CFGs generated by early exits
+
+;     vector.body
+;        /   \
+;       /     \
+;   exiting1   \
+;     / \       \
+;    /   \       |
+;   /   join1    |
+;  /    /    \   |
+; / exiting2  \  |
+; \    |    \  \ |
+;  \   |     join2
+;   \  |    /
+;    \ |   /
+;     latch
+define void @multi_early_exit_predicated_nested(ptr %p1, ptr %p2, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'multi_early_exit_predicated_nested'
+; CHECK-NEXT:  <x1> vector loop: {
+; CHECK-NEXT:  vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT:    vector.body:
+; CHECK-NEXT:      ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT:    Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting1:
+; CHECK-NEXT:    Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT:    join1:
+; CHECK-NEXT:      EMIT vp<[[VP4:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT:      EMIT vp<[[VP5:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP4]]>
+; CHECK-NEXT:    Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting2:
+; CHECK-NEXT:      EMIT vp<[[VP6:%[0-9]+]]> = logical-and vp<[[VP5]]>, ir<%c2>
+; CHECK-NEXT:    Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT:    join2:
+; CHECK-NEXT:      EMIT vp<[[VP7:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT:      EMIT vp<[[VP8:%[0-9]+]]> = logical-and vp<[[VP6]]>, vp<[[VP7]]>
+; CHECK-NEXT:      EMIT vp<[[VP9:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT:      EMIT vp<[[VP10:%[0-9]+]]> = logical-and vp<[[VP5]]>, vp<[[VP9]]>
+; CHECK-NEXT:      EMIT vp<[[VP11:%[0-9]+]]> = or vp<[[VP8]]>, vp<[[VP10]]>
+; CHECK-NEXT:      EMIT vp<[[VP12:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT:      EMIT vp<[[VP13:%[0-9]+]]> = or vp<[[VP11]]>, vp<[[VP12]]>
+; CHECK-NEXT:      EMIT vp<[[VP14:%[0-9]+]]> = or vp<[[VP8]]>, vp<[[VP10]]>
+; CHECK-NEXT:      BLEND ir<%phi1.join2> = ir<1>/vp<[[VP14]]> ir<0>/vp<[[VP12]]>
+; CHECK-NEXT:      EMIT vp<[[VP15:%[0-9]+]]> = or vp<[[VP10]]>, vp<[[VP12]]>
+; CHECK-NEXT:      BLEND ir<%phi2.join2> = ir<1>/vp<[[VP8]]> ir<0>/vp<[[VP15]]>
+; CHECK-NEXT:    Successor(s): latch
+; CHECK-EMPTY:
+; CHECK-NEXT:    latch:
+; CHECK-NEXT:      BLEND ir<%phi1> = ir<1>/ir<%c1> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT:      BLEND ir<%phi2> = ir<1>/vp<[[VP6]]> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT:      EMIT ir<%gep1> = getelementptr ir<%p1>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi1>, ir<%gep1>
+; CHECK-NEXT:      EMIT ir<%gep2> = getelementptr ir<%p2>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi2>, ir<%gep2>
+; CHECK-NEXT:      EMIT ir<%iv.next> = add ir<%iv>, ir<1>
+; CHECK-NEXT:      EMIT ir<%ec> = icmp eq ir<%iv.next>, ir<%n>
+; CHECK-NEXT:      EMIT vp<%index.next> = add nuw vp<[[VP3]]>, vp<[[VP1:%[0-9]+]]>
+; CHECK-NEXT:      EMIT branch-on-count vp<%index.next>, vp<[[VP2:%[0-9]+]]>
+; CHECK-NEXT:    No successors
+; CHECK-NEXT:  }
+; CHECK-NEXT:  Successor(s): middle.block
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %latch]
+  br i1 %c1, label %exiting1, label %join2
+
+exiting1:
+  br i1 %ee1, label %latch, label %join1
+
+join1:
+  br i1 %c2, label %exiting2, label %join2
+
+exiting2:
+  br i1 %ee2, label %latch, label %join2
+
+join2:
+  %phi1.join2 = phi i32 [1, %exiting2], [1, %join1], [0, %loop]
+  %phi2.join2 = phi i32 [1, %exiting2], [0, %join1], [0, %loop]
+  br label %latch
+
+latch:
+  %phi1 = phi i32 [1, %exiting1], [1, %exiting2], [%phi1.join2, %join2]
+  %phi2 = phi i32 [poison, %exiting1], [1, %exiting2], [%phi2.join2, %join2]
+  %gep1 = getelementptr i32, ptr %p1, i32 %iv
+  store i32 %phi1, ptr %gep1
+  %gep2 = getelementptr i32, ptr %p2, i32 %iv
+  store i32 %phi2, ptr %gep2
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv.next, %n
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+;     vector.body
+;        /   \
+;       /     \
+;   exiting1  /
+;     / \    /
+;    /   \  /
+;   /   join1
+;  /    /    \
+; / exiting2  \
+; \    |    \  \
+;  \   |     join2
+;   \  |    /
+;    \ |   /
+;     latch
+define void @multi_early_exit_predicated_not_nested(ptr %p1, ptr %p2, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'multi_early_exit_predicated_not_nested'
+; CHECK-NEXT:  <x1> vector loop: {
+; CHECK-NEXT:  vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT:    vector.body:
+; CHECK-NEXT:      ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT:    Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting1:
+; CHECK-NEXT:    Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT:    join1:
+; CHECK-NEXT:      EMIT vp<[[VP4:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT:      EMIT vp<[[VP5:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP4]]>
+; CHECK-NEXT:      EMIT vp<[[VP6:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT:      EMIT vp<[[VP7:%[0-9]+]]> = or vp<[[VP5]]>, vp<[[VP6]]>
+; CHECK-NEXT:      BLEND ir<%phi.join1> = ir<1>/vp<[[VP5]]> ir<0>/vp<[[VP6]]>
+; CHECK-NEXT:    Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting2:
+; CHECK-NEXT:      EMIT vp<[[VP8:%[0-9]+]]> = logical-and vp<[[VP7]]>, ir<%c2>
+; CHECK-NEXT:    Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT:    join2:
+; CHECK-NEXT:      EMIT vp<[[VP9:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT:      EMIT vp<[[VP10:%[0-9]+]]> = logical-and vp<[[VP8]]>, vp<[[VP9]]>
+; CHECK-NEXT:      EMIT vp<[[VP11:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT:      EMIT vp<[[VP12:%[0-9]+]]> = logical-and vp<[[VP7]]>, vp<[[VP11]]>
+; CHECK-NEXT:      EMIT vp<[[VP13:%[0-9]+]]> = or vp<[[VP10]]>, vp<[[VP12]]>
+; CHECK-NEXT:      BLEND ir<%phi.join2> = ir<1>/vp<[[VP10]]> ir<0>/vp<[[VP12]]>
+; CHECK-NEXT:    Successor(s): latch
+; CHECK-EMPTY:
+; CHECK-NEXT:    latch:
+; CHECK-NEXT:      BLEND ir<%phi1> = ir<1>/ir<%c1> ir<0>/vp<[[VP7]]>
+; CHECK-NEXT:      BLEND ir<%phi2> = ir<1>/vp<[[VP8]]> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT:      EMIT ir<%gep1> = getelementptr ir<%p1>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi1>, ir<%gep1>
+; CHECK-NEXT:      EMIT ir<%gep2> = getelementptr ir<%p2>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi2>, ir<%gep2>
+; CHECK-NEXT:      EMIT ir<%iv.next> = add ir<%iv>, ir<1>
+; CHECK-NEXT:      EMIT ir<%ec> = icmp eq ir<%iv.next>, ir<%n>
+; CHECK-NEXT:      EMIT vp<%index.next> = add nuw vp<[[VP3]]>, vp<[[VP1:%[0-9]+]]>
+; CHECK-NEXT:      EMIT branch-on-count vp<%index.next>, vp<[[VP2:%[0-9]+]]>
+; CHECK-NEXT:    No successors
+; CHECK-NEXT:  }
+; CHECK-NEXT:  Successor(s): middle.block
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %latch]
+  br i1 %c1, label %exiting1, label %join1
+
+exiting1:
+  br i1 %ee1, label %latch, label %join1
+
+join1:
+  %phi.join1 = phi i32 [1, %exiting1], [0, %loop]
+  br i1 %c2, label %exiting2, label %join2
+
+exiting2:
+  br i1 %ee2, label %latch, label %join2
+
+join2:
+  %phi.join2 = phi i32 [1, %exiting2], [0, %join1]
+  br label %latch
+
+latch:
+  %phi1 = phi i32 [1, %exiting1], [%phi.join1, %exiting2], [%phi.join1, %join2]
+  %phi2 = phi i32 [poison, %exiting1], [1, %exiting2], [%phi.join2, %join2]
+  %gep1 = getelementptr i32, ptr %p1, i32 %iv
+  store i32 %phi1, ptr %gep1
+  %gep2 = getelementptr i32, ptr %p2, i32 %iv
+  store i32 %phi2, ptr %gep2
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv.next, %n
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+;     vector.body
+;        /    \
+;       /      \
+;   exiting1  exiting2
+;     / \      /    \
+;    /   \    /      \
+;   /    join1        \
+;  /    /    \         \
+; / exiting3 exiting4  /
+; \   \  \   /  /     /
+;  \   \ join2 /     /
+;   \   \  |  /     /
+;    +---latch-----+
+define void @four_exits_2x2_diamond(ptr %p1, ptr %p2, ptr %p3, ptr %p4, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i1 %ee3, i1 %ee4, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'four_exits_2x2_diamond'
+; CHECK-NEXT:  <x1> vector loop: {
+; CHECK-NEXT:  vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT:    vector.body:
+; CHECK-NEXT:      ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT:    Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting2:
+; CHECK-NEXT:      EMIT vp<[[VP4:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT:    Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting1:
+; CHECK-NEXT:    Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT:    join1:
+; CHECK-NEXT:      EMIT vp<[[VP5:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT:      EMIT vp<[[VP6:%[0-9]+]]> = logical-and vp<[[VP4]]>, vp<[[VP5]]>
+; CHECK-NEXT:      EMIT vp<[[VP7:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT:      EMIT vp<[[VP8:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP7]]>
+; CHECK-NEXT:      EMIT vp<[[VP9:%[0-9]+]]> = or vp<[[VP6]]>, vp<[[VP8]]>
+; CHECK-NEXT:      BLEND ir<%phi1.join1> = ir<0>/vp<[[VP6]]> ir<1>/vp<[[VP8]]>
+; CHECK-NEXT:      BLEND ir<%phi2.join1> = ir<1>/vp<[[VP6]]> ir<0>/vp<[[VP8]]>
+; CHECK-NEXT:    Successor(s): exiting4
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting4:
+; CHECK-NEXT:      EMIT vp<[[VP10:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT:      EMIT vp<[[VP11:%[0-9]+]]> = logical-and vp<[[VP9]]>, vp<[[VP10]]>
+; CHECK-NEXT:    Successor(s): exiting3
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting3:
+; CHECK-NEXT:      EMIT vp<[[VP12:%[0-9]+]]> = logical-and vp<[[VP9]]>, ir<%c2>
+; CHECK-NEXT:    Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT:    join2:
+; CHECK-NEXT:      EMIT vp<[[VP13:%[0-9]+]]> = not ir<%ee4>
+; CHECK-NEXT:      EMIT vp<[[VP14:%[0-9]+]]> = logical-and vp<[[VP11]]>, vp<[[VP13]]>
+; CHECK-NEXT:      EMIT vp<[[VP15:%[0-9]+]]> = not ir<%ee3>
+; CHECK-NEXT:      EMIT vp<[[VP16:%[0-9]...
[truncated]

github-actions · 2026-06-05T09:08:02Z

🐧 Linux x64 Test Results

197445 tests passed
5446 tests skipped

✅ The build succeeded and all tests passed.

#201783 wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.

We don't need it for now

lukel97 · 2026-06-10T09:25:21Z

Unstacked now that #201782 is landed

llvm#201783 wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.

eas · 2026-06-16T16:25:09Z

+    SmallVector<EdgeTy> OutEdges;
+    for (const VPBlockBase *Succ : VPBB->getSuccessors())
+      OutEdges.emplace_back(VPBB, cast<VPBasicBlock>(Succ));


nit: maybe use from_range ctor + map_range for the cast? Not sure if that would actually work.

Are you suggesting to turn this into a mapped iterator instead of storing it in a vector? OutEdges gets traversed twice so we'll end up constructing the edges twice if we use map_range

I was thinking SmallVector<EdgeTy> OutEdges(from_range, map_range(...));, but apparently it doesn't even need/have from_range and I think we can just do

SmallVector<EdgeTy> OutEdges(map_range(...));

(although I didn't actually try, just read through some interfaces).

I gave it a try and I'm not sure it's more readable:

SmallVector<EdgeTy> OutEdges( map_range(VPBB->getSuccessors(), [&VPBB](const VPBlockBase *Succ) { return std::make_pair(VPBB, cast<VPBasicBlock>(Succ)); }));

I also think it's nice to use emplace_back where possible, it's more obvious that it's avoiding a copy

Either is fine with me. I don't know if mapped_iterator is random access or not, or if SmallVector range ctor can pre-allocate storage for all elements, but theoretically that could be an argument for the ctor version.

* Reuse previous method in DomiananceFrontier * Replace GetAllEqual with a map_range

After thinking about this for a bit this isn't needed. If a phi doesn't postdominate an incoming block, the incoming block will have an outgoing edge with no value. So we won't propagate any further up that incoming block anyway. What differs between this approach and llvm#184838 is that the latter performs a full inverse DFS to see what blocks are reachable, whereas this just checks that the incoming values are the same at each postdominance frontier. The test case phi_doesnt_postdom_incoming shows a scenario where the full inverse DFS approach could simplify the edge to just c1 and !c1, but we calculate the conservative (but still correct) edges in this PR.

lukel97 · 2026-06-24T05:11:42Z

/test-suite

github-actions · 2026-06-24T07:05:49Z

test-suite diff from 9ee7bda...a6b156b: https://github.com/llvm/llvm-project/actions/runs/28076715654/artifacts/7842411900

lukel97 · 2026-06-25T08:18:37Z

Ping

llvmorg-github-actions Bot added vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jun 5, 2026

lukel97 mentioned this pull request Jun 5, 2026

[VPlan] Use control flow to implement MaskedCond and preserve SSA #201784

Open

lukel97 requested review from Mel-Chen, arcbbb, ayalz, david-arm, eas and fhahn June 5, 2026 08:40

This was referenced Jun 5, 2026

[VPlan] Compute blend masks from minimum set of edge masks #184838

Closed

[VPlan] Insert VPBlendRecipes in post order. NFC #201782

Merged

lukel97 force-pushed the loop-vectorize/computeBlendMasks branch 2 times, most recently from 9366715 to aedc836 Compare June 8, 2026 07:33

eas reviewed Jun 8, 2026

View reviewed changes

lukel97 force-pushed the loop-vectorize/computeBlendMasks branch from aedc836 to 672de9b Compare June 9, 2026 11:19

lukel97 added 6 commits June 10, 2026 17:21

Precommit tests

3bdeab2

Simplify blend masks

e3b9c30

Remove unnecessary map_range

49eabde

Address review comments, add ASCII diagram and comments

3724b66

Remove peeking through phis

05ac6a4

Remove reduction phi check

de9c116

We don't need it for now

lukel97 force-pushed the loop-vectorize/computeBlendMasks branch from 672de9b to de9c116 Compare June 10, 2026 09:23

lukel97 added 2 commits June 10, 2026 17:38

Fix assertion comment

d458213

Remove poison coalescing for now until we have test cases

0b553a5

This was referenced Jun 11, 2026

[VPlan] Peek through nested phi incoming values in computeBlendMasks #203164

Open

[VPlan] Use CFG to mask early exit loops with side-effects. NFC #203263

Open

Uncountable early exit loop vectorization with tail folding #179595

Open

eas reviewed Jun 16, 2026

View reviewed changes

lukel97 added 5 commits June 17, 2026 11:03

Merge branch 'main' of github.com:llvm/llvm-project into HEAD

168bf0a

Address review comments

a7081b6

* Reuse previous method in DomiananceFrontier * Replace GetAllEqual with a map_range

Add test case for switch with duplicate edges

af139d8

Rename createMaskDisjunction

25b8c28

Merge branch 'main' of github.com:llvm/llvm-project into HEAD

9ccf7a6

Uh oh!

Conversation

lukel97 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmorg-github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Jun 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eas Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

lukel97 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

eas Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

lukel97 Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

eas Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

lukel97 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukel97 commented Jun 5, 2026 •

edited

Loading

llvmorg-github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading