Skip to content

[VPlan] Compute blend masks from minimum set of edge masks#201783

Open
lukel97 wants to merge 14 commits into
llvm:mainfrom
lukel97:loop-vectorize/computeBlendMasks
Open

[VPlan] Compute blend masks from minimum set of edge masks#201783
lukel97 wants to merge 14 commits into
llvm:mainfrom
lukel97:loop-vectorize/computeBlendMasks

Conversation

@lukel97

@lukel97 lukel97 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

#201784 aims to preserve SSA in early exit loops, and in doing so insert phi nodes. More phi nodes results in more VPBlendRecipes, so this PR optimizes the edge masks generated for those blend recipes to prevent regressions.

The idea is to compute a minimal set of edges that lead to each unique incoming value in a phi. It does this by iterating up the edges in the post dominance frontier till the outgoing edges no longer lead to the same value.

This is a simpler, less general version of #184838 since this can't optimize away edges that aren't postdominated by the phi. This is fine the early exit use case though, since we only need to optimize phi nodes inserted in the latch.

The big advantage over #184838 is that it doesn't require several depth-first searches to compute the set of reachable nodes, and can be done entirely by iterating the post-dominator frontier.

@llvmorg-github-actions llvmorg-github-actions Bot added vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jun 5, 2026
@llvmorg-github-actions

llvmorg-github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Stacked on #201782

Another PR aims to model the control flow of early exits explicitly, and in doing so insert phi nodes to preserve SSA. Inserting phi nodes results in more VPBlendRecipes, so this PR optimizes the edge masks generated for those blend recipes to prevent regressions.

The shape of the CFG and the phis that would be emitted are precomitted in the predicator-early-exit.ll test.

The idea is to compute a minimal set of edges that lead to each unique incoming value in a phi. It does this by iterating up the edges in the post dominance frontier till the outgoing edges no longer lead to the same value. It also recursively looks through the incoming edges of any values that are phi nodes themselves.

This is a simpler, less general version of #184838 since this requires the phi node to postdominate its incoming values. This is fine the early exit use case though, since we only need to optimize phi nodes inserted in the latch.

The big advantage over #184838 is that it doesn't require several depth-first searches to compute the set of reachable nodes, and can be done entirely by iterating the post-dominator frontier.


Patch is 34.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/201783.diff

9 Files Affected:

  • (modified) llvm/include/llvm/Analysis/DominanceFrontier.h (+1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h (+8)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+156-6)
  • (added) llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll (+331)
  • (modified) llvm/test/Transforms/LoopVectorize/VPlan/predicator.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+3-6)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+4-6)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+12-9)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+4-6)
diff --git a/llvm/include/llvm/Analysis/DominanceFrontier.h b/llvm/include/llvm/Analysis/DominanceFrontier.h
index fd38891e901e3..4a8ab96cf71a7 100644
--- a/llvm/include/llvm/Analysis/DominanceFrontier.h
+++ b/llvm/include/llvm/Analysis/DominanceFrontier.h
@@ -78,6 +78,7 @@ class DominanceFrontierBase {
   const_iterator end() const { return Frontiers.end(); }
   iterator find(BlockT *B) { return Frontiers.find(B); }
   const_iterator find(BlockT *B) const { return Frontiers.find(B); }
+  const_iterator find(const BlockT *B) const { return Frontiers.find(B); }
 
   /// print - Convert to human readable form
   ///
diff --git a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
index 2864670f44913..1ad522880c709 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
@@ -18,6 +18,8 @@
 #include "VPlan.h"
 #include "VPlanCFG.h"
 #include "llvm/ADT/GraphTraits.h"
+#include "llvm/Analysis/DominanceFrontier.h"
+#include "llvm/Analysis/DominanceFrontierImpl.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/Support/GenericDomTree.h"
 #include "llvm/Support/GenericDomTreeConstruction.h"
@@ -67,5 +69,11 @@ template <>
 struct GraphTraits<const VPDomTreeNode *>
     : public DomTreeGraphTraitsBase<const VPDomTreeNode,
                                     VPDomTreeNode::const_iterator> {};
+
+class VPPostDominanceFrontier
+    : public DominanceFrontierBase<VPBlockBase, true> {
+public:
+  explicit VPPostDominanceFrontier(const DomTreeT &VPDT) { analyze(VPDT); }
+};
 } // namespace llvm
 #endif // LLVM_TRANSFORMS_VECTORIZE_VPLANDOMINATORTREE_H
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
index 2717b80e2eeaa..2ec3df8ccf8c1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@@ -34,6 +34,9 @@ class VPPredicator {
   /// Post-dominator tree for the VPlan.
   VPPostDominatorTree VPPDT;
 
+  /// Post-dominator frontier for the VPlan.
+  VPPostDominanceFrontier VPPDF;
+
   /// When we if-convert we need to create edge masks. We have to cache values
   /// so that we don't end up with exponential recursion/IR.
   using EdgeMaskCacheTy =
@@ -69,8 +72,19 @@ class VPPredicator {
     return EdgeMaskCache[{Src, Dst}] = Mask;
   }
 
+  using EdgeTy = std::pair<const VPBasicBlock *, const VPBasicBlock *>;
+
+  /// Compute the "furthest up" set of edges for each incoming value of \Phi.
+  MapVector<EdgeTy, VPValue *> computeBlendEdges(VPPhi *Phi);
+
+  /// Given a set of \p Edges that lead to \p VPBB, return the OR of all edges
+  /// or an equivalent block in-mask.
+  VPValue *createMaskDisjunction(ArrayRef<EdgeTy> Edges, VPBasicBlock *VPBB);
+
+  DenseMap<const VPBasicBlock *, VPBasicBlock::iterator> InsertPoints;
+
 public:
-  VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan) {}
+  VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan), VPPDF(VPPDT) {}
 
   /// Returns the *entry* mask for \p VPBB.
   VPValue *getBlockInMask(const VPBasicBlock *VPBB) const {
@@ -136,6 +150,10 @@ void VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
   // Start inserting after the block's phis, which be replaced by blends later.
   Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
 
+  // Keep track of where in VPBB we are inserting the masks into.
+  scope_exit UpdateInsertPoint(
+      [this, &VPBB]() { InsertPoints[VPBB] = Builder.getInsertPoint(); });
+
   // Reuse the mask of the immediate dominator if the VPBB post-dominates the
   // immediate dominator.
   auto *IDom = VPDT.getNode(VPBB)->getIDom();
@@ -224,7 +242,117 @@ void VPPredicator::createSwitchEdgeMasks(const VPInstruction *SI) {
   setEdgeMask(Src, DefaultDst, DefaultMask);
 }
 
+// Compute the "furthest up" set of edges for each incoming value of a phi.
+//
+// Start by keeping track of what edges lead to which value. Then see if any
+// node has the same value for all outgoing edges. If so then propagate that
+// value up to every node it postdominates.
+MapVector<VPPredicator::EdgeTy, VPValue *>
+VPPredicator::computeBlendEdges(VPPhi *Phi) {
+  MapVector<EdgeTy, VPValue *> Edges;
+
+  // Mark the given edge as providing the value \p V.
+  auto AddEdge = [&Edges](const VPBlockBase *From, const VPBlockBase *To,
+                          VPValue *V) {
+    EdgeTy Edge = {cast<VPBasicBlock>(From), cast<VPBasicBlock>(To)};
+    assert((!Edges.contains(Edge) || Edges.lookup(Edge) == V) &&
+           "Clobbering an edge?");
+    Edges[Edge] = V;
+  };
+
+  for (auto [InVal, InVPBB] : Phi->incoming_values_and_blocks())
+    AddEdge(InVPBB, Phi->getParent(), InVal);
+
+  // The root phi must postdominate every incoming block. Also don't touch
+  // phis in a reduction chain since they need to be in a specific structure
+  // for handle*Reductions.
+  for (auto [InVal, InVPBB] : Phi->incoming_values_and_blocks())
+    if (!VPPDT.dominates(Phi->getParent(), InVPBB) ||
+        isa<VPReductionPHIRecipe>(InVal))
+      return Edges;
+
+  // Given a list of edges, check if they all have the same value and return it.
+  auto GetAllEqual = [&Edges](ArrayRef<EdgeTy> OutEdges) -> VPValue * {
+    VPValue *Common = nullptr;
+    for (EdgeTy E : OutEdges) {
+      VPValue *V = Edges.lookup(E);
+      if (!V)
+        return nullptr;
+      if (match(V, m_Poison()))
+        continue;
+      if (!Common)
+        Common = V;
+      else if (Common != V)
+        return nullptr;
+    }
+    return Common;
+  };
+
+  SetVector<const VPBlockBase *> Worklist(from_range, Phi->incoming_blocks());
+  while (!Worklist.empty()) {
+    auto *VPBB = cast<VPBasicBlock>(Worklist.pop_back_val());
+
+    // Check that all outgoing edges from VPBB have the same value.
+    SmallVector<EdgeTy> OutEdges;
+    for (const VPBlockBase *Succ : VPBB->getSuccessors())
+      OutEdges.emplace_back(VPBB, cast<VPBasicBlock>(Succ));
+    VPValue *Common = GetAllEqual(OutEdges);
+    if (!Common)
+      continue;
+
+    // They have the same value: we can move the edges up
+    for (EdgeTy Edge : OutEdges)
+      Edges.erase(Edge);
+
+    // Peek through phis that are postdominated by VPBB
+    if (auto *Phi = dyn_cast<VPPhi>(Common))
+      if (VPPDT.dominates(VPBB, Phi->getParent())) {
+        for (auto [InV, InVPBB] : Phi->incoming_values_and_blocks()) {
+          AddEdge(InVPBB, Phi->getParent(), InV);
+          Worklist.insert(InVPBB);
+        }
+        continue;
+      }
+
+    // Iterate up through the post dominance frontier
+    for (const VPBlockBase *Frontier : VPPDF.find(VPBB)->second) {
+      for (const VPBlockBase *FrontierSucc : Frontier->getSuccessors())
+        if (VPPDT.dominates(VPBB, FrontierSucc))
+          AddEdge(Frontier, FrontierSucc, Common);
+      Worklist.insert(cast<VPBasicBlock>(Frontier));
+    }
+  }
+
+  return Edges;
+}
+
+VPValue *VPPredicator::createMaskDisjunction(ArrayRef<EdgeTy> Edges,
+                                             VPBasicBlock *VPBB) {
+  auto Dsts = map_range(Edges, [](auto E) { return E.second; });
+  const VPBasicBlock *PostDom = *Dsts.begin();
+  for (const VPBasicBlock *VPBB : drop_begin(Dsts))
+    PostDom =
+        cast<VPBasicBlock>(VPPDT.findNearestCommonDominator(PostDom, VPBB));
+  assert(VPPDT.dominates(VPBB, PostDom) && "Edges don't postdominate VPBB");
+  if (PostDom != VPBB)
+    return getBlockInMask(PostDom);
+
+  VPValue *Mask = nullptr;
+  for (auto [Src, Dst] : Edges) {
+    VPValue *EdgeMask;
+    {
+      VPBuilder::InsertPointGuard Guard(Builder);
+      Builder.setInsertPoint(const_cast<VPBasicBlock *>(Dst),
+                             InsertPoints[Dst]);
+      EdgeMask = createEdgeMask(Src, Dst);
+    }
+    Mask = Mask ? Builder.createOr(Mask, EdgeMask) : EdgeMask;
+  }
+  return Mask;
+}
+
 void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
+  Builder.setInsertPoint(VPBB, InsertPoints[VPBB]);
   SmallVector<VPPhi *> Phis;
   for (VPRecipeBase &R : VPBB->phis())
     Phis.push_back(cast<VPPhi>(&R));
@@ -245,10 +373,30 @@ void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
       continue;
     }
 
+    MapVector<VPValue *, SmallVector<EdgeTy>> InValEdgesMap;
+    for (auto [Edge, Val] : computeBlendEdges(PhiR))
+      InValEdgesMap[Val].push_back(Edge);
+    auto InValEdges = InValEdgesMap.takeVector();
+
+    if (InValEdges.size() == 1) {
+      PhiR->replaceAllUsesWith(InValEdges[0].first);
+      PhiR->eraseFromParent();
+      continue;
+    }
+
+    // Sort the incoming value order to match PhiR as much as possible.
+    llvm::stable_sort(InValEdges, [&PhiR](auto &L, auto &R) {
+      auto InVs = PhiR->incoming_values();
+      return std::distance(InVs.begin(), find(InVs, L.first)) <
+             std::distance(InVs.begin(), find(InVs, R.first));
+    });
+
     SmallVector<VPValue *, 2> OperandsWithMask;
-    for (const auto &[InVPV, InVPBB] : PhiR->incoming_values_and_blocks()) {
+    for (const auto &[InVPV, Edges] : InValEdges) {
+      if (match(InVPV, m_Poison()))
+        continue;
       OperandsWithMask.push_back(InVPV);
-      OperandsWithMask.push_back(createEdgeMask(InVPBB, VPBB));
+      OperandsWithMask.push_back(createMaskDisjunction(Edges, VPBB));
     }
     PHINode *IRPhi = cast_or_null<PHINode>(PhiR->getUnderlyingValue());
     auto *Blend =
@@ -276,10 +424,8 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     // Introduce the mask for VPBB, which may introduce needed edge masks, and
     // convert all phi recipes of VPBB to blend recipes unless VPBB is the
     // header.
-    if (VPBB != Header) {
+    if (VPBB != Header)
       Predicator.createBlockInMask(VPBB);
-      Predicator.convertPhisToBlends(VPBB);
-    }
 
     VPValue *BlockMask = Predicator.getBlockInMask(VPBB);
     if (!BlockMask)
@@ -292,6 +438,10 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
     }
   }
 
+  for (VPBlockBase *VPB : reverse(RPOT))
+    if (VPB != Header)
+      Predicator.convertPhisToBlends(cast<VPBasicBlock>(VPB));
+
   // Linearize the blocks of the loop into one serial chain.
   VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
diff --git a/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll b/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll
new file mode 100644
index 0000000000000..4d388a3be84e4
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll
@@ -0,0 +1,331 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -disable-output < %s -p loop-vectorize -vplan-print-after=introduceMasksAndLinearize -vplan-print-vector-region-scope 2>&1 | FileCheck %s
+
+; Test various CFGs generated by early exits
+
+;     vector.body
+;        /   \
+;       /     \
+;   exiting1   \
+;     / \       \
+;    /   \       |
+;   /   join1    |
+;  /    /    \   |
+; / exiting2  \  |
+; \    |    \  \ |
+;  \   |     join2
+;   \  |    /
+;    \ |   /
+;     latch
+define void @multi_early_exit_predicated_nested(ptr %p1, ptr %p2, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'multi_early_exit_predicated_nested'
+; CHECK-NEXT:  <x1> vector loop: {
+; CHECK-NEXT:  vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT:    vector.body:
+; CHECK-NEXT:      ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT:    Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting1:
+; CHECK-NEXT:    Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT:    join1:
+; CHECK-NEXT:      EMIT vp<[[VP4:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT:      EMIT vp<[[VP5:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP4]]>
+; CHECK-NEXT:    Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting2:
+; CHECK-NEXT:      EMIT vp<[[VP6:%[0-9]+]]> = logical-and vp<[[VP5]]>, ir<%c2>
+; CHECK-NEXT:    Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT:    join2:
+; CHECK-NEXT:      EMIT vp<[[VP7:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT:      EMIT vp<[[VP8:%[0-9]+]]> = logical-and vp<[[VP6]]>, vp<[[VP7]]>
+; CHECK-NEXT:      EMIT vp<[[VP9:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT:      EMIT vp<[[VP10:%[0-9]+]]> = logical-and vp<[[VP5]]>, vp<[[VP9]]>
+; CHECK-NEXT:      EMIT vp<[[VP11:%[0-9]+]]> = or vp<[[VP8]]>, vp<[[VP10]]>
+; CHECK-NEXT:      EMIT vp<[[VP12:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT:      EMIT vp<[[VP13:%[0-9]+]]> = or vp<[[VP11]]>, vp<[[VP12]]>
+; CHECK-NEXT:      EMIT vp<[[VP14:%[0-9]+]]> = or vp<[[VP8]]>, vp<[[VP10]]>
+; CHECK-NEXT:      BLEND ir<%phi1.join2> = ir<1>/vp<[[VP14]]> ir<0>/vp<[[VP12]]>
+; CHECK-NEXT:      EMIT vp<[[VP15:%[0-9]+]]> = or vp<[[VP10]]>, vp<[[VP12]]>
+; CHECK-NEXT:      BLEND ir<%phi2.join2> = ir<1>/vp<[[VP8]]> ir<0>/vp<[[VP15]]>
+; CHECK-NEXT:    Successor(s): latch
+; CHECK-EMPTY:
+; CHECK-NEXT:    latch:
+; CHECK-NEXT:      BLEND ir<%phi1> = ir<1>/ir<%c1> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT:      BLEND ir<%phi2> = ir<1>/vp<[[VP6]]> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT:      EMIT ir<%gep1> = getelementptr ir<%p1>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi1>, ir<%gep1>
+; CHECK-NEXT:      EMIT ir<%gep2> = getelementptr ir<%p2>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi2>, ir<%gep2>
+; CHECK-NEXT:      EMIT ir<%iv.next> = add ir<%iv>, ir<1>
+; CHECK-NEXT:      EMIT ir<%ec> = icmp eq ir<%iv.next>, ir<%n>
+; CHECK-NEXT:      EMIT vp<%index.next> = add nuw vp<[[VP3]]>, vp<[[VP1:%[0-9]+]]>
+; CHECK-NEXT:      EMIT branch-on-count vp<%index.next>, vp<[[VP2:%[0-9]+]]>
+; CHECK-NEXT:    No successors
+; CHECK-NEXT:  }
+; CHECK-NEXT:  Successor(s): middle.block
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %latch]
+  br i1 %c1, label %exiting1, label %join2
+
+exiting1:
+  br i1 %ee1, label %latch, label %join1
+
+join1:
+  br i1 %c2, label %exiting2, label %join2
+
+exiting2:
+  br i1 %ee2, label %latch, label %join2
+
+join2:
+  %phi1.join2 = phi i32 [1, %exiting2], [1, %join1], [0, %loop]
+  %phi2.join2 = phi i32 [1, %exiting2], [0, %join1], [0, %loop]
+  br label %latch
+
+latch:
+  %phi1 = phi i32 [1, %exiting1], [1, %exiting2], [%phi1.join2, %join2]
+  %phi2 = phi i32 [poison, %exiting1], [1, %exiting2], [%phi2.join2, %join2]
+  %gep1 = getelementptr i32, ptr %p1, i32 %iv
+  store i32 %phi1, ptr %gep1
+  %gep2 = getelementptr i32, ptr %p2, i32 %iv
+  store i32 %phi2, ptr %gep2
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv.next, %n
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+;     vector.body
+;        /   \
+;       /     \
+;   exiting1  /
+;     / \    /
+;    /   \  /
+;   /   join1
+;  /    /    \
+; / exiting2  \
+; \    |    \  \
+;  \   |     join2
+;   \  |    /
+;    \ |   /
+;     latch
+define void @multi_early_exit_predicated_not_nested(ptr %p1, ptr %p2, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'multi_early_exit_predicated_not_nested'
+; CHECK-NEXT:  <x1> vector loop: {
+; CHECK-NEXT:  vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT:    vector.body:
+; CHECK-NEXT:      ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT:    Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting1:
+; CHECK-NEXT:    Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT:    join1:
+; CHECK-NEXT:      EMIT vp<[[VP4:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT:      EMIT vp<[[VP5:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP4]]>
+; CHECK-NEXT:      EMIT vp<[[VP6:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT:      EMIT vp<[[VP7:%[0-9]+]]> = or vp<[[VP5]]>, vp<[[VP6]]>
+; CHECK-NEXT:      BLEND ir<%phi.join1> = ir<1>/vp<[[VP5]]> ir<0>/vp<[[VP6]]>
+; CHECK-NEXT:    Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting2:
+; CHECK-NEXT:      EMIT vp<[[VP8:%[0-9]+]]> = logical-and vp<[[VP7]]>, ir<%c2>
+; CHECK-NEXT:    Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT:    join2:
+; CHECK-NEXT:      EMIT vp<[[VP9:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT:      EMIT vp<[[VP10:%[0-9]+]]> = logical-and vp<[[VP8]]>, vp<[[VP9]]>
+; CHECK-NEXT:      EMIT vp<[[VP11:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT:      EMIT vp<[[VP12:%[0-9]+]]> = logical-and vp<[[VP7]]>, vp<[[VP11]]>
+; CHECK-NEXT:      EMIT vp<[[VP13:%[0-9]+]]> = or vp<[[VP10]]>, vp<[[VP12]]>
+; CHECK-NEXT:      BLEND ir<%phi.join2> = ir<1>/vp<[[VP10]]> ir<0>/vp<[[VP12]]>
+; CHECK-NEXT:    Successor(s): latch
+; CHECK-EMPTY:
+; CHECK-NEXT:    latch:
+; CHECK-NEXT:      BLEND ir<%phi1> = ir<1>/ir<%c1> ir<0>/vp<[[VP7]]>
+; CHECK-NEXT:      BLEND ir<%phi2> = ir<1>/vp<[[VP8]]> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT:      EMIT ir<%gep1> = getelementptr ir<%p1>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi1>, ir<%gep1>
+; CHECK-NEXT:      EMIT ir<%gep2> = getelementptr ir<%p2>, ir<%iv>
+; CHECK-NEXT:      EMIT store ir<%phi2>, ir<%gep2>
+; CHECK-NEXT:      EMIT ir<%iv.next> = add ir<%iv>, ir<1>
+; CHECK-NEXT:      EMIT ir<%ec> = icmp eq ir<%iv.next>, ir<%n>
+; CHECK-NEXT:      EMIT vp<%index.next> = add nuw vp<[[VP3]]>, vp<[[VP1:%[0-9]+]]>
+; CHECK-NEXT:      EMIT branch-on-count vp<%index.next>, vp<[[VP2:%[0-9]+]]>
+; CHECK-NEXT:    No successors
+; CHECK-NEXT:  }
+; CHECK-NEXT:  Successor(s): middle.block
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [0, %entry], [%iv.next, %latch]
+  br i1 %c1, label %exiting1, label %join1
+
+exiting1:
+  br i1 %ee1, label %latch, label %join1
+
+join1:
+  %phi.join1 = phi i32 [1, %exiting1], [0, %loop]
+  br i1 %c2, label %exiting2, label %join2
+
+exiting2:
+  br i1 %ee2, label %latch, label %join2
+
+join2:
+  %phi.join2 = phi i32 [1, %exiting2], [0, %join1]
+  br label %latch
+
+latch:
+  %phi1 = phi i32 [1, %exiting1], [%phi.join1, %exiting2], [%phi.join1, %join2]
+  %phi2 = phi i32 [poison, %exiting1], [1, %exiting2], [%phi.join2, %join2]
+  %gep1 = getelementptr i32, ptr %p1, i32 %iv
+  store i32 %phi1, ptr %gep1
+  %gep2 = getelementptr i32, ptr %p2, i32 %iv
+  store i32 %phi2, ptr %gep2
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv.next, %n
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+;     vector.body
+;        /    \
+;       /      \
+;   exiting1  exiting2
+;     / \      /    \
+;    /   \    /      \
+;   /    join1        \
+;  /    /    \         \
+; / exiting3 exiting4  /
+; \   \  \   /  /     /
+;  \   \ join2 /     /
+;   \   \  |  /     /
+;    +---latch-----+
+define void @four_exits_2x2_diamond(ptr %p1, ptr %p2, ptr %p3, ptr %p4, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i1 %ee3, i1 %ee4, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'four_exits_2x2_diamond'
+; CHECK-NEXT:  <x1> vector loop: {
+; CHECK-NEXT:  vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT:    vector.body:
+; CHECK-NEXT:      ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT:    Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting2:
+; CHECK-NEXT:      EMIT vp<[[VP4:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT:    Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting1:
+; CHECK-NEXT:    Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT:    join1:
+; CHECK-NEXT:      EMIT vp<[[VP5:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT:      EMIT vp<[[VP6:%[0-9]+]]> = logical-and vp<[[VP4]]>, vp<[[VP5]]>
+; CHECK-NEXT:      EMIT vp<[[VP7:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT:      EMIT vp<[[VP8:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP7]]>
+; CHECK-NEXT:      EMIT vp<[[VP9:%[0-9]+]]> = or vp<[[VP6]]>, vp<[[VP8]]>
+; CHECK-NEXT:      BLEND ir<%phi1.join1> = ir<0>/vp<[[VP6]]> ir<1>/vp<[[VP8]]>
+; CHECK-NEXT:      BLEND ir<%phi2.join1> = ir<1>/vp<[[VP6]]> ir<0>/vp<[[VP8]]>
+; CHECK-NEXT:    Successor(s): exiting4
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting4:
+; CHECK-NEXT:      EMIT vp<[[VP10:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT:      EMIT vp<[[VP11:%[0-9]+]]> = logical-and vp<[[VP9]]>, vp<[[VP10]]>
+; CHECK-NEXT:    Successor(s): exiting3
+; CHECK-EMPTY:
+; CHECK-NEXT:    exiting3:
+; CHECK-NEXT:      EMIT vp<[[VP12:%[0-9]+]]> = logical-and vp<[[VP9]]>, ir<%c2>
+; CHECK-NEXT:    Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT:    join2:
+; CHECK-NEXT:      EMIT vp<[[VP13:%[0-9]+]]> = not ir<%ee4>
+; CHECK-NEXT:      EMIT vp<[[VP14:%[0-9]+]]> = logical-and vp<[[VP11]]>, vp<[[VP13]]>
+; CHECK-NEXT:      EMIT vp<[[VP15:%[0-9]+]]> = not ir<%ee3>
+; CHECK-NEXT:      EMIT vp<[[VP16:%[0-9]...
[truncated]

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 197445 tests passed
  • 5446 tests skipped

✅ The build succeeded and all tests passed.

@lukel97 lukel97 force-pushed the loop-vectorize/computeBlendMasks branch 2 times, most recently from 9366715 to aedc836 Compare June 8, 2026 07:33
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@lukel97 lukel97 force-pushed the loop-vectorize/computeBlendMasks branch from aedc836 to 672de9b Compare June 9, 2026 11:19
lukel97 added a commit that referenced this pull request Jun 10, 2026
#201783 wants to optimize blend masks by peeking through the contents of
other phi nodes. Currently we eagerly convert phis to blends in reverse
post order, so switch it to post order so that phis at the bottom can
see the phis in their uses.
@lukel97 lukel97 force-pushed the loop-vectorize/computeBlendMasks branch from 672de9b to de9c116 Compare June 10, 2026 09:23
@lukel97

lukel97 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Unstacked now that #201782 is landed

Jianhui-Li pushed a commit to Jianhui-Li/llvm-project that referenced this pull request Jun 11, 2026
llvm#201783 wants to optimize blend masks by peeking through the contents of
other phi nodes. Currently we eagerly convert phis to blends in reverse
post order, so switch it to post order so that phis at the bottom can
see the phis in their uses.
carlobertolli pushed a commit to carlobertolli/llvm-project that referenced this pull request Jun 11, 2026
llvm#201783 wants to optimize blend masks by peeking through the contents of
other phi nodes. Currently we eagerly convert phis to blends in reverse
post order, so switch it to post order so that phis at the bottom can
see the phis in their uses.
Comment thread llvm/include/llvm/Analysis/DominanceFrontier.h
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
Comment thread llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
Comment on lines +302 to +304
SmallVector<EdgeTy> OutEdges;
for (const VPBlockBase *Succ : VPBB->getSuccessors())
OutEdges.emplace_back(VPBB, cast<VPBasicBlock>(Succ));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe use from_range ctor + map_range for the cast? Not sure if that would actually work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to turn this into a mapped iterator instead of storing it in a vector? OutEdges gets traversed twice so we'll end up constructing the edges twice if we use map_range

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking SmallVector<EdgeTy> OutEdges(from_range, map_range(...));, but apparently it doesn't even need/have from_range and I think we can just do

SmallVector<EdgeTy> OutEdges(map_range(...));

(although I didn't actually try, just read through some interfaces).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a try and I'm not sure it's more readable:

    SmallVector<EdgeTy> OutEdges(
        map_range(VPBB->getSuccessors(), [&VPBB](const VPBlockBase *Succ) {
          return std::make_pair(VPBB, cast<VPBasicBlock>(Succ));
        }));

I also think it's nice to use emplace_back where possible, it's more obvious that it's avoiding a copy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either is fine with me. I don't know if mapped_iterator is random access or not, or if SmallVector range ctor can pre-allocate storage for all elements, but theoretically that could be an argument for the ctor version.

lukel97 added 5 commits June 17, 2026 11:03
* Reuse previous method in DomiananceFrontier
* Replace GetAllEqual with a map_range
After thinking about this for a bit this isn't needed.

If a phi doesn't postdominate an incoming block, the incoming block
will have an outgoing edge with no value. So we won't propagate any
further up that incoming block anyway.

What differs between this approach and
llvm#184838 is that the latter
performs a full inverse DFS to see what blocks are reachable, whereas this
just checks that the incoming values are the same at each
postdominance frontier.

The test case phi_doesnt_postdom_incoming shows a scenario where the
full inverse DFS approach could simplify the edge to just c1 and !c1,
but we calculate the conservative (but still correct) edges in this
PR.
@lukel97

lukel97 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/test-suite

@github-actions

Copy link
Copy Markdown

@lukel97

lukel97 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms vectorizers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants