[VPlan] Move tail folding out of VPlanPredicator. NFC by lukel97 · Pull Request #176143 · llvm/llvm-project

lukel97 · 2026-01-15T12:13:52Z

Currently the logic for introducing a header mask and predicating the vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform so that VPlanPredicator.cpp doesn't need to worry about tail folding, which seemed to be a temporary measure according to a comment in VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the vector loop region between the phis in the header and the branch + iv increment in the latch:

Before:

+-------------------------------------------+
|%iv = ...                                  |
|...                                        |
|%iv.next = add %iv, vfxuf                  |
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

After:

+-------------------------------------------+
|%iv = ...                                  |
|%wide.iv = widen-canonical-iv ...          |
|%header-mask = icmp ule %wide.iv, BTC      |---+
|branch-on-cond %header-mask                |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|...                                        |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|%iv.next = add %iv, vfxuf                  |<--+
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

Phis are then inserted in the latch for any value in the loop body that have outside uses, with poison as their incoming value from the header edge.

The motivation for this is to allow us to share the same "predicate all successor blocks" type of predication we do for tail folding, but for early-exit loops in #172454. This may also allow us to directly emit an EVL based header mask, instead of having to match + transform the existing header mask in addExplicitVectorLength.

This also allows us to eventually handle recurrences in the same transform, avoiding the need to special case tail folding in addReductionResultComputation.

Currently the logic for introducing a header mask and predicating the vector loop region is done inside introduceMasksAndLinearize. This splits the tail folding part out into an individual VPlan transform so that VPlanPredicator.cpp doesn't need to worry about tail folding, which seemed to be a temporary measure according to a comment in VPlanTransforms.h. To perform tail folding independently, this splits the "body" of the vector loop region between the phis in the header and the branch + iv increment in the latch: Before: +-------------------------------------------+ |%iv = ... | |... | |%iv.next = add %iv, vfxuf | |branch-on-count %iv.next, vector-trip-count| +-------------------------------------------+ After: +-------------------------------------------+ |%iv = ... | |%wide.iv = widen-canonical-iv ... | |%header-mask = icmp ult %wide.iv, BTC |---+ |branch-on-cond %header-mask | | +-------------------------------------------+ | | | v | +-------------------------------------------+ | |... | | +-------------------------------------------+ | | | v | +-------------------------------------------+ | |%iv.next = add %iv, vfxuf |<--+ |branch-on-count %iv.next, vector-trip-count| +-------------------------------------------+ The motivation for this is to align tail folding predication with early-exit predication in llvm#172454. This will also allow us to directly emit an EVL based header mask, instead of having to match + transform the existing header mask in addExplicitVectorLength.

llvmbot · 2026-01-15T12:14:24Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Currently the logic for introducing a header mask and predicating the vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform so that VPlanPredicator.cpp doesn't need to worry about tail folding, which seemed to be a temporary measure according to a comment in VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the vector loop region between the phis in the header and the branch + iv increment in the latch:

Before:

+-------------------------------------------+
|%iv = ...                                  |
|...                                        |
|%iv.next = add %iv, vfxuf                  |
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

After:

+-------------------------------------------+
|%iv = ...                                  |
|%wide.iv = widen-canonical-iv ...          |
|%header-mask = icmp ult %wide.iv, BTC      |---+
|branch-on-cond %header-mask                |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|...                                        |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|%iv.next = add %iv, vfxuf                  |&lt;--+
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

The motivation for this is to align tail folding predication with early-exit predication in #172454. This will also allow us to directly emit an EVL based header mask, instead of having to match + transform the existing header mask in addExplicitVectorLength.

Full diff: https://github.com/llvm/llvm-project/pull/176143.diff

5 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+6-3)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+5)
(modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+55)
(modified) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+3-64)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+36-6)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3f1e12e5d1cd0..1dcd33828445a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8463,11 +8463,13 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
     InterleaveGroups.insert(IG);
   }
 
+  if (CM.foldTailByMasking())
+    VPlanTransforms::foldTailByMasking(*Plan);
+
   // ---------------------------------------------------------------------------
   // Predicate and linearize the top-level loop region.
   // ---------------------------------------------------------------------------
-  auto BlockMaskCache = VPlanTransforms::introduceMasksAndLinearize(
-      *Plan, CM.foldTailByMasking());
+  auto BlockMaskCache = VPlanTransforms::introduceMasksAndLinearize(*Plan);
 
   // ---------------------------------------------------------------------------
   // Construct wide recipes and apply predication for original scalar
@@ -8726,7 +8728,8 @@ void LoopVectorizationPlanner::addReductionResultComputation(
     auto *RR = dyn_cast<VPReductionRecipe>(OrigExitingVPV->getDefiningRecipe());
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         (!RR || !RR->isPartialReduction())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(PhiR->getParent());
+      VPValue *Cond = RecipeBuilder.getBlockInMask(
+          cast<VPBasicBlock>(PhiR->getParent()->getSuccessors().front()));
       std::optional<FastMathFlags> FMFs =
           PhiTy->isFloatingPointTy()
               ? std::make_optional(RdxDesc.getFastMathFlags())
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index a6a46e36b397d..d8817e1d481a7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -566,6 +566,11 @@ VPBasicBlock *VPBasicBlock::splitAt(iterator SplitAt) {
   auto *SplitBlock = getPlan()->createVPBasicBlock(getName() + ".split");
   VPBlockUtils::insertBlockAfter(SplitBlock, this);
 
+  // If this is the exiting block, make the split the new exiting block.
+  auto *ParentRegion = getParent();
+  if (ParentRegion && ParentRegion->getExiting() == this)
+    ParentRegion->setExiting(SplitBlock);
+
   // Finally, move the recipes starting at SplitAt to new block.
   for (VPRecipeBase &ToMove :
        make_early_inc_range(make_range(SplitAt, this->end())))
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 7ee133545eeb9..97ad127658320 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -958,6 +958,61 @@ void VPlanTransforms::createLoopRegions(VPlan &Plan) {
   TopRegion->getEntryBasicBlock()->setName("vector.body");
 }
 
+void VPlanTransforms::foldTailByMasking(VPlan &Plan) {
+  assert(Plan.getExitBlocks().size() == 1 &&
+         "only a single-exit block is supported currently");
+  assert(Plan.getExitBlocks().front()->getSinglePredecessor() ==
+             Plan.getMiddleBlock() &&
+         "the exit block must have middle block as single predecessor");
+
+  VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+  VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
+
+  Header->splitAt(Header->getFirstNonPhi());
+
+  // Create the header mask, insert it in the header and branch on it.
+  auto *IV =
+      new VPWidenCanonicalIVRecipe(Header->getParent()->getCanonicalIV());
+  VPBuilder Builder(Header, Header->getFirstNonPhi());
+  Builder.insert(IV);
+  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
+  VPValue *HeaderMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
+  Builder.createNaryOp(VPInstruction::BranchOnCond, HeaderMask);
+
+  VPBasicBlock *Latch = LoopRegion->getExitingBasicBlock();
+  VPValue *IVInc;
+  [[maybe_unused]] bool TermBranchOnCount =
+      match(Latch->getTerminator(),
+            m_BranchOnCount(m_VPValue(IVInc),
+                            m_Specific(&Plan.getVectorTripCount())));
+  assert(TermBranchOnCount &&
+         match(IVInc, m_Add(m_Specific(LoopRegion->getCanonicalIV()),
+                            m_Specific(&Plan.getVFxUF()))) &&
+         "Unexpected terminator");
+
+  // Split the latch at the IV update, and branch to it from the header mask.
+  VPBasicBlock *LatchSplit =
+      Latch->splitAt(IVInc->getDefiningRecipe()->getIterator());
+  VPBlockUtils::connectBlocks(Header, LatchSplit);
+
+  // Any extract of the last element must be updated to extract from the last
+  // active lane of the header mask instead (i.e., the lane corresponding to the
+  // last active iteration).
+  Builder.setInsertPoint(Plan.getMiddleBlock()->getTerminator());
+  for (VPRecipeBase &R : *Plan.getMiddleBlock()) {
+    VPValue *Op;
+    if (!match(&R, m_ExtractLastLane(m_ExtractLastPart(m_VPValue(Op)))))
+      continue;
+
+    // Compute the index of the last active lane.
+    VPValue *LastActiveLane =
+        Builder.createNaryOp(VPInstruction::LastActiveLane, HeaderMask);
+    auto *Ext =
+        Builder.createNaryOp(VPInstruction::ExtractLane, {LastActiveLane, Op});
+    R.getVPSingleValue()->replaceAllUsesWith(Ext);
+  }
+}
+
 // Likelyhood of bypassing the vectorized loop due to a runtime check block,
 // including memory overlap checks block and wrapping/unit-stride checks block.
 static constexpr uint32_t CheckBypassWeights[] = {1, 127};
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
index f7e7fc29bc203..96bee083875af 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@@ -14,13 +14,11 @@
 #include "VPRecipeBuilder.h"
 #include "VPlan.h"
 #include "VPlanCFG.h"
-#include "VPlanPatternMatch.h"
 #include "VPlanTransforms.h"
 #include "VPlanUtils.h"
 #include "llvm/ADT/PostOrderIterator.h"
 
 using namespace llvm;
-using namespace VPlanPatternMatch;
 
 namespace {
 class VPPredicator {
@@ -73,9 +71,6 @@ class VPPredicator {
     return EdgeMaskCache.lookup({Src, Dst});
   }
 
-  /// Compute and return the mask for the vector loop header block.
-  void createHeaderMask(VPBasicBlock *HeaderVPBB, bool FoldTail);
-
   /// Compute and return the predicate of \p VPBB, assuming that the header
   /// block of the loop is set to True, or to the loop mask when tail folding.
   VPValue *createBlockInMask(VPBasicBlock *VPBB);
@@ -156,28 +151,6 @@ VPValue *VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
   return BlockMask;
 }
 
-void VPPredicator::createHeaderMask(VPBasicBlock *HeaderVPBB, bool FoldTail) {
-  if (!FoldTail) {
-    setBlockInMask(HeaderVPBB, nullptr);
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  auto &Plan = *HeaderVPBB->getPlan();
-  auto *IV =
-      new VPWidenCanonicalIVRecipe(HeaderVPBB->getParent()->getCanonicalIV());
-  Builder.setInsertPoint(HeaderVPBB, HeaderVPBB->getFirstNonPhi());
-  Builder.insert(IV);
-
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  VPValue *BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  setBlockInMask(HeaderVPBB, BlockMask);
-}
-
 void VPPredicator::createSwitchEdgeMasks(VPInstruction *SI) {
   VPBasicBlock *Src = SI->getParent();
 
@@ -232,7 +205,8 @@ void VPPredicator::createSwitchEdgeMasks(VPInstruction *SI) {
 void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
   SmallVector<VPPhi *> Phis;
   for (VPRecipeBase &R : VPBB->phis())
-    Phis.push_back(cast<VPPhi>(&R));
+    if (auto *PhiR = dyn_cast<VPPhi>(&R))
+      Phis.push_back(PhiR);
   for (VPPhi *PhiR : Phis) {
     // The non-header Phi is converted into a Blend recipe below,
     // so we don't have to worry about the insertion order and we can just use
@@ -261,7 +235,7 @@ void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
 }
 
 DenseMap<VPBasicBlock *, VPValue *>
-VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan, bool FoldTail) {
+VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
   VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
@@ -272,14 +246,6 @@ VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan, bool FoldTail) {
   for (VPBlockBase *VPB : RPOT) {
     // Non-outer regions with VPBBs only are supported at the moment.
     auto *VPBB = cast<VPBasicBlock>(VPB);
-    // Introduce the mask for VPBB, which may introduce needed edge masks, and
-    // convert all phi recipes of VPBB to blend recipes unless VPBB is the
-    // header.
-    if (VPBB == Header) {
-      Predicator.createHeaderMask(Header, FoldTail);
-      continue;
-    }
-
     Predicator.createBlockInMask(VPBB);
     Predicator.convertPhisToBlends(VPBB);
   }
@@ -300,32 +266,5 @@ VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan, bool FoldTail) {
 
     PrevVPBB = VPBB;
   }
-
-  // If we folded the tail and introduced a header mask, any extract of the
-  // last element must be updated to extract from the last active lane of the
-  // header mask instead (i.e., the lane corresponding to the last active
-  // iteration).
-  if (FoldTail) {
-    assert(Plan.getExitBlocks().size() == 1 &&
-           "only a single-exit block is supported currently");
-    assert(Plan.getExitBlocks().front()->getSinglePredecessor() ==
-               Plan.getMiddleBlock() &&
-           "the exit block must have middle block as single predecessor");
-
-    VPBuilder B(Plan.getMiddleBlock()->getTerminator());
-    for (VPRecipeBase &R : *Plan.getMiddleBlock()) {
-      VPValue *Op;
-      if (!match(&R, m_ExtractLastLane(m_ExtractLastPart(m_VPValue(Op)))))
-        continue;
-
-      // Compute the index of the last active lane.
-      VPValue *HeaderMask = Predicator.getBlockInMask(Header);
-      VPValue *LastActiveLane =
-          B.createNaryOp(VPInstruction::LastActiveLane, HeaderMask);
-      auto *Ext =
-          B.createNaryOp(VPInstruction::ExtractLane, {LastActiveLane, Op});
-      R.getVPSingleValue()->replaceAllUsesWith(Ext);
-    }
-  }
   return Predicator.getBlockMaskCache();
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index e0d09a099647a..e00c94558bf7a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -413,14 +413,44 @@ struct VPlanTransforms {
   static void narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
                                      TypeSize VectorRegWidth);
 
+  /// Adapts the vector loop region for tail folding by introducing a header
+  /// mask and predicating the region:
+  ///
+  /// Vector loop region before:
+  /// +-------------------------------------------+
+  /// |%iv = ...                                  |
+  /// |...                                        |
+  /// |%iv.next = add %iv, vfxuf                  |
+  /// |branch-on-count %iv.next, vector-trip-count|
+  /// +-------------------------------------------+
+  ///
+  /// Vector loop region after:
+  /// +-------------------------------------------+
+  /// |%iv = ...                                  |
+  /// |%wide.iv = widen-canonical-iv ...          |
+  /// |%header-mask = icmp ult %wide.iv, BTC      |
+  /// |branch-on-cond %header-mask                |---+
+  /// +-------------------------------------------+   |
+  ///                      |                          |
+  ///                      v                          |
+  /// +-------------------------------------------+   |
+  /// |                   ...                     |   |
+  /// +-------------------------------------------+   |
+  ///                      |                          |
+  ///                      v                          |
+  /// +-------------------------------------------+   |
+  /// |%iv.next = add %iv, vfxuf                  |<--+
+  /// |branch-on-count %iv.next, vector-trip-count|
+  /// +-------------------------------------------+
+  ///
+  /// Any VPInstruction::ExtractLastLanes are also updated to extract from the
+  /// last active lane of the header mask.
+  static void foldTailByMasking(VPlan &Plan);
+
   /// Predicate and linearize the control-flow in the only loop region of
-  /// \p Plan. If \p FoldTail is true, create a mask guarding the loop
-  /// header, otherwise use all-true for the header mask. Masks for blocks are
-  /// added to a block-to-mask map which is returned in order to be used later
-  /// for wide recipe construction. This argument is temporary and will be
-  /// removed in the future.
+  /// \p Plan.
   static DenseMap<VPBasicBlock *, VPValue *>
-  introduceMasksAndLinearize(VPlan &Plan, bool FoldTail);
+  introduceMasksAndLinearize(VPlan &Plan);
 
   /// Add branch weight metadata, if the \p Plan's middle block is terminated by
   /// a BranchOnCond recipe.

lukel97 · 2026-01-15T12:18:54Z

        (!RR || !RR->isPartialReduction())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(PhiR->getParent());
+      VPValue *Cond = RecipeBuilder.getBlockInMask(
+          cast<VPBasicBlock>(PhiR->getParent()->getSuccessors().front()));


Cond here is supposed to be the header mask IIUC, and PhiR's parent is the header block. But the header block itself is no longer predicated, only the "body" block, so this gets the header mask from its successor instead.

I think that's essentially my question about SSA form. I would expect that select to be created by predicator automatically if we maintain proper SSA form, i.e.,

; after tail folding header: %red-phi = phi [ init, %preheader ], [ %new-phi, %latch ] ... br i1 %header-mask, label %body, label %latch body: %red-phi.next = ... ... br label %latch latch: ; Will be transformed into a VPBlend and subsequently into a select with existing predicator: new-phi = phi [ %red-phi, %header ], [ %red-phi.next, %body ] ... exit: %use = phi [ %new-phi, %latch ]

I gave this a try and I think it's possible if you explicitly handle the VPReductionPHIRecipes in foldTailByMasking, it would make for a nice tidy up. But it doesn't seem to be an NFC change. I can add a TODO to move this into VPlanTransforms::foldTailByMasking as a follow up?

What special handling? Is it still necessary after you've optimized blends per your comment in #176143 (comment)?

You need to check for reduction phis when creating the latch phi and set their incoming value to the reduction phi, as opposed to poison. It's still needed after optimizing the blends:

@@ -1014,7 +1014,12 @@ void VPlanTransforms::foldTailByMasking(VPlan &Plan) { continue; VPValue *Poison = Plan.getOrAddLiveIn( PoisonValue::get(V->getUnderlyingValue()->getType())); + auto *RedIt = find_if(V->users(), IsaPred<VPReductionPHIRecipe>); + if (RedIt != V->user_end()) + Poison = cast<VPRecipeBase>(*RedIt)->getVPSingleValue(); VPValue *Phi = Builder.createScalarPhi({V, Poison}, {}); + if (RedIt != V->user_end() && PredicatedReductionSelect) + cast<VPRecipeBase>(*RedIt)->setOperand(1, Phi); V->replaceUsesWithIf(Phi, NeedsPhi); } }

You're right. I now think that tail folding belongs to the predicator :) More specifically, handling of the divergent loop exit belongs to the predicator. Consider outer loop vectorization:

inner.header: %inner.iv = phi %inner.red = phi [ 0, %inner.preheader ], [ %inner.red.next, %inner.header ] ... %ec = icmp %inner.iv.next, %n br i1 %ec, %inner.exit, %inner.header inner.exit: %inner.red.lcssa = phi [ %inner.red.next ]

If %n is inner-loop-invariant and all lanes exit at the same time, no special handling for the %inner.red is needed - it can simply be widened and all SIMD lanes would operate independently but exactly the same, in a sense. If, for example, %n == %outer.iv and some lanes exit inner loop earlier than the others, then exactly the same processing as you're talking about would be necessary for %inner.red. However, it has nothing to do with the outer loop we're vectorizing and/or its tail folding, thus that special handling belongs to the predicator.

I think the right way to communicate tail masking is to tell predicator that [outer|our] loop backedge isn't uniform any more, and let it do the processing necessary. We'll reuse the same functionality later when we start supporting proper outer loop vectorization.

Early exits, on the other hand, are different. We don't take backedge if any lane has exited, so the backedge branch remains uniform. As such, early exits from the loop we're vectorizing are unlikely to belong to the predicator, even if there might be some similarities.

To clarify, the predicator is still ultimately doing performing the tail folding. The only difference is that we're now explicitly modelling the creation of the header mask and the predication of the blocks in VPlan in a separate, isolated step.

To the best of my understanding, this seems to match the original direction, see this comment here:

llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines 9277 to 9278 in dabc84c

// Create mask based on the IR BB corresponding to VPBB.

// TODO: Predicate directly based on VPlan.

// Create mask based on the IR BB corresponding to VPBB.
// TODO: Predicate directly based on VPlan.

Hope to find time to track this patch more closely, but regarding original directions, one laid out back in '23 was to first (always) tail-fold, when strip-mining the scalar VPlan by a symbolic strip-mine factor SF (later to become VF*UF), then predicate, as part of VPlan-creation, then tail-unfold as a cost-based VF-dependent decision:
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf#page=37

There could be multiple ways to get there from where we are now.

Thanks for sharing that. I think this patch is somewhat aligned in that direction, since it models the CFG in the strip-mine step from that slide.

As a side note, if we want to tail-unfold then presumably there'll be a step where we'll want to remove the header mask by e.g. replacing it with all ones. I think expanding VPBlendRecipes to VPInstruction::Select would help with this since any select with an all-ones mask just gets folded away: #171851

eas

This breaks SSA form for the exiting values (outside uses might no longer be dominated by the defs) and headers phis' incoming values over the backedge. Subsequent predication would restore it, and it happens (at the moment of this patch) immediately, but I wonder what would happen if we make some future changes in the pipeline.

Do you think maintaining proper SSA form would be worthwhile here?

eas · 2026-01-20T19:31:01Z

+  // Split the latch at the IV update, and branch to it from the header mask.
+  VPBasicBlock *LatchSplit =
+      Latch->splitAt(IVInc->getDefiningRecipe()->getIterator());


Can we assert that there are no unexpected instructions in the LatchSplit (e.g. if someone tries to change pass ordering in future)?

I think that's what the asserts do above, that it only contains the canonical IV increment and the branch-on-count.

that it ONLY contains

Sorry, I don't see that. I'm expecting these:

assert(std::distance(Latch->begin(), Latch->end() == 2); assert(IVInc->getParent() == Latch); // Ensured by the verifier: assert(Latch->getTerminator->getParent() == Latch);

IIUC the asserts above check for that, but just stricter. The two asserts ensure that the latch is always of the form:

latch: ... arbitrary recipes %ivinc = add %can-iv, vfxuf %branch-on-count %ivinc, vector-trip-count successors: middle.block, header

So when we split at %ivinc, the new structure should always be:

latch: ... arbitrary recipes successors: latch.split latch.split: %ivinc = add %can-iv, vfxuf %branch-on-count %ivinc, vector-trip-count successors: middle.block, header

Woops, I was only considering the use def chain. Added an assert in df47d1e

eas · 2026-01-20T20:49:58Z

+  // Any extract of the last element must be updated to extract from the last
+  // active lane of the header mask instead (i.e., the lane corresponding to the
+  // last active iteration).


Is this something that we're ok with, or do we need a TODO to move tail folding even earlier in the pipeline before last lane extracts are introduced (or anything else, really, that needs explicit handling here)?

I think it's fine to handle it here with the rest of the tail folding transform, so it's all kept in one place. The header mask won't be available up until this point anyway.

In general I think it's best to do transforms atomically so you don't have an incorrect intermediate VPlan between transforms

One way to make it work automatically is to allow a mask on the ExtractLastElement. Then any instances introduced before predication would be masked automatically. And then we can either modify its codegen or lower into a more verbose representation by the end of the pipeline. #142285 could make that simple enough, maybe.

There's an invariant that the mask needs to be a prefix mask, i.e. all 1s before all 0s, see LastActiveLane, so we wouldn't be able to lower any arbitrary mask.

So, header-mask will work and that's the one we're creating here, right? And that can be asserted for in the predicator, I think, although somewhat ugly: Mask->getParent() == Header && match(Mask, m_Cmp(LT/LE, IV, m_LoopInvariant()). Or, if everything else, just change how we lower LastActiveLane to support arbitrary mask. That could help in other places, e.g. with generalizing #141900 to skip blocks that have masks other than the header mask.

Given that this is existing code, I think we should we defer this conversation to a separate PR if we want to change the approach

lukel97 · 2026-01-21T16:44:24Z

This breaks SSA form for the exiting values (outside uses might no longer be dominated by the defs) and headers phis' incoming values over the backedge. Subsequent predication would restore it, and it happens (at the moment of this patch) immediately, but I wonder what would happen if we make some future changes in the pipeline.

Do you think maintaining proper SSA form would be worthwhile here?

Yes, in practice the vector region's control flow is immediately linearized afterwards so everything is dominated.

This patch has a slightly weird use of the CFG, because the branch-on-cond doesn't use a scalar i1 condition but a vector mask instead. So it's never actually "branching" to begin with and relies on being lineraized and removed.

I think this is a good point though. In #172454 the original I approach I used was to insert custom VPInstructions to indicate to VPPredicator that a certain point needed to be masked, without touching the CFG itself and preserving SSA: da6eba9

We could try going back down this route. But this time I would maybe avoid the custom VPInstruction and instead just pass to VPPredicator a map of blocks that should be predicated down to the latch. Something like DenseMap<VPBasicBlock *, VPValue *> PredicationPoints;, where each successor of a basic block in the map has the mask value added to its edge mask.

…cator This preserves SSA

lukel97 · 2026-01-21T17:15:19Z

I've tried out the explicit DenseMap approach in ed18c28 which should avoid the faux-control flow now.

eas · 2026-01-21T18:05:38Z

because the branch-on-cond doesn't use a scalar i1 condition but a vector mask instead.

How is this a problem? Isn't it what predicator sees for the original control flow from VPlan construction (modulo "vector mask" -> "varying i1 condition" renaming, which are essentially the same anyway)? I think explicit control flow while maintaining SSA is a better choice:

VPlan is in consistent (and potentially verifiable) state
There is no implicit state being passed anywhere, the information is explicit in the VPlan and can be printed/CHECK'ed
We'd probably get phi->select transformation "for free", see my comment in [VPlan] Move tail folding out of VPlanPredicator. NFC #176143 (comment)

fhahn

Do you know if the verifier would have caught that?

Given that we now need to pass the extra state between transforms, I am wondering if it still helps for #172454? That would then also require passing more state between transforms further appart, right?

I think #172454 would be a good candidate for using the masked VPInsruction capabilities from #142285; for early exits, we can mask the relevant VPInstructions early.

lukel97 · 2026-01-22T04:29:19Z

Isn't it what predicator sees for the original control flow from VPlan construction (modulo "vector mask" -> "varying i1 condition" renaming, which are essentially the same anyway)?

Yes but the vector mask condition is the problem since VPInstruction::BranchOnCond isn't designed as a masking construct but for regular control flow, i.e. see VPInstruction::usesFirstLaneOnly where it states the condition operand is scalar. In retrospect I think it was more a coincidence that VPPredicator happened to work with it because it was automatically broadcasting scalars when predicating.

VPlan is in consistent (and potentially verifiable) state

Even if we do add PHIs in the latch to keep SSA, the BranchOnCond instructions will be invalid so it will fail the VPlanVerifier anyway. FWIW I'm not strongly opinionated on either approach

Do you know if the verifier would have caught that?

Yes it checks for domination, but at this stage during VPlan construction it doesn't pass anyway because e.g. unhandled VPInstruction opcodes that aren't yet widened to VPWidenRecipe.

I think #172454 would be a good candidate for using the masked VPInsruction capabilities from #142285; for early exits, we can mask the relevant VPInstructions early.

I didn't notice that PR before, I think we could use it for both this PR and #172454. But we would need to teach VPlanPredicator to logically-and masks with each VPInstructions' existing mask. I'm not sure if we would then need to keep around a cache or something so that we don't end up with a logical-and for each VPInstruction.

lukel97 · 2026-01-22T10:57:37Z

I tried reworking this PR and #172454 to do predication via VPInstruction mask operands in #142285, but it turned out to be very difficult since you need to merge the mask operands during both tail folding, early exits and linearization.

Not only do you have to worry about where to insert logical-ands and deduplicating them, the big issue I found is that if you e.g. want to predicate everything past the exiting block here, you have to start dealing with control flow joins:

flowchart
    foo --> exiting --> bar
    foo --> bar

And in that case you end up reimplementing what VPPredicator already does.

Given that we now need to pass the extra state between transforms, I am wondering if it still helps for #172454? That would then also require passing more state between transforms further appart, right?

I've removed the state in 81332b3, and instead used a new VPInstruction to indicate to VPPredicator that the successors should be predicated.

I was able to reuse this approach for #172454 too, and it works after rebasing on top of #142285.

eas · 2026-01-22T16:12:57Z

Yes but the vector mask condition is the problem since VPInstruction::BranchOnCond isn't designed as a masking construct but for regular control flow, i.e. see VPInstruction::usesFirstLaneOnly where it states the condition operand is scalar. In retrospect I think it was more a coincidence that VPPredicator happened to work with it because it was automatically broadcasting scalars when predicating.

For

header:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
  %mask = icmp slt i64 %iv, %n
  br i1 %mask, label %body, label %latch

body:
  %gep = getelementptr i64, ptr %ptr, i64 %iv
  %ld = load i64, ptr %gep
  %add = add i64 %ld, %iv
  store i64 %add, ptr %gep
  br label %latch

latch:
  %iv.next = add nsw i64 %iv, 1
  %exitcond = icmp slt i64 %iv.next, %n
  br i1 %exitcond, label %header, label %exit

Right before predicator (current trunk):

<x1> vector loop: {
  vector.body:
    EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
    ir<%iv> = WIDEN-INDUCTION nsw ir<0>, ir<1>, vp<%0>
    EMIT ir<%mask> = icmp slt ir<%iv>, ir<%n>
    EMIT branch-on-cond ir<%mask>          ;; ! branch-on-cond too !
  Successor(s): body, latch

  body:
    EMIT ir<%gep> = getelementptr ir<%ptr>, ir<%iv>
    EMIT ir<%ld> = load ir<%gep>
    EMIT ir<%add> = add ir<%ld>, ir<%iv>
    EMIT store ir<%add>, ir<%gep>
  Successor(s): latch

  latch:
    EMIT ir<%iv.next> = add nsw ir<%iv>, ir<1>
    EMIT ir<%exitcond> = icmp slt ir<%iv.next>, ir<%n>
    EMIT vp<%5> = not ir<%exitcond>
    EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
    EMIT branch-on-count vp<%index.next>, vp<%2>
  No successors
}
Successor(s): middle.block

How is that different from manually created icmp? I don't think predicator looks into BranchOnCond->getOperand(0)->getUnderlyingValue() (and really there would be no reason for it to do that).

lukel97 · 2026-01-23T11:41:16Z

How is that different from manually created icmp? I don't think predicator looks into BranchOnCond->getOperand(0)->getUnderlyingValue() (and really there would be no reason for it to do that).

TIL the header phis are widened before linearization. I had presumed that they were just widened alongside regular phis in the loop body after linearization. In that case I feel better about using BranchOnCond, let me give it a second try but preserving SSA.

This reverts commit 81332b3.

… VPPredicator" This reverts commit ed18c28.

lukel97 · 2026-01-23T16:28:59Z

+    auto NotPoison = [&](VPValue *V) {
+      // Don't remove poison from phis from the original loop.
+      return PhiR->getUnderlyingValue() || !isa<VPIRValue>(V) ||
+             !isa<PoisonValue>(cast<VPIRValue>(V)->getValue());
+    };


@eas now that we have phis to keep SSA, the incoming value for the header block (i.e. the tail lanes) is poison. But if we don't remove the redundant phis a lot of stuff breaks around the reduction computation matching stuff, and we get other regressions with unnecessary blends. So this prevents the redundant blends from being created.

There is a patch (by Florian, I think) that changes predicator to emit selects instead of blends. With that, IIUC, we'd just do (pseudocode)

auto *Result = nullptr; for (auto [Val, Edge] : orig_phi->values()) { if (is_poison(Val) || Result == Val) continue; Result = Result ? createSelect(EdgeCondition, Result, Val) : Val; }

eliminating this whole block of special casing. Is that accurate enough? If so, then I think whatever this patch needs to do temporarily isn't an issue. I'm not sure if the same can be emulated with the VPBlend, but probably the above can be rewritten in terms of filling in the blend's operands.

Is it #171851, and the successor #150369? Those patches essentially do the code snippet above.

But for this patch at least we need this special casing, since we don't yet have a simplification for a select with a poison value. And we also don't run simplification this early on yet, so we would need to manually call it here. But all for another patch.

Oh, It was yours :) Yes, #150369 is the one I had in mind.

fhahn · 2026-01-25T23:27:28Z

Not only do you have to worry about where to insert logical-ands and deduplicating them, the big issue I found is that if you e.g. want to predicate everything past the exiting block here, you have to start dealing with control flow joins:

It seems with the the fold from #177887, the first part does not seem too bad: d15e2db

As for the second part, I need to think a bit more about it, but I think similarly with the explicit CFG we similarly need to make sure we insert phis to preserve SSA form?

eas · 2026-02-17T17:02:23Z

Regarding early exits, one way to support them may be to move predication before early exit handling (and before creating regions), retain the early-exiting edges, then handle early-exiting edges when handling early exits.

I have some strong doubts about this. My reasoning for this is that early exits from the loop we're vectorizing are vastly different from early exits from the inner loops (that the predicator would need to handle someday), so it's not "early exit" but an "early exit from that particular loop" and the early semantics lowering of the incoming source code (either implicit via legal or explicit if we implement more of OMP SIMD/other similar annotation). I also believe that such "import of semantics" should be done early on before more generic processing (like the linearization/predication).

This silently just happened to work because these phis are just replaced with the same original value.

…ags. NFC (#181694) In order to be able to create selects for reduction phis through tail folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an instance of VPIRFlags and plumb the FMFs from the original RdxDesc. This allows us to remove more uses of the RecurrenceDescriptor in addReductionResultComputation, which should help untie it from LoopVectorizationLegality.

lukel97 · 2026-02-25T17:07:53Z

Ping

…ze/move-foldTailByMasking

github-actions · 2026-03-04T12:38:05Z

🐧 Linux x64 Test Results

170611 tests passed
3097 tests skipped
1 test failed

Failed Tests

(click on a test name to see its output)

lldb-api

lldb-api.functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py

Script:
--
/usr/bin/python3 /home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib --env LLVM_INCLUDE_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/include --env LLVM_TOOLS_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin --libcxx-include-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/include/c++/v1 --libcxx-include-target-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/include/x86_64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib/x86_64-unknown-linux-gnu --arch x86_64 --build-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex --lldb-module-cache-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/lldb --compiler /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/clang --dsymutil /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin --lldb-obj-root /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/lldb --lldb-libs-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib --cmake-build-type Release /home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list -p TestDataFormatterGenericList.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 23.0.0git (https://github.com/llvm/llvm-project revision c080404e7c94e4ed758f822a346b42d048d16391)
  clang revision c080404e7c94e4ed758f822a346b42d048d16391
  llvm revision c080404e7c94e4ed758f822a346b42d048d16391
Skipping the following test categories: msvcstl, dsym, pdb, gmodules, debugserver, objc

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_dsym) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_dwarf)
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_dsym) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_dwarf)
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_dsym) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_dwarf) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_dwo) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_dsym) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_dwarf)
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dsym) (test case does not fall in any category of interest for this run) 
FAIL: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwarf)
FAIL: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_dsym) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_dwarf) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_dwo) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_pdb) (test case does not fall in any category of interest for this run) 
======================================================================
FAIL: test_with_run_command_libstdcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwarf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1858, in test_method
    return attrvalue(self)
           ^^^^^^^^^^^^^^^
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 302, in test_with_run_command_libstdcpp
    self.do_test_with_run_command(is_libstdcpp=True)
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 64, in do_test_with_run_command
    self.expect(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2487, in expect
    self.runCmd(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1049, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Command 'frame variable &numbers_list._M_impl._M_node --raw' did not return successfully
Error output:
error: <user expression 0>:1:15: "_M_impl" is not a member of "(int_list) numbers_list"
   1 | &numbers_list._M_impl._M_node
     | ^

Config=x86_64-/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang
======================================================================
FAIL: test_with_run_command_libstdcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1858, in test_method
    return attrvalue(self)
           ^^^^^^^^^^^^^^^
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 302, in test_with_run_command_libstdcpp
    self.do_test_with_run_command(is_libstdcpp=True)
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 64, in do_test_with_run_command
    self.expect(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2487, in expect
    self.runCmd(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1049, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Command 'frame variable &numbers_list._M_impl._M_node --raw' did not return successfully
Error output:
error: <user expression 0>:1:15: "_M_impl" is not a member of "(int_list) numbers_list"
   1 | &numbers_list._M_impl._M_node
     | ^

Config=x86_64-/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang
----------------------------------------------------------------------
Ran 24 tests in 3.542s

FAILED (failures=2, skipped=16)

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

fhahn

LGTM, thanks!

fhahn · 2026-03-04T16:25:38Z

+  // Collect any values defined in the loop that need a phi. Currently this
+  // includes header phi backedges and live-outs extracted in the middle block.
+  // TODO: Handle early exits via Plan.getExitBlocks()
+  MapVector<VPValue *, SmallVector<VPUser *>> NeedsPhi;


May be good to have a slightly more descriptive name, although I don't really have any good suggestions.

Agreed but I'm also drawing a blank. I guess it's something like "live-out of the predicated section". I'll try and think of something in a follow up PR.

fhahn · 2026-03-04T16:38:08Z

+  /// +-------------------------------------------+
+  /// |%iv = ...                                  |
+  /// |%wide.iv = widen-canonical-iv ...          |
+  /// |%header-mask = icmp ult %wide.iv, BTC      |


Suggested change

/// |%header-mask = icmp ult %wide.iv, BTC |

/// |%header-mask = icmp ule %wide.iv, BTC |

Mel-Chen · 2026-03-05T07:27:01Z

@lukel97 Please land this patch before #173265. I guess we will get regression if we land #173265 before this patch.

lukel97 · 2026-03-05T07:30:02Z

@lukel97 Please land this patch before #173265. I guess we will get regression if we land #173265 before this patch.

Oh, I was actually waiting for #173265 to land first to see if we would need to generalize it to the IDom. I'll land this first though then, and if we need to generalize it then I guess you can just do that in #173265

Mel-Chen · 2026-03-05T07:34:47Z

@lukel97 Please land this patch before #173265. I guess we will get regression if we land #173265 before this patch.

Oh, I was actually waiting for #173265 to land first to see if we would need to generalize it to the IDom. I'll land this first though then, and if we need to generalize it then I guess you can just do that in #173265

#173265 can handle non-tail folding case first. We can create another patch to handle tail folding case. :)

…ze/move-foldTailByMasking

lukel97 requested review from Mel-Chen, arcbbb, ayalz, david-arm and fhahn January 15, 2026 12:13

llvmbot added vectorizers llvm:transforms labels Jan 15, 2026

lukel97 commented Jan 15, 2026

View reviewed changes

Comment thread llvm/lib/Transforms/Vectorize/VPlan.cpp

lukel97 commented Jan 15, 2026

View reviewed changes

eas reviewed Jan 20, 2026

View reviewed changes

eas mentioned this pull request Jan 21, 2026

[VPlan] Handle early exit loops with non-dereferenceable loads in latch #172454

Closed

Don't use branch-on-cond, explicitly pass mask in densemap to VPPredi…

ed18c28

…cator This preserves SSA

fhahn reviewed Jan 21, 2026

View reviewed changes

Don't pass state, use VPInstruction::PredicateSuccessors

81332b3

lukel97 added 3 commits January 23, 2026 19:43

Revert "Don't pass state, use VPInstruction::PredicateSuccessors"

5475f48

This reverts commit 81332b3.

Revert "Don't use branch-on-cond, explicitly pass mask in densemap to…

d7bb111

… VPPredicator" This reverts commit ed18c28.

Maintain SSA

b7842de

lukel97 commented Jan 23, 2026

View reviewed changes

Remove unused VPTypeAnalysis

b88529d

Assert no other recipes are in new latch

df47d1e

lukel97 added 3 commits February 18, 2026 00:38

Use VPTypeAnalysis, rework to collect users separately and then add phis

8ef954d

Add test cases for header with only phis, and live-out defined by phi

ae85948

Explicitly check for VPWidenInductionRecipe

23a6010

lukel97 mentioned this pull request Feb 17, 2026

[LV] Add initial support for partial alias masking #177599

Draft

Fix replacing users inside the loop

5e1e585

This silently just happened to work because these phis are just replaced with the same original value.

fhahn reviewed Feb 19, 2026

View reviewed changes

Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated

Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated

lukel97 added 2 commits February 20, 2026 13:17

Use MapVector and cast VPHeaderPHIRecipe

4df3ab1

Merge branch 'main' into loop-vectorize/move-foldTailByMasking

d0bd50b

lukel97 mentioned this pull request Feb 23, 2026

[VPlan] Supported conditionally executed single early exits. #182395

Merged

Merge branch 'main' into loop-vectorize/move-foldTailByMasking

27d24d0

fhahn reviewed Mar 1, 2026

View reviewed changes

lukel97 added 4 commits March 2, 2026 17:47

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

6ed250a

…ze/move-foldTailByMasking

Address review comments

355bbb3

Remove other ExitingIVValue match, use m_ExtractLastLaneOfLastPart

a89f7dc

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

12fa4a5

…ze/move-foldTailByMasking

fhahn approved these changes Mar 4, 2026

View reviewed changes

lukel97 added 2 commits March 5, 2026 15:39

Fix cmp predicate in diagram

0d841f7

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

41afe8e

…ze/move-foldTailByMasking

lukel97 enabled auto-merge (squash) March 5, 2026 07:43

lukel97 merged commit 8251293 into llvm:main Mar 5, 2026
9 of 10 checks passed

eas mentioned this pull request Apr 10, 2026

[LV] Header masked condition of FindLastSelect when optimizeFindIVReductions when folding tail. #191166

Merged

lukel97 mentioned this pull request Jun 5, 2026

[VPlan] Use control flow to implement MaskedCond and preserve SSA #201784

Open

	// Create mask based on the IR BB corresponding to VPBB.
	// TODO: Predicate directly based on VPlan.

	/// \|%header-mask = icmp ult %wide.iv, BTC \|
	/// \|%header-mask = icmp ule %wide.iv, BTC \|

Uh oh!

Conversation

lukel97 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eas Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eas Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jan 21, 2026

Uh oh!

eas commented Jan 21, 2026

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jan 22, 2026

Uh oh!

lukel97 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eas commented Jan 22, 2026

Uh oh!

lukel97 commented Jan 23, 2026

lukel97 commented Jan 15, 2026 •

edited

Loading

llvmbot commented Jan 15, 2026 •

edited

Loading

lukel97 Jan 23, 2026 •

edited

Loading

lukel97 Jan 23, 2026 •

edited

Loading

eas Jan 23, 2026 •

edited

Loading

lukel97 Jan 28, 2026 •

edited

Loading

eas Jan 23, 2026 •

edited

Loading

lukel97 commented Jan 21, 2026 •

edited

Loading

lukel97 commented Jan 22, 2026 •

edited

Loading

lukel97 Jan 23, 2026 •

edited

Loading