Skip to content

[VPlan] Move tail folding out of VPlanPredicator. NFC#176143

Merged
lukel97 merged 36 commits into
llvm:mainfrom
lukel97:loop-vectorize/move-foldTailByMasking
Mar 5, 2026
Merged

[VPlan] Move tail folding out of VPlanPredicator. NFC#176143
lukel97 merged 36 commits into
llvm:mainfrom
lukel97:loop-vectorize/move-foldTailByMasking

Conversation

@lukel97

@lukel97 lukel97 commented Jan 15, 2026

Copy link
Copy Markdown
Contributor

Currently the logic for introducing a header mask and predicating the vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform so that VPlanPredicator.cpp doesn't need to worry about tail folding, which seemed to be a temporary measure according to a comment in VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the vector loop region between the phis in the header and the branch + iv increment in the latch:

Before:

+-------------------------------------------+
|%iv = ...                                  |
|...                                        |
|%iv.next = add %iv, vfxuf                  |
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

After:

+-------------------------------------------+
|%iv = ...                                  |
|%wide.iv = widen-canonical-iv ...          |
|%header-mask = icmp ule %wide.iv, BTC      |---+
|branch-on-cond %header-mask                |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|...                                        |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|%iv.next = add %iv, vfxuf                  |<--+
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

Phis are then inserted in the latch for any value in the loop body that have outside uses, with poison as their incoming value from the header edge.

The motivation for this is to allow us to share the same "predicate all successor blocks" type of predication we do for tail folding, but for early-exit loops in #172454. This may also allow us to directly emit an EVL based header mask, instead of having to match + transform the existing header mask in addExplicitVectorLength.

This also allows us to eventually handle recurrences in the same transform, avoiding the need to special case tail folding in addReductionResultComputation.

Currently the logic for introducing a header mask and predicating the vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform so that VPlanPredicator.cpp doesn't need to worry about tail folding, which seemed to be a temporary measure according to a comment in VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the vector loop region between the phis in the header and the branch + iv increment in the latch:

Before:

    +-------------------------------------------+
    |%iv = ...                                  |
    |...                                        |
    |%iv.next = add %iv, vfxuf                  |
    |branch-on-count %iv.next, vector-trip-count|
    +-------------------------------------------+

After:

    +-------------------------------------------+
    |%iv = ...                                  |
    |%wide.iv = widen-canonical-iv ...          |
    |%header-mask = icmp ult %wide.iv, BTC      |---+
    |branch-on-cond %header-mask                |   |
    +-------------------------------------------+   |
                         |                          |
                         v                          |
    +-------------------------------------------+   |
    |...                                        |   |
    +-------------------------------------------+   |
                         |                          |
                         v                          |
    +-------------------------------------------+   |
    |%iv.next = add %iv, vfxuf                  |<--+
    |branch-on-count %iv.next, vector-trip-count|
    +-------------------------------------------+

The motivation for this is to align tail folding predication with early-exit predication in llvm#172454. This will also allow us to directly emit an EVL based header mask, instead of having to match + transform the existing header mask in addExplicitVectorLength.
@llvmbot

llvmbot commented Jan 15, 2026

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

Currently the logic for introducing a header mask and predicating the vector loop region is done inside introduceMasksAndLinearize.

This splits the tail folding part out into an individual VPlan transform so that VPlanPredicator.cpp doesn't need to worry about tail folding, which seemed to be a temporary measure according to a comment in VPlanTransforms.h.

To perform tail folding independently, this splits the "body" of the vector loop region between the phis in the header and the branch + iv increment in the latch:

Before:

+-------------------------------------------+
|%iv = ...                                  |
|...                                        |
|%iv.next = add %iv, vfxuf                  |
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

After:

+-------------------------------------------+
|%iv = ...                                  |
|%wide.iv = widen-canonical-iv ...          |
|%header-mask = icmp ult %wide.iv, BTC      |---+
|branch-on-cond %header-mask                |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|...                                        |   |
+-------------------------------------------+   |
                     |                          |
                     v                          |
+-------------------------------------------+   |
|%iv.next = add %iv, vfxuf                  |&lt;--+
|branch-on-count %iv.next, vector-trip-count|
+-------------------------------------------+

The motivation for this is to align tail folding predication with early-exit predication in #172454. This will also allow us to directly emit an EVL based header mask, instead of having to match + transform the existing header mask in addExplicitVectorLength.


Full diff: https://github.com/llvm/llvm-project/pull/176143.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+6-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+55)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+3-64)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+36-6)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3f1e12e5d1cd0..1dcd33828445a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8463,11 +8463,13 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
     InterleaveGroups.insert(IG);
   }
 
+  if (CM.foldTailByMasking())
+    VPlanTransforms::foldTailByMasking(*Plan);
+
   // ---------------------------------------------------------------------------
   // Predicate and linearize the top-level loop region.
   // ---------------------------------------------------------------------------
-  auto BlockMaskCache = VPlanTransforms::introduceMasksAndLinearize(
-      *Plan, CM.foldTailByMasking());
+  auto BlockMaskCache = VPlanTransforms::introduceMasksAndLinearize(*Plan);
 
   // ---------------------------------------------------------------------------
   // Construct wide recipes and apply predication for original scalar
@@ -8726,7 +8728,8 @@ void LoopVectorizationPlanner::addReductionResultComputation(
     auto *RR = dyn_cast<VPReductionRecipe>(OrigExitingVPV->getDefiningRecipe());
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         (!RR || !RR->isPartialReduction())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(PhiR->getParent());
+      VPValue *Cond = RecipeBuilder.getBlockInMask(
+          cast<VPBasicBlock>(PhiR->getParent()->getSuccessors().front()));
       std::optional<FastMathFlags> FMFs =
           PhiTy->isFloatingPointTy()
               ? std::make_optional(RdxDesc.getFastMathFlags())
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index a6a46e36b397d..d8817e1d481a7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -566,6 +566,11 @@ VPBasicBlock *VPBasicBlock::splitAt(iterator SplitAt) {
   auto *SplitBlock = getPlan()->createVPBasicBlock(getName() + ".split");
   VPBlockUtils::insertBlockAfter(SplitBlock, this);
 
+  // If this is the exiting block, make the split the new exiting block.
+  auto *ParentRegion = getParent();
+  if (ParentRegion && ParentRegion->getExiting() == this)
+    ParentRegion->setExiting(SplitBlock);
+
   // Finally, move the recipes starting at SplitAt to new block.
   for (VPRecipeBase &ToMove :
        make_early_inc_range(make_range(SplitAt, this->end())))
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 7ee133545eeb9..97ad127658320 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -958,6 +958,61 @@ void VPlanTransforms::createLoopRegions(VPlan &Plan) {
   TopRegion->getEntryBasicBlock()->setName("vector.body");
 }
 
+void VPlanTransforms::foldTailByMasking(VPlan &Plan) {
+  assert(Plan.getExitBlocks().size() == 1 &&
+         "only a single-exit block is supported currently");
+  assert(Plan.getExitBlocks().front()->getSinglePredecessor() ==
+             Plan.getMiddleBlock() &&
+         "the exit block must have middle block as single predecessor");
+
+  VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
+  VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
+
+  Header->splitAt(Header->getFirstNonPhi());
+
+  // Create the header mask, insert it in the header and branch on it.
+  auto *IV =
+      new VPWidenCanonicalIVRecipe(Header->getParent()->getCanonicalIV());
+  VPBuilder Builder(Header, Header->getFirstNonPhi());
+  Builder.insert(IV);
+  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
+  VPValue *HeaderMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
+  Builder.createNaryOp(VPInstruction::BranchOnCond, HeaderMask);
+
+  VPBasicBlock *Latch = LoopRegion->getExitingBasicBlock();
+  VPValue *IVInc;
+  [[maybe_unused]] bool TermBranchOnCount =
+      match(Latch->getTerminator(),
+            m_BranchOnCount(m_VPValue(IVInc),
+                            m_Specific(&Plan.getVectorTripCount())));
+  assert(TermBranchOnCount &&
+         match(IVInc, m_Add(m_Specific(LoopRegion->getCanonicalIV()),
+                            m_Specific(&Plan.getVFxUF()))) &&
+         "Unexpected terminator");
+
+  // Split the latch at the IV update, and branch to it from the header mask.
+  VPBasicBlock *LatchSplit =
+      Latch->splitAt(IVInc->getDefiningRecipe()->getIterator());
+  VPBlockUtils::connectBlocks(Header, LatchSplit);
+
+  // Any extract of the last element must be updated to extract from the last
+  // active lane of the header mask instead (i.e., the lane corresponding to the
+  // last active iteration).
+  Builder.setInsertPoint(Plan.getMiddleBlock()->getTerminator());
+  for (VPRecipeBase &R : *Plan.getMiddleBlock()) {
+    VPValue *Op;
+    if (!match(&R, m_ExtractLastLane(m_ExtractLastPart(m_VPValue(Op)))))
+      continue;
+
+    // Compute the index of the last active lane.
+    VPValue *LastActiveLane =
+        Builder.createNaryOp(VPInstruction::LastActiveLane, HeaderMask);
+    auto *Ext =
+        Builder.createNaryOp(VPInstruction::ExtractLane, {LastActiveLane, Op});
+    R.getVPSingleValue()->replaceAllUsesWith(Ext);
+  }
+}
+
 // Likelyhood of bypassing the vectorized loop due to a runtime check block,
 // including memory overlap checks block and wrapping/unit-stride checks block.
 static constexpr uint32_t CheckBypassWeights[] = {1, 127};
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
index f7e7fc29bc203..96bee083875af 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@@ -14,13 +14,11 @@
 #include "VPRecipeBuilder.h"
 #include "VPlan.h"
 #include "VPlanCFG.h"
-#include "VPlanPatternMatch.h"
 #include "VPlanTransforms.h"
 #include "VPlanUtils.h"
 #include "llvm/ADT/PostOrderIterator.h"
 
 using namespace llvm;
-using namespace VPlanPatternMatch;
 
 namespace {
 class VPPredicator {
@@ -73,9 +71,6 @@ class VPPredicator {
     return EdgeMaskCache.lookup({Src, Dst});
   }
 
-  /// Compute and return the mask for the vector loop header block.
-  void createHeaderMask(VPBasicBlock *HeaderVPBB, bool FoldTail);
-
   /// Compute and return the predicate of \p VPBB, assuming that the header
   /// block of the loop is set to True, or to the loop mask when tail folding.
   VPValue *createBlockInMask(VPBasicBlock *VPBB);
@@ -156,28 +151,6 @@ VPValue *VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
   return BlockMask;
 }
 
-void VPPredicator::createHeaderMask(VPBasicBlock *HeaderVPBB, bool FoldTail) {
-  if (!FoldTail) {
-    setBlockInMask(HeaderVPBB, nullptr);
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  auto &Plan = *HeaderVPBB->getPlan();
-  auto *IV =
-      new VPWidenCanonicalIVRecipe(HeaderVPBB->getParent()->getCanonicalIV());
-  Builder.setInsertPoint(HeaderVPBB, HeaderVPBB->getFirstNonPhi());
-  Builder.insert(IV);
-
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  VPValue *BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  setBlockInMask(HeaderVPBB, BlockMask);
-}
-
 void VPPredicator::createSwitchEdgeMasks(VPInstruction *SI) {
   VPBasicBlock *Src = SI->getParent();
 
@@ -232,7 +205,8 @@ void VPPredicator::createSwitchEdgeMasks(VPInstruction *SI) {
 void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
   SmallVector<VPPhi *> Phis;
   for (VPRecipeBase &R : VPBB->phis())
-    Phis.push_back(cast<VPPhi>(&R));
+    if (auto *PhiR = dyn_cast<VPPhi>(&R))
+      Phis.push_back(PhiR);
   for (VPPhi *PhiR : Phis) {
     // The non-header Phi is converted into a Blend recipe below,
     // so we don't have to worry about the insertion order and we can just use
@@ -261,7 +235,7 @@ void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
 }
 
 DenseMap<VPBasicBlock *, VPValue *>
-VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan, bool FoldTail) {
+VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
   VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
@@ -272,14 +246,6 @@ VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan, bool FoldTail) {
   for (VPBlockBase *VPB : RPOT) {
     // Non-outer regions with VPBBs only are supported at the moment.
     auto *VPBB = cast<VPBasicBlock>(VPB);
-    // Introduce the mask for VPBB, which may introduce needed edge masks, and
-    // convert all phi recipes of VPBB to blend recipes unless VPBB is the
-    // header.
-    if (VPBB == Header) {
-      Predicator.createHeaderMask(Header, FoldTail);
-      continue;
-    }
-
     Predicator.createBlockInMask(VPBB);
     Predicator.convertPhisToBlends(VPBB);
   }
@@ -300,32 +266,5 @@ VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan, bool FoldTail) {
 
     PrevVPBB = VPBB;
   }
-
-  // If we folded the tail and introduced a header mask, any extract of the
-  // last element must be updated to extract from the last active lane of the
-  // header mask instead (i.e., the lane corresponding to the last active
-  // iteration).
-  if (FoldTail) {
-    assert(Plan.getExitBlocks().size() == 1 &&
-           "only a single-exit block is supported currently");
-    assert(Plan.getExitBlocks().front()->getSinglePredecessor() ==
-               Plan.getMiddleBlock() &&
-           "the exit block must have middle block as single predecessor");
-
-    VPBuilder B(Plan.getMiddleBlock()->getTerminator());
-    for (VPRecipeBase &R : *Plan.getMiddleBlock()) {
-      VPValue *Op;
-      if (!match(&R, m_ExtractLastLane(m_ExtractLastPart(m_VPValue(Op)))))
-        continue;
-
-      // Compute the index of the last active lane.
-      VPValue *HeaderMask = Predicator.getBlockInMask(Header);
-      VPValue *LastActiveLane =
-          B.createNaryOp(VPInstruction::LastActiveLane, HeaderMask);
-      auto *Ext =
-          B.createNaryOp(VPInstruction::ExtractLane, {LastActiveLane, Op});
-      R.getVPSingleValue()->replaceAllUsesWith(Ext);
-    }
-  }
   return Predicator.getBlockMaskCache();
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index e0d09a099647a..e00c94558bf7a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -413,14 +413,44 @@ struct VPlanTransforms {
   static void narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
                                      TypeSize VectorRegWidth);
 
+  /// Adapts the vector loop region for tail folding by introducing a header
+  /// mask and predicating the region:
+  ///
+  /// Vector loop region before:
+  /// +-------------------------------------------+
+  /// |%iv = ...                                  |
+  /// |...                                        |
+  /// |%iv.next = add %iv, vfxuf                  |
+  /// |branch-on-count %iv.next, vector-trip-count|
+  /// +-------------------------------------------+
+  ///
+  /// Vector loop region after:
+  /// +-------------------------------------------+
+  /// |%iv = ...                                  |
+  /// |%wide.iv = widen-canonical-iv ...          |
+  /// |%header-mask = icmp ult %wide.iv, BTC      |
+  /// |branch-on-cond %header-mask                |---+
+  /// +-------------------------------------------+   |
+  ///                      |                          |
+  ///                      v                          |
+  /// +-------------------------------------------+   |
+  /// |                   ...                     |   |
+  /// +-------------------------------------------+   |
+  ///                      |                          |
+  ///                      v                          |
+  /// +-------------------------------------------+   |
+  /// |%iv.next = add %iv, vfxuf                  |<--+
+  /// |branch-on-count %iv.next, vector-trip-count|
+  /// +-------------------------------------------+
+  ///
+  /// Any VPInstruction::ExtractLastLanes are also updated to extract from the
+  /// last active lane of the header mask.
+  static void foldTailByMasking(VPlan &Plan);
+
   /// Predicate and linearize the control-flow in the only loop region of
-  /// \p Plan. If \p FoldTail is true, create a mask guarding the loop
-  /// header, otherwise use all-true for the header mask. Masks for blocks are
-  /// added to a block-to-mask map which is returned in order to be used later
-  /// for wide recipe construction. This argument is temporary and will be
-  /// removed in the future.
+  /// \p Plan.
   static DenseMap<VPBasicBlock *, VPValue *>
-  introduceMasksAndLinearize(VPlan &Plan, bool FoldTail);
+  introduceMasksAndLinearize(VPlan &Plan);
 
   /// Add branch weight metadata, if the \p Plan's middle block is terminated by
   /// a BranchOnCond recipe.

Comment thread llvm/lib/Transforms/Vectorize/VPlan.cpp
(!RR || !RR->isPartialReduction())) {
VPValue *Cond = RecipeBuilder.getBlockInMask(PhiR->getParent());
VPValue *Cond = RecipeBuilder.getBlockInMask(
cast<VPBasicBlock>(PhiR->getParent()->getSuccessors().front()));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cond here is supposed to be the header mask IIUC, and PhiR's parent is the header block. But the header block itself is no longer predicated, only the "body" block, so this gets the header mask from its successor instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's essentially my question about SSA form. I would expect that select to be created by predicator automatically if we maintain proper SSA form, i.e.,

; after tail folding
header:
  %red-phi = phi [ init, %preheader ], [ %new-phi, %latch ]
  ...
  br i1 %header-mask, label %body, label %latch

body:
  %red-phi.next = ...
  ...
  br label %latch

latch:
  ; Will be transformed into a VPBlend and subsequently into a select with existing predicator:
  new-phi = phi [ %red-phi, %header ], [ %red-phi.next, %body ]
  ...

exit:
  %use = phi [ %new-phi, %latch ]

@lukel97 lukel97 Jan 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a try and I think it's possible if you explicitly handle the VPReductionPHIRecipes in foldTailByMasking, it would make for a nice tidy up. But it doesn't seem to be an NFC change. I can add a TODO to move this into VPlanTransforms::foldTailByMasking as a follow up?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What special handling? Is it still necessary after you've optimized blends per your comment in #176143 (comment)?

@lukel97 lukel97 Jan 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check for reduction phis when creating the latch phi and set their incoming value to the reduction phi, as opposed to poison. It's still needed after optimizing the blends:

@@ -1014,7 +1014,12 @@ void VPlanTransforms::foldTailByMasking(VPlan &Plan) {
           continue;
         VPValue *Poison = Plan.getOrAddLiveIn(
             PoisonValue::get(V->getUnderlyingValue()->getType()));
+        auto *RedIt = find_if(V->users(), IsaPred<VPReductionPHIRecipe>);
+        if (RedIt != V->user_end())
+          Poison = cast<VPRecipeBase>(*RedIt)->getVPSingleValue();
         VPValue *Phi = Builder.createScalarPhi({V, Poison}, {});
+        if (RedIt != V->user_end() && PredicatedReductionSelect)
+          cast<VPRecipeBase>(*RedIt)->setOperand(1, Phi);
         V->replaceUsesWithIf(Phi, NeedsPhi);
       }
     }

@eas eas Jan 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I now think that tail folding belongs to the predicator :) More specifically, handling of the divergent loop exit belongs to the predicator. Consider outer loop vectorization:

inner.header:
  %inner.iv = phi
  %inner.red = phi [ 0, %inner.preheader ], [ %inner.red.next, %inner.header ]
  ...
  %ec = icmp %inner.iv.next, %n
  br i1 %ec, %inner.exit, %inner.header
  
inner.exit:
  %inner.red.lcssa = phi [ %inner.red.next ]

If %n is inner-loop-invariant and all lanes exit at the same time, no special handling for the %inner.red is needed - it can simply be widened and all SIMD lanes would operate independently but exactly the same, in a sense. If, for example, %n == %outer.iv and some lanes exit inner loop earlier than the others, then exactly the same processing as you're talking about would be necessary for %inner.red. However, it has nothing to do with the outer loop we're vectorizing and/or its tail folding, thus that special handling belongs to the predicator.

I think the right way to communicate tail masking is to tell predicator that [outer|our] loop backedge isn't uniform any more, and let it do the processing necessary. We'll reuse the same functionality later when we start supporting proper outer loop vectorization.

Early exits, on the other hand, are different. We don't take backedge if any lane has exited, so the backedge branch remains uniform. As such, early exits from the loop we're vectorizing are unlikely to belong to the predicator, even if there might be some similarities.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the predicator is still ultimately doing performing the tail folding. The only difference is that we're now explicitly modelling the creation of the header mask and the predication of the blocks in VPlan in a separate, isolated step.

To the best of my understanding, this seems to match the original direction, see this comment here:

// Create mask based on the IR BB corresponding to VPBB.
// TODO: Predicate directly based on VPlan.

// Create mask based on the IR BB corresponding to VPBB.
// TODO: Predicate directly based on VPlan.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope to find time to track this patch more closely, but regarding original directions, one laid out back in '23 was to first (always) tail-fold, when strip-mining the scalar VPlan by a symbolic strip-mine factor SF (later to become VF*UF), then predicate, as part of VPlan-creation, then tail-unfold as a cost-based VF-dependent decision:
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf#page=37

There could be multiple ways to get there from where we are now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing that. I think this patch is somewhat aligned in that direction, since it models the CFG in the strip-mine step from that slide.

As a side note, if we want to tail-unfold then presumably there'll be a step where we'll want to remove the header mask by e.g. replacing it with all ones. I think expanding VPBlendRecipes to VPInstruction::Select would help with this since any select with an all-ones mask just gets folded away: #171851

@eas eas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks SSA form for the exiting values (outside uses might no longer be dominated by the defs) and headers phis' incoming values over the backedge. Subsequent predication would restore it, and it happens (at the moment of this patch) immediately, but I wonder what would happen if we make some future changes in the pipeline.

Do you think maintaining proper SSA form would be worthwhile here?

Comment on lines +993 to +995
// Split the latch at the IV update, and branch to it from the header mask.
VPBasicBlock *LatchSplit =
Latch->splitAt(IVInc->getDefiningRecipe()->getIterator());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert that there are no unexpected instructions in the LatchSplit (e.g. if someone tries to change pass ordering in future)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what the asserts do above, that it only contains the canonical IV increment and the branch-on-count.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that it ONLY contains

Sorry, I don't see that. I'm expecting these:

assert(std::distance(Latch->begin(), Latch->end() == 2);
assert(IVInc->getParent() == Latch);
// Ensured by the verifier:
assert(Latch->getTerminator->getParent() == Latch);

@lukel97 lukel97 Jan 28, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the asserts above check for that, but just stricter. The two asserts ensure that the latch is always of the form:

latch:
... arbitrary recipes
%ivinc = add %can-iv, vfxuf
%branch-on-count %ivinc, vector-trip-count
successors: middle.block, header

So when we split at %ivinc, the new structure should always be:

latch:
... arbitrary recipes
successors: latch.split

latch.split:
%ivinc = add %can-iv, vfxuf
%branch-on-count %ivinc, vector-trip-count
successors: middle.block, header

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, I was only considering the use def chain. Added an assert in df47d1e

Comment on lines +998 to +1000
// Any extract of the last element must be updated to extract from the last
// active lane of the header mask instead (i.e., the lane corresponding to the
// last active iteration).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that we're ok with, or do we need a TODO to move tail folding even earlier in the pipeline before last lane extracts are introduced (or anything else, really, that needs explicit handling here)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to handle it here with the rest of the tail folding transform, so it's all kept in one place. The header mask won't be available up until this point anyway.

In general I think it's best to do transforms atomically so you don't have an incorrect intermediate VPlan between transforms

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to make it work automatically is to allow a mask on the ExtractLastElement. Then any instances introduced before predication would be masked automatically. And then we can either modify its codegen or lower into a more verbose representation by the end of the pipeline. #142285 could make that simple enough, maybe.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an invariant that the mask needs to be a prefix mask, i.e. all 1s before all 0s, see LastActiveLane, so we wouldn't be able to lower any arbitrary mask.

@eas eas Jan 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, header-mask will work and that's the one we're creating here, right? And that can be asserted for in the predicator, I think, although somewhat ugly: Mask->getParent() == Header && match(Mask, m_Cmp(LT/LE, IV, m_LoopInvariant()). Or, if everything else, just change how we lower LastActiveLane to support arbitrary mask. That could help in other places, e.g. with generalizing #141900 to skip blocks that have masks other than the header mask.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is existing code, I think we should we defer this conversation to a separate PR if we want to change the approach

@lukel97

lukel97 commented Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

This breaks SSA form for the exiting values (outside uses might no longer be dominated by the defs) and headers phis' incoming values over the backedge. Subsequent predication would restore it, and it happens (at the moment of this patch) immediately, but I wonder what would happen if we make some future changes in the pipeline.

Do you think maintaining proper SSA form would be worthwhile here?

Yes, in practice the vector region's control flow is immediately linearized afterwards so everything is dominated.

This patch has a slightly weird use of the CFG, because the branch-on-cond doesn't use a scalar i1 condition but a vector mask instead. So it's never actually "branching" to begin with and relies on being lineraized and removed.

I think this is a good point though. In #172454 the original I approach I used was to insert custom VPInstructions to indicate to VPPredicator that a certain point needed to be masked, without touching the CFG itself and preserving SSA: da6eba9

We could try going back down this route. But this time I would maybe avoid the custom VPInstruction and instead just pass to VPPredicator a map of blocks that should be predicated down to the latch. Something like DenseMap<VPBasicBlock *, VPValue *> PredicationPoints;, where each successor of a basic block in the map has the mask value added to its edge mask.

@lukel97

lukel97 commented Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

I've tried out the explicit DenseMap approach in ed18c28 which should avoid the faux-control flow now.

@eas

eas commented Jan 21, 2026

Copy link
Copy Markdown
Contributor

because the branch-on-cond doesn't use a scalar i1 condition but a vector mask instead.

How is this a problem? Isn't it what predicator sees for the original control flow from VPlan construction (modulo "vector mask" -> "varying i1 condition" renaming, which are essentially the same anyway)? I think explicit control flow while maintaining SSA is a better choice:

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the verifier would have caught that?

Given that we now need to pass the extra state between transforms, I am wondering if it still helps for #172454? That would then also require passing more state between transforms further appart, right?

I think #172454 would be a good candidate for using the masked VPInsruction capabilities from #142285; for early exits, we can mask the relevant VPInstructions early.

@lukel97

lukel97 commented Jan 22, 2026

Copy link
Copy Markdown
Contributor Author

Isn't it what predicator sees for the original control flow from VPlan construction (modulo "vector mask" -> "varying i1 condition" renaming, which are essentially the same anyway)?

Yes but the vector mask condition is the problem since VPInstruction::BranchOnCond isn't designed as a masking construct but for regular control flow, i.e. see VPInstruction::usesFirstLaneOnly where it states the condition operand is scalar. In retrospect I think it was more a coincidence that VPPredicator happened to work with it because it was automatically broadcasting scalars when predicating.

VPlan is in consistent (and potentially verifiable) state

Even if we do add PHIs in the latch to keep SSA, the BranchOnCond instructions will be invalid so it will fail the VPlanVerifier anyway. FWIW I'm not strongly opinionated on either approach

Do you know if the verifier would have caught that?

Yes it checks for domination, but at this stage during VPlan construction it doesn't pass anyway because e.g. unhandled VPInstruction opcodes that aren't yet widened to VPWidenRecipe.

I think #172454 would be a good candidate for using the masked VPInsruction capabilities from #142285; for early exits, we can mask the relevant VPInstructions early.

I didn't notice that PR before, I think we could use it for both this PR and #172454. But we would need to teach VPlanPredicator to logically-and masks with each VPInstructions' existing mask. I'm not sure if we would then need to keep around a cache or something so that we don't end up with a logical-and for each VPInstruction.

@lukel97

lukel97 commented Jan 22, 2026

Copy link
Copy Markdown
Contributor Author

I tried reworking this PR and #172454 to do predication via VPInstruction mask operands in #142285, but it turned out to be very difficult since you need to merge the mask operands during both tail folding, early exits and linearization.

Not only do you have to worry about where to insert logical-ands and deduplicating them, the big issue I found is that if you e.g. want to predicate everything past the exiting block here, you have to start dealing with control flow joins:

flowchart
    foo --> exiting --> bar
    foo --> bar
Loading

And in that case you end up reimplementing what VPPredicator already does.

Given that we now need to pass the extra state between transforms, I am wondering if it still helps for #172454? That would then also require passing more state between transforms further appart, right?

I've removed the state in 81332b3, and instead used a new VPInstruction to indicate to VPPredicator that the successors should be predicated.

I was able to reuse this approach for #172454 too, and it works after rebasing on top of #142285.

@eas

eas commented Jan 22, 2026

Copy link
Copy Markdown
Contributor

Yes but the vector mask condition is the problem since VPInstruction::BranchOnCond isn't designed as a masking construct but for regular control flow, i.e. see VPInstruction::usesFirstLaneOnly where it states the condition operand is scalar. In retrospect I think it was more a coincidence that VPPredicator happened to work with it because it was automatically broadcasting scalars when predicating.

For

header:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
  %mask = icmp slt i64 %iv, %n
  br i1 %mask, label %body, label %latch

body:
  %gep = getelementptr i64, ptr %ptr, i64 %iv
  %ld = load i64, ptr %gep
  %add = add i64 %ld, %iv
  store i64 %add, ptr %gep
  br label %latch

latch:
  %iv.next = add nsw i64 %iv, 1
  %exitcond = icmp slt i64 %iv.next, %n
  br i1 %exitcond, label %header, label %exit

Right before predicator (current trunk):

<x1> vector loop: {
  vector.body:
    EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
    ir<%iv> = WIDEN-INDUCTION nsw ir<0>, ir<1>, vp<%0>
    EMIT ir<%mask> = icmp slt ir<%iv>, ir<%n>
    EMIT branch-on-cond ir<%mask>          ;; ! branch-on-cond too !
  Successor(s): body, latch

  body:
    EMIT ir<%gep> = getelementptr ir<%ptr>, ir<%iv>
    EMIT ir<%ld> = load ir<%gep>
    EMIT ir<%add> = add ir<%ld>, ir<%iv>
    EMIT store ir<%add>, ir<%gep>
  Successor(s): latch

  latch:
    EMIT ir<%iv.next> = add nsw ir<%iv>, ir<1>
    EMIT ir<%exitcond> = icmp slt ir<%iv.next>, ir<%n>
    EMIT vp<%5> = not ir<%exitcond>
    EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
    EMIT branch-on-count vp<%index.next>, vp<%2>
  No successors
}
Successor(s): middle.block

How is that different from manually created icmp? I don't think predicator looks into BranchOnCond->getOperand(0)->getUnderlyingValue() (and really there would be no reason for it to do that).

@lukel97

lukel97 commented Jan 23, 2026

Copy link
Copy Markdown
Contributor Author

How is that different from manually created icmp? I don't think predicator looks into BranchOnCond->getOperand(0)->getUnderlyingValue() (and really there would be no reason for it to do that).

TIL the header phis are widened before linearization. I had presumed that they were just widened alongside regular phis in the loop body after linearization. In that case I feel better about using BranchOnCond, let me give it a second try but preserving SSA.

Comment on lines +217 to +221
auto NotPoison = [&](VPValue *V) {
// Don't remove poison from phis from the original loop.
return PhiR->getUnderlyingValue() || !isa<VPIRValue>(V) ||
!isa<PoisonValue>(cast<VPIRValue>(V)->getValue());
};

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eas now that we have phis to keep SSA, the incoming value for the header block (i.e. the tail lanes) is poison. But if we don't remove the redundant phis a lot of stuff breaks around the reduction computation matching stuff, and we get other regressions with unnecessary blends. So this prevents the redundant blends from being created.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a patch (by Florian, I think) that changes predicator to emit selects instead of blends. With that, IIUC, we'd just do (pseudocode)

  auto *Result = nullptr;
  for (auto [Val, Edge] : orig_phi->values()) {
    if (is_poison(Val) || Result == Val) 
      continue;
    Result = Result ? createSelect(EdgeCondition, Result, Val) : Val;
  }

eliminating this whole block of special casing. Is that accurate enough? If so, then I think whatever this patch needs to do temporarily isn't an issue. I'm not sure if the same can be emulated with the VPBlend, but probably the above can be rewritten in terms of filling in the blend's operands.

@lukel97 lukel97 Jan 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it #171851, and the successor #150369? Those patches essentially do the code snippet above.

But for this patch at least we need this special casing, since we don't yet have a simplification for a select with a poison value. And we also don't run simplification this early on yet, so we would need to manually call it here. But all for another patch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, It was yours :) Yes, #150369 is the one I had in mind.

@fhahn

fhahn commented Jan 25, 2026

Copy link
Copy Markdown
Contributor

Not only do you have to worry about where to insert logical-ands and deduplicating them, the big issue I found is that if you e.g. want to predicate everything past the exiting block here, you have to start dealing with control flow joins:

It seems with the the fold from #177887, the first part does not seem too bad: d15e2db

As for the second part, I need to think a bit more about it, but I think similarly with the explicit CFG we similarly need to make sure we insert phis to preserve SSA form?

@eas

eas commented Feb 17, 2026

Copy link
Copy Markdown
Contributor

Regarding early exits, one way to support them may be to move predication before early exit handling (and before creating regions), retain the early-exiting edges, then handle early-exiting edges when handling early exits.

I have some strong doubts about this. My reasoning for this is that early exits from the loop we're vectorizing are vastly different from early exits from the inner loops (that the predicator would need to handle someday), so it's not "early exit" but an "early exit from that particular loop" and the early semantics lowering of the incoming source code (either implicit via legal or explicit if we implement more of OMP SIMD/other similar annotation). I also believe that such "import of semantics" should be done early on before more generic processing (like the linearization/predication).

This silently just happened to work because these phis are just replaced with the same original value.
lukel97 added a commit that referenced this pull request Feb 19, 2026
…ags. NFC (#181694)

In order to be able to create selects for reduction phis through tail
folding in foldTailByMasking (#176143), make VPReductionPHIRecipe an
instance of VPIRFlags and plumb the FMFs from the original RdxDesc.

This allows us to remove more uses of the RecurrenceDescriptor in
addReductionResultComputation, which should help untie it from
LoopVectorizationLegality.
Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
@lukel97

lukel97 commented Feb 25, 2026

Copy link
Copy Markdown
Contributor Author

Ping

Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp Outdated
Comment thread llvm/lib/Transforms/Vectorize/VPlanTransforms.h Outdated
@github-actions

github-actions Bot commented Mar 4, 2026

Copy link
Copy Markdown

🐧 Linux x64 Test Results

  • 170611 tests passed
  • 3097 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

lldb-api

lldb-api.functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py
Script:
--
/usr/bin/python3 /home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib --env LLVM_INCLUDE_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/include --env LLVM_TOOLS_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin --libcxx-include-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/include/c++/v1 --libcxx-include-target-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/include/x86_64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib/x86_64-unknown-linux-gnu --arch x86_64 --build-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex --lldb-module-cache-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/lldb --compiler /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/clang --dsymutil /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin --lldb-obj-root /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/lldb --lldb-libs-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib --cmake-build-type Release /home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list -p TestDataFormatterGenericList.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 23.0.0git (https://github.com/llvm/llvm-project revision c080404e7c94e4ed758f822a346b42d048d16391)
  clang revision c080404e7c94e4ed758f822a346b42d048d16391
  llvm revision c080404e7c94e4ed758f822a346b42d048d16391
Skipping the following test categories: msvcstl, dsym, pdb, gmodules, debugserver, objc

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_dsym) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_dwarf)
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_dsym) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_dwarf)
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_libstdcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_libstdcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_dsym) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_dwarf) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_dwo) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_ptr_and_ref_msvcstl_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_ptr_and_ref_msvcstl_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_dsym) (test case does not fall in any category of interest for this run) 
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_dwarf)
PASS: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dsym) (test case does not fall in any category of interest for this run) 
FAIL: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwarf)
FAIL: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwo)
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_libstdcpp_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_pdb) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_dsym (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_dsym) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_dwarf) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_dwo) (test case does not fall in any category of interest for this run) 
UNSUPPORTED: LLDB (/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang-x86_64) :: test_with_run_command_msvcstl_pdb (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_msvcstl_pdb) (test case does not fall in any category of interest for this run) 
======================================================================
FAIL: test_with_run_command_libstdcpp_dwarf (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwarf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1858, in test_method
    return attrvalue(self)
           ^^^^^^^^^^^^^^^
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 302, in test_with_run_command_libstdcpp
    self.do_test_with_run_command(is_libstdcpp=True)
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 64, in do_test_with_run_command
    self.expect(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2487, in expect
    self.runCmd(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1049, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Command 'frame variable &numbers_list._M_impl._M_node --raw' did not return successfully
Error output:
error: <user expression 0>:1:15: "_M_impl" is not a member of "(int_list) numbers_list"
   1 | &numbers_list._M_impl._M_node
     | ^

Config=x86_64-/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang
======================================================================
FAIL: test_with_run_command_libstdcpp_dwo (TestDataFormatterGenericList.GenericListDataFormatterTestCase.test_with_run_command_libstdcpp_dwo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1858, in test_method
    return attrvalue(self)
           ^^^^^^^^^^^^^^^
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 302, in test_with_run_command_libstdcpp
    self.do_test_with_run_command(is_libstdcpp=True)
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/list/TestDataFormatterGenericList.py", line 64, in do_test_with_run_command
    self.expect(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2487, in expect
    self.runCmd(
  File "/home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1049, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Command 'frame variable &numbers_list._M_impl._M_node --raw' did not return successfully
Error output:
error: <user expression 0>:1:15: "_M_impl" is not a member of "(int_list) numbers_list"
   1 | &numbers_list._M_impl._M_node
     | ^

Config=x86_64-/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/clang
----------------------------------------------------------------------
Ran 24 tests in 3.542s

FAILED (failures=2, skipped=16)

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

// Collect any values defined in the loop that need a phi. Currently this
// includes header phi backedges and live-outs extracted in the middle block.
// TODO: Handle early exits via Plan.getExitBlocks()
MapVector<VPValue *, SmallVector<VPUser *>> NeedsPhi;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be good to have a slightly more descriptive name, although I don't really have any good suggestions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but I'm also drawing a blank. I guess it's something like "live-out of the predicated section". I'll try and think of something in a follow up PR.

/// +-------------------------------------------+
/// |%iv = ... |
/// |%wide.iv = widen-canonical-iv ... |
/// |%header-mask = icmp ult %wide.iv, BTC |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// |%header-mask = icmp ult %wide.iv, BTC |
/// |%header-mask = icmp ule %wide.iv, BTC |

@Mel-Chen

Mel-Chen commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

@lukel97 Please land this patch before #173265. I guess we will get regression if we land #173265 before this patch.

@lukel97

lukel97 commented Mar 5, 2026

Copy link
Copy Markdown
Contributor Author

@lukel97 Please land this patch before #173265. I guess we will get regression if we land #173265 before this patch.

Oh, I was actually waiting for #173265 to land first to see if we would need to generalize it to the IDom. I'll land this first though then, and if we need to generalize it then I guess you can just do that in #173265

@Mel-Chen

Mel-Chen commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

@lukel97 Please land this patch before #173265. I guess we will get regression if we land #173265 before this patch.

Oh, I was actually waiting for #173265 to land first to see if we would need to generalize it to the IDom. I'll land this first though then, and if we need to generalize it then I guess you can just do that in #173265

#173265 can handle non-tail folding case first. We can create another patch to handle tail folding case. :)

@lukel97 lukel97 enabled auto-merge (squash) March 5, 2026 07:43
@lukel97 lukel97 merged commit 8251293 into llvm:main Mar 5, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants