[VPlan] Compute blend masks from minimum set of edge masks#201783
[VPlan] Compute blend masks from minimum set of edge masks#201783lukel97 wants to merge 14 commits into
Conversation
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesStacked on #201782 Another PR aims to model the control flow of early exits explicitly, and in doing so insert phi nodes to preserve SSA. Inserting phi nodes results in more VPBlendRecipes, so this PR optimizes the edge masks generated for those blend recipes to prevent regressions. The shape of the CFG and the phis that would be emitted are precomitted in the predicator-early-exit.ll test. The idea is to compute a minimal set of edges that lead to each unique incoming value in a phi. It does this by iterating up the edges in the post dominance frontier till the outgoing edges no longer lead to the same value. It also recursively looks through the incoming edges of any values that are phi nodes themselves. This is a simpler, less general version of #184838 since this requires the phi node to postdominate its incoming values. This is fine the early exit use case though, since we only need to optimize phi nodes inserted in the latch. The big advantage over #184838 is that it doesn't require several depth-first searches to compute the set of reachable nodes, and can be done entirely by iterating the post-dominator frontier. Patch is 34.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/201783.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Analysis/DominanceFrontier.h b/llvm/include/llvm/Analysis/DominanceFrontier.h
index fd38891e901e3..4a8ab96cf71a7 100644
--- a/llvm/include/llvm/Analysis/DominanceFrontier.h
+++ b/llvm/include/llvm/Analysis/DominanceFrontier.h
@@ -78,6 +78,7 @@ class DominanceFrontierBase {
const_iterator end() const { return Frontiers.end(); }
iterator find(BlockT *B) { return Frontiers.find(B); }
const_iterator find(BlockT *B) const { return Frontiers.find(B); }
+ const_iterator find(const BlockT *B) const { return Frontiers.find(B); }
/// print - Convert to human readable form
///
diff --git a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
index 2864670f44913..1ad522880c709 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanDominatorTree.h
@@ -18,6 +18,8 @@
#include "VPlan.h"
#include "VPlanCFG.h"
#include "llvm/ADT/GraphTraits.h"
+#include "llvm/Analysis/DominanceFrontier.h"
+#include "llvm/Analysis/DominanceFrontierImpl.h"
#include "llvm/IR/Dominators.h"
#include "llvm/Support/GenericDomTree.h"
#include "llvm/Support/GenericDomTreeConstruction.h"
@@ -67,5 +69,11 @@ template <>
struct GraphTraits<const VPDomTreeNode *>
: public DomTreeGraphTraitsBase<const VPDomTreeNode,
VPDomTreeNode::const_iterator> {};
+
+class VPPostDominanceFrontier
+ : public DominanceFrontierBase<VPBlockBase, true> {
+public:
+ explicit VPPostDominanceFrontier(const DomTreeT &VPDT) { analyze(VPDT); }
+};
} // namespace llvm
#endif // LLVM_TRANSFORMS_VECTORIZE_VPLANDOMINATORTREE_H
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
index 2717b80e2eeaa..2ec3df8ccf8c1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp
@@ -34,6 +34,9 @@ class VPPredicator {
/// Post-dominator tree for the VPlan.
VPPostDominatorTree VPPDT;
+ /// Post-dominator frontier for the VPlan.
+ VPPostDominanceFrontier VPPDF;
+
/// When we if-convert we need to create edge masks. We have to cache values
/// so that we don't end up with exponential recursion/IR.
using EdgeMaskCacheTy =
@@ -69,8 +72,19 @@ class VPPredicator {
return EdgeMaskCache[{Src, Dst}] = Mask;
}
+ using EdgeTy = std::pair<const VPBasicBlock *, const VPBasicBlock *>;
+
+ /// Compute the "furthest up" set of edges for each incoming value of \Phi.
+ MapVector<EdgeTy, VPValue *> computeBlendEdges(VPPhi *Phi);
+
+ /// Given a set of \p Edges that lead to \p VPBB, return the OR of all edges
+ /// or an equivalent block in-mask.
+ VPValue *createMaskDisjunction(ArrayRef<EdgeTy> Edges, VPBasicBlock *VPBB);
+
+ DenseMap<const VPBasicBlock *, VPBasicBlock::iterator> InsertPoints;
+
public:
- VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan) {}
+ VPPredicator(VPlan &Plan) : VPDT(Plan), VPPDT(Plan), VPPDF(VPPDT) {}
/// Returns the *entry* mask for \p VPBB.
VPValue *getBlockInMask(const VPBasicBlock *VPBB) const {
@@ -136,6 +150,10 @@ void VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
// Start inserting after the block's phis, which be replaced by blends later.
Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
+ // Keep track of where in VPBB we are inserting the masks into.
+ scope_exit UpdateInsertPoint(
+ [this, &VPBB]() { InsertPoints[VPBB] = Builder.getInsertPoint(); });
+
// Reuse the mask of the immediate dominator if the VPBB post-dominates the
// immediate dominator.
auto *IDom = VPDT.getNode(VPBB)->getIDom();
@@ -224,7 +242,117 @@ void VPPredicator::createSwitchEdgeMasks(const VPInstruction *SI) {
setEdgeMask(Src, DefaultDst, DefaultMask);
}
+// Compute the "furthest up" set of edges for each incoming value of a phi.
+//
+// Start by keeping track of what edges lead to which value. Then see if any
+// node has the same value for all outgoing edges. If so then propagate that
+// value up to every node it postdominates.
+MapVector<VPPredicator::EdgeTy, VPValue *>
+VPPredicator::computeBlendEdges(VPPhi *Phi) {
+ MapVector<EdgeTy, VPValue *> Edges;
+
+ // Mark the given edge as providing the value \p V.
+ auto AddEdge = [&Edges](const VPBlockBase *From, const VPBlockBase *To,
+ VPValue *V) {
+ EdgeTy Edge = {cast<VPBasicBlock>(From), cast<VPBasicBlock>(To)};
+ assert((!Edges.contains(Edge) || Edges.lookup(Edge) == V) &&
+ "Clobbering an edge?");
+ Edges[Edge] = V;
+ };
+
+ for (auto [InVal, InVPBB] : Phi->incoming_values_and_blocks())
+ AddEdge(InVPBB, Phi->getParent(), InVal);
+
+ // The root phi must postdominate every incoming block. Also don't touch
+ // phis in a reduction chain since they need to be in a specific structure
+ // for handle*Reductions.
+ for (auto [InVal, InVPBB] : Phi->incoming_values_and_blocks())
+ if (!VPPDT.dominates(Phi->getParent(), InVPBB) ||
+ isa<VPReductionPHIRecipe>(InVal))
+ return Edges;
+
+ // Given a list of edges, check if they all have the same value and return it.
+ auto GetAllEqual = [&Edges](ArrayRef<EdgeTy> OutEdges) -> VPValue * {
+ VPValue *Common = nullptr;
+ for (EdgeTy E : OutEdges) {
+ VPValue *V = Edges.lookup(E);
+ if (!V)
+ return nullptr;
+ if (match(V, m_Poison()))
+ continue;
+ if (!Common)
+ Common = V;
+ else if (Common != V)
+ return nullptr;
+ }
+ return Common;
+ };
+
+ SetVector<const VPBlockBase *> Worklist(from_range, Phi->incoming_blocks());
+ while (!Worklist.empty()) {
+ auto *VPBB = cast<VPBasicBlock>(Worklist.pop_back_val());
+
+ // Check that all outgoing edges from VPBB have the same value.
+ SmallVector<EdgeTy> OutEdges;
+ for (const VPBlockBase *Succ : VPBB->getSuccessors())
+ OutEdges.emplace_back(VPBB, cast<VPBasicBlock>(Succ));
+ VPValue *Common = GetAllEqual(OutEdges);
+ if (!Common)
+ continue;
+
+ // They have the same value: we can move the edges up
+ for (EdgeTy Edge : OutEdges)
+ Edges.erase(Edge);
+
+ // Peek through phis that are postdominated by VPBB
+ if (auto *Phi = dyn_cast<VPPhi>(Common))
+ if (VPPDT.dominates(VPBB, Phi->getParent())) {
+ for (auto [InV, InVPBB] : Phi->incoming_values_and_blocks()) {
+ AddEdge(InVPBB, Phi->getParent(), InV);
+ Worklist.insert(InVPBB);
+ }
+ continue;
+ }
+
+ // Iterate up through the post dominance frontier
+ for (const VPBlockBase *Frontier : VPPDF.find(VPBB)->second) {
+ for (const VPBlockBase *FrontierSucc : Frontier->getSuccessors())
+ if (VPPDT.dominates(VPBB, FrontierSucc))
+ AddEdge(Frontier, FrontierSucc, Common);
+ Worklist.insert(cast<VPBasicBlock>(Frontier));
+ }
+ }
+
+ return Edges;
+}
+
+VPValue *VPPredicator::createMaskDisjunction(ArrayRef<EdgeTy> Edges,
+ VPBasicBlock *VPBB) {
+ auto Dsts = map_range(Edges, [](auto E) { return E.second; });
+ const VPBasicBlock *PostDom = *Dsts.begin();
+ for (const VPBasicBlock *VPBB : drop_begin(Dsts))
+ PostDom =
+ cast<VPBasicBlock>(VPPDT.findNearestCommonDominator(PostDom, VPBB));
+ assert(VPPDT.dominates(VPBB, PostDom) && "Edges don't postdominate VPBB");
+ if (PostDom != VPBB)
+ return getBlockInMask(PostDom);
+
+ VPValue *Mask = nullptr;
+ for (auto [Src, Dst] : Edges) {
+ VPValue *EdgeMask;
+ {
+ VPBuilder::InsertPointGuard Guard(Builder);
+ Builder.setInsertPoint(const_cast<VPBasicBlock *>(Dst),
+ InsertPoints[Dst]);
+ EdgeMask = createEdgeMask(Src, Dst);
+ }
+ Mask = Mask ? Builder.createOr(Mask, EdgeMask) : EdgeMask;
+ }
+ return Mask;
+}
+
void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
+ Builder.setInsertPoint(VPBB, InsertPoints[VPBB]);
SmallVector<VPPhi *> Phis;
for (VPRecipeBase &R : VPBB->phis())
Phis.push_back(cast<VPPhi>(&R));
@@ -245,10 +373,30 @@ void VPPredicator::convertPhisToBlends(VPBasicBlock *VPBB) {
continue;
}
+ MapVector<VPValue *, SmallVector<EdgeTy>> InValEdgesMap;
+ for (auto [Edge, Val] : computeBlendEdges(PhiR))
+ InValEdgesMap[Val].push_back(Edge);
+ auto InValEdges = InValEdgesMap.takeVector();
+
+ if (InValEdges.size() == 1) {
+ PhiR->replaceAllUsesWith(InValEdges[0].first);
+ PhiR->eraseFromParent();
+ continue;
+ }
+
+ // Sort the incoming value order to match PhiR as much as possible.
+ llvm::stable_sort(InValEdges, [&PhiR](auto &L, auto &R) {
+ auto InVs = PhiR->incoming_values();
+ return std::distance(InVs.begin(), find(InVs, L.first)) <
+ std::distance(InVs.begin(), find(InVs, R.first));
+ });
+
SmallVector<VPValue *, 2> OperandsWithMask;
- for (const auto &[InVPV, InVPBB] : PhiR->incoming_values_and_blocks()) {
+ for (const auto &[InVPV, Edges] : InValEdges) {
+ if (match(InVPV, m_Poison()))
+ continue;
OperandsWithMask.push_back(InVPV);
- OperandsWithMask.push_back(createEdgeMask(InVPBB, VPBB));
+ OperandsWithMask.push_back(createMaskDisjunction(Edges, VPBB));
}
PHINode *IRPhi = cast_or_null<PHINode>(PhiR->getUnderlyingValue());
auto *Blend =
@@ -276,10 +424,8 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
// Introduce the mask for VPBB, which may introduce needed edge masks, and
// convert all phi recipes of VPBB to blend recipes unless VPBB is the
// header.
- if (VPBB != Header) {
+ if (VPBB != Header)
Predicator.createBlockInMask(VPBB);
- Predicator.convertPhisToBlends(VPBB);
- }
VPValue *BlockMask = Predicator.getBlockInMask(VPBB);
if (!BlockMask)
@@ -292,6 +438,10 @@ void VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan) {
}
}
+ for (VPBlockBase *VPB : reverse(RPOT))
+ if (VPB != Header)
+ Predicator.convertPhisToBlends(cast<VPBasicBlock>(VPB));
+
// Linearize the blocks of the loop into one serial chain.
VPBlockBase *PrevVPBB = nullptr;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
diff --git a/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll b/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll
new file mode 100644
index 0000000000000..4d388a3be84e4
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/VPlan/predicator-early-exit.ll
@@ -0,0 +1,331 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -disable-output < %s -p loop-vectorize -vplan-print-after=introduceMasksAndLinearize -vplan-print-vector-region-scope 2>&1 | FileCheck %s
+
+; Test various CFGs generated by early exits
+
+; vector.body
+; / \
+; / \
+; exiting1 \
+; / \ \
+; / \ |
+; / join1 |
+; / / \ |
+; / exiting2 \ |
+; \ | \ \ |
+; \ | join2
+; \ | /
+; \ | /
+; latch
+define void @multi_early_exit_predicated_nested(ptr %p1, ptr %p2, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'multi_early_exit_predicated_nested'
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT: vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.body:
+; CHECK-NEXT: ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT: Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting1:
+; CHECK-NEXT: Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT: join1:
+; CHECK-NEXT: EMIT vp<[[VP4:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT: EMIT vp<[[VP5:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP4]]>
+; CHECK-NEXT: Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting2:
+; CHECK-NEXT: EMIT vp<[[VP6:%[0-9]+]]> = logical-and vp<[[VP5]]>, ir<%c2>
+; CHECK-NEXT: Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT: join2:
+; CHECK-NEXT: EMIT vp<[[VP7:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT: EMIT vp<[[VP8:%[0-9]+]]> = logical-and vp<[[VP6]]>, vp<[[VP7]]>
+; CHECK-NEXT: EMIT vp<[[VP9:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT: EMIT vp<[[VP10:%[0-9]+]]> = logical-and vp<[[VP5]]>, vp<[[VP9]]>
+; CHECK-NEXT: EMIT vp<[[VP11:%[0-9]+]]> = or vp<[[VP8]]>, vp<[[VP10]]>
+; CHECK-NEXT: EMIT vp<[[VP12:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT: EMIT vp<[[VP13:%[0-9]+]]> = or vp<[[VP11]]>, vp<[[VP12]]>
+; CHECK-NEXT: EMIT vp<[[VP14:%[0-9]+]]> = or vp<[[VP8]]>, vp<[[VP10]]>
+; CHECK-NEXT: BLEND ir<%phi1.join2> = ir<1>/vp<[[VP14]]> ir<0>/vp<[[VP12]]>
+; CHECK-NEXT: EMIT vp<[[VP15:%[0-9]+]]> = or vp<[[VP10]]>, vp<[[VP12]]>
+; CHECK-NEXT: BLEND ir<%phi2.join2> = ir<1>/vp<[[VP8]]> ir<0>/vp<[[VP15]]>
+; CHECK-NEXT: Successor(s): latch
+; CHECK-EMPTY:
+; CHECK-NEXT: latch:
+; CHECK-NEXT: BLEND ir<%phi1> = ir<1>/ir<%c1> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT: BLEND ir<%phi2> = ir<1>/vp<[[VP6]]> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT: EMIT ir<%gep1> = getelementptr ir<%p1>, ir<%iv>
+; CHECK-NEXT: EMIT store ir<%phi1>, ir<%gep1>
+; CHECK-NEXT: EMIT ir<%gep2> = getelementptr ir<%p2>, ir<%iv>
+; CHECK-NEXT: EMIT store ir<%phi2>, ir<%gep2>
+; CHECK-NEXT: EMIT ir<%iv.next> = add ir<%iv>, ir<1>
+; CHECK-NEXT: EMIT ir<%ec> = icmp eq ir<%iv.next>, ir<%n>
+; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<[[VP3]]>, vp<[[VP1:%[0-9]+]]>
+; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<[[VP2:%[0-9]+]]>
+; CHECK-NEXT: No successors
+; CHECK-NEXT: }
+; CHECK-NEXT: Successor(s): middle.block
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [0, %entry], [%iv.next, %latch]
+ br i1 %c1, label %exiting1, label %join2
+
+exiting1:
+ br i1 %ee1, label %latch, label %join1
+
+join1:
+ br i1 %c2, label %exiting2, label %join2
+
+exiting2:
+ br i1 %ee2, label %latch, label %join2
+
+join2:
+ %phi1.join2 = phi i32 [1, %exiting2], [1, %join1], [0, %loop]
+ %phi2.join2 = phi i32 [1, %exiting2], [0, %join1], [0, %loop]
+ br label %latch
+
+latch:
+ %phi1 = phi i32 [1, %exiting1], [1, %exiting2], [%phi1.join2, %join2]
+ %phi2 = phi i32 [poison, %exiting1], [1, %exiting2], [%phi2.join2, %join2]
+ %gep1 = getelementptr i32, ptr %p1, i32 %iv
+ store i32 %phi1, ptr %gep1
+ %gep2 = getelementptr i32, ptr %p2, i32 %iv
+ store i32 %phi2, ptr %gep2
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv.next, %n
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret void
+}
+
+; vector.body
+; / \
+; / \
+; exiting1 /
+; / \ /
+; / \ /
+; / join1
+; / / \
+; / exiting2 \
+; \ | \ \
+; \ | join2
+; \ | /
+; \ | /
+; latch
+define void @multi_early_exit_predicated_not_nested(ptr %p1, ptr %p2, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'multi_early_exit_predicated_not_nested'
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT: vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.body:
+; CHECK-NEXT: ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT: Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting1:
+; CHECK-NEXT: Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT: join1:
+; CHECK-NEXT: EMIT vp<[[VP4:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT: EMIT vp<[[VP5:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP4]]>
+; CHECK-NEXT: EMIT vp<[[VP6:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT: EMIT vp<[[VP7:%[0-9]+]]> = or vp<[[VP5]]>, vp<[[VP6]]>
+; CHECK-NEXT: BLEND ir<%phi.join1> = ir<1>/vp<[[VP5]]> ir<0>/vp<[[VP6]]>
+; CHECK-NEXT: Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting2:
+; CHECK-NEXT: EMIT vp<[[VP8:%[0-9]+]]> = logical-and vp<[[VP7]]>, ir<%c2>
+; CHECK-NEXT: Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT: join2:
+; CHECK-NEXT: EMIT vp<[[VP9:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT: EMIT vp<[[VP10:%[0-9]+]]> = logical-and vp<[[VP8]]>, vp<[[VP9]]>
+; CHECK-NEXT: EMIT vp<[[VP11:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT: EMIT vp<[[VP12:%[0-9]+]]> = logical-and vp<[[VP7]]>, vp<[[VP11]]>
+; CHECK-NEXT: EMIT vp<[[VP13:%[0-9]+]]> = or vp<[[VP10]]>, vp<[[VP12]]>
+; CHECK-NEXT: BLEND ir<%phi.join2> = ir<1>/vp<[[VP10]]> ir<0>/vp<[[VP12]]>
+; CHECK-NEXT: Successor(s): latch
+; CHECK-EMPTY:
+; CHECK-NEXT: latch:
+; CHECK-NEXT: BLEND ir<%phi1> = ir<1>/ir<%c1> ir<0>/vp<[[VP7]]>
+; CHECK-NEXT: BLEND ir<%phi2> = ir<1>/vp<[[VP8]]> ir<0>/vp<[[VP13]]>
+; CHECK-NEXT: EMIT ir<%gep1> = getelementptr ir<%p1>, ir<%iv>
+; CHECK-NEXT: EMIT store ir<%phi1>, ir<%gep1>
+; CHECK-NEXT: EMIT ir<%gep2> = getelementptr ir<%p2>, ir<%iv>
+; CHECK-NEXT: EMIT store ir<%phi2>, ir<%gep2>
+; CHECK-NEXT: EMIT ir<%iv.next> = add ir<%iv>, ir<1>
+; CHECK-NEXT: EMIT ir<%ec> = icmp eq ir<%iv.next>, ir<%n>
+; CHECK-NEXT: EMIT vp<%index.next> = add nuw vp<[[VP3]]>, vp<[[VP1:%[0-9]+]]>
+; CHECK-NEXT: EMIT branch-on-count vp<%index.next>, vp<[[VP2:%[0-9]+]]>
+; CHECK-NEXT: No successors
+; CHECK-NEXT: }
+; CHECK-NEXT: Successor(s): middle.block
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [0, %entry], [%iv.next, %latch]
+ br i1 %c1, label %exiting1, label %join1
+
+exiting1:
+ br i1 %ee1, label %latch, label %join1
+
+join1:
+ %phi.join1 = phi i32 [1, %exiting1], [0, %loop]
+ br i1 %c2, label %exiting2, label %join2
+
+exiting2:
+ br i1 %ee2, label %latch, label %join2
+
+join2:
+ %phi.join2 = phi i32 [1, %exiting2], [0, %join1]
+ br label %latch
+
+latch:
+ %phi1 = phi i32 [1, %exiting1], [%phi.join1, %exiting2], [%phi.join1, %join2]
+ %phi2 = phi i32 [poison, %exiting1], [1, %exiting2], [%phi.join2, %join2]
+ %gep1 = getelementptr i32, ptr %p1, i32 %iv
+ store i32 %phi1, ptr %gep1
+ %gep2 = getelementptr i32, ptr %p2, i32 %iv
+ store i32 %phi2, ptr %gep2
+ %iv.next = add i32 %iv, 1
+ %ec = icmp eq i32 %iv.next, %n
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret void
+}
+
+; vector.body
+; / \
+; / \
+; exiting1 exiting2
+; / \ / \
+; / \ / \
+; / join1 \
+; / / \ \
+; / exiting3 exiting4 /
+; \ \ \ / / /
+; \ \ join2 / /
+; \ \ | / /
+; +---latch-----+
+define void @four_exits_2x2_diamond(ptr %p1, ptr %p2, ptr %p3, ptr %p4, i1 %c1, i1 %c2, i1 %ee1, i1 %ee2, i1 %ee3, i1 %ee4, i32 %n) {
+; CHECK-LABEL: VPlan for loop in 'four_exits_2x2_diamond'
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT: vp<[[VP3:%[0-9]+]]> = CANONICAL-IV
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.body:
+; CHECK-NEXT: ir<%iv> = WIDEN-INDUCTION ir<0>, ir<1>, vp<[[VP0:%[0-9]+]]>
+; CHECK-NEXT: Successor(s): exiting2
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting2:
+; CHECK-NEXT: EMIT vp<[[VP4:%[0-9]+]]> = not ir<%c1>
+; CHECK-NEXT: Successor(s): exiting1
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting1:
+; CHECK-NEXT: Successor(s): join1
+; CHECK-EMPTY:
+; CHECK-NEXT: join1:
+; CHECK-NEXT: EMIT vp<[[VP5:%[0-9]+]]> = not ir<%ee2>
+; CHECK-NEXT: EMIT vp<[[VP6:%[0-9]+]]> = logical-and vp<[[VP4]]>, vp<[[VP5]]>
+; CHECK-NEXT: EMIT vp<[[VP7:%[0-9]+]]> = not ir<%ee1>
+; CHECK-NEXT: EMIT vp<[[VP8:%[0-9]+]]> = logical-and ir<%c1>, vp<[[VP7]]>
+; CHECK-NEXT: EMIT vp<[[VP9:%[0-9]+]]> = or vp<[[VP6]]>, vp<[[VP8]]>
+; CHECK-NEXT: BLEND ir<%phi1.join1> = ir<0>/vp<[[VP6]]> ir<1>/vp<[[VP8]]>
+; CHECK-NEXT: BLEND ir<%phi2.join1> = ir<1>/vp<[[VP6]]> ir<0>/vp<[[VP8]]>
+; CHECK-NEXT: Successor(s): exiting4
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting4:
+; CHECK-NEXT: EMIT vp<[[VP10:%[0-9]+]]> = not ir<%c2>
+; CHECK-NEXT: EMIT vp<[[VP11:%[0-9]+]]> = logical-and vp<[[VP9]]>, vp<[[VP10]]>
+; CHECK-NEXT: Successor(s): exiting3
+; CHECK-EMPTY:
+; CHECK-NEXT: exiting3:
+; CHECK-NEXT: EMIT vp<[[VP12:%[0-9]+]]> = logical-and vp<[[VP9]]>, ir<%c2>
+; CHECK-NEXT: Successor(s): join2
+; CHECK-EMPTY:
+; CHECK-NEXT: join2:
+; CHECK-NEXT: EMIT vp<[[VP13:%[0-9]+]]> = not ir<%ee4>
+; CHECK-NEXT: EMIT vp<[[VP14:%[0-9]+]]> = logical-and vp<[[VP11]]>, vp<[[VP13]]>
+; CHECK-NEXT: EMIT vp<[[VP15:%[0-9]+]]> = not ir<%ee3>
+; CHECK-NEXT: EMIT vp<[[VP16:%[0-9]...
[truncated]
|
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
9366715 to
aedc836
Compare
aedc836 to
672de9b
Compare
#201783 wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.
672de9b to
de9c116
Compare
|
Unstacked now that #201782 is landed |
llvm#201783 wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.
llvm#201783 wants to optimize blend masks by peeking through the contents of other phi nodes. Currently we eagerly convert phis to blends in reverse post order, so switch it to post order so that phis at the bottom can see the phis in their uses.
| SmallVector<EdgeTy> OutEdges; | ||
| for (const VPBlockBase *Succ : VPBB->getSuccessors()) | ||
| OutEdges.emplace_back(VPBB, cast<VPBasicBlock>(Succ)); |
There was a problem hiding this comment.
nit: maybe use from_range ctor + map_range for the cast? Not sure if that would actually work.
There was a problem hiding this comment.
Are you suggesting to turn this into a mapped iterator instead of storing it in a vector? OutEdges gets traversed twice so we'll end up constructing the edges twice if we use map_range
There was a problem hiding this comment.
I was thinking SmallVector<EdgeTy> OutEdges(from_range, map_range(...));, but apparently it doesn't even need/have from_range and I think we can just do
SmallVector<EdgeTy> OutEdges(map_range(...));
(although I didn't actually try, just read through some interfaces).
There was a problem hiding this comment.
I gave it a try and I'm not sure it's more readable:
SmallVector<EdgeTy> OutEdges(
map_range(VPBB->getSuccessors(), [&VPBB](const VPBlockBase *Succ) {
return std::make_pair(VPBB, cast<VPBasicBlock>(Succ));
}));
I also think it's nice to use emplace_back where possible, it's more obvious that it's avoiding a copy
There was a problem hiding this comment.
Either is fine with me. I don't know if mapped_iterator is random access or not, or if SmallVector range ctor can pre-allocate storage for all elements, but theoretically that could be an argument for the ctor version.
* Reuse previous method in DomiananceFrontier * Replace GetAllEqual with a map_range
After thinking about this for a bit this isn't needed. If a phi doesn't postdominate an incoming block, the incoming block will have an outgoing edge with no value. So we won't propagate any further up that incoming block anyway. What differs between this approach and llvm#184838 is that the latter performs a full inverse DFS to see what blocks are reachable, whereas this just checks that the incoming values are the same at each postdominance frontier. The test case phi_doesnt_postdom_incoming shows a scenario where the full inverse DFS approach could simplify the edge to just c1 and !c1, but we calculate the conservative (but still correct) edges in this PR.
|
/test-suite |
|
Ping |
#201784 aims to preserve SSA in early exit loops, and in doing so insert phi nodes. More phi nodes results in more VPBlendRecipes, so this PR optimizes the edge masks generated for those blend recipes to prevent regressions.
The idea is to compute a minimal set of edges that lead to each unique incoming value in a phi. It does this by iterating up the edges in the post dominance frontier till the outgoing edges no longer lead to the same value.
This is a simpler, less general version of #184838 since this can't optimize away edges that aren't postdominated by the phi. This is fine the early exit use case though, since we only need to optimize phi nodes inserted in the latch.
The big advantage over #184838 is that it doesn't require several depth-first searches to compute the set of reachable nodes, and can be done entirely by iterating the post-dominator frontier.