Skip to content

JIT: Merge all RETURN/THROW blocks#128515

Merged
EgorBo merged 11 commits into
dotnet:mainfrom
BoyBaykiller:deduplicate-all-return-throw-blocks
Jun 5, 2026
Merged

JIT: Merge all RETURN/THROW blocks#128515
EgorBo merged 11 commits into
dotnet:mainfrom
BoyBaykiller:deduplicate-all-return-throw-blocks

Conversation

@BoyBaykiller
Copy link
Copy Markdown
Contributor

Fix #128514

tailMergePreds(nullptr) was called once, but my understanding is it needs to be called repeatedly as it only processes one set at at time.

@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 23, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label May 23, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

@AndyAyersMS PTAL.

Comment thread src/coreclr/jit/fgopt.cpp Outdated
Copy link
Copy Markdown
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this without repeatedly searching all blocks for returns and throws?

Comment thread src/coreclr/jit/fgopt.cpp Outdated
…of reinvoking and re-gathering candidates every timme

* hack to suppress positive diffs
@AndyAyersMS
Copy link
Copy Markdown
Member

I recommend keeping refactoring/renaming changes and functionality in separate PRs, otherwise reviews are more likely to miss important things.

Also, does tail merging returns lead to new tail merge opportunities like it does for other blocks (eg should we be populating "retry blocks")?

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

BoyBaykiller commented May 27, 2026

Also, does tail merging returns lead to new tail merge opportunities like it does for other blocks (eg should we be populating "retry blocks")?

Yes, deduplicating return blocks often does expose new opportunities to tail merge. We are already pushing merged blocks to the retryBlocks stack:

// We should try tail merging the cross jump target.
//
retryBlocks.Push(crossJumpTarget);


Here is an example (for myself to harden understanding):

static int Example(bool cond1, bool cond2, ref int x, ref int y)
{
    if (cond1)
    {
        y = 8;
        x = 9;
        return 10; 
    }
    if (cond2)
    {
        y = 8;
        x = 9;
        return 10;
    }
    return 2;
}

First we pull out the return 10; statement:

A set of 2 return/throw blocks end with the same tree
STMT00005 ( 0x017[E--] ... 0x019 )
               [000017] -----------                         *  RETURN    int   
               [000016] -----------                         \--*  CNS_INT   int    10
New Basic Block BB06 [0005] created.
setting likelihood of BB02 -> BB06 to 1
Will cross-jump to newly split off BB06

unlinking STMT00005 ( 0x017[E--] ... 0x019 )
               [000017] -----------                         *  RETURN    int   
               [000016] -----------                         \--*  CNS_INT   int    10
 from BB04
setting likelihood of BB04 -> BB06 to 1
Deduplicated 1 set of return/throw blocks

After that we look at the predecessors of the new return 10; block - more specifically their last statements - and discover they are also the same. So it get's sunken into the return 10; block:

All 2 preds of BB06 end with the same tree, moving
STMT00004 ( 0x013[E--] ... 0x016 )
               [000015] -A-XG------                         *  STOREIND  int   
               [000013] -----------                         +--*  LCL_VAR   byref  V02 arg2         
               [000014] -----------                         \--*  CNS_INT   int    9

unlinking STMT00004 ( 0x013[E--] ... 0x016 )
               [000015] -A-XG------                         *  STOREIND  int   
               [000013] -----------                         +--*  LCL_VAR   byref  V02 arg2         
               [000014] -----------                         \--*  CNS_INT   int    9
 from BB04

unlinking STMT00007 ( 0x006[E--] ... 0x009 )
               [000023] -A-XG------                         *  STOREIND  int   
               [000021] -----------                         +--*  LCL_VAR   byref  V02 arg2         
               [000022] -----------                         \--*  CNS_INT   int    9
 from BB02
Merged 1 set of tails going into BB06

And so one-by-one we work ourselves through the equivalent statements. Regathering predecessors at each step.

Note: For some cases we might be able to consider tails equivalent even though their exact stmt order isnt the same (?), granted they can be re-ordered accordingly.

Update: I just moved de-duplicating return/throw blocks before tail merging and no longer pushing to retryBlocks and that has no diffs.

…f using a BitVec to sparsely mark them as processed

* move de-duplication before tail-merging and then no longer add them to the retry list as it isnt needed
* use stl iterator tag to be able to call std::stable_partition
* and assert to vector indexer
…s in downstream phases because the way we choose the crossJumpVictim is order-dependent and non optimal (for example we'd want to avoid new BBF_NEEDS_GCPOLL)

* also remove the std::reverse - same reason
@AndyAyersMS
Copy link
Copy Markdown
Member

How about something like this (diff is vs main). We could collect also return and throw separately to save a bit of time.

diff --git a/src/coreclr/jit/fgopt.cpp b/src/coreclr/jit/fgopt.cpp
index ffa3d88cba3..d43078f66b4 100644
--- a/src/coreclr/jit/fgopt.cpp
+++ b/src/coreclr/jit/fgopt.cpp
@@ -5492,13 +5492,25 @@ PhaseStatus Compiler::fgHeadTailMerge(bool early)
         }
     }

-    predInfo.Reset();
-    for (BasicBlock* const block : retOrThrowBlocks.BottomUpOrder())
+    if (retOrThrowBlocks.Height() > 1)
     {
-        predInfo.Push(PredInfo(block, block->lastStmt()));
-    }
+        JITDUMP("Trying tail merge of return and throw blocks\n");
+
+        for (int i = 0; i < retOrThrowBlocks.Height() - 1; i++)
+        {
+            predInfo.Reset();
+            for (int j = i; j < retOrThrowBlocks.Height(); j++)
+            {
+                BasicBlock* const block = retOrThrowBlocks.TopRef(j);
+                predInfo.Push(PredInfo(block, block->lastStmt()));
+            }

-    tailMergePreds(nullptr);
+            if tailMergePreds(nullptr)
+            {
+                numOpts++;
+            }
+        }
+    }

     // Work through any retries
     //

plus a check in tailMergePreds that the blocks are the same kind:

diff --git a/src/coreclr/jit/fgopt.cpp b/src/coreclr/jit/fgopt.cpp
index ffa3d88cba3..ae34f2c7df7 100644
--- a/src/coreclr/jit/fgopt.cpp
+++ b/src/coreclr/jit/fgopt.cpp
@@ -5166,6 +5166,11 @@ PhaseStatus Compiler::fgHeadTailMerge(bool early)
             {
                 BasicBlock* const otherBlock = predInfo.TopRef(j).m_block;

+                if (baseBlock->GetKind() != otherBlock->GetKind())
+                {
+                    continue;
+                }
+
                 // Consider: bypass this for statements that can't cause exceptions.
                 //
                 if (!BasicBlock::sameEHRegion(baseBlock, otherBlock))

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

Even with the if (baseBlock->GetKind() != otherBlock->GetKind()) check this doesn't skip already processed candidates. Let's see what happens if we run it on this again:

static int Example(bool cond1, bool cond2, ref int x, ref int y)
{
    if (cond1)
    {
        y = 8;
        x = 9;
        return 10; 
    }
    if (cond2)
    {
        y = 8;
        x = 9;
        return 10;
    }
    return 2;
}

On the first pass predInfo contains [return 2, return 10, return 10].
tailMergePreds(nullptr) then de-duplicates the return 10:

A set of 2 return blocks end with the same tree
STMT00005 ( 0x017[E--] ... 0x019 )
               [000017] -----------                         *  RETURN    int   
               [000016] -----------                         \--*  CNS_INT   int    10
New Basic Block BB06 [0005] created.
setting likelihood of BB04 -> BB06 to 1
Will cross-jump to newly split off BB06

unlinking STMT00008 ( 0x00A[E--] ... 0x00C )
               [000025] -----------                         *  RETURN    int   
               [000024] -----------                         \--*  CNS_INT   int    10
 from BB02
setting likelihood of BB02 -> BB06 to 1

On the second pass things start getting weird. predInfo now contains [store(9), store(9)]. The issue is the blocks in retOrThrowBlocks aren't BBJ_RETURN/BBJ_THROW anymore! And then we get:

A set of 2 return blocks end with the same tree
STMT00004 ( 0x013[E--] ... 0x016 )
               [000015] -A-XG------                         *  STOREIND  int   
               [000013] -----------                         +--*  LCL_VAR   byref  V02 arg2         
               [000014] -----------                         \--*  CNS_INT   int    9

Which is unexpected because thats not a return stmt of course.
Ultimately I get Assertion failed '!"Invalid block preds".

@AndyAyersMS
Copy link
Copy Markdown
Member

AndyAyersMS commented Jun 3, 2026

Yeah, just filter those out?

@@ -5492,13 +5497,40 @@ PhaseStatus Compiler::fgHeadTailMerge(bool early)
         }
     }

-    predInfo.Reset();
-    for (BasicBlock* const block : retOrThrowBlocks.BottomUpOrder())
+    if (retOrThrowBlocks.Height() > 1)
     {
-        predInfo.Push(PredInfo(block, block->lastStmt()));
-    }
+        JITDUMP("Trying tail merge of return and throw blocks\n");
+
+        for (int i = 0; i < retOrThrowBlocks.Height() - 1; i++)
+        {
+            BasicBlock* const block = retOrThrowBlocks.TopRef(i);
+
+            // If this block was already merged, skip it
+            //
+            if (!block->KindIs(BBJ_RETURN, BBJ_THROW))
+            {
+                continue;
+            }

-    tailMergePreds(nullptr);
+            predInfo.Reset();
+            for (int j = i; j < retOrThrowBlocks.Height(); j++)
+            {
+                BasicBlock* const otherBlock = retOrThrowBlocks.TopRef(j);
+
+                if (otherBlock->GetKind() != block->GetKind())
+                {
+                    continue;
+                }
+
+                predInfo.Push(PredInfo(otherBlock, otherBlock->lastStmt()));
+            }
+
+            if tailMergePreds(nullptr)
+            {
+                numOpts++;
+            }
+        }
+    }

(then you don't need the other diff, as when tail merging throws or rets there are only throw or ret candidates, respectively)

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

That makes sense and seems to work.

What I don't understand: Why do you prefer to add code on top, instead of improving the underlying tailMergePreds to handle all sets at once? Like the PR currently does.

@AndyAyersMS
Copy link
Copy Markdown
Member

Why do you prefer to add code on top, instead of improving the underlying tailMergePreds to handle all sets at once? Like the PR currently does.

We generally prefer to keep refactoring/efficiency improvements separate from extending functionality. The first kind of change can be zero diff, which helps assure us that nothing got broken, and we can get a clean read on the TP improvement it offers.

@BoyBaykiller BoyBaykiller marked this pull request as ready for review June 3, 2026 22:50
@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

@AndyAyersMS PTAL.
Should be in its least invasive form now. diffs

@AndyAyersMS
Copy link
Copy Markdown
Member

@AndyAyersMS PTAL. Should be in its least invasive form now. diffs

What lead you to that last commit?

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

What lead you to that last commit?

The previous code had "hidden diffs" from populating predInfo in a different order than what was previously done. This matters because the way we choose the crossJumpVictim is sensitive to the order of items in predInfo:

for (PredInfo& info : matchedPredInfo.TopDownOrder())
{
Statement* const stmt = info.m_stmt;
BasicBlock* const predBlock = info.m_block;
// Never pick the init block as the victim as that would
// cause us to add a predecessor to it, which is invalid.
if (predBlock == fgFirstBB)
{
continue;
}
bool const isNoSplit = stmt == predBlock->firstStmt();
bool const isFallThrough = (predBlock->KindIs(BBJ_ALWAYS) && predBlock->JumpsToNext());
// Is this block possibly better than what we have?
//
bool useBlock = false;
if (crossJumpVictim == nullptr)
{
// Pick an initial candidate.
useBlock = true;
}
else if (isNoSplit && isFallThrough)
{
// This is the ideal choice.
//
useBlock = true;
}
else if (!haveNoSplitVictim && isNoSplit)
{
useBlock = true;
}
else if (!haveNoSplitVictim && !haveFallThroughVictim && isFallThrough)
{
useBlock = true;
}
if (useBlock)
{
crossJumpVictim = predBlock;
crossJumpStmt = stmt;
haveNoSplitVictim = isNoSplit;
haveFallThroughVictim = isFallThrough;
}
// If we have the perfect victim, stop looking.
//
if (haveNoSplitVictim && haveFallThroughVictim)
{
break;
}
}

There may have also been other things at play, but I am certain this is one factor.
Didn't look further into it because the way its done in the "new commit" makes overall more sense to me.

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

For the record, this order sensitivity is also the reason why instead of std::partition I was playing with std::stable_partition (to preserve order). c4e8252

Also changing the order on purpose and observing the regressions (as well as improvements) is a great strategy for finding better heuristics of picking the crossJumpVictim.

@AndyAyersMS
Copy link
Copy Markdown
Member

Ok, thanks... if there are better ways of figuring out which block to cross jump to, then we should investigate too.

If this is an area you're interested in working on, I think the idea of partial tree merges during tail merging is likely to yield a nice benefit, especially merging of similar looking calls with different (simple) args. Calls expand into a lot of code and also wreak havok with LSRA (inevitably all the calls interfere with one another so collapsing N into 1 would likely save quite a bit of code size, compile time, and might even produce faster code).

I couldn't find an easy way to do it (though I also didn't try very hard). When the tree compares fail you're left with no info about where the matches failed, and plumbing through partial match recording while allowing for commutative swaps seemed quite messy.

The rough idea would be to try and match and then assess in some (as of yet unspecified fashion) whether introducing temps for all the things that now need to cross block boundaries would be "worth it".

@AndyAyersMS
Copy link
Copy Markdown
Member

Diffs -- interestingly enough this causes size increases on Wasm.

Wasm is a bit of a strange beast, we might actually be better off not tail merging returns there (since epilogs are basically empty). But ok as is for now.

@AndyAyersMS
Copy link
Copy Markdown
Member

Also tail merging may not be a performance improvement. We'll have to keep an eye out for regressions (will see them next week).

@dotnet/jit-contrib FYI

Copy link
Copy Markdown
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo want to do a secondary review?

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Jun 5, 2026

/ba-g timeouts

@EgorBo EgorBo merged commit 503cdd1 into dotnet:main Jun 5, 2026
136 of 139 checks passed
@dotnet-milestone-bot dotnet-milestone-bot Bot added this to the 11.0-preview6 milestone Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JIT: Missing deduplication of RETURN block causes switch recognition to miss JTRUEs

3 participants