Skip to content

Constant-scalar and enum-case union fast path in TypeCombinator::intersect#5935

Merged
ondrejmirtes merged 5 commits into
phpstan:2.2.xfrom
SanderMuller:perf-intersect-constant-scalar-unions
Jun 25, 2026
Merged

Constant-scalar and enum-case union fast path in TypeCombinator::intersect#5935
ondrejmirtes merged 5 commits into
phpstan:2.2.xfrom
SanderMuller:perf-intersect-constant-scalar-unions

Conversation

@SanderMuller

@SanderMuller SanderMuller commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Problem

TypeCombinator::intersect() implements A & (B | C) by distributing the members of one union across the other and rebuilding the result with union(). For two unions of width n and m that is O(n*m).

This becomes pathological when a variable is assigned across many === narrowing branches:

function f(int $x): string {
    $v = 0;
    if ($x === 0) $v = "0";
    if ($x === 1) $v = "1";
    // ... one branch per constant ...
    return (string) $v;
}

The conditional-expression machinery repeatedly intersects $v's growing constant-value union with the narrowed consequence union (also a growing constant-value union), so intersect runs O(n) times on width-O(n) operands and analysis grows super-linearly. Reported as phpstan/phpstan#14869.

Fix

When both operands are plain unions whose members are all disjoint constant scalars, the intersection is just their value-keyed set intersection, computable in O(n). The fast path is restricted to the exact UnionType class so BenevolentUnionType and the template union types keep their dedicated handling, and it excludes class-string constant strings (the class-string flag is not captured by the value) and floats (-0.0 / NAN comparison quirks). Anything outside that set falls back to the existing distribution, so behaviour is unchanged.

Numbers

Single file, level 8, single process, result cache cold (hyperfine, N = number of branches):

N before after
100 0.79s 0.52s
200 3.07s 0.78s
400 21.27s ± 0.04 2.48s ± 0.01

That is 8.6x at N=400; the intersect call count on that case drops from ~21M to linear.

Tests

New dataIntersect cases in TypeCombinatorTest covering constant-int, constant-string and mixed-with-null overlaps, the empty (never) result, a single-member result, and a bail back to the general path. TypeCombinatorTest, UnionTypeTest and NodeScopeResolverTest are green, and make phpstan and the coding standard are clean.

This is independent of #5857 (it does not touch TypeCombinator), so it helps both current 2.2.x and the single-pass work once that lands.

Update (review feedback)

Extended the fast path to enum-case unions and mixed constant-scalar + enum-case unions, keyed by class + case name (the identity EnumCaseObjectType::equals() uses). The same blowup with enum cases (a variable assigned enum cases across N === branches over an N-case enum) ran ~30s at N=400 before, ~3.8s after. Backed enum cases are keyed as cases, not by their backing scalar. Added TypeCombinatorTest cases for enum-only, mixed, backed-enum-vs-int, and constant-string-vs-int; NodeScopeResolverTest green.

Intersecting two unions implements `A & (B|C)` by distributing the
members of one union across the other, which is O(n*m). When both
unions consist only of disjoint constant scalars, the result is just
their value-keyed set intersection, computable in O(n).

This shows up when a variable is assigned across many `===` narrowing
branches (phpstan/phpstan#14869): the conditional-expression machinery
repeatedly intersects the variable's growing constant-value union with
the narrowed consequence union, so analysis grows super-linearly. On
the reproducer from that issue, a single file with N=400 branches goes
from 21.3s to 2.5s (8.6x), with the intersect call count dropping from
~21M to linear.

The fast path is restricted to the exact UnionType class so
BenevolentUnionType and the template unions keep their dedicated
handling. Class-string constant strings and floats are excluded
because they are not safe to compare by value alone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@SanderMuller

Copy link
Copy Markdown
Contributor Author

Related: #5934 also speeds up TypeCombinator from this reproducer, but at a different layer (it makes the post-distribution usort conditional). This PR returns before the distribution for constant-scalar unions, so the two do not overlap and merge cleanly together (verified locally, TypeCombinatorTest green). Complementary, not competing.

@ondrejmirtes ondrejmirtes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to apply this to a union of enum cases as well. And potentially to unions that have both constant scalars + enum cases.

But first try to reproduce the same pathological case, should be slow before the optimization.

Per review feedback: the same intersect() distribution blowup happens
for unions of enum cases (a variable assigned enum cases across many
=== branches over a large enum), about 30s at N=400 before.

Key enum-case members by class + case name, the identity
EnumCaseObjectType::equals() uses (describe() would also fold in a
subtracted type, which equals() ignores, so a narrowed case and a bare
one would get different keys). The enum check runs before the constant
scalar branch so a backed enum case is keyed as a case, not by its
backing scalar. The fast path now also covers mixed constant-scalar +
enum-case unions. Same reproducer: ~30s -> 3.8s at N=400.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@SanderMuller SanderMuller changed the title Constant-scalar-union fast path in TypeCombinator::intersect Constant-scalar and enum-case union fast path in TypeCombinator::intersect Jun 25, 2026
@SanderMuller

Copy link
Copy Markdown
Contributor Author

Done, and I reproduced it first as you suggested.

Reproduction: the same shape with enum cases (a variable assigned enum cases across N === branches over an N-case enum) hits the same intersect() distribution, about 30s at N=400 before the optimization.

The fast path now keys enum-case members by class + case name, so it covers pure enum-case unions and mixed constant-scalar + enum-case unions. Same reproducer: about 30s to 3.8s at N=400, and the mixed int|enum shape drops similarly.

One note in case it is useful: I first keyed enum cases by describe(VerbosityLevel::cache()) the way union() does, but that folds in a subtracted type that EnumCaseObjectType::equals() ignores, so a narrowed Enum::Case and a bare one got different keys and the intersection came out as never. NodeScopeResolverTest caught it on the in_array/enum narrowing cases. Keying by class + case name fixes it and matches what equals() compares. Backed enum cases are keyed as cases, not by their backing scalar, so TestEnum::TWO (= 2) is not confused with the integer 2.

New TypeCombinatorTest cases cover enum-only unions, mixed scalar + enum, the backed-enum-vs-int distinction, and constant-string-vs-int. TypeCombinatorTest and NodeScopeResolverTest are green.

Comment thread src/Type/TypeCombinator.php
@ondrejmirtes

Copy link
Copy Markdown
Member

A failure in the test suite. I'm fine with updating the error message expectation tbh.

getEnumCaseObject() returns the case not only for a bare EnumCaseObjectType
but also for a refined member - an intersection like $this & Enum::Case, a
whole single-case enum, or an enum subtracted to one case. Keying those by
class + case name collapsed the refinement (e.g. $this(Foo)&Foo::BAR became
a bare Foo::BAR), which MatchExpressionRuleTest caught.

Guard the enum branch with EnumCaseObjectType::equals(): a bare case equals
its own getEnumCaseObject(), a refined member does not (equals() requires an
EnumCaseObjectType), so refined members bail to the slow path and keep their
form. Also skip building the second member map when the first already bails
(staabm review note).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@SanderMuller

Copy link
Copy Markdown
Contributor Author

Thanks. I looked into that failure and it turned out not to be cosmetic: the fast path was collapsing a refined enum-case type. getEnumCaseObject() returns the case not just for a bare enum case but also for a refined member like $this(Foo)&Foo::BAR (and whole single-case enums, and enums subtracted to one case), so keying those by class + case name dropped the refinement and the match arm's type changed.

I guarded the enum branch with EnumCaseObjectType::equals(): a bare case equals its own getEnumCaseObject(), a refined member does not (equals() requires an EnumCaseObjectType), so refined members fall back to the slow path and keep their form. MatchExpressionRuleTest::testLastCondition now passes unchanged, so no expectation update is needed. I also applied @staabm's note to skip building the second member map when the first already bails.

Sorry for the red: I had validated on a subset and missed the Rules-layer test, so I ran the full suite this time. It is green now (the one local failure I see is an unrelated, pre-existing signature-map test that also fails on the base, a dev-PHP-build artifact).

@ondrejmirtes

Copy link
Copy Markdown
Member

One more thing - please put a code sample which was slow and is fast now into tests/bench/data with a unique namespace. No more is required, I'll push a regenerated phpbench baseline once this is merged.

SanderMuller and others added 2 commits June 25, 2026 22:14
The #14869 reproducer: a variable assigned under many `$x === <const>`
branches. The conditional-expression machinery repeatedly intersects the
variable's growing constant-value union with the narrowed union, which the
finite-union fast path turns from super-linear into an O(n) set intersection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The enum-case branch of the fast path: a variable assigned enum cases under
many `$x === Enum::Case` branches over a large backed enum. Same intersect
blowup as the constant-scalar case (~30s at N=400 before), collapsed to an
O(n) set intersection by keying members on class + case name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@SanderMuller

Copy link
Copy Markdown
Contributor Author

One more thing - please put a code sample which was slow and is fast now into tests/bench/data with a unique namespace. No more is required, I'll push a regenerated phpbench baseline once this is merged.

Added one for the scalar and one for the enum case, can you review them?

@ondrejmirtes ondrejmirtes merged commit 041824d into phpstan:2.2.x Jun 25, 2026
671 of 672 checks passed
@ondrejmirtes

Copy link
Copy Markdown
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants