Constant-scalar and enum-case union fast path in TypeCombinator::intersect#5935
Conversation
Intersecting two unions implements `A & (B|C)` by distributing the members of one union across the other, which is O(n*m). When both unions consist only of disjoint constant scalars, the result is just their value-keyed set intersection, computable in O(n). This shows up when a variable is assigned across many `===` narrowing branches (phpstan/phpstan#14869): the conditional-expression machinery repeatedly intersects the variable's growing constant-value union with the narrowed consequence union, so analysis grows super-linearly. On the reproducer from that issue, a single file with N=400 branches goes from 21.3s to 2.5s (8.6x), with the intersect call count dropping from ~21M to linear. The fast path is restricted to the exact UnionType class so BenevolentUnionType and the template unions keep their dedicated handling. Class-string constant strings and floats are excluded because they are not safe to compare by value alone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Related: #5934 also speeds up |
ondrejmirtes
left a comment
There was a problem hiding this comment.
Please try to apply this to a union of enum cases as well. And potentially to unions that have both constant scalars + enum cases.
But first try to reproduce the same pathological case, should be slow before the optimization.
Per review feedback: the same intersect() distribution blowup happens for unions of enum cases (a variable assigned enum cases across many === branches over a large enum), about 30s at N=400 before. Key enum-case members by class + case name, the identity EnumCaseObjectType::equals() uses (describe() would also fold in a subtracted type, which equals() ignores, so a narrowed case and a bare one would get different keys). The enum check runs before the constant scalar branch so a backed enum case is keyed as a case, not by its backing scalar. The fast path now also covers mixed constant-scalar + enum-case unions. Same reproducer: ~30s -> 3.8s at N=400. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Done, and I reproduced it first as you suggested. Reproduction: the same shape with enum cases (a variable assigned enum cases across N The fast path now keys enum-case members by class + case name, so it covers pure enum-case unions and mixed constant-scalar + enum-case unions. Same reproducer: about 30s to 3.8s at N=400, and the mixed One note in case it is useful: I first keyed enum cases by New |
|
A failure in the test suite. I'm fine with updating the error message expectation tbh. |
getEnumCaseObject() returns the case not only for a bare EnumCaseObjectType but also for a refined member - an intersection like $this & Enum::Case, a whole single-case enum, or an enum subtracted to one case. Keying those by class + case name collapsed the refinement (e.g. $this(Foo)&Foo::BAR became a bare Foo::BAR), which MatchExpressionRuleTest caught. Guard the enum branch with EnumCaseObjectType::equals(): a bare case equals its own getEnumCaseObject(), a refined member does not (equals() requires an EnumCaseObjectType), so refined members bail to the slow path and keep their form. Also skip building the second member map when the first already bails (staabm review note). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks. I looked into that failure and it turned out not to be cosmetic: the fast path was collapsing a refined enum-case type. I guarded the enum branch with Sorry for the red: I had validated on a subset and missed the Rules-layer test, so I ran the full suite this time. It is green now (the one local failure I see is an unrelated, pre-existing signature-map test that also fails on the base, a dev-PHP-build artifact). |
|
One more thing - please put a code sample which was slow and is fast now into tests/bench/data with a unique namespace. No more is required, I'll push a regenerated phpbench baseline once this is merged. |
The #14869 reproducer: a variable assigned under many `$x === <const>` branches. The conditional-expression machinery repeatedly intersects the variable's growing constant-value union with the narrowed union, which the finite-union fast path turns from super-linear into an O(n) set intersection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The enum-case branch of the fast path: a variable assigned enum cases under many `$x === Enum::Case` branches over a large backed enum. Same intersect blowup as the constant-scalar case (~30s at N=400 before), collapsed to an O(n) set intersection by keying members on class + case name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Added one for the scalar and one for the enum case, can you review them? |
|
Thank you! |
Problem
TypeCombinator::intersect()implementsA & (B | C)by distributing the members of one union across the other and rebuilding the result withunion(). For two unions of widthnandmthat isO(n*m).This becomes pathological when a variable is assigned across many
===narrowing branches:The conditional-expression machinery repeatedly intersects
$v's growing constant-value union with the narrowed consequence union (also a growing constant-value union), sointersectrunsO(n)times on width-O(n)operands and analysis grows super-linearly. Reported as phpstan/phpstan#14869.Fix
When both operands are plain unions whose members are all disjoint constant scalars, the intersection is just their value-keyed set intersection, computable in
O(n). The fast path is restricted to the exactUnionTypeclass soBenevolentUnionTypeand the template union types keep their dedicated handling, and it excludes class-string constant strings (the class-string flag is not captured by the value) and floats (-0.0/NANcomparison quirks). Anything outside that set falls back to the existing distribution, so behaviour is unchanged.Numbers
Single file, level 8, single process, result cache cold (
hyperfine, N = number of branches):That is 8.6x at N=400; the
intersectcall count on that case drops from ~21M to linear.Tests
New
dataIntersectcases inTypeCombinatorTestcovering constant-int, constant-string and mixed-with-null overlaps, the empty (never) result, a single-member result, and a bail back to the general path.TypeCombinatorTest,UnionTypeTestandNodeScopeResolverTestare green, andmake phpstanand the coding standard are clean.This is independent of #5857 (it does not touch
TypeCombinator), so it helps both current 2.2.x and the single-pass work once that lands.Update (review feedback)
Extended the fast path to enum-case unions and mixed constant-scalar + enum-case unions, keyed by class + case name (the identity
EnumCaseObjectType::equals()uses). The same blowup with enum cases (a variable assigned enum cases across N===branches over an N-case enum) ran ~30s at N=400 before, ~3.8s after. Backed enum cases are keyed as cases, not by their backing scalar. AddedTypeCombinatorTestcases for enum-only, mixed, backed-enum-vs-int, and constant-string-vs-int;NodeScopeResolverTestgreen.