fix(render-plan): rewrite raw CTE-name qualifiers to FROM/JOIN alias#348
Merged
Conversation
Unlocks 8 LDBC queries on Spark/Delta (Category C):
bi-1, bi-2, bi-5, complex-1, complex-4, complex-7, complex-9, complex-11.
Sweep: 15p / 21xf / 5s → 23p / 13xf / 5s.
Problem
-------
After a WITH → CTE barrier, downstream SELECT items could be emitted
qualified by the CTE table name (`with_friend_cte_1.p6_friend_id`)
even when the same CTE was JOINed under a user-facing alias
(`INNER JOIN with_friend_cte_1 AS friend`). ClickHouse silently accepts
either qualifier, but Spark/Delta resolves only against the alias
actually bound in FROM/JOIN, producing UNRESOLVED_COLUMN.
The two qualifier forms come from different render-plan construction
paths (e.g. `rewrite_with_aliases_to_cte` and `rewrite_table_aliases_to_cte`
both build `PropertyAccess { table_alias: cte_name, ... }`).
Fix
---
Extend `fix_orphan_table_aliases_impl` (variable_scope.rs) so its
`alias_replacements` map also includes `cte_name → from_alias` entries
for any CTE referenced from FROM/JOIN. The existing
`rewrite_expr_table_aliases` walker then normalizes any stray CTE-name
qualifier in SELECT/WHERE/GROUP BY/ORDER BY/HAVING/JOIN-ON to the bound
alias. Skips entries where `cte_name == from_alias` (no-op) and uses
`entry().or_insert_with` so prior orphan-alias mappings win on
collision.
Test
----
`cargo test --lib` — 1370 passed.
`CLICKGRAPH_SPARK_TESTS=1 pytest tests/spark_smoke/test_ldbc_sweep.py`
— 23/13/5 (was 15/21/5).
Two remaining Category C entries retagged: bi-14 (same alias `person1`
rebound across 5 chained CTEs — different bug) and complex-3 (schema
mapping bug emitting `t5.CountryId` for Place→Place rel — different
bug).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a Spark/Delta SQL compatibility issue in render-plan aliasing: after a WITH→CTE barrier, expressions could be qualified by the raw CTE table name even when the CTE was bound under a different FROM/JOIN alias, which Spark/Delta does not resolve (but ClickHouse does). The change extends the final alias-rewrite pass so stray with_*_cte_N.col qualifiers are normalized to the actual bound FROM/JOIN alias, unlocking additional LDBC queries in the Spark smoke sweep.
Changes:
- Extend
fix_orphan_table_aliases_implto also rewrite raw CTE-name qualifiers (cte_name → from/join alias) in downstream expressions. - Update the Spark LDBC sweep xfail list to drop the 8 queries now passing and retag remaining Category C failures with more specific reasons.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/render_plan/variable_scope.rs |
Adds CTE-name→FROM/JOIN-alias entries to the alias replacement map so expression qualifiers match the bound alias on Spark/Delta. |
tests/spark_smoke/test_ldbc_sweep.py |
Updates EXPECTED_FAILURES to reflect the newly passing queries and reclassifies remaining C-resolution failures. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unlocks 8 LDBC queries on Spark/Delta (Category C from the sweep):
bi-1,bi-2,bi-5,complex-1,complex-4,complex-7,complex-9,complex-11.Sweep result: 15p / 21xf / 5s → 23p / 13xf / 5s.
Problem
After a WITH→CTE barrier, downstream SELECT items could be emitted qualified by the CTE table name (
with_friend_cte_1.p6_friend_id) even when the same CTE was JOINed under a user-facing alias (INNER JOIN with_friend_cte_1 AS friend). ClickHouse silently accepts either qualifier; Spark/Delta resolves only against the alias actually bound in FROM/JOIN, producingUNRESOLVED_COLUMN.Two render-plan construction paths (
rewrite_with_aliases_to_cte,rewrite_table_aliases_to_cte) buildPropertyAccess { table_alias: cte_name, ... }directly. Both leaked CH-only resolution behaviour into the emitted SQL.Fix
Extend
fix_orphan_table_aliases_impl(src/render_plan/variable_scope.rs) so itsalias_replacementsmap also includescte_name → from_aliasentries for any CTE referenced from FROM/JOIN. The existingrewrite_expr_table_aliaseswalker then normalizes any stray CTE-name qualifier in SELECT/WHERE/GROUP BY/ORDER BY/HAVING/JOIN-ON to the bound alias.cte_name == from_alias(no-op)entry().or_insert_withso prior orphan-alias mappings win on collisionStragglers
Two queries that were tagged
[C]in PR #347 remain failing for distinct reasons (now retagged):person1rebound across 5 chained CTEs.person1.scorein the 5th CTE doesn't resolve against the 4th CTE's schema.t5.CountryIdemitted for aPlace_isPartOf_Placerel (no such column — Place→Place edge usesPlaceId).Test plan
cargo fmt --all && cargo clippy --all-targets --features databrickscleancargo test --lib— 1370 passedCLICKGRAPH_SPARK_TESTS=1 pytest tests/spark_smoke/test_ldbc_sweep.py— 23p / 13xf / 5s (was 15/21/5); no regressions on the original 15 passingpytest --runxfailconfirms the 8 newly-passing queries return correct results, not just no-error🤖 Generated with Claude Code