Skip to content

fix(render-plan): rewrite raw CTE-name qualifiers to FROM/JOIN alias#348

Merged
genezhang merged 1 commit into
mainfrom
fix/cte-name-qualifier-rewrite
May 18, 2026
Merged

fix(render-plan): rewrite raw CTE-name qualifiers to FROM/JOIN alias#348
genezhang merged 1 commit into
mainfrom
fix/cte-name-qualifier-rewrite

Conversation

@genezhang

Copy link
Copy Markdown
Owner

Summary

Unlocks 8 LDBC queries on Spark/Delta (Category C from the sweep): bi-1, bi-2, bi-5, complex-1, complex-4, complex-7, complex-9, complex-11.

Sweep result: 15p / 21xf / 5s → 23p / 13xf / 5s.

Problem

After a WITH→CTE barrier, downstream SELECT items could be emitted qualified by the CTE table name (with_friend_cte_1.p6_friend_id) even when the same CTE was JOINed under a user-facing alias (INNER JOIN with_friend_cte_1 AS friend). ClickHouse silently accepts either qualifier; Spark/Delta resolves only against the alias actually bound in FROM/JOIN, producing UNRESOLVED_COLUMN.

Two render-plan construction paths (rewrite_with_aliases_to_cte, rewrite_table_aliases_to_cte) build PropertyAccess { table_alias: cte_name, ... } directly. Both leaked CH-only resolution behaviour into the emitted SQL.

Fix

Extend fix_orphan_table_aliases_impl (src/render_plan/variable_scope.rs) so its alias_replacements map also includes cte_name → from_alias entries for any CTE referenced from FROM/JOIN. The existing rewrite_expr_table_aliases walker then normalizes any stray CTE-name qualifier in SELECT/WHERE/GROUP BY/ORDER BY/HAVING/JOIN-ON to the bound alias.

  • Skips entries where cte_name == from_alias (no-op)
  • Uses entry().or_insert_with so prior orphan-alias mappings win on collision

Stragglers

Two queries that were tagged [C] in PR #347 remain failing for distinct reasons (now retagged):

  • bi-14 — same alias person1 rebound across 5 chained CTEs. person1.score in the 5th CTE doesn't resolve against the 4th CTE's schema.
  • complex-3t5.CountryId emitted for a Place_isPartOf_Place rel (no such column — Place→Place edge uses PlaceId).

Test plan

  • cargo fmt --all && cargo clippy --all-targets --features databricks clean
  • cargo test --lib — 1370 passed
  • CLICKGRAPH_SPARK_TESTS=1 pytest tests/spark_smoke/test_ldbc_sweep.py23p / 13xf / 5s (was 15/21/5); no regressions on the original 15 passing
  • pytest --runxfail confirms the 8 newly-passing queries return correct results, not just no-error

🤖 Generated with Claude Code

Unlocks 8 LDBC queries on Spark/Delta (Category C):
bi-1, bi-2, bi-5, complex-1, complex-4, complex-7, complex-9, complex-11.

Sweep: 15p / 21xf / 5s → 23p / 13xf / 5s.

Problem
-------
After a WITH → CTE barrier, downstream SELECT items could be emitted
qualified by the CTE table name (`with_friend_cte_1.p6_friend_id`)
even when the same CTE was JOINed under a user-facing alias
(`INNER JOIN with_friend_cte_1 AS friend`). ClickHouse silently accepts
either qualifier, but Spark/Delta resolves only against the alias
actually bound in FROM/JOIN, producing UNRESOLVED_COLUMN.

The two qualifier forms come from different render-plan construction
paths (e.g. `rewrite_with_aliases_to_cte` and `rewrite_table_aliases_to_cte`
both build `PropertyAccess { table_alias: cte_name, ... }`).

Fix
---
Extend `fix_orphan_table_aliases_impl` (variable_scope.rs) so its
`alias_replacements` map also includes `cte_name → from_alias` entries
for any CTE referenced from FROM/JOIN. The existing
`rewrite_expr_table_aliases` walker then normalizes any stray CTE-name
qualifier in SELECT/WHERE/GROUP BY/ORDER BY/HAVING/JOIN-ON to the bound
alias. Skips entries where `cte_name == from_alias` (no-op) and uses
`entry().or_insert_with` so prior orphan-alias mappings win on
collision.

Test
----
`cargo test --lib` — 1370 passed.
`CLICKGRAPH_SPARK_TESTS=1 pytest tests/spark_smoke/test_ldbc_sweep.py`
— 23/13/5 (was 15/21/5).

Two remaining Category C entries retagged: bi-14 (same alias `person1`
rebound across 5 chained CTEs — different bug) and complex-3 (schema
mapping bug emitting `t5.CountryId` for Place→Place rel — different
bug).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 18, 2026 17:15

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a Spark/Delta SQL compatibility issue in render-plan aliasing: after a WITH→CTE barrier, expressions could be qualified by the raw CTE table name even when the CTE was bound under a different FROM/JOIN alias, which Spark/Delta does not resolve (but ClickHouse does). The change extends the final alias-rewrite pass so stray with_*_cte_N.col qualifiers are normalized to the actual bound FROM/JOIN alias, unlocking additional LDBC queries in the Spark smoke sweep.

Changes:

  • Extend fix_orphan_table_aliases_impl to also rewrite raw CTE-name qualifiers (cte_name → from/join alias) in downstream expressions.
  • Update the Spark LDBC sweep xfail list to drop the 8 queries now passing and retag remaining Category C failures with more specific reasons.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/render_plan/variable_scope.rs Adds CTE-name→FROM/JOIN-alias entries to the alias replacement map so expression qualifiers match the bound alias on Spark/Delta.
tests/spark_smoke/test_ldbc_sweep.py Updates EXPECTED_FAILURES to reflect the newly passing queries and reclassifies remaining C-resolution failures.

@genezhang genezhang merged commit f3c2305 into main May 18, 2026
5 checks passed
@genezhang genezhang deleted the fix/cte-name-qualifier-rewrite branch May 18, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants