feat(dialect): route alias quoting through FunctionMapper (Phase 1.4)#325
Merged
Conversation
Adds `FunctionMapper::quote_alias(&name)` so the rendering layer emits CH's `AS "alias"` (historical double-quote form) vs Spark's `AS \`alias\`` automatically based on the active task-local dialect. Spark parses `"name"` as a string literal, so backticks are mandatory there. CH accepts both — kept double-quote form to minimize diff against existing CH SQL output. Routed sites (~8 across 2 files): - `to_sql_query.rs`: 4 sites — AS-clause emission in SELECT items, agg-arg columns, agg-with-bare-alias fallback, TableAlias/UNWIND path - `plan_builder_helpers.rs`: 3 sites — ColumnAlias references in outer SELECT projection, aggregate args, and GROUP BY when rewriting inner-aggregation queries Spike test now asserts both: - CH: `AS "b.id"` (and NO backticks in AS position) - Databricks: `` AS `b.id` `` (and NO double quotes) Verified end-to-end Databricks output for the VLP test query now matches Spark identifier-quoting rules. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR advances dialect-aware SQL rendering by routing column alias quoting through FunctionMapper, preserving ClickHouse double-quoted aliases while emitting Spark/Databricks backtick aliases.
Changes:
- Adds
FunctionMapper::quote_aliaswith ClickHouse and Databricks implementations. - Replaces several hardcoded
AS "alias"render sites with dialect-aware alias quoting. - Extends Databricks spike tests and module notes for Phase 1.4 alias quoting.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/sql_generator/function_mapper/mod.rs |
Adds the alias quoting method to the mapper trait. |
src/sql_generator/function_mapper/databricks.rs |
Implements Databricks backtick alias quoting and tests it. |
src/sql_generator/function_mapper/clickhouse.rs |
Implements ClickHouse double-quote alias quoting. |
src/sql_generator/emitters/clickhouse/to_sql_query.rs |
Routes SELECT alias rendering through the current function mapper. |
src/render_plan/plan_builder_helpers.rs |
Routes aggregate/union alias reference rendering through the mapper. |
src/render_plan/tests/databricks_emit_spike_tests.rs |
Updates spike documentation and assertions for dialect-specific alias quoting. |
| // Spark parses `"foo"` as a string literal — backticks are the | ||
| // only valid identifier quote. `quote_identifier` uses backticks | ||
| // for both dialects so this stays consistent with that. | ||
| format!("`{name}`") |
| } | ||
|
|
||
| fn quote_alias(&self, name: &str) -> String { | ||
| format!("\"{name}\"") |
Comment on lines
+89
to
+95
| /// Quote a column alias for an `AS` clause. CH: `"name"` (also | ||
| /// accepts backticks but the existing pipeline emits double quotes | ||
| /// here historically). Spark: `` `name` `` — Spark parses `"name"` | ||
| /// as a string literal, so backticks are mandatory. The bare | ||
| /// `quote_identifier` helper in `common.rs` is a separate concern | ||
| /// (it already uses backticks for both dialects since both accept | ||
| /// them for column refs). |
Comment on lines
+49
to
+51
| //! the *function name* layer; Phase 1.3 routed array-literal shape; | ||
| //! Phase 1.4 routed identifier quoting. The remaining work — aggregate | ||
| //! registry routing and CAST type names — fits the same shape. |
Comment on lines
+302
to
+304
| assert!( | ||
| sql.contains("AS `b.id`"), | ||
| "expected Spark backtick alias `AS `b.id``; got:\n{sql}" |
Five inline comments on PR #325, all fixed: 1. `databricks.rs` `quote_alias`: backticks inside `name` are now escaped by doubling (Spark convention). Aliases derived from raw return text can contain `` ` `` and prior code would emit unclosed quoted identifiers. 2. `clickhouse.rs` `quote_alias`: same fix for `"` inside `name` — doubled per CH's quoted-identifier escape rule. 3. Trait doc clarifies `quote_alias` is for both `AS` clauses AND alias references (GROUP BY / agg args after inner-query rewrite) — matches actual call sites. Notes that each impl escapes its own delimiter. 4. Aggregation alias rendering paths in `build_outer_aggregate_select` and `extract_outer_aggregation_info` are now covered by two new spike tests: one asserts Databricks emits backtick aliases, one asserts CH keeps double-quoted aliases. Guards against future regressions silently re-emerging on either side. 5. (Spike-test docs already accurately list both remaining gaps — aggregate registry routing AND CAST type names. The original PR description undercounted; will update PR body alongside this push.) Plus unit tests for both `quote_alias` impls covering the escape behavior directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
|
@copilot review — addressed all 5 inline comments in 8536b34. Summary in updated PR description. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FunctionMapper::quote_alias(&name)so the rendering layer emits CH'sAS "alias"vs Spark/Databricks'sAS `alias`automatically.to_sql_query.rsandplan_builder_helpers.rs.Verification
Four spike tests now cover both dialects across two plan shapes:
build_outer_aggregate_select/extract_outer_aggregation_info(new in this PR)Actual Databricks output for
MATCH (a:User)-[:FOLLOWS*1..3]->(b:User) RETURN b.id:Plus unit tests for
quote_aliascovering embedded-delimiter escaping (x`y→`xy`` for Spark;x"y→"x""y"` for CH).Remaining gaps
Updated
databricks_emit_spike_tests.rsmodule docs. Two syntactic gaps remain:collect()hardcoded togroupArrayviaFunctionMapping.clickhouse_name. Phase 1.5.UInt32,Float64, etc., in various rendering paths. Not exercised by the spike yet.Review fixes (commit 8536b34)
Addressed 5 inline comments from Copilot:
databricks.rsquote_alias: escape embedded`by doublingclickhouse.rsquote_alias: escape embedded"by doublingquote_aliascovers AS clauses AND referencesTest plan
cargo buildcleancargo clippy --all-targetscleancargo fmt --allappliedquote_aliasimpls have escape-behavior unit tests🤖 Generated with Claude Code