Skip to content

feat(dialect): route alias quoting through FunctionMapper (Phase 1.4)#325

Merged
genezhang merged 2 commits into
mainfrom
feature/phase-1.4-alias-quoting
May 16, 2026
Merged

feat(dialect): route alias quoting through FunctionMapper (Phase 1.4)#325
genezhang merged 2 commits into
mainfrom
feature/phase-1.4-alias-quoting

Conversation

@genezhang

@genezhang genezhang commented May 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds FunctionMapper::quote_alias(&name) so the rendering layer emits CH's AS "alias" vs Spark/Databricks's AS `alias` automatically.
  • Routes ~8 hardcoded sites in to_sql_query.rs and plan_builder_helpers.rs.
  • Third of four syntactic-layer gaps identified in PR spike(sql-gen): DatabricksEmitter wired end-to-end [Phase 1.2] #323's spike-test docs (after Phase 1.3's array literals).

Verification

Four spike tests now cover both dialects across two plan shapes:

  • VLP final-SELECT alias path (existing)
  • Aggregation alias references in build_outer_aggregate_select / extract_outer_aggregation_info (new in this PR)

Actual Databricks output for MATCH (a:User)-[:FOLLOWS*1..3]->(b:User) RETURN b.id:

SELECT
      t.end_id AS `b.id`
FROM vlp_a_b AS t

Plus unit tests for quote_alias covering embedded-delimiter escaping (x`y `xy`` for Spark;x"y"x""y"` for CH).

Remaining gaps

Updated databricks_emit_spike_tests.rs module docs. Two syntactic gaps remain:

  • Aggregate function_registrycollect() hardcoded to groupArray via FunctionMapping.clickhouse_name. Phase 1.5.
  • Type names in non-routed CASTsUInt32, Float64, etc., in various rendering paths. Not exercised by the spike yet.

Review fixes (commit 8536b34)

Addressed 5 inline comments from Copilot:

  1. databricks.rs quote_alias: escape embedded ` by doubling
  2. clickhouse.rs quote_alias: escape embedded " by doubling
  3. Trait doc clarifies quote_alias covers AS clauses AND references
  4. Aggregation alias path now covered by dedicated tests
  5. Updated PR description to match spike-test docs (both remaining gaps)

Test plan

  • cargo build clean
  • cargo clippy --all-targets clean
  • cargo fmt --all applied
  • All 1358 lib tests pass (5 new ones added)
  • Both quote_alias impls have escape-behavior unit tests

🤖 Generated with Claude Code

Adds `FunctionMapper::quote_alias(&name)` so the rendering layer
emits CH's `AS "alias"` (historical double-quote form) vs Spark's
`AS \`alias\`` automatically based on the active task-local dialect.
Spark parses `"name"` as a string literal, so backticks are mandatory
there. CH accepts both — kept double-quote form to minimize diff against
existing CH SQL output.

Routed sites (~8 across 2 files):
- `to_sql_query.rs`: 4 sites — AS-clause emission in SELECT items,
  agg-arg columns, agg-with-bare-alias fallback, TableAlias/UNWIND path
- `plan_builder_helpers.rs`: 3 sites — ColumnAlias references in outer
  SELECT projection, aggregate args, and GROUP BY when rewriting
  inner-aggregation queries

Spike test now asserts both:
- CH: `AS "b.id"` (and NO backticks in AS position)
- Databricks: `` AS `b.id` `` (and NO double quotes)

Verified end-to-end Databricks output for the VLP test query now
matches Spark identifier-quoting rules.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 16, 2026 00:47

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances dialect-aware SQL rendering by routing column alias quoting through FunctionMapper, preserving ClickHouse double-quoted aliases while emitting Spark/Databricks backtick aliases.

Changes:

  • Adds FunctionMapper::quote_alias with ClickHouse and Databricks implementations.
  • Replaces several hardcoded AS "alias" render sites with dialect-aware alias quoting.
  • Extends Databricks spike tests and module notes for Phase 1.4 alias quoting.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/sql_generator/function_mapper/mod.rs Adds the alias quoting method to the mapper trait.
src/sql_generator/function_mapper/databricks.rs Implements Databricks backtick alias quoting and tests it.
src/sql_generator/function_mapper/clickhouse.rs Implements ClickHouse double-quote alias quoting.
src/sql_generator/emitters/clickhouse/to_sql_query.rs Routes SELECT alias rendering through the current function mapper.
src/render_plan/plan_builder_helpers.rs Routes aggregate/union alias reference rendering through the mapper.
src/render_plan/tests/databricks_emit_spike_tests.rs Updates spike documentation and assertions for dialect-specific alias quoting.

// Spark parses `"foo"` as a string literal — backticks are the
// only valid identifier quote. `quote_identifier` uses backticks
// for both dialects so this stays consistent with that.
format!("`{name}`")
}

fn quote_alias(&self, name: &str) -> String {
format!("\"{name}\"")
Comment on lines +89 to +95
/// Quote a column alias for an `AS` clause. CH: `"name"` (also
/// accepts backticks but the existing pipeline emits double quotes
/// here historically). Spark: `` `name` `` — Spark parses `"name"`
/// as a string literal, so backticks are mandatory. The bare
/// `quote_identifier` helper in `common.rs` is a separate concern
/// (it already uses backticks for both dialects since both accept
/// them for column refs).
Comment on lines +49 to +51
//! the *function name* layer; Phase 1.3 routed array-literal shape;
//! Phase 1.4 routed identifier quoting. The remaining work — aggregate
//! registry routing and CAST type names — fits the same shape.
Comment on lines +302 to +304
assert!(
sql.contains("AS `b.id`"),
"expected Spark backtick alias `AS `b.id``; got:\n{sql}"
Five inline comments on PR #325, all fixed:

1. `databricks.rs` `quote_alias`: backticks inside `name` are now
   escaped by doubling (Spark convention). Aliases derived from raw
   return text can contain `` ` `` and prior code would emit unclosed
   quoted identifiers.

2. `clickhouse.rs` `quote_alias`: same fix for `"` inside `name` —
   doubled per CH's quoted-identifier escape rule.

3. Trait doc clarifies `quote_alias` is for both `AS` clauses AND
   alias references (GROUP BY / agg args after inner-query rewrite)
   — matches actual call sites. Notes that each impl escapes its
   own delimiter.

4. Aggregation alias rendering paths in `build_outer_aggregate_select`
   and `extract_outer_aggregation_info` are now covered by two new
   spike tests: one asserts Databricks emits backtick aliases, one
   asserts CH keeps double-quoted aliases. Guards against future
   regressions silently re-emerging on either side.

5. (Spike-test docs already accurately list both remaining gaps —
   aggregate registry routing AND CAST type names. The original PR
   description undercounted; will update PR body alongside this push.)

Plus unit tests for both `quote_alias` impls covering the escape
behavior directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@genezhang

Copy link
Copy Markdown
Owner Author

@copilot review — addressed all 5 inline comments in 8536b34. Summary in updated PR description.

@genezhang genezhang merged commit f89aadf into main May 16, 2026
5 checks passed
@genezhang genezhang deleted the feature/phase-1.4-alias-quoting branch May 16, 2026 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants