Skip to content

GH-49985: Duplicate function aliases with same parameters#49986

Open
lriggs wants to merge 10 commits into
apache:mainfrom
lriggs:dualAliasFix
Open

GH-49985: Duplicate function aliases with same parameters#49986
lriggs wants to merge 10 commits into
apache:mainfrom
lriggs:dualAliasFix

Conversation

@lriggs
Copy link
Copy Markdown
Contributor

@lriggs lriggs commented May 18, 2026

Rationale for this change

Silently allowing duplicate functions that only differ by return type provides a confusing function registry. Callers do not typically expect the return type to be part of determining which function to call leading to ambiguity.

It is possible to register different Gandiva functions with the same alias and parameters but different return types, resulting in confusing function overloads.

For example, DATE_EXTRACTION_TRUNCATION_FNS in [cpp/src/gandiva/function_registry_datetime.cc] was invoked twice with the same SQL alias lists — once for extract* (returns int64) and once for date_trunc_* (returns the input date/timestamp type):


DATE_EXTRACTION_TRUNCATION_FNS(EXTRACT_SAFE_NULL_IF_NULL, extract)
DATE_EXTRACTION_TRUNCATION_FNS(TRUNCATE_SAFE_NULL_IF_NULL, date_trunc_)

As a result the registry contained four entries for day(...) where there should have been two:

int64 day(timestamp)   → extractDay_timestamp
int64 day(date)        → extractDay_date64
timestamp day(timestamp) → date_trunc_Day_timestamp
date day(date)         → date_trunc_Day_date64

The same problem existed for every calendar-unit alias: year, month, quarter, week, weekofyear, yearweek, dayofmonth, hour, minute, second. Resolution behavior depended on the caller's inferred return type, which is not the SQL semantics anyone expects from day(timestamp_col).

FunctionRegistry::Add was silently allowing these registrations: unordered_map::emplace keeps the first entry and discards subsequent ones with no warning.

What changes are included in this PR?

Diagnostics

Added two ARROW_LOG(ERROR) checks inside FunctionRegistry::Add:
Duplicate-signature check — when pc_registry_map_.emplace reports the entry already existed (same name + params + return type), log the conflict including both pc_names.
Alias-collision check — a new call_shape_map_ keyed on (lower(name), param_type_ids) (return type excluded) detects the case where the same name(args) could resolve to two functions with different return types. This is the check that catches the day family.
These run at registry-construction time, so future regressions surface in test output rather than being silently shadowed.

Macro split

DATE_EXTRACTION_TRUNCATION_FNS is split into two macros:

DATE_EXTRACTION_FNS — keeps the existing SQL alias lists ({"year"}, {"day", "dayofmonth"}, etc.).
DATE_TRUNCATION_FNS — every alias list is {}. The truncate functions are still reachable through their date_trunc_* base names.

Strengthened registry test

Added a third loop to TestNoDuplicates that mirrors the runtime alias-collision check: keys each registered signature on (lower(name), param_types) only and fails if two signatures share that key but disagree on return type. The two pre-existing loops still cover their original cases (same pc_name + params + ret, and same full signature appearing twice).

Cleanup

Removed the standalone NativeFunction("add", ..., date64+int64 → timestamp, "add_date64_int64"). DATE_ADD_FNS(add, {}) already provides the entry with the correct → date64 return type matching the precompiled symbol.

Dropped the {"convert_fromutf8"} and {"convert_replaceutf8"} aliases; the base names already match case-insensitively.

Are these changes tested?

Registry construction is now silent — no duplicate or alias-collision logs.
TestFunctionRegistry — 6/6 pass, including the strengthened TestNoDuplicates.
ctest -R gandiva — 4/4 binaries pass (gandiva-internals-test, gandiva-precompiled-test, gandiva-projector-test, gandiva-projector-test-static).

Are there any user-facing changes?

The Gandiva function list is different, with some aliases no longer appearing. Functionality is still available.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49985 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant