Skip to content

Support distinct-from predicates in Parquet pruning#22084

Open
Dandandan wants to merge 4 commits into
apache:mainfrom
Dandandan:parquet-distinct-pruning
Open

Support distinct-from predicates in Parquet pruning#22084
Dandandan wants to merge 4 commits into
apache:mainfrom
Dandandan:parquet-distinct-pruning

Conversation

@Dandandan
Copy link
Copy Markdown
Contributor

@Dandandan Dandandan commented May 9, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

Parquet statistics pruning did not rewrite IS DISTINCT FROM or IS NOT DISTINCT FROM, so row groups that could be proven irrelevant from min/max and null-count statistics were still kept.

What changes are included in this PR?

  • Adds null-aware pruning rewrites for IS DISTINCT FROM and IS NOT DISTINCT FROM.
  • Treats distinct-from operators as symmetric when normalizing scalar-left predicates.
  • Refactors shared min/max and null-count pruning expression builders.
  • Adds unit tests for pruning predicate evaluation and Parquet row-group regression coverage.

Are these changes tested?

Are there any user-facing changes?

No API changes. Queries using IS DISTINCT FROM and IS NOT DISTINCT FROM can now benefit from Parquet statistics pruning.

@github-actions github-actions Bot added logical-expr Logical plan and expressions core Core DataFusion crate labels May 9, 2026
@Dandandan Dandandan marked this pull request as ready for review May 9, 2026 12:13
@Dandandan Dandandan added this pull request to the merge queue May 11, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 11, 2026
@Dandandan Dandandan added this pull request to the merge queue May 11, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants