[fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns#63118
Open
wuguowei1994 wants to merge 1 commit intoapache:masterfrom
Open
[fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns#63118wuguowei1994 wants to merge 1 commit intoapache:masterfrom
wuguowei1994 wants to merge 1 commit intoapache:masterfrom
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
…ariant subcolumns
e75111a to
904d4c0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On the current master branch, inverted index predicate pushdown does not work correctly when querying
VARIANTfields with explicitCAST.This is a serious issue because it is not limited to a single target type. In our testing, predicates in the form of
CAST(variant_field["key"] AS <type>) = ...fail to leverage the inverted index properly across castedVARIANTaccess patterns.This is especially problematic because the recommended usage for
VARIANTfields is to explicitly useCASTwhen extracting typed values. In our internal production workloads,VARIANTis heavily used, and all business teams are required to queryVARIANTsubfields through explicitCAST. As a result, these queries cannot benefit from inverted index filtering and end up scanning significantly more rows than expected, causing severe performance degradation.For production workloads with large
VARIANTcolumns, this effectively makes the inverted index unusable for the officially recommended query pattern, which has a major impact on query latency and resource consumption.Reproduction
Expected Behavior
The predicate:
should be pushed down to the inverted index on the
VARIANTcolumn, and the query should use the inverted index to filter rows before data scanning.Only the matching row should need to be read after index filtering.
Actual Behavior
The query result is correct, but the query profile shows that the inverted index does not effectively filter the data.
Instead of being pruned by the inverted index, all 20 rows are still read/scanned. This indicates that the predicate involving
CASTon theVARIANTsubfield is not correctly handled by inverted index predicate pushdown.Please check the query profile after running the reproduction SQL. The key point is that the inverted index does not successfully reduce the scanned rows for the casted
VARIANTpredicate.