Skip to content

[fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns#63118

Open
wuguowei1994 wants to merge 1 commit intoapache:masterfrom
wuguowei1994:fix-variant-inverted-index-cast
Open

[fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns#63118
wuguowei1994 wants to merge 1 commit intoapache:masterfrom
wuguowei1994:fix-variant-inverted-index-cast

Conversation

@wuguowei1994
Copy link
Copy Markdown

Summary

On the current master branch, inverted index predicate pushdown does not work correctly when querying VARIANT fields with explicit CAST.

This is a serious issue because it is not limited to a single target type. In our testing, predicates in the form of CAST(variant_field["key"] AS <type>) = ... fail to leverage the inverted index properly across casted VARIANT access patterns.

This is especially problematic because the recommended usage for VARIANT fields is to explicitly use CAST when extracting typed values. In our internal production workloads, VARIANT is heavily used, and all business teams are required to query VARIANT subfields through explicit CAST. As a result, these queries cannot benefit from inverted index filtering and end up scanning significantly more rows than expected, causing severe performance degradation.

For production workloads with large VARIANT columns, this effectively makes the inverted index unusable for the officially recommended query pattern, which has a major impact on query latency and resource consumption.


Reproduction

DROP TABLE IF EXISTS variant_inverted_intkey_test;

CREATE TABLE variant_inverted_intkey_test (
    row_id BIGINT,
    v VARIANT,
    INDEX idx_v(v) USING INVERTED
)
ENGINE=OLAP
DUPLICATE KEY(row_id)
DISTRIBUTED BY HASH(row_id) BUCKETS 1
PROPERTIES (
    "replication_num" = "1",
    "disable_auto_compaction" = "true",
    "inverted_index_storage_format" = "v2"
);

INSERT INTO variant_inverted_intkey_test VALUES
(1,  '{"int_key": 1}'),
(2,  '{"int_key": 2}'),
(3,  '{"int_key": 3}'),
(4,  '{"int_key": 4}'),
(5,  '{"int_key": 5}'),
(6,  '{"int_key": 6}'),
(7,  '{"int_key": 7}'),
(8,  '{"int_key": 8}'),
(9,  '{"int_key": 9}'),
(10, '{"int_key": 10}'),
(11, '{"int_key": 11}'),
(12, '{"int_key": 12}'),
(13, '{"int_key": 13}'),
(14, '{"int_key": 14}'),
(15, '{"int_key": 15}'),
(16, '{"int_key": 16}'),
(17, '{"int_key": 17}'),
(18, '{"int_key": 18}'),
(19, '{"int_key": 19}'),
(20, '{"int_key": 20}');

SELECT row_id, CAST(v["int_key"] AS INT) AS int_key
FROM variant_inverted_intkey_test
WHERE CAST(v["int_key"] AS INT) = 13;

Expected Behavior

The predicate:

CAST(v["int_key"] AS INT) = 13

should be pushed down to the inverted index on the VARIANT column, and the query should use the inverted index to filter rows before data scanning.

Only the matching row should need to be read after index filtering.


Actual Behavior

The query result is correct, but the query profile shows that the inverted index does not effectively filter the data.

Instead of being pruned by the inverted index, all 20 rows are still read/scanned. This indicates that the predicate involving CAST on the VARIANT subfield is not correctly handled by inverted index predicate pushdown.

Please check the query profile after running the reproduction SQL. The key point is that the inverted index does not successfully reduce the scanned rows for the casted VARIANT predicate.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wuguowei1994 wuguowei1994 changed the title [fix](variant) VARIANT Inverted Index Predicate Pushdown Bug [fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns May 10, 2026
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from e75111a to 904d4c0 Compare May 10, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants