perf(expression): Eager full-base fill in evalWithMemo for cheap expressions by yingsu00 · Pull Request #2172 · IBM/velox

yingsu00 · 2026-06-23T03:40:40Z

Adds a fast path in Expr::evalWithMemo: on the second sighting of a dictionary base, when the expression is cheap to re-evaluate and non-throwing, fill every position of the base into dictionaryCache_ in one shot. Subsequent batches over the same base then hit the cache-covers-base bypass in peelEncodings and return the cached vector directly without per-row work.

The classification of "cheap" lives on Expr::isCheapToReevaluate():

Expr's base implementation (in Expr.cpp) returns true for function calls whose registered name is in the curated cheapFunctionNames() set. The set is conservative: only entries that are both cheap per-row AND non-throwing on plausible inputs are included. Casts, arithmetic with divide / mod, parsing functions, regex, json, and crypto are deliberately omitted. Date / time accessors, date / time arithmetic and formatting, simple string ops, and non-throwing math (NaN / Inf instead of exceptions) are included. This covers common expressions like date_format(...), date_trunc(...), substr(...), length(...) over dictionary-encoded inputs.
CastExpr overrides to return true for fast numeric upcasts, DATE -> TIMESTAMP, and DATE -> VARCHAR.

The eager-fill block also exposes a deselect-vs-full-reeval choice: the SelectivityVector deselect of already-cached positions is O(base / 64) and the resulting sparse toFill makes the subsequent evalWithNulls iterate set-bit-by-set-bit instead of running over a dense range. When only a minority of base positions are cached, the extra eval cost of re-running on cached positions is small compared to the deselect + sparse-iteration cost; full re-eval is faster. When the majority is cached, deselect saves enough work to be worth it. Threshold at 50% (cachedCount * 2 >= baseSize). The same deselect-or-not decision drives both toFill (the rows to evaluate) and writable (the positions ensureWritable must make mutable on dictionaryCache_).

Why now: a production query with
date_format(CAST(date_trunc(...)) AS timestamp), '%Y-%m-%d') on a hot column was showing ~2% of total process CPU in FlatVector::copy ->
acquireSharedStringBuffers -> addStringBuffer. The atomic refcount increment in intrusive_ptr::push_back is a full memory barrier; on a Buffer shared across drivers the cache line bounces and each increment stalls hundreds of cycles. The bypass in peelEncodings (separate commit) already sidesteps that entire chain on cache-hit batches - but the bypass only fires after eager-fill has populated the whole base. Without eager-fill for date_format, the cache filled only incrementally and the bypass never reached the "covers base" threshold for many production-sized bases.

1626/1626 velox_expression_test pass.

…essions Adds a fast path in Expr::evalWithMemo: on the second sighting of a dictionary base, when the expression is cheap to re-evaluate and non-throwing, fill every position of the base into dictionaryCache_ in one shot. Subsequent batches over the same base then hit the cache-covers-base bypass in peelEncodings and return the cached vector directly without per-row work. The classification of "cheap" lives on Expr::isCheapToReevaluate(): * Expr's base implementation (in Expr.cpp) returns true for function calls whose registered name is in the curated cheapFunctionNames() set. The set is conservative: only entries that are both cheap per-row AND non-throwing on plausible inputs are included. Casts, arithmetic with divide / mod, parsing functions, regex, json, and crypto are deliberately omitted. Date / time accessors, date / time arithmetic and formatting, simple string ops, and non-throwing math (NaN / Inf instead of exceptions) are included. This covers common expressions like date_format(...), date_trunc(...), substr(...), length(...) over dictionary-encoded inputs. * CastExpr overrides to return true for fast numeric upcasts, DATE -> TIMESTAMP, and DATE -> VARCHAR. The eager-fill block also exposes a deselect-vs-full-reeval choice: the SelectivityVector deselect of already-cached positions is O(base / 64) and the resulting sparse toFill makes the subsequent evalWithNulls iterate set-bit-by-set-bit instead of running over a dense range. When only a minority of base positions are cached, the extra eval cost of re-running on cached positions is small compared to the deselect + sparse-iteration cost; full re-eval is faster. When the majority is cached, deselect saves enough work to be worth it. Threshold at 50% (cachedCount * 2 >= baseSize). The same deselect-or-not decision drives both toFill (the rows to evaluate) and writable (the positions ensureWritable must make mutable on dictionaryCache_). Why now: a production query with `date_format(CAST(date_trunc(...)) AS timestamp), '%Y-%m-%d')` on a hot column was showing ~2% of total process CPU in FlatVector<StringView>::copy -> acquireSharedStringBuffers -> addStringBuffer. The atomic refcount increment in intrusive_ptr<Buffer>::push_back is a full memory barrier; on a Buffer shared across drivers the cache line bounces and each increment stalls hundreds of cycles. The bypass in peelEncodings (separate commit) already sidesteps that entire chain on cache-hit batches - but the bypass only fires after eager-fill has populated the whole base. Without eager-fill for date_format, the cache filled only incrementally and the bypass never reached the "covers base" threshold for many production-sized bases. 1626/1626 velox_expression_test pass.

yingsu00 requested review from rui-mo and xin-zhang2 June 23, 2026 03:40

yingsu00 self-assigned this Jun 23, 2026

yingsu00 added the bolt label Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(expression): Eager full-base fill in evalWithMemo for cheap expressions#2172

perf(expression): Eager full-base fill in evalWithMemo for cheap expressions#2172
yingsu00 wants to merge 1 commit into
IBM:boltfrom
yingsu00:cast-perf-03-memo-eager-fill

yingsu00 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yingsu00 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant