Skip to content

VACCUM Event Driven Based Cache Invalidation #69

Description

@peterxcli

For row-group merge, targeted row-group eviction is fundamentally insufficient. QCC
lookup assumes row-group identity is row_id / DEFAULT_ROW_GROUP_SIZE in src/
query_condition_cache_filter.cpp:57. But DuckDB checkpoint vacuum can drop empty row
groups and explicitly mark that later row ids shift in duckdb/src/storage/table/
row_group_collection.cpp:1354, can merge multiple old row groups into fewer new ones in
duckdb/src/storage/table/row_group_collection.cpp:1465 and duckdb/src/storage/table/
row_group_collection.cpp:1264, then swaps in a new row-group tree in duckdb/src/
storage/table/row_group_collection.cpp:1768. After that, “old row group N” is no longer
a stable thing you can surgically evict against the existing cache entry.

So the right model is: row-local DML uses row-group eviction; layout rewrites use
whole-table structural invalidation. In practice I would add a per-table layout_epoch
to QCC entries and bump it whenever DuckDB performs a row-id-changing rewrite. On
lookup, if the entry epoch mismatches the table epoch, drop/rebuild the entry. If you
want the minimal safe behavior, clear all QCC entries for that table whenever
checkpoint/vacuum reports row_ids_changed. QCC is effectively another rowid-dependent
secondary structure, so it needs the same protection boundary as indexes.

by doing this, the #68 issue can be resolved at the same time.


https://dl.acm.org/doi/epdf/10.1145/3626246.3653395

4.3 Inserts, Deletes, and UpdatesA key advantage of predicate caching is that it is online and easyto build. In particular, the cached entries remain valid even if thecustomer inserts new rows, deletes, or updates them. As a result,the predicate cache is more versatile than other caching approacheslike the result cache or AutoMV.

4.3.1 Inserts. SQL statements that add new data to tables (insertand copy) are the most common operations after select statementson Redshift clusters. Since Redshift is designed as an analyticalsystem for petabyte-scale workloads, adding a large amount of newdata is highly optimized: Redshift does not enforce constraints like primary keys and foreign keys, and no indexes like B-trees are builton the data. Inserting new tuples is fast, even if the data is sorted.If the customer specifies an order key, we add tuples to the end ofthe table in the insert buffer, which are then merged as part of avacuum process.The insert buffer for staging new tuples is perfect for the predi-cate cache. As new tuples are appended at the end, Redshift assignsthem higher row numbers. Hence, the row ranges of the cachedpredicates remain valid, and we do not have to invalidate them.Instead, we remember the row number of the last cached row inthe data slice, and newer tuples are scanned using the normal com-bination of range-restricted and vectorized scans. We can then addthe new row ranges to the predicate cache to keep it up-to-date.

4.3.2 Deletes. Like PostgreSQL, Redshift implements multi-versionconcurrency control based on timestamps. For every row, it main-tains a creation and a deletion timestamps that determine the vis-ibility of the row. Deletions set the second timestamp and markthe tuple as invisible for subsequent transactions. The predicatecache effortlessly handles deletions: even if a deleted row is part ofa cached row range, subsequent scans eliminate the row during thevisibility check. Redshift checks the visibility of a tuple during thevectorized scan and tests if the timestamp of the current transactionis between the creating and deletion timestamp of the row.Like inserts, the vacuum removes the row from the table once it isglobally invisible and rewrites the compressed blocks. Tuples are nolonger globally visible if no transaction older than the transactionthat deleted the tuple exists. As before, vacuum changes the rowranges in the predicate cache, and we invalidate the cache.

4.3.3 Updates. Like many OLAP systems with a columnar storagelayout, Redshift implements updates out-of-place. The old tupleremains in the relation and is only marked as deleted. The new(updated) version of the tuple is inserted into the relation. By doingso, Redshift implements updates as a delete followed by an insertoperation. The predicate caches, therefore, remains valid as noneof the two operations changes the row numbers of the tuples.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions