fix(storage): flush all column families in RocksDbStorage::flush()#755
fix(storage): flush all column families in RocksDbStorage::flush()#755pauldelucia wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR fixes multi-column-family flush behavior in RocksDbStorage. The flush() method now calls flush_cfs_opt() to atomically persist all opened column families (default, aux, roots, meta) instead of only the default CF, ensuring consistency when atomic flush is enabled. A helper function and regression test are included. ChangesMulti-CF Flush Atomicity
🎯 2 (Simple) | ⏱️ ~10 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
storage/src/rocksdb_storage/storage.rs (1)
751-782: 💤 Low valueSolid regression test. Correctly reproduces the old default-only flush gap by asserting
rocksdb.num-entries-active-mem-table == 0across all four CFs afterflush().One optional hardening: the test asserts memtables are empty but doesn't assert the data is readable post-flush. Asserting
db.get_cf(...)returns the written values would also guard against a regression where a flush silently drops entries. Optional given the active-memtable check already proves the atomic flush ran.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@storage/src/rocksdb_storage/storage.rs` around lines 751 - 782, Add a post-flush readback check in flush_persists_all_column_families so the test verifies not only that rocksdb.num-entries-active-mem-table is 0 for each CF, but also that db.get_cf on cf_default, cf_aux, cf_roots, and cf_meta returns the values written before storage.flush(). This hardens the test against a regression where flush empties memtables but data is not actually persisted or becomes unreadable; use the existing helper CF accessors and TempStorage::flush to locate the spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@storage/src/rocksdb_storage/storage.rs`:
- Around line 751-782: Add a post-flush readback check in
flush_persists_all_column_families so the test verifies not only that
rocksdb.num-entries-active-mem-table is 0 for each CF, but also that db.get_cf
on cf_default, cf_aux, cf_roots, and cf_meta returns the values written before
storage.flush(). This hardens the test against a regression where flush empties
memtables but data is not actually persisted or becomes unreadable; use the
existing helper CF accessors and TempStorage::flush to locate the spot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1bc53e4d-be9a-486c-bb38-c4aaaf368c91
📒 Files selected for processing (1)
storage/src/rocksdb_storage/storage.rs
a1592e7 to
71734ac
Compare
RocksDbStorage opens four column families (default, aux, roots, meta) with set_atomic_flush(true), but flush() called db.flush(), which only flushes the default column family. The roots/meta/aux memtables were left unpersisted, so a caller relying on flush() for durability before a non-clean shutdown -- a SIGTERM/SIGKILL where the Drop-time close never runs -- silently lost recently committed data. Flush all four column families together via flush_cfs_opt, consistent with set_atomic_flush(true). Adds a regression test (flush_persists_all_column_families) asserting every CF's active memtable is empty after flush(); it fails on the old default-only flush() and passes with this change.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #755 +/- ##
========================================
Coverage 91.44% 91.44%
========================================
Files 237 237
Lines 67298 67333 +35
========================================
+ Hits 61540 61575 +35
Misses 5758 5758
🚀 New features to boost your workflow:
|
71734ac to
49ff924
Compare
Problem
RocksDbStorage::flush()flushes only thedefaultcolumn family:But the storage spreads its data across four CFs —
default,aux,roots,meta— opened withset_atomic_flush(true). Soflush()leavesaux/roots/metasitting in volatile memtables, and any caller that relies on it for durability before a non-clean shutdown (SIGTERM/SIGKILL, where theDrop-time close never runs) silently loses recently-committed data.Under
atomic_flush, RocksDB expects every CF to be flushed together anyway — so today's single-CF flush is effectively a no-op for durability.Fix
Flush all four column families atomically via
flush_cfs_opt, consistent withset_atomic_flush(true).Test
Adds
flush_persists_all_column_families: writes into all four CFs, callsflush(), and asserts each CF's active memtable is empty (data moved to SST). It fails on the oldflush()(column family 'aux' was not flushed: 1 entries still in the memtable) and passes with this change.Why this stayed latent
The single-CF flush has been here since #13 (2022), so it's worth noting why it hasn't bitten before.
flush()only matters for durability when it's the last write before an unclean stop. Drive — grovedb's main consumer — calls it fromDrop, where RocksDB's own teardown persists every CF anyway, and an unclean kill is backstopped by Tenderdash replaying the missing blocks. The consumer that surfaced this (Willow indexer node) is the first to lean onflush()as a real crash barrier — called onSIGTERM, with no consensus layer behind it to regenerate lost state — which is the one condition under which the partial flush actually drops committed data.Summary by CodeRabbit