Skip to content

fix(storage): flush all column families in RocksDbStorage::flush()#755

Open
pauldelucia wants to merge 1 commit into
developfrom
fix/flush-all-column-families
Open

fix(storage): flush all column families in RocksDbStorage::flush()#755
pauldelucia wants to merge 1 commit into
developfrom
fix/flush-all-column-families

Conversation

@pauldelucia

@pauldelucia pauldelucia commented Jun 3, 2026

Copy link
Copy Markdown
Member

Problem

RocksDbStorage::flush() flushes only the default column family:

fn flush(&self) -> Result<(), Error> {
    self.db.flush().map_err(RocksDBError)
}

But the storage spreads its data across four CFs — default, aux, roots, meta — opened with set_atomic_flush(true). So flush() leaves aux/roots/meta sitting in volatile memtables, and any caller that relies on it for durability before a non-clean shutdown (SIGTERM/SIGKILL, where the Drop-time close never runs) silently loses recently-committed data.

Under atomic_flush, RocksDB expects every CF to be flushed together anyway — so today's single-CF flush is effectively a no-op for durability.

Fix

Flush all four column families atomically via flush_cfs_opt, consistent with set_atomic_flush(true).

Test

Adds flush_persists_all_column_families: writes into all four CFs, calls flush(), and asserts each CF's active memtable is empty (data moved to SST). It fails on the old flush() (column family 'aux' was not flushed: 1 entries still in the memtable) and passes with this change.

cargo test -p grovedb-storage --features rocksdb_storage flush_persists_all_column_families

Why this stayed latent

The single-CF flush has been here since #13 (2022), so it's worth noting why it hasn't bitten before. flush() only matters for durability when it's the last write before an unclean stop. Drive — grovedb's main consumer — calls it from Drop, where RocksDB's own teardown persists every CF anyway, and an unclean kill is backstopped by Tenderdash replaying the missing blocks. The consumer that surfaced this (Willow indexer node) is the first to lean on flush() as a real crash barrier — called on SIGTERM, with no consensus layer behind it to regenerate lost state — which is the one condition under which the partial flush actually drops committed data.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed a storage persistence issue where certain data types could remain in volatile memory instead of being properly written to disk during flush operations, improving overall data reliability.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR fixes multi-column-family flush behavior in RocksDbStorage. The flush() method now calls flush_cfs_opt() to atomically persist all opened column families (default, aux, roots, meta) instead of only the default CF, ensuring consistency when atomic flush is enabled. A helper function and regression test are included.

Changes

Multi-CF Flush Atomicity

Layer / File(s) Summary
Multi-CF flush implementation and helper
storage/src/rocksdb_storage/storage.rs
FlushOptions is imported from RocksDB. RocksDbStorage::flush() is refactored from db.flush() to db.flush_cfs_opt(...) to atomically flush the default, aux, roots, and meta column families together. A new cf_default() helper retrieves the default column family handle.
Flush atomicity regression test
storage/src/rocksdb_storage/storage.rs
Test flush_persists_all_column_families populates all column families, invokes storage.flush(), and verifies that each CF's active memtable entry count (rocksdb.num-entries-active-mem-table) is zero.

🎯 2 (Simple) | ⏱️ ~10 minutes

A rabbit hops through RocksDB's CF lanes,
Flushing all memtables, no more stray planes,
Atomicity reigns where columns once strayed,
Now aux and roots join the persisted parade! 🐰💾

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: fixing the flush() implementation to flush all column families instead of just the default one, which directly addresses the core bug in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/flush-all-column-families

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
storage/src/rocksdb_storage/storage.rs (1)

751-782: 💤 Low value

Solid regression test. Correctly reproduces the old default-only flush gap by asserting rocksdb.num-entries-active-mem-table == 0 across all four CFs after flush().

One optional hardening: the test asserts memtables are empty but doesn't assert the data is readable post-flush. Asserting db.get_cf(...) returns the written values would also guard against a regression where a flush silently drops entries. Optional given the active-memtable check already proves the atomic flush ran.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/src/rocksdb_storage/storage.rs` around lines 751 - 782, Add a
post-flush readback check in flush_persists_all_column_families so the test
verifies not only that rocksdb.num-entries-active-mem-table is 0 for each CF,
but also that db.get_cf on cf_default, cf_aux, cf_roots, and cf_meta returns the
values written before storage.flush(). This hardens the test against a
regression where flush empties memtables but data is not actually persisted or
becomes unreadable; use the existing helper CF accessors and TempStorage::flush
to locate the spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@storage/src/rocksdb_storage/storage.rs`:
- Around line 751-782: Add a post-flush readback check in
flush_persists_all_column_families so the test verifies not only that
rocksdb.num-entries-active-mem-table is 0 for each CF, but also that db.get_cf
on cf_default, cf_aux, cf_roots, and cf_meta returns the values written before
storage.flush(). This hardens the test against a regression where flush empties
memtables but data is not actually persisted or becomes unreadable; use the
existing helper CF accessors and TempStorage::flush to locate the spot.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1bc53e4d-be9a-486c-bb38-c4aaaf368c91

📥 Commits

Reviewing files that changed from the base of the PR and between a18f792 and a1592e7.

📒 Files selected for processing (1)
  • storage/src/rocksdb_storage/storage.rs

@pauldelucia pauldelucia force-pushed the fix/flush-all-column-families branch from a1592e7 to 71734ac Compare June 3, 2026 17:17
RocksDbStorage opens four column families (default, aux, roots, meta) with set_atomic_flush(true), but flush() called db.flush(), which only flushes the default column family. The roots/meta/aux memtables were left unpersisted, so a caller relying on flush() for durability before a non-clean shutdown -- a SIGTERM/SIGKILL where the Drop-time close never runs -- silently lost recently committed data.

Flush all four column families together via flush_cfs_opt, consistent with set_atomic_flush(true).

Adds a regression test (flush_persists_all_column_families) asserting every CF's active memtable is empty after flush(); it fails on the old default-only flush() and passes with this change.
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.44%. Comparing base (a18f792) to head (49ff924).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #755   +/-   ##
========================================
  Coverage    91.44%   91.44%           
========================================
  Files          237      237           
  Lines        67298    67333   +35     
========================================
+ Hits         61540    61575   +35     
  Misses        5758     5758           
Components Coverage Δ
grovedb-core 88.97% <ø> (ø)
merk 92.26% <ø> (ø)
storage 86.52% <100.00%> (+0.31%) ⬆️
commitment-tree 96.03% <ø> (ø)
mmr 96.79% <ø> (ø)
bulk-append-tree 89.82% <ø> (ø)
element 97.38% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pauldelucia pauldelucia force-pushed the fix/flush-all-column-families branch from 71734ac to 49ff924 Compare June 3, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant