feat: snapshot copy of clarity side tables#7307
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for snapshotting the Clarity MARF’s side-storage into the offline “squashed” snapshot flow, so squashed Clarity DBs remain readable via the normal runtime path (MarfedKV) while only retaining side-table rows still reachable from the squashed trie.
Changes:
- Add
copy_clarity_side_tables()to schema-clone and copydata_table(leaf value-hash referenced) andmetadata_table(contracts still committed in trie) into the squashed Clarity MARF DB. - Add
SqliteConnection::{make_metadata_key, parse_metadata_key}helpers to centralize theclr-meta::...metadata key format handling. - Add Clarity snapshot tests covering pruning,
::in metadata keys, exclusion of uncommitted contracts, mismatched source DB behavior, and a schema drift guard.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| stackslib/src/clarity_vm/database/marf.rs | Minor import cleanup (rusqlite::{self, ...} → rusqlite::...). |
| stackslib/src/chainstate/stacks/db/snapshot/tests/mod.rs | Wires in new Clarity snapshot test module. |
| stackslib/src/chainstate/stacks/db/snapshot/tests/clarity.rs | Adds end-to-end tests validating copied Clarity side tables work via MarfedKV and guard against drift/mismatch. |
| stackslib/src/chainstate/stacks/db/snapshot/mod.rs | Adds clarity snapshot module and re-exports copy API + stats. |
| stackslib/src/chainstate/stacks/db/snapshot/clarity.rs | Implements side-table copy logic for Clarity (data_table, metadata_table) based on squashed trie reachability. |
| clarity/src/vm/database/sqlite.rs | Adds metadata key build/parse helpers and routes metadata operations through them. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Coverage Report for CI Build 27406271778Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+0.2%) to 85.937%Details
Uncovered Changes
Coverage Regressions7630 previously-covered lines in 122 files lost coverage.
Coverage Stats
💛 - Coveralls |
… helpers, open contract probes read-only
cylewitruk-stacks
left a comment
There was a problem hiding this comment.
Still looking through the new files, but dropping the comments I have so far ^^
| let mut stmt = conn.prepare("SELECT key, blockhash, value FROM metadata_table")?; | ||
| let mut rows = stmt.query(NO_PARAMS)?; | ||
| while let Some(row) = rows.next()? { | ||
| let key: String = row.get(0)?; |
There was a problem hiding this comment.
For all of these visitor methods, since you're passing references to the visitor closures anyway, I believe you should be able to use get_ref(n)?.as_str()? to get a reference into SQLite-owned memory instead of forcing an allocation/copy into String; then you can clone or do whatever at the callsite if it needs ownership there.
| use crate::util_lib::db::sqlite_open; | ||
|
|
||
| /// Clarity side-storage tables copied by [`copy_clarity_side_tables`]. | ||
| pub(super) const CLARITY_SIDE_TABLES: &[&str] = &["data_table", "metadata_table"]; |
There was a problem hiding this comment.
I'd probably move this into clarity's sqlite.rs as well, or expose constants there like DATA_TABLE_NAME and METADATA_TABLE_NAME and compose them into the array slice here -- just to keep the table names with the owning code.
| @@ -130,6 +130,20 @@ pub fn sqlite_get_metadata_manual( | |||
| } | |||
|
|
|||
| impl SqliteConnection { | |||
There was a problem hiding this comment.
Note for future (not specific to this PR): since this is used externally to the clarity crate, it'd probably be prudent to rename the type to ClaritySqliteConnection or similar -- or move SqliteConnection to a common place and implement these functions as a trait.
| value: &str, | ||
| ) -> Result<(), VmExecutionError> { | ||
| let key = format!("clr-meta::{contract_hash}::{key}"); | ||
| let key = Self::make_metadata_key(contract_hash, key); |
There was a problem hiding this comment.
This isn't actually a hash, so maybe we should rename this parameter while we're here?
| pub fn insert_metadata_row( | ||
| conn: &Connection, | ||
| key: &str, | ||
| blockhash: &str, |
There was a problem hiding this comment.
This screams "footgun" at me. Yes, the database column is called blockhash, but it's actually a block id. In the other places, the correctness is ensured because it's typed as StacksBlockId (as opposed to BlockHeaderHash), but here it's just a string. Maybe we should at least rename the parameter?
| /// Visit every `metadata_table` row on `conn` as `(key, blockhash, value)`. | ||
| pub fn visit_metadata_rows<F>(conn: &Connection, mut visit: F) -> Result<(), rusqlite::Error> | ||
| where | ||
| F: FnMut(&str, &str, &str) -> Result<(), rusqlite::Error>, |
There was a problem hiding this comment.
Nit: I'm not a fan of functions that just take a bunch of string arguments, because the type system isn't going to stop you if mess up the order.
Right now this problem is somewhat limited, but it still feels dangerous.
(Similar for the other functions here)
| key: &str, | ||
| ) -> Result<Option<String>, VmExecutionError> { | ||
| let key = format!("clr-meta::{contract_hash}::{key}"); | ||
| let key = Self::make_metadata_key(contract_hash, key); |
There was a problem hiding this comment.
same point as above re: this not being a hash
| } | ||
|
|
||
| /// The distinct contract ids appearing in `metadata_table` keys on `conn`. | ||
| /// Scanned in key order so the result is deterministic across runs. |
There was a problem hiding this comment.
This guaranteed ordering is very implicit. In particular the function names visit_metadata_keys and scan_metadata_contract_ids don't actually make it obvious that it's guaranteed to be in order.
Maybe we should have a test that makes sure this invariant is uphelp?
| SqliteConnection::visit_metadata_rows(src_conn, |key, blockhash, value| { | ||
| scanned += 1; | ||
| let Some((contract_id, _meta_key)) = SqliteConnection::parse_metadata_key(key) else { | ||
| return Ok(()); |
There was a problem hiding this comment.
Can this ever happen? As far as I can tell all keys in that table are clr-meta::, so if we find something else, that feels error-worthy? Especially because it indicates that there's some data in the source that we will not be copying.
| let Some((contract_id, _meta_key)) = SqliteConnection::parse_metadata_key(key) else { | ||
| return Ok(()); | ||
| }; | ||
| if !required.contains(contract_id) { |
There was a problem hiding this comment.
Pardon the ignorance -- why would this ever happen?
Description
Follow-up to #7254: adds the Clarity MARF's side storage to the offline snapshot copy.
copy_clarity_side_tablesclones thedata_table/metadata_tableschemas from the source and copies only what the squashed trie still references:data_tablerows by leaf value hash,metadata_tablerows by contracts still committed in the trie. Metadata key parsing moves behind newSqliteConnection::{make_metadata_key, parse_metadata_key}helpers so the clr-meta:: format stays owned by the Clarity store.Tests read the copied data back through MarfedKV (real runtime path), cover stale-value pruning, ::-containing metadata keys, exclusion of uncommitted contracts, and a schema drift guard.
Applicable issues
Additional info (benefits, drawbacks, caveats)
Checklist
docs/property-testing.md)changelog.d/README.md)rpc/openapi.yamlfor RPC endpoints,event-dispatcher.mdfor new events)clarity-benchmarkingrepo