Skip to content

feat: snapshot copy of clarity side tables#7307

Open
francesco-stacks wants to merge 8 commits into
stacks-network:developfrom
francesco-stacks:feat/marf-snapshot-clarity-v2
Open

feat: snapshot copy of clarity side tables#7307
francesco-stacks wants to merge 8 commits into
stacks-network:developfrom
francesco-stacks:feat/marf-snapshot-clarity-v2

Conversation

@francesco-stacks

Copy link
Copy Markdown
Contributor

Description

Follow-up to #7254: adds the Clarity MARF's side storage to the offline snapshot copy. copy_clarity_side_tables clones the data_table/metadata_table schemas from the source and copies only what the squashed trie still references: data_table rows by leaf value hash, metadata_table rows by contracts still committed in the trie. Metadata key parsing moves behind new SqliteConnection::{make_metadata_key, parse_metadata_key} helpers so the clr-meta:: format stays owned by the Clarity store.

Tests read the copied data back through MarfedKV (real runtime path), cover stale-value pruning, ::-containing metadata keys, exclusion of uncommitted contracts, and a schema drift guard.

Applicable issues

  • fixes #

Additional info (benefits, drawbacks, caveats)

Checklist

  • Test coverage for new or modified code paths
  • For new Clarity features or consensus changes, add property tests (see docs/property-testing.md)
  • Changelog fragment(s) or "no changelog" label added (see changelog.d/README.md)
  • Required documentation changes (e.g., rpc/openapi.yaml for RPC endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for snapshotting the Clarity MARF’s side-storage into the offline “squashed” snapshot flow, so squashed Clarity DBs remain readable via the normal runtime path (MarfedKV) while only retaining side-table rows still reachable from the squashed trie.

Changes:

  • Add copy_clarity_side_tables() to schema-clone and copy data_table (leaf value-hash referenced) and metadata_table (contracts still committed in trie) into the squashed Clarity MARF DB.
  • Add SqliteConnection::{make_metadata_key, parse_metadata_key} helpers to centralize the clr-meta::... metadata key format handling.
  • Add Clarity snapshot tests covering pruning, :: in metadata keys, exclusion of uncommitted contracts, mismatched source DB behavior, and a schema drift guard.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
stackslib/src/clarity_vm/database/marf.rs Minor import cleanup (rusqlite::{self, ...}rusqlite::...).
stackslib/src/chainstate/stacks/db/snapshot/tests/mod.rs Wires in new Clarity snapshot test module.
stackslib/src/chainstate/stacks/db/snapshot/tests/clarity.rs Adds end-to-end tests validating copied Clarity side tables work via MarfedKV and guard against drift/mismatch.
stackslib/src/chainstate/stacks/db/snapshot/mod.rs Adds clarity snapshot module and re-exports copy API + stats.
stackslib/src/chainstate/stacks/db/snapshot/clarity.rs Implements side-table copy logic for Clarity (data_table, metadata_table) based on squashed trie reachability.
clarity/src/vm/database/sqlite.rs Adds metadata key build/parse helpers and routes metadata operations through them.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stackslib/src/chainstate/stacks/db/snapshot/clarity.rs Outdated
@coveralls

coveralls commented Jun 11, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 27406271778

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage increased (+0.2%) to 85.937%

Details

  • Coverage increased (+0.2%) from the base build.
  • Patch coverage: 5 uncovered changes across 1 file (156 of 161 lines covered, 96.89%).
  • 7630 coverage regressions across 122 files.

Uncovered Changes

File Changed Covered %
stackslib/src/chainstate/stacks/db/snapshot/clarity.rs 116 111 95.69%
Total (2 files) 161 156 96.89%

Coverage Regressions

7630 previously-covered lines in 122 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
stackslib/src/chainstate/nakamoto/mod.rs 403 84.62%
stackslib/src/config/mod.rs 376 68.96%
stackslib/src/net/mod.rs 310 78.12%
stackslib/src/chainstate/stacks/index/storage.rs 277 82.41%
clarity/src/vm/database/clarity_db.rs 268 82.11%
stackslib/src/chainstate/stacks/miner.rs 253 83.4%
stackslib/src/chainstate/stacks/db/transactions.rs 239 97.15%
stackslib/src/net/inv/epoch2x.rs 222 79.44%
stackslib/src/net/chat.rs 200 93.03%
stackslib/src/chainstate/stacks/db/mod.rs 198 86.23%

Coverage Stats

Coverage Status
Relevant Lines: 225273
Covered Lines: 193593
Line Coverage: 85.94%
Coverage Strength: 18730752.35 hits per line

💛 - Coveralls

Comment thread stackslib/src/chainstate/stacks/db/snapshot/clarity.rs Outdated
Comment thread clarity/src/vm/database/sqlite.rs Outdated

@cylewitruk-stacks cylewitruk-stacks left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking through the new files, but dropping the comments I have so far ^^

let mut stmt = conn.prepare("SELECT key, blockhash, value FROM metadata_table")?;
let mut rows = stmt.query(NO_PARAMS)?;
while let Some(row) = rows.next()? {
let key: String = row.get(0)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these visitor methods, since you're passing references to the visitor closures anyway, I believe you should be able to use get_ref(n)?.as_str()? to get a reference into SQLite-owned memory instead of forcing an allocation/copy into String; then you can clone or do whatever at the callsite if it needs ownership there.

use crate::util_lib::db::sqlite_open;

/// Clarity side-storage tables copied by [`copy_clarity_side_tables`].
pub(super) const CLARITY_SIDE_TABLES: &[&str] = &["data_table", "metadata_table"];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably move this into clarity's sqlite.rs as well, or expose constants there like DATA_TABLE_NAME and METADATA_TABLE_NAME and compose them into the array slice here -- just to keep the table names with the owning code.

@@ -130,6 +130,20 @@ pub fn sqlite_get_metadata_manual(
}

impl SqliteConnection {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future (not specific to this PR): since this is used externally to the clarity crate, it'd probably be prudent to rename the type to ClaritySqliteConnection or similar -- or move SqliteConnection to a common place and implement these functions as a trait.

value: &str,
) -> Result<(), VmExecutionError> {
let key = format!("clr-meta::{contract_hash}::{key}");
let key = Self::make_metadata_key(contract_hash, key);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually a hash, so maybe we should rename this parameter while we're here?

pub fn insert_metadata_row(
conn: &Connection,
key: &str,
blockhash: &str,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This screams "footgun" at me. Yes, the database column is called blockhash, but it's actually a block id. In the other places, the correctness is ensured because it's typed as StacksBlockId (as opposed to BlockHeaderHash), but here it's just a string. Maybe we should at least rename the parameter?

/// Visit every `metadata_table` row on `conn` as `(key, blockhash, value)`.
pub fn visit_metadata_rows<F>(conn: &Connection, mut visit: F) -> Result<(), rusqlite::Error>
where
F: FnMut(&str, &str, &str) -> Result<(), rusqlite::Error>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not a fan of functions that just take a bunch of string arguments, because the type system isn't going to stop you if mess up the order.

Right now this problem is somewhat limited, but it still feels dangerous.

(Similar for the other functions here)

key: &str,
) -> Result<Option<String>, VmExecutionError> {
let key = format!("clr-meta::{contract_hash}::{key}");
let key = Self::make_metadata_key(contract_hash, key);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same point as above re: this not being a hash

}

/// The distinct contract ids appearing in `metadata_table` keys on `conn`.
/// Scanned in key order so the result is deterministic across runs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guaranteed ordering is very implicit. In particular the function names visit_metadata_keys and scan_metadata_contract_ids don't actually make it obvious that it's guaranteed to be in order.

Maybe we should have a test that makes sure this invariant is uphelp?

SqliteConnection::visit_metadata_rows(src_conn, |key, blockhash, value| {
scanned += 1;
let Some((contract_id, _meta_key)) = SqliteConnection::parse_metadata_key(key) else {
return Ok(());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever happen? As far as I can tell all keys in that table are clr-meta::, so if we find something else, that feels error-worthy? Especially because it indicates that there's some data in the source that we will not be copying.

let Some((contract_id, _meta_key)) = SqliteConnection::parse_metadata_key(key) else {
return Ok(());
};
if !required.contains(contract_id) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon the ignorance -- why would this ever happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants