celo-reth import-celo-state storage_v2#186
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb0001be6b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| StaticFileSegment::AccountChangeSets, | ||
| StaticFileSegment::StorageChangeSets, | ||
| ] { | ||
| let mut writer = static_file_provider.latest_writer(segment)?; |
There was a problem hiding this comment.
Could we use writer.ensure_at_block(CEL2_MIGRATION_BLOCK_NUMBER) here instead of manually implementing the loop?
/// Ensures that the writer is positioned at the specified block number.
///
/// If the writer is positioned at a greater block number than the specified one, the writer
/// will NOT be unwound and the error will be returned.
pub fn ensure_at_block(&mut self, advance_to: BlockNumber) -> ProviderResult<()> {
let current_block = if let Some(current_block_number) = self.current_block_number() {
current_block_number
} else {
self.increment_block(0)?;
0
};
match current_block.cmp(&advance_to) {
Ordering::Less => {
for block in current_block + 1..=advance_to {
self.increment_block(block)?;
}
}
Ordering::Equal => {}
Ordering::Greater => {
return Err(ProviderError::UnexpectedStaticFileBlockNumber(
self.writer.user_header().segment(),
current_block,
advance_to,
));
}
}
Ok(())
}There was a problem hiding this comment.
Yep, looks like a better option.
|
Were you able to run this? I run into this error when testing: |
|
Heads up on the empty changesets: history stays empty after import, because IndexAccountHistory builds AccountsHistory entirely from changeset contents. So on an archive node (no |
Hey @palango, by testing do you mean running a full import or something else? I ran the state import with the first 100K lines of the full state dump. |
Exactly. Indenpently, a challenger tried this out and ran into the same issue. |
Hey @palango, I think we need to add this since as you say queries for any account untouched since the import will not return anything and eth_getProof does accept the block number as input. So the safe approach seems to be add a history entry at the migration block for every account. |
Ok I will try with a the full dump. Do you recall how long it took to fail? |
Alternatively, we could also just leave the migration as a v1 database and then run the v2-migration on top. But we need to check if this circumvents all of these issues. |
It happened in the final step of validating the state root. So probably 45-60mins. |
|
See also #192 (comment) |
Replace upstream init_from_state_dump with a local copy that uses single-transaction state root computation. The upstream chunked variant replays the boundary storage key when resuming across MDBX transaction commits, causing a panic in HashBuilder::add_leaf for accounts with large storage tries. Also adds v2 storage support: changeset static file segments are advanced through the dummy chain via ensure_at_block, and a v2 storage check rejects v1 datadirs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4cdb017d36
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let mut acct_cs_cursor = tx.cursor_dup_write::<tables::AccountChangeSets>()?; | ||
| acct_cs_cursor.append_dup(block, AccountBeforeTx { address: *address, info: None })?; | ||
|
|
||
| tx.put::<tables::AccountsHistory>(ShardedKey::new(*address, u64::MAX), history_list.clone())?; |
There was a problem hiding this comment.
Write v2 history/change sets to their routed backends
When this command runs it now requires storage_v2, but these writes still bypass the v2 routing and put account changesets/history directly into MDBX. In storage v2, account changesets are read from static files and account history from RocksDB, so an imported datadir will have empty v2 backends for the migration state even though MDBX contains rows that normal v2 readers ignore; the same direct-write pattern is repeated for storage changesets/history below. This breaks historical lookups and any operation that needs the migration-block change/history data after import, so the import path should use the provider/EitherWriter routing or explicitly populate the v2 static/RocksDB destinations.
Useful? React with 👍 / 👎.
| StateRootProgress::Progress(state, _, updates) => { | ||
| let updated_len = provider.write_trie_updates(updates)?; | ||
| total_flushed_updates += updated_len; |
There was a problem hiding this comment.
Commit trie updates between progress batches
For a full Celo migration dump, root_with_progress() can return many Progress batches, but each batch is written into the same MDBX write transaction here and the transaction is not committed until the whole root finishes. MDBX retains dirty pages for the open transaction, so this can exhaust memory or the DB map during the import; the previous chunked path existed to release those pages between batches. Keep the boundary-key workaround, but still commit/reopen at a safe progress boundary or threshold before importing mainnet-sized state.
Useful? React with 👍 / 👎.
Switches support from storage_v1 to storage_v2 in the import-celo-state command.
The updated upstream reth (to v2.2.0) now skips writing to static files for the import and writes everything to the MDBX database.
This bypasses the previous problem we had (solved with a fork of upstream reth in celo-org/optimism#437) with writing static files where each write incremented the block and writes were batched so that we couldn't write the whole state in one go leading to mismatching block numbers.
Tested with a minimal import file, yet to test over the full import.