Skip to content

fix(celo-reth): correct import-celo-state + add celo-migrate-v2 (storage v1→v2)#198

Merged
seolaoh merged 13 commits into
mainfrom
jcortejoso/celo-migrate-v2
Jun 8, 2026
Merged

fix(celo-reth): correct import-celo-state + add celo-migrate-v2 (storage v1→v2)#198
seolaoh merged 13 commits into
mainfrom
jcortejoso/celo-migrate-v2

Conversation

@jcortejoso

Copy link
Copy Markdown
Member

Summary

This PR makes celo-reth's offline state-import path actually produce a working, reth-storage-v2 Celo Mainnet archive datadir. It contains two related pieces:

  1. Root-cause fixes for the corrupted import-celo-state output (the real cause behind Snapshot-restored op-reth diverges on CIP-64 execution despite identical pre-state #192).
  2. A new celo-reth celo-migrate-v2 subcommand that converts the imported v1 datadir to reth's storage-v2 layout without the upstream stage-reset that is incompatible with Celo's dummy pre-migration blocks.

Validated end-to-end on a real node (Hetzner fsn-1): import-celo-statecelo-migrate-v2op-reth + op-node synced cleanly from the migration block (31,056,500) to live tip (68M+). The resulting v2 datadir was additionally snapshot + clone-booted in isolation: reth's startup consistency check passes and it serves the correct head.

What was broken

import-celo-state opens the env via upstream EnvironmentArgs::init, which ends by writing the chainspec genesis alloc. Celo Mainnet's genesis alloc includes one storage slot on the Registry (0xce10, slot 0xb53127…). The L1 final-state dump we ingest writes a different value for that same slot, so both land in HashedStorages as a duplicate dup-entry and the state-root computation panics in alloy_trie HashBuilder::add_leaf (key == self.key). The datadir was unusable; the symptoms tracked in #192 were downstream of this.

Two more issues blocked the v1→v2 path:

  • The canonical L2 allocs contain zero-value storage slots; init_from_state_dump writes them verbatim, but op-geth's StateDB.Commit prunes SSTORE-to-zero slots, so the computed migration root didn't match canonical.
  • Upstream reth db migrate-v2 clears recomputable tables and resets 6 stage checkpoints to 0, expecting the pipeline to rebuild by re-walking blocks 1..tip. Celo's blocks 1..31M are header-only dummies (no bodies), so SenderRecovery et al. cannot rebuild → the node won't start.

Changes

Commit What
feat: add celo-migrate-v2 subcommand that preserves stages New Celo-aware v1→v2 migration: changeset/receipt migration + TransactionSenders seed + metadata flip, skipping the upstream table-clear/stage-reset.
fix: commit TransactionSenders seed per-segment + progress ensure_at_block(31M) in one shot grew the static-file writer's in-memory state past 18 GB and deadlocked the commit. Chunked 500k-block commits with progress/ETA.
fix: skip genesis alloc state writes in import-celo-state The #192 root-cause fix. New shared open_env_skip_init_and_consistency_check helper opens the env without init_genesis_with_settings; the import then emits only the non-state genesis side-effects (storage settings, canonical header, stage checkpoints, segment ranges).
fix(scripts): prune zero-value storage slots from L2 allocs Match op-geth StateDB.Commit SSTORE-to-zero semantics so the migration state root matches canonical.
fix: also bypass the consistency check in celo-migrate-v2 Reuse the shared helper so a freshly imported v1 datadir (SenderRecovery checkpoint at the migration block, empty TransactionSenders segment) doesn't trip the destructive Unwind(0) assertion before the seed step can fix it.
fix: write v1 storage settings (not base = v2) in import-celo-state StorageSettings::v1() explicitly (upstream base() now defaults to v2), so the import produces a coherent v1 datadir for celo-migrate-v2 to convert.
fix: guard celo-migrate-v2 to fresh imports; drop temp CI trigger; lint Review hardening (see below).

Validation

  • Computed import state root matches the canonical migration root 0xed980641….
  • db settings get: storage_v2: false after import → true after migrate.
  • Sample state queries (Registry/GoldToken/FeeHandler/Governance/FeeCurrencyDirectory) match forno.
  • op-reth + op-node (execution-layer syncmode, no --proofs-history) booted on the v2 datadir, consistency check passed, synced 31M → 68M.
  • v2 datadir snapshotted and clone-booted in isolation: Database opened, storage_v2: true, head = 68,157,296.
  • cargo test -p celo-reth: 131 pass. fmt --check + clippy -D warnings clean.

Reviewer notes / known limitations

A deep review flagged the following. The first is fixed in this PR; the rest are intentionally out of scope and noted for follow-up:

  • Fixed — misuse guard: celo-migrate-v2 now bails unless the Execution stage checkpoint equals the migration block. seed_transaction_senders writes a zero-entry senders segment up to tip, which is only correct when blocks 0..=tip have no transactions (the fresh post-import shape). The guard prevents running it on a synced datadir (which would silently corrupt the segment).
  • Upstream-copied helpers: migrate_account_changesets / migrate_storage_changesets / migrate_receipts / compact_mdbx / swap_compacted_db are byte-for-byte copies of upstream private fns (they can't be reached via a public API). They are pinned to reth rev 88505c7 and must be re-audited on every reth bump. The while let Some(Ok(..)) = walker.peek() changeset loops inherit upstream's behavior of breaking (rather than propagating) on a cursor error; acceptable here because the command is a one-shot offline tool gated to the known-good datadir shape, but worth tightening if these are ever generalized.
  • Partial-failure recovery: Phase 0 guards (already-v2, non-empty changeset segments) protect against double-appends but are not fully idempotent across a mid-run crash; recovery is "restore the datadir and re-run".
  • swap_compacted_db is not crash-atomic between the two renames (db → db_pre_compact, db_compact → db); a crash in that window leaves the DB at db_pre_compact. Acceptable for an offline maintenance command; recovery path is the backup dir.
  • Tests: unit tests cover CLI parsing. A higher-value follow-up is a fixture test asserting (a) import with a genesis/dump storage-slot conflict does not panic and produces the expected migration root, and (b) post-migrate storage_v2: true with the senders boundary at the migration block.

Relates to #192 (root cause + fix). The snapshot-subcommands work (#191) sits on top of a clean import produced by this PR.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07569efe61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/celo-reth/src/celo_migrate_v2.rs
Comment thread crates/celo-reth/src/state_import.rs
Comment thread crates/celo-reth/src/celo_migrate_v2.rs Outdated
@jcortejoso jcortejoso force-pushed the jcortejoso/celo-migrate-v2 branch from 07569ef to a18a516 Compare May 30, 2026 07:16

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e43509fa7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/celo-reth/src/celo_migrate_v2.rs
Comment thread crates/celo-reth/src/state_import.rs
Comment on lines +284 to +299
for block in first_block..=tip {
let mut entries = Vec::new();

while let Some(Ok((block_number, _))) = walker.peek() {
if *block_number != block {
break;
}
let (_, entry) = walker.next().expect("peeked")?;
entries.push(entry);
}

count += entries.len() as u64;
writer.append_account_changeset(entries, block)?;
}

writer.commit()?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a single commit for the whole block range acceptable here, or do we need to chunk the commits as in 4269c58?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied 6f30d23

Comment on lines +329 to +348
for block in first_block..=tip {
let mut entries = Vec::new();

while let Some(Ok((key, _))) = walker.peek() {
if key.block_number() != block {
break;
}
let (key, entry) = walker.next().expect("peeked")?;
entries.push(StorageBeforeTx {
address: key.address(),
key: entry.key,
value: entry.value,
});
}

count += entries.len() as u64;
writer.append_storage_changeset(entries, block)?;
}

writer.commit()?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.

jcortejoso and others added 9 commits June 8, 2026 21:27
Upstream `reth db migrate-v2` clears the recomputable MDBX tables (senders,
indices, trie, plain state) and resets the SenderRecovery / TransactionLookup /
IndexAccountHistory / IndexStorageHistory / MerkleExecute / MerkleUnwind stage
checkpoints to 0, on the assumption that the pipeline can rebuild from block 0.

For a Celo datadir bootstrapped via `import-celo-state`, blocks 1..31_056_499
are header-only dummies with no bodies. After the upstream migrate-v2, the
pipeline fails on its first SenderRecovery iteration with "block meta not
found for block #1". Manually setting the cleared stages back to the
migration block then panics on a static-file <> database consistency check
because TransactionSenders is empty.

This commit adds `celo-reth celo-migrate-v2` as a sibling to the existing
`import-celo-state` interception. It runs the upstream changesets/receipts
migration + storage-settings flip + MDBX compaction unchanged, but skips
Phase 4 (clear_recomputable_tables) entirely: on a Celo-imported datadir
those tables are already in the correct post-migration state (empty for
the dummy range, populated at and beyond the migration header) and the
stages are already at the migration block.

The four upstream helper functions (migrate_account_changesets,
migrate_storage_changesets, migrate_receipts, compact_mdbx / swap_compacted_db)
are private associated functions on `reth_cli_commands::db::migrate_v2::Command`,
so they are copied verbatim into `celo_migrate_v2.rs` with a note to keep
them in sync when the reth pin is bumped.
A single `ensure_at_block(31_056_500)` call from an empty segment grew the
static-file writer's in-memory state past 18 GB and deadlocked the final
commit on a Celo Mainnet datadir, leaving the migration hung in the seed
step. The writer accumulates per-block segment metadata until the next
commit, and a single 31M-iteration call accumulated more than the writer
could flush in one shot.

Commit every 500 000 blocks (one segment file's worth) so the in-memory
metadata is bounded and freed between chunks. Also log progress per chunk
with elapsed + ETA so the operator has a usable signal during the multi-
hour seed phase.

Also: add this branch to the dev-image push workflow so CI builds an
image we can test on fsn-1 without going through PR review. To revert
before merging.
EnvironmentArgs::init ends with a call to init_genesis_with_settings, which
writes the chainspec's genesis allocation into PlainAccountState,
PlainStorageState, HashedAccounts, HashedStorages, AccountsHistory,
StoragesHistory and the trie tables. For Celo Mainnet that alloc includes
the Registry contract (0x000000000000000000000000000000000000ce10) with one
storage slot at key
0xb53127684a568b3173ae13b9f8a6016e243e63b6e8ee1178d6a717850b5d6103.

The L1 final state dump consumed by init_from_state_dump (via
scripts/append_l2_allocs.py) also writes that exact Registry slot, with a
different value. Both writes land in HashedStorages for the same
(keccak(0xce10), keccak(0xb53127…)) pair, and DbDupCursorRW::upsert on
HashedStorages appends a second dup-record instead of replacing. The
subsequent compute_state_root iterates HashedStorages by (outer_key,
subkey) and encounters the duplicate, panicking in
alloy_trie::hash_builder::HashBuilder::add_leaf with 'key == self.key' and
leaving the datadir unusable.

Replace the env_args.init() call with a local helper that performs the
same setup but emits only the non-state side-effects of init_genesis:

  * StorageSettings (so the provider knows the chosen layout — v1 here);
  * the canonical genesis header (so the Headers static-file segment has
    block 0, which setup_without_evm needs as a prerequisite for
    append_header(1, B256::ZERO));
  * stage checkpoints at the genesis block number (so setup_without_evm
    can advance them to the migration block);
  * empty block ranges for the Receipts / Transactions static-file
    segments (so writers that grow these later don't trip on missing
    block bounds).

It deliberately does NOT call insert_genesis_state, insert_genesis_hashes,
insert_genesis_history, or compute_state_root — those are the four steps
inside init_genesis_with_settings that touch the alloc and produce the
duplicate.

Reproduced empirically on fsn-1 with the full L1 dump
(https://fsn1.your-objectstorage.com/celo/l1-final-state.json.zst): the
unpatched import crashes during state-root computation after 26+ minutes
with the exact 'add_leaf key == self.key' panic on the Registry storage
slot above; with this patch the same import is expected to complete
cleanly. The HashedStorages discrepancy (360,672,585 entries vs
PlainStorageState 360,672,584) collapses to zero.

Refs #192.
…_l2_allocs

op-geth's StateDB.Commit deletes storage slots stored at value 0 (SSTORE
to 0 deletes the slot from the trie), whereas reth's init_from_state_dump
writes every entry from the input dump verbatim into HashedStorages. The
canonical L2 allocs file at storage.googleapis.com/cel2-rollup-files/celo/l2-allocs.json
contains four zero-value slots under the OP Stack implementation namespace
(0xc0d3...30007/0010/0012/0014); without filtering them out, the imported
datadir's computed state_root mismatches the canonical CEL2 migration root
0xed980641a4bd4d2e84c6c8db980b7f05e95733c92be2e0045db3735efeb1d807.

Add _filter_zero_storage and apply it to every L2 alloc immediately after
loading l2-allocs.json. This makes the merged JSONL feed init_from_state_dump
the same set of slots that op-geth would commit, so the post-import state
root verification passes.

Verified end-to-end on fsn-1 against the full Celo Mainnet L1 final dump:
import-celo-state on celo-kona-reth:sha-a881d63 + this fix produces a
datadir whose computed state_root equals the canonical migration root
exactly. HashedStorages and PlainStorageState end up with identical entry
counts (360,672,580).

Refs #192.
EnvironmentArgs::init runs a static-file/database consistency check in
create_provider_factory that asserts (assert_ne!(unwind_target,
PipelineTarget::Unwind(0), …)) when the only safe heal would be a full
unwind to block 0. A v1 datadir freshly produced by import-celo-state has
SenderRecovery's stage checkpoint at the migration block (31,056,500) but
an empty TransactionSenders static-file segment, so the check picks an
unwind-to-zero target and trips the assertion before
celo-migrate-v2::execute_with_factory's seed_transaction_senders step can
fix the inconsistency.

Extract the open_env_skip_genesis_state body into a shared helper
open_env_skip_init_and_consistency_check that skips BOTH
init_genesis_with_settings and the consistency check, leaving any post-open
initialization to the caller. Both Celo-only subcommands now use it:

* import-celo-state: still calls insert_genesis_header / writes
  StorageSettings / seeds stage checkpoints / sets static-file block ranges
  after the helper returns (the freshly created datadir has none of those
  set up yet). Behavior unchanged from the previous fix.

* celo-migrate-v2: switches from env_args.init() to the helper. The
  datadir is already initialized so there is nothing to write at open
  time; seed_transaction_senders later in the migration brings
  TransactionSenders into line with the SenderRecovery checkpoint so the
  resulting v2 datadir is consistent for any subsequent open.

Verified with cargo check + cargo test -p celo-reth --lib (all 131 unit
tests pass).

Refs #192.
…elo-state

open_env_skip_init_and_consistency_check was passing StorageSettings::base()
to write_storage_settings, but upstream reth's StorageSettings::base() returns
StorageSettings::v2() (default for new nodes). The intent of import-celo-state
is to produce a v1/legacy datadir — see the existing comment
"// v1 storage only — \`--storage.v2\` is not exposed by this subcommand" —
so that celo-migrate-v2 can later perform the v1 -> v2 migration cleanly.

With the v2 metadata mistakenly written, celo-migrate-v2's preflight short-
circuits on "Storage is already v2, nothing to do" while the underlying
tables still have the v1 layout (everything in MDBX, nothing in static files
or RocksDB). The datadir ends up inconsistent: v2 metadata + v1 data.

Write StorageSettings::v1() explicitly so the import produces a properly
v1-flagged datadir.

Refs #192.
…trigger; lint

Review hardening before opening the PR:

- celo-migrate-v2: bail unless the Execution stage checkpoint equals the
  migration block (CEL2_MIGRATION_BLOCK_NUMBER). seed_transaction_senders writes
  a zero-entry senders segment up to tip, which is only correct when blocks
  0..=tip contain no transactions (the fresh post-import shape). Running it on a
  synced datadir would silently corrupt the segment; the guard refuses that.

- Drop the temporary jcortejoso/celo-migrate-v2 branch trigger from
  push-container-dev.yaml (was an explicit 'drop before merging' dev-image hack).

- Lint: remove redundant args.chain.clone() in the shared env helper; annotate
  the required db_path owned-copy against a redundant_clone (nursery) false
  positive; apply nightly rustfmt.
…DBX tables

Address PR review:

- import-celo-state: hoist the empty-datadir guard above the bootstrap metadata
  writes. Previously v1 storage settings + stage-checkpoint resets were committed
  before the last_block_number != 0 check, so an accidental run on a populated
  datadir would clobber its metadata before erroring.

- celo-migrate-v2: replace the Phase 4 skip with a scoped clear of the MDBX tables
  the v2 flip makes dead (AccountChangeSets, StorageChangeSets now in static files;
  TransactionSenders in its segment; TransactionHashNumbers/AccountsHistory/
  StoragesHistory now read from RocksDB). Reclaims space the compaction otherwise
  copies. Plain state / hashed / trie are kept (still read from MDBX in v2) and
  stage checkpoints are NOT reset (Celo dummy blocks can't be re-walked).
write_storage_settings persists v1 metadata but does not update the
in-memory storage-settings cache. The cache is seeded by
ProviderFactory::new, which at the current reth pin defaults a fresh
datadir to v1, so it already matches -- set it explicitly so a future
reth that defaults the cache to v2 (or makes the state dump
cache-sensitive) cannot leave the cache diverging from the v1
metadata/import layout written here. Correct today, version-dependent.

Hardens the Codex r3328416058 thread on #198 across reth pin bumps.
@seolaoh seolaoh force-pushed the jcortejoso/celo-migrate-v2 branch from 6e43509 to 97661fe Compare June 8, 2026 13:03

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97661feba3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/celo-reth/src/celo_migrate_v2.rs
seolaoh added 3 commits June 8, 2026 22:15
proc-macro-error2 is a newly-flagged unmaintained advisory, pulled in at
build time via alloy-sol-macro (the sol! proc-macro); it never ships in
the runtime binary and has no safe upgrade (alloy-sol-macro must drop it
upstream). Ignore it, matching the existing unmaintained-advisory
entries, so `cargo deny check advisories` passes again.
migrate_account_changesets / migrate_storage_changesets advance the
static-file segment over Celo's ~31M dummy pre-migration blocks with a
single writer.commit(), accumulating per-block segment metadata until
flush -- the same pattern that OOM'd seed_transaction_senders at 18 GB
on a real Celo Mainnet datadir. Commit every 500k blocks (segment
aligned), mirroring the senders fix, so Phase 1 can't OOM before
reaching the (already chunked) sender seeding.

These two helpers now deviate from their upstream verbatim copies; the
in-file note flags re-applying the chunking after a reth pin bump.

Addresses the karlb and Codex (r3373429686) review threads on #198.
celo-protocol declares kona-derive and kona-proof but uses neither (not
in its src, features, or re-exports), so `cargo udeps` flags them as
unused. Remove the dead deps.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 20ad1d275a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/celo-reth/src/state_import.rs Outdated
The emptiness guard only checked `last_block_number() != 0`, which is 0
for BOTH a fresh datadir and a genesis-only / partially-initialized one
(after a stock `init`, or an aborted import that already committed the
bootstrap block). Importing onto existing genesis state mixes it with
the L1 dump and fails the state-root check after mutating the datadir.
Also reject when a genesis header is already present (we insert it
ourselves below), so only a truly uninitialized datadir is accepted.

Addresses Codex r3373644300 on #198.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e3cda608b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/celo-reth/src/celo_migrate_v2.rs
Comment thread crates/celo-reth/src/state_import.rs
@seolaoh seolaoh merged commit 7e3bfbe into main Jun 8, 2026
16 checks passed
@seolaoh seolaoh deleted the jcortejoso/celo-migrate-v2 branch June 8, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants