You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚠️Root-cause superseded — see Update 5. The divergence is intermittent / concurrency-dependent (not deterministic on one "specific" tx) and is a reth in-memory state read-consistency bug, not snapshot/import-specific. Updates 1 & 3 below are retracted; the reproduction details remain valid.
Snapshot-restored op-reth diverges on CIP-64 execution despite identical pre-state
Summary
When celo-kona-reth is started from a snapshot of a previously-running canonical archive node and brought up to chain head via op-node, transaction execution deterministicallyintermittently diverges from the canonical chainthe moment it encounters a specific CIP-64 transaction (on some CIP-64 blocks — see Update 5) — even though the pre-state (the snapshot's state root) is byte-for-byte identical to the canonical state at the same block.
Two celo-kona-reth image versions both exhibit the bug, with different symptoms:
No outright rejection. State root computed locally differs from the block header's state root → block added to fork chain. Chain forks silently. Different runs produce different fork-block hashes.
This makes snapshot-based bootstrap (the workflow PR #191 enables) unusable for production without further fix — the snapshot mechanically reads & extracts correctly, op-reth boots, but the resulting node cannot follow canonical chain.
Reproduction
Producer side (one-time, already done)
Running celo-kona-reth sha-04c2a3c on /var/lib/op-reth (ZFS dataset tank/may18-fresh), in archive mode, with --proofs-history enabled.
zfs snapshot tank/may18-fresh@export-2026-05-24 — captured at L2 block 67,728,745.
podman start op-reth (resumes normally).
zfs clone -o readonly=on tank/may18-fresh@export-2026-05-24 tank/snapshot-src for tooling.
Ran celo-reth snapshot-manifest (from PR feat(celo-reth): add download and snapshot-manifest subcommands #191, image sha-bc50e90) against the clone → produced manifest.json + 818 component archives (354 GiB), uploaded to Hetzner Object Storage at https://fsn1.your-objectstorage.com/celo/snapshots/.
Consumer side (the bug)
On a clean dataset, run celo-reth download --datadir /celo/data --chain celo --non-interactive --archive (image sha-bc50e90).
Completes successfully: emits "Snapshot download complete. Run celo-reth node to start syncing."
Same setup, just swap the image. The "fee currency not registered" symptom goes away, but state computation is still wrong:
WARN Changeset cache MISS, falling back to DB-based computation
block_hash: 0x88cacb50d7e5479317e95c582b6ccf35be32cf511e59be5d9818b1cc3c8b7a72
block_number: 67817511
INFO State root task finished
state_root: 0x4df15fd214e8cf7edc0207af2d13d482a47e673befa4bd0b738322f802537a15
elapsed: 409.559µs
WARN State root task returned incorrect state root
state_root: 0x4df15fd214e8cf7edc0207af2d13d482a47e673befa4bd0b738322f802537a15 ← computed by op-reth
block_state_root: 0xa1477420629fdbec1cf2c82a810a8f49d51292709b50864268d7f56e26616ec1 ← what the canonical block header says
WARN Failed to compute state root in parallel
INFO Block added to fork chain
INFO Canonical chain committed
The block is added with a hash different from canonical's. Different op-reth restarts produce different fork-block hashes (e.g. 0x88cacb…, 0x0fcc36…) — strongly suggesting non-deterministic state read from an uninitialized location.
Since state root is computed over a Merkle Patricia Trie of all account states (including storage roots), an identical state root mathematically guarantees that every account's storage is identical to canonical at that block. The on-chain state at 67,817,510 is provably canonical.
Yet execution at block 67,817,511 produces different state
The only way to reconcile this is: block execution reads or modifies state that is not in the Merkle Patricia Trie.
Hypothesis
Cip64Storage (the per-block transient state that PR #184 made per-EVM instead of factory-scoped) — or something analogous — is lazily populated during normal block-by-block sync and is not captured by ZFS/file-based snapshots. From-genesis nodes accumulate this transient state as they execute every block. Snapshot-restored nodes start with it missing/empty/uninitialized, and when they hit a CIP-64 transaction whose execution depends on that state, the result diverges.
Supporting evidence for this hypothesis:
The same sha-04c2a3c image had been running canonical for ~2 months prior on the same physical machine (no divergence from genesis sync). After a single stop/snapshot/restart cycle, it now fails.
sha-bc50e90 (which post-dates PR Per evm CIP-64 storage #184) doesn't fix the issue but does change the symptom — consistent with "different transient-state handling, same core missing-state bug".
The fork-block hash is non-deterministic across op-reth restarts on the same input — strongly suggests the divergent state is initialized from uninitialized memory or a non-deterministic source.
PR Per evm CIP-64 storage #184's commit message explicitly mentions that Cip64Storage lives in CeloEvm (not in the trie) and that the proofs-history ExEx re-executes blocks through the factory — exactly the kind of side-channel state that would not survive a snapshot.
What we tried (none worked)
Attempt
Result
ZFS rollback to @export-2026-05-24 and restart with sha-04c2a3c
"static file tip behind checkpoint" + key decode failure during auto-unwind. Different bug class (interaction with tank/op-reth-proofs dataset not being rolled back).
celo-reth download to a fresh datadir + sandbox boot (no op-node, no peers)
Boots cleanly, pipeline finishes at snapshot block. False-positive validation.
celo-reth download + production op-reth + op-node + chain catchup
Fork divergence as documented above.
Restart op-reth several times
Each restart produces a different fork-block hash at 67,817,511.
Swap image from sha-04c2a3c → sha-bc50e90
Symptom changes from "invalid tx" to "wrong state root", but still divergent.
Wipe and recreate tank/op-reth-proofs (the proofs-history rocksdb)
No effect on divergence behaviour.
Environment
Host: Hetzner EX44 in FSN1 datacenter, Debian 12, ZFS root pool, sync=standard.
Chain: Celo Mainnet (--chain celo, network 42220). All hardforks active including Jovian (Mar 31 2026).
op-reth flags include --proofs-history --proofs-history.storage-path=/celo-proofs/data --proofs-history.window=1209600.
Forensic artifacts available
The reproduction case is preserved on the affected node (celo-mainnet-archive-hetzner-fsn-1) for as long as needed:
tank/test-download — the celo-reth download-produced datadir, byte-for-byte equivalent to the published snapshot. Mounted at /var/lib/op-reth.
tank/may18-fresh — the previously-canonical broken state (forked at block 67,751,935 on May 24 from a separate incident). Mounted at /var/lib/op-reth.broken-2026-05-25.
tank/op-reth-proofs.broken-2026-05-25 — the proofs-history rocksdb from the broken state.
ZFS snapshots @backup-anchor-2026-05-17, @backup-anchor-2026-05-15, plus daily autosnaps back to early May.
Published snapshot artifacts at https://fsn1.your-objectstorage.com/celo/snapshots/ (manifest.json + 818 chunks, 354 GiB total).
Full op-reth + op-node journald logs from the reproduction window (2026-05-25 ~11:00–15:00 UTC).
Happy to grant access or attach raw logs to anyone who wants to reproduce or debug.
Impact on snapshot publication tooling
PR #191 added download and snapshot-manifest subcommands precisely to enable snapshot-based bootstrap. The file shuttling works correctly — the bug is downstream. Until this is resolved:
The published snapshot at https://fsn1.your-objectstorage.com/celo/snapshots/ should be marked experimental / not for production use.
The validation procedure for snapshots needs to extend beyond "op-reth boots cleanly" to "process N canonical blocks via engine API and verify every block's state root matches" — boot-only validation produced a false positive here.
A separate snapshot-export tool that also captures whatever non-trie state is being missed may be needed (or, alternately, op-reth should be able to lazily reconstruct it from MDBX/RocksDB on first start after restore).
Snapshot-restored op-reth diverges on CIP-64 execution despite identical pre-state
Summary
When
celo-kona-rethis started from a snapshot of a previously-running canonical archive node and brought up to chain head viaop-node, transaction executiondeterministicallyintermittently diverges from the canonical chainthe moment it encounters a specific CIP-64 transaction(on some CIP-64 blocks — see Update 5) — even though the pre-state (the snapshot's state root) is byte-for-byte identical to the canonical state at the same block.Two
celo-kona-rethimage versions both exhibit the bug, with different symptoms:sha-04c2a3c(pre-#184)"fee currency not registered: 0x0E2A3e05bc9A16F5292A6170456A710cb89C6f72". Block is dropped; chain stalls.sha-bc50e90(includes #184)This makes snapshot-based bootstrap (the workflow PR #191 enables) unusable for production without further fix — the snapshot mechanically reads & extracts correctly, op-reth boots, but the resulting node cannot follow canonical chain.
Reproduction
Producer side (one-time, already done)
sha-04c2a3con/var/lib/op-reth(ZFS datasettank/may18-fresh), in archive mode, with--proofs-historyenabled.podman stop -t 60 op-reth(clean shutdown, exit 0 in 4.3 s).zfs snapshot tank/may18-fresh@export-2026-05-24— captured at L2 block 67,728,745.podman start op-reth(resumes normally).zfs clone -o readonly=on tank/may18-fresh@export-2026-05-24 tank/snapshot-srcfor tooling.celo-reth snapshot-manifest(from PR feat(celo-reth): add download and snapshot-manifest subcommands #191, imagesha-bc50e90) against the clone → produced manifest.json + 818 component archives (354 GiB), uploaded to Hetzner Object Storage athttps://fsn1.your-objectstorage.com/celo/snapshots/.Consumer side (the bug)
celo-reth download --datadir /celo/data --chain celo --non-interactive --archive(imagesha-bc50e90)."Snapshot download complete. Runcelo-reth nodeto start syncing."db/(100 GiB),static_files/(487 GiB),rocksdb/(47 GiB).chown -R 10001:10001 <datadir>(download command creates files as root; op-reth runs as celo UID 10001).celo-kona-rethnodeagainst this datadir (imagesha-04c2a3cinitially — the production image).op-node(imagecelo-blockchain-public/op-node:celo-v2.2.1).After this rejection, every subsequent payload from op-node is rejected as "links to previously rejected block". Chain stalls indefinitely.
With sha-bc50e90 (includes PR #184)
Same setup, just swap the image. The "fee currency not registered" symptom goes away, but state computation is still wrong:
The block is added with a hash different from canonical's. Different op-reth restarts produce different fork-block hashes (e.g.
0x88cacb…,0x0fcc36…) — strongly suggesting non-deterministic state read from an uninitialized location.What we verified
Pre-state (block 67,817,510) matches canonical byte-for-byte
FeeCurrencyDirectory storage matches canonical at block 67,817,510
Since state root is computed over a Merkle Patricia Trie of all account states (including storage roots), an identical state root mathematically guarantees that every account's storage is identical to canonical at that block. The on-chain state at 67,817,510 is provably canonical.
Yet execution at block 67,817,511 produces different state
The only way to reconcile this is: block execution reads or modifies state that is not in the Merkle Patricia Trie.
Hypothesis
Cip64Storage(the per-block transient state that PR #184 made per-EVM instead of factory-scoped) — or something analogous — is lazily populated during normal block-by-block sync and is not captured by ZFS/file-based snapshots. From-genesis nodes accumulate this transient state as they execute every block. Snapshot-restored nodes start with it missing/empty/uninitialized, and when they hit a CIP-64 transaction whose execution depends on that state, the result diverges.Supporting evidence for this hypothesis:
sha-04c2a3cimage had been running canonical for ~2 months prior on the same physical machine (no divergence from genesis sync). After a single stop/snapshot/restart cycle, it now fails.sha-bc50e90(which post-dates PR Per evm CIP-64 storage #184) doesn't fix the issue but does change the symptom — consistent with "different transient-state handling, same core missing-state bug".Cip64Storagelives inCeloEvm(not in the trie) and that the proofs-history ExEx re-executes blocks through the factory — exactly the kind of side-channel state that would not survive a snapshot.What we tried (none worked)
@export-2026-05-24and restart with sha-04c2a3ctank/op-reth-proofsdataset not being rolled back).celo-reth downloadto a fresh datadir + sandbox boot (no op-node, no peers)celo-reth download+ production op-reth + op-node + chain catchuptank/op-reth-proofs(the proofs-history rocksdb)Environment
sync=standard.us-west1-docker.pkg.dev/devopsre/dev-images/celo-kona-reth:sha-04c2a3c.us-west1-docker.pkg.dev/devopsre/dev-images/celo-kona-reth:sha-bc50e90.us-west1-docker.pkg.dev/devopsre/celo-blockchain-public/op-node:celo-v2.2.1.ghcr.io/paradigmxyz/reth:latest.docker.io/sigp/lighthouse:latest.--chain celo, network 42220). All hardforks active including Jovian (Mar 31 2026).--proofs-history --proofs-history.storage-path=/celo-proofs/data --proofs-history.window=1209600.Forensic artifacts available
The reproduction case is preserved on the affected node (
celo-mainnet-archive-hetzner-fsn-1) for as long as needed:tank/test-download— thecelo-reth download-produced datadir, byte-for-byte equivalent to the published snapshot. Mounted at/var/lib/op-reth.tank/may18-fresh— the previously-canonical broken state (forked at block 67,751,935 on May 24 from a separate incident). Mounted at/var/lib/op-reth.broken-2026-05-25.tank/op-reth-proofs.broken-2026-05-25— the proofs-history rocksdb from the broken state.@backup-anchor-2026-05-17,@backup-anchor-2026-05-15, plus daily autosnaps back to early May.https://fsn1.your-objectstorage.com/celo/snapshots/(manifest.json + 818 chunks, 354 GiB total).Happy to grant access or attach raw logs to anyone who wants to reproduce or debug.
Impact on snapshot publication tooling
PR #191 added
downloadandsnapshot-manifestsubcommands precisely to enable snapshot-based bootstrap. The file shuttling works correctly — the bug is downstream. Until this is resolved:https://fsn1.your-objectstorage.com/celo/snapshots/should be marked experimental / not for production use.