execution/bal: serve BALs up to and beyond the WSP#21764
Open
taratorio wants to merge 6 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a regeneration-based fallback to serve EIP-7928 Block Access Lists (BALs) beyond Erigon’s reorg-depth DB retention window by re-executing blocks against historical state, and wires it into RPC, Engine API, and eth/71 devp2p serving paths. This aligns BAL availability with the weak subjectivity period requirements without changing existing BAL storage/pruning.
Changes:
- Introduces
execution/balwithRederiveBlockAccessList+Regenerator(LRU cache, per-block de-dup, bounded replay concurrency, history reader positioning, header hash verification). - Adds DB-first + regeneration fallback for BAL serving in:
- RPC
eth_getBlockAccessList - Engine API
engine_getPayloadBodiesBy{Hash,Range}V2 - eth/71
GetBlockAccessLists
- RPC
- Adds tests covering regeneration after pruning across the three serving surfaces.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| rpc/jsonrpc/eth_block_access_list.go | Adds BAL regeneration fallback when stored BAL is missing/pruned. |
| rpc/jsonrpc/eth_block_access_list_test.go | Tests RPC BAL regeneration after pruning stored BAL rows. |
| rpc/jsonrpc/eth_api.go | Adds bal.Regenerator construction into BaseAPI when a full rules engine is available. |
| p2p/sentry/sentry_multi_client/sentry_multi_client.go | Wires BAL regeneration into eth/71 handler using temporal tx + BlockAccessListGetter. |
| p2p/protocols/eth/handlers.go | Extends eth/71 BAL handler to optionally regenerate missing/pruned BALs. |
| p2p/protocols/eth/handlers_test.go | Adds unit tests for handler regeneration fallback behavior and edge cases. |
| execution/execmodule/getters.go | Adds BAL regeneration fallback for payload bodies getters; switches getter tx to kv.TemporalTx. |
| execution/execmodule/exec_module.go | Instantiates a BAL regenerator inside ExecModule. |
| execution/execmodule/exec_module_test.go | Tests payload bodies BAL regeneration after pruning stored BAL rows. |
| execution/bal/regenerator.go | Implements BAL regeneration with caching, concurrency bounds, and historical state setup. |
| execution/bal/regenerator_test.go | Tests regenerator reproduces canonical BAL bytes and handles pre-Amsterdam. |
| execution/bal/rederive.go | Implements full-block replay to re-derive BALs consistent with builder recording sequence. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
Jun 13, 2026
…ture set (erigontech#21771) ## Problem `eest-spec-enginextests-benchmark-30m-parallel` failed on erigontech#21764 with the Go linker dying mid-build ([run](https://github.com/erigontech/erigon/actions/runs/27421370747/job/81047607869)): ``` link: mapping output file failed: no space left on device ``` GitHub's `ubuntu-24.04` hosted pool currently serves two disk flavors — 72G and 145G root volumes. The same shard on the same branch passed 45 minutes earlier on a 145G runner (110G free after cleanup) and failed on a 72G one (36G free after cleanup). The job's disk demand sits right at that 36G line because every `eest-spec-*` shard provisions **all three** EEST corpora through the blanket `test-fixtures-eest` Make target, while `tools/run-eest-spec-test.sh` only ever reads one of them: | corpus | tarball | extracted | |---|---|---| | eest_benchmark | 1.0G | 5.3G | | eest_devnet | 0.6G | 10G | | eest_stable | 0.4G | 8.1G | 36G free − ~8G (Go toolchain + mod/build cache restore) − 2G tarballs − 23.4G extraction ≈ 2.6G left when `go build` starts → ENOSPC at link. The ±1-2G run-to-run variance is why it flakes instead of failing deterministically. ## Fix - Move fixture provisioning into `tools/run-eest-spec-test.sh`, which already resolves the shard's corpus from its name — each shard now downloads/extracts only the set it reads. Worst-case footprint drops from ~35G to ~22G (devnet shards), ~17G for the benchmark shard that flaked. - The zkevm/non-zkevm Makefile rule split existed solely to pick the fixture prerequisite, so the four pattern rules collapse to two (race/non-race). - Add `--download-only` to `tools/test-fixtures.sh` and use it in `cache-warming-eest-fixtures.yml`. The warmer caches only the `.tar.gz` files but was extracting all four corpora (~27G) — on a runner that doesn't even run the cleanup-space step, i.e. the same ENOSPC waiting to happen on a 72G runner whenever the probe misses. Local `make eest-spec-*` behaviour is unchanged: provisioning is a sentinel-guarded no-op when the corpus is already extracted, and `make test-fixtures-eest`/`test-fixtures-zkevm` remain for prefetching everything. ## Verification TDD note: infra/shell change with no Go behaviour; verified functionally instead of via Go tests: - Synthetic end-to-end test of `test-fixtures.sh` (tiny `file://` manifest in a temp dir): download-only cold / idempotent re-run / normal-mode extract from existing tarball / sentinel no-op / download-only re-download after tarball deletion — all pass. - `make -n` for one shard per class (plain, race, zkevm-race, benchmark): correct binary (`evm` vs `evm.race`), no blanket fixture extraction in any recipe. - Smoke run of `run-eest-spec-test.sh statetests-stable` against a populated cache: in-script provisioning hits the cached fast-path and proceeds into the run stage. - `actionlint` on both workflows; `shellcheck` adds no new findings; `make lint` clean twice.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
EIP-7928 requires nodes to serve BALs for the weak subjectivity period (3533 epochs ≈ 113,056 blocks ≈ 15.7 days). Erigon only stores BALs in MDBX for the last
MaxReorgDepth(96) blocks — the exec-stage prune (#19110) deletes everything older — so eth/71,engine_getPayloadBodiesBy*V2andeth_getBlockAccessListcould not serve anything beyond the reorg window.Full/archive nodes keep account/storage/code domain history well past the WSP (default prune distance 262,144 blocks), so older BALs don't need to be stored at all: they can be re-derived on demand by re-executing the block against historical state — the same approach already used by the receipts generator for devp2p
GetReceipts.What
New
execution/balpackage:RederiveBlockAccessList— replays a full block (init system calls at access index −1, every transaction, finalize atlen(txns)) recording I/O into aVersionedIO, mirroring the block builder's BAL recording sequence.Regenerator— caching wrapper around it: LRU (BAL_LRU, default 1024 entries ≈ 100MB), per-block-hash dedup mutex,BAL_EXEC_CONCURRENCYreplay semaphore (default GOMAXPROCS/2), history reader positioned at the block's first txNum. The re-derived BAL's hash is verified againstheader.BlockAccessListHashbefore it is cached or served — the node only ever serves BALs it derived itself from canonical state; a mismatch (e.g. non-canonical block) degrades to "not available". Pruned history surfaces asstate.PrunedError; pre-Amsterdam headers (no BAL commitment) return nil.Wired into all three read paths, DB-first with regeneration fallback:
GetBlockAccessLists0x80sentinel0x80only beyond historyengine_getPayloadBodiesBy{Hash,Range}V2blockAccessList: nullnullonly beyond historyeth_getBlockAccessListeth.BlockAccessListGetterinterface, fallback insideAnswerGetBlockAccessListsQuery;MultiClientowns aRegeneratornext to its receipts generator. The existing 2MiB/1024-lookup response limits bound per-request replay work.beginOverlayOrRonow returnskv.TemporalTxso the getters' existing tx is reused for regeneration.BaseAPI.balRegenerator, constructed when the engine is a fullrules.Engine.BAL storage and pruning are unchanged — this PR only adds the serving fallback.
Testing
All new tests run with
t.Parallel():execution/bal: insert a blockgen Amsterdam chain (transfers, contract creation with SSTORE init code, multi-tx and empty blocks), prune all stored BAL rows (asserted 0 rows left), then require regenerated bytes byte-equal blockgen's independently computed canonical BALs and their hash equal the header commitment; pre-Amsterdam returns nil.p2p/protocols/eth: handler fallback unit tests (stored wins, regenerated served, getter empty/error/unknown → sentinel, getter not consulted when stored).execution/execmodule: payload bodies serve stored BALs, then after pruning regenerate identical bytes via both ByHash and ByRange.rpc/jsonrpc:eth_getBlockAccessListreturns the regenerated BAL after the stored rows are pruned.make lintclean, full test suites of all touched packages pass, race-clean.