db/state, db/config3: erigondb.toml as source of truth for commitment branch referencing#21452
db/state, db/config3: erigondb.toml as source of truth for commitment branch referencing#21452awskii wants to merge 41 commits into
Conversation
…ranches Mechanical rename of the commitment-branch referencing flag (field + the ForTest* toggle) repo-wide; no behavior change. Existing tests are the safety net. Behavioral decoupling of the merge-scheduling guards is a later task.
…gatorSqueezeCommitmentValues
… aggregator Apply the erigondb.toml-resolved commitment-refs flag to the global statecfg.Schema and the live commitment domain via Aggregator.applyReferencesInCommitmentBranches, called from ReloadErigonDBSettings and from the WithErigonDBSettings -> Open builder path (covering all standalone entry points).
Add a version.Version field to FilesItem, populated in Domain.openDirtyFiles from the version parsed by MatchVersionedFile, and expose it via a visibleFile.Version() accessor. This makes a per-file regime marker reachable at the commitment deref/merge sites (Tasks 7/8) via the existing range-match loop, without extending getLatestFromFiles.
…sion (Task 6) Bump commitment domain DataKV.Current to v2.1 in versions.yaml (regenerated version_schema_gen.go) so the read ceiling accepts both v2.0 referenced and v2.1 plain commitment files (range [v1.0, v2.1]). Add Domain.kvWriteVersion: the commitment .kv write version is decoupled from Current — v2.0 when ReferencesInCommitmentBranches is on, v2.1 when off. Other domains keep writing at DataKV.Current.
…write flag (Task 8)
…nditionally (Task 9) Decouple the commitment merge transformer's expand and re-shorten decisions so flipping references_in_commitment_branches off on a populated datadir stays correct: - vt closure: expand input short keys to plain whenever the input file is referenced (its own version+range), independent of the live flag; re-shorten into the merged output only when the flag is on AND the output range reaches the threshold. Never return a referenced valBuf as-is — that copied stale offsets into the merged file. - evaluate the threshold against the input file range for expansion and the output range for re-shortening (previously conflated on the output range). - SqueezeCommitmentFiles now writes its re-referenced output to the flag-derived kvNewFilePath (v2.0) rather than reusing the input filename. A rebuild writes plain v2.1 files during its flag-off window; reusing that name for squeezed referenced content produced a v2.1-named file with short keys, which reads as garbage after a disk reload. Tests (commitment_merge_version_test.go): - TestCommitmentMergeFlagOffExpandsReferencedInputs: disjoint early-only keys keep v2.0 referenced branches as merge winners; flag-off merge must produce a v2.1 plain file with no short offsets. - TestCommitmentRebuildSqueezeReadableAfterReload: full rebuild→squeeze cycle reopened from disk reads back to the same root with version/content consistent.
…cks (Task 10) Integrity checks decided whether a commitment file carried short references by range alone (ValuesPlainKeyReferencingThresholdReached). With v2.0-referenced and v2.1-plain files coexisting, a v2.1 file at/above the threshold was wrongly treated as referencing and silently under-verified. Expose Version() on the kv.VisibleFile interface, export state.CommitmentBranchReferenced, and route the three integrity decision sites through commitmentFileReferencing(file, stepSize) so they consult the file's own version+range.
- db/state/aggregator.go: gate applyReferencesInCommitmentBranches writes on actual change so the live commitment-domain flag read by background merges is mutated only at first resolution (before any merge spawns), never concurrently - db/state/merge.go: stamp merged commitment file's in-memory version from kvWriteVersion so freshly-merged files are classified by regime without a folder reopen (was zero-version, over-approximating as referenced until restart) - db/state/commitment_merge_version_test.go: add TestCommitmentReadDerefsReferencedFileWithFlagOff (reads a referenced v2.0 branch with the flag off through the real deref path and asserts short refs expand to plain — the central safety claim, previously only covered by self-consistent root equality) and TestMergedCommitmentFileVersionStampedInMemory; both mutation-verified red->green - docs/plans: correct the downgrade caveat — an old binary hard-rejects v2.1 via MustSupport panic, it does not silently misread
integrateDirtyFiles left FilesItem.version at the zero value, unlike mergeFiles and openDirtyFiles. The version-aware commitment read/merge logic treats a zero version as referenced (below v2.1), so a freshly built plain commitment file misclassified as referenced until restart, forcing the read down the deref/range-lookup path it should skip.
There was a problem hiding this comment.
Pull request overview
Migrates the commitment "references in branches" setting from a build-time constant (AggregatorSqueezeCommitmentValues) to a per-datadir field in snapshots/erigondb.toml (references_in_commitment_branches, defaults to true). Reads become per-file version-driven (a commitment .kv is referenced iff its version < v2.1 and its range reaches the referencing threshold), while writes follow the live flag. The commitment DataKV version is bumped v2.0 → v2.1, with the read window accepting both and the write version flag-derived.
Changes:
- Add
*boolReferencesInCommitmentBranchestoErigonDBSettings, thread it viaWithErigonDBSettings/ReloadErigonDBSettingsintostatecfg.Schemaand the live commitment domain; renameDomainCfg.ReplaceKeysInValues → ReferencesInCommitmentBranchesand the test helper. - Decouple commitment
.kvwrite version fromDataKV.Current(flag-on → v2.0, flag-off → v2.1); persist per-file parsed version onFilesItem, expose viaVisibleFile.Version(), stamp newly merged/integrated files; squeeze writes to a flag-derivedkvNewFilePathrather than reusing the input name. - Make read deref, integrity checks, and merge planning/transformer use the per-file version+range predicate (
CommitmentBranchReferenced), so referenced inputs are always expanded even with the flag off and plain v2.1 files are never re-dereferenced.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| db/config3/config3.go | Adds DefaultReferencesInCommitmentBranches = true. |
| db/state/erigondb_settings.go | Adds the new TOML field, accessor with default, normalizes/writes it in resolution paths. |
| db/state/erigondb_settings_test.go | New tests for roundtrip, absent-field normalization, legacy/fresh write behavior. |
| db/state/aggregator2.go | Plumbs referencesInCommitmentBranches through AggOpts/Open and WithErigonDBSettings. |
| db/state/aggregator.go | Renames ForTest… helper, adds applyReferencesInCommitmentBranches, gates merge planning on resolved-referencing predicate. |
| db/state/aggregator_refs_test.go | New tests for reload/open propagation of the flag. |
| db/state/aggregator_test.go | New tests for commitmentMergeInputsReferenced / commitmentVisibleFilesReferenced. |
| db/state/aggregator_fuzz_test.go | Rename of ForTestReplaceKeysInValues. |
| db/state/statecfg/statecfg.go | Renames ReplaceKeysInValues field on DomainCfg. |
| db/state/statecfg/state_schema.go | Removes AggregatorSqueezeCommitmentValues, wires default into the commitment domain. |
| db/state/statecfg/state_schema_test.go | Asserts default and schema agree. |
| db/state/statecfg/versions.yaml, version_schema_gen.go | Bump commitment DataKV current to v2.1. |
| db/state/domain.go | Adds kvWriteVersion(); stamps integrated file version. |
| db/state/domain_test.go | New tests for write-version selection and acceptance window. |
| db/state/domain_committed.go | New CommitmentBranchReferenced predicate plus visible/merge/file helpers; version-aware deref and merge transformer. |
| db/state/domain_committed_test.go | New table tests for CommitmentBranchReferenced. |
| db/state/dirty_files.go | FilesItem.version field; populated on open; visibleFile.Version() implementation. |
| db/state/dirty_files_test.go | New tests for visibleFile.Version() and openDirtyFiles version population. |
| db/state/merge.go | Stamps merged output version. |
| db/state/squeeze.go | Renames flag usage; squeeze output now uses kvNewFilePath. |
| db/state/squeeze_test.go, squeeze_concurrent_rebuild_test.go | Rename of ForTest… helper. |
| db/state/commitment_merge_version_test.go, commitment_version_testutil_test.go | New mixed-regime / merge / rebuild+squeeze tests and shared helpers. |
| db/test/aggregator_ext_test.go | Helper rename. |
| db/state/trie_reader_integration_test.go | Helper rename. |
| db/kv/visible_file.go | Adds Version() to VisibleFile interface. |
| db/integrity/commitment_integrity.go | Switches integrity decisions to version-aware predicate; updated log messages. |
| db/integrity/commitment_version_test.go, commitment_version_integration_test.go | New unit + integration coverage of version-regime behavior. |
| cmd/integration/commands/commitment.go | Helper rename in rebuild path. |
| db/agents.md | Documents snapshots/erigondb.toml fields and read/write semantics. |
| .claude/skills/erigondb-sync-integration-test-plan/SKILL.md | Updates plan with new field, producer workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolve 3 conflicts from main's overlapping refactors: - db/state/aggregator.go: drop frozenBlocks/FrozenBlocksProvider (removed repo-wide by #21415 block-catchup-recovery), keep commitmentRefsOverride. - db/state/domain_committed.go: keep both the PR's reshorten gate and #21303's per-merge findShortenedKey caches; the vt closure uses all three. - db/state/squeeze.go: keep the PR rename ForTestReferencesInCommitmentBranches and the wantsReferencesInBranches guard over main's old ForTestReplaceKeysInValues.
…ersion End-to-end guard for the two upgrade scenarios: an in-place upgrade (erigondb.toml without references_in_commitment_branches -> default true) keeps producing v2.0 referenced commitment files; a downloaded plain snapshot set (references=false) produces v2.1 plain files. Existing tests covered each link (resolution->live-domain bool, and domain bool->kvWriteVersion) separately but not the datadir-shape->produced-version chain.
If a datadir ends up with both a v2.0 (referenced) and v2.1 (plain) commitment file for the same step range (e.g. a download landing the other regime's file), openDirtyFiles must collapse them to one dirty item resolved via MatchVersionedFile to the highest version (v2.1), leaving the v2.0 twin on disk but unopened. Asserted order-independent.
…mmitment (fix Windows CI) wipeCommitment dir.RemoveFile'd commitment .kv files while the aggregator still mmapped them. Unix allows unlink-while-open, but Windows refuses (file in use), failing TestCommitmentRebuildSqueezeReadableAfterReload on the windows-2025 test legs. Release the handles via agg.Close() before deleting, then reopen a fresh agg over the wiped folder and return the new (db, agg); all 5 call sites updated to capture them.
kvWriteVersion special-cased the commitment domain by name and inlined the v2.0/v2.1 literals. Replace it with an optional KVWriteVersion func(*DomainCfg) on DomainCfg: the generic method delegates to the hook (else DataKV.Current), and the commitment regime->version mapping is declared once next to the schema. The hook reads the live ReferencesInCommitmentBranches flag, preserving single-source-of-truth (the flag is toggled at runtime during rebuild/squeeze, so the version must stay derived, not stored).
|
bucket |
I'll do the release. AFAIK you did noref on all chains. |
| valBuf, _ = r.Next(valBuf[:0]) | ||
| if bytes.Equal(keyBuf, commitmentdb.KeyCommitmentState) { | ||
| continue | ||
| } | ||
| if branchHasShortenedKey(valBuf) { | ||
| return true | ||
| } |
yperbasis
left a comment
There was a problem hiding this comment.
Requesting changes — three issues, all on the merge/concurrency path.
1. Data race on ReferencesInCommitmentBranches (blocker). applyReferencesInCommitmentBranches writes a.d[kv.CommitmentDomain].ReferencesInCommitmentBranches unsynchronized (called from ReloadErigonDBSettings ← SpawnStageSnapshots, stage_snapshots.go:210). The background MergeLoop goroutine (backend.go) reads it concurrently in findMergeRange/mergeFiles (via at.a.Cfg(...) — the struct copy still reads the bool unsynchronized) and in commitmentValTransformDomain's vt closure (dt.d.ReferencesInCommitmentBranches). The write is skipped on the default true→true path, so it's invisible until the flag actually flips at runtime — i.e. a plain-snapshot consumer with a downloader, which is the feature's main use case. Please make the flag an atomic.Bool read through an accessor in all merge/squeeze paths (or quiesce merges around the reload), and confirm under -race.
2. Merge-path sampling false-negative. Please switch commitmentFileReferenced to the deterministic classifier discussed in review (scan to the first key-bearing branch, classify by key length). A file's regime is uniform, so this is exact and typically cheaper than 200 strided samples — and it closes the only un-caught corruption vector: a missed referenced input merged with the flag off copies stale offsets verbatim. Reads are covered by the per-branch guard; merges are not.
3. Per-read cost on the plain regime. Once #2 makes referenced trustworthy, drop the per-branch branchHasShortenedKey(branch) parse from replaceShortenedKeysInBranch's fast path. As written, every read of a plain file — the regime this PR converges everyone toward — pays a full ReplacePlainKeys walk that main short-circuited on a cheap boolean.
Minor: RebuildCommitmentFiles now runs the squeeze pass whenever the flag is on regardless of squeeze=false — correct for the new model, but worth a note since it changes the parameter's meaning.
those v2.1 files are referenced, I'm 100% sure (also because I just ran a sepolia upgrade test) |
…kets Switch EmbeddedWebseeds for all chains from erigon34 to erigon36 buckets and DefaultSnapshotGitBranch from release/3.4 to release/3.6, so OtterSync pulls the v3.6 all-plain snapshot dataset.
…ferenced Greedy scanner: reads every branch value in a commitment domain .kv and reports referenced (first shortened key) or plain (all full keys).
Nope. Guess is wrong as well, there is no uniformity (if referenced - every branch data MAY have one or more shortened keys, but could be plain - touched as a siblings @sudeepdino008 i added tool to check plain/referenced files |
|
im still getting 404 on downloading snapshots, after setting branch to |
|
requested files are not exist on R2 bucket, some discrepancy on publishing block segments for very recent steps. seg retire bug? waiting for https://github.com/erigontech/erigon-snapshots-automation/actions/runs/27747302801 to finish |
|
sepolia is uploading now (first time), probably bucket is empty rn |
ReferencesInCommitmentBranches is flipped by ReloadErigonDBSettings (snapshots stage) while the background merge loop reads it in findMergeRange, mergeFiles and the commitmentValTransformDomain transformer — an unsynchronized read/write that surfaces when a downloader flips the flag at runtime (the plain-snapshot consumer path). Route every read through a locked accessor, take the lock on every write, and pass the flag into commitmentValTransformDomain since the transformer has no aggregator handle. Add a -race regression test.
|
erigontech/erigon-snapshot#1324 after it i was able to sync correctly and continue execution i suppose to cherry pick 246f8b9 into #21376 so release changes will be only there while this carrying right wiring. Also worth setting referencing enabled to true by default i think |
shouldn't it be false (for the upgrade to produce plain files) |
|
@AskAlexSharov #21376 supposed to be merged in release/3.6 because raises version and we wanted it to be separate, later, as we agreed this (
right, but to continue working in main with 3.4 until we ENABLE this change deliberately, we should flip default, to avoid issue you've got yesterday. |
…3.6) buckets" This reverts commit 246f8b9.
ResolveErigonDBSettings tests hardcoded true for the resolved/written references flag; assert config3.DefaultReferencesInCommitmentBranches instead so they verify resolve-to-default rather than a literal, and survive flipping the default.
That case asserts the explicit file value is kept, not the config default.
|
Re: the deterministic classifier — two clarifications. Non-uniformity doesn't sink it. Agreed that classifying by the first key-bearing branch is unsound (referenced files do carry plain, sibling-touched branches). But the classifier I mean is the one already in this PR — "Same cost or cheaper" is backwards for the referenced case. The sampler doesn't seek — it The actual gap is the merge path with the flag off. Reads are fine — the per-branch guard derefs a shortened key even on a sampled-plain file. But |
yperbasis
left a comment
There was a problem hiding this comment.
A few minor, non-blocking notes inline. The one blocking item — the merge-path false-negative when references_in_commitment_branches is off — is in the thread above, not here.
| a.ForTestReplaceKeysInValues(kv.CommitmentDomain, false) | ||
| // Capture the resolved flag before we temporarily flip it off for the rebuild loop; | ||
| // the squeeze gate below uses the captured value, not process-global schema state. | ||
| wantsReferencesInBranches := a.Cfg(kv.CommitmentDomain).ReferencesInCommitmentBranches |
There was a problem hiding this comment.
a.Cfg(...) copies DomainCfg by value without taking commitmentRefsMu, while writers (applyReferencesInCommitmentBranches, ForTestReferencesInCommitmentBranches) hold it. Offline-rebuild only, so low real risk — but it's an unsynchronized read of the mutex-guarded field, and -race would flag it if a rebuild ever overlaps a settings reload. Same at :833. Suggest reading through referencesInCommitmentBranches().
- Consolidate the len!=20/52 shortened-key check into commitment.BranchData.HasShortenedKeys; callers in state, integrity and cmd/commitment-regime delegate to it. - Guard the two remaining RebuildCommitmentFiles flag reads with the referencesInCommitmentBranches accessor (-race). - RefsInCommitmentBranches resolves nil to config3.DefaultReferencesInCommitmentBranches; drop the now-redundant resolve-time normalization. - Integrity skip log: 'no shortened keys found (full scan)' instead of 'sampled as plain'.
Race-test — data-race fix for the runtime commitment-refs flag (intermediate)Context: this PR makes Unit ( Live soak — sepolia archive,
Intermediate — the re-exec is still in progress under |
erigondb.tomlcarries the commitment "references in branches" regime (references_in_commitment_branches,*bool, absent →true), replacing theAggregatorSqueezeCommitmentValuesbuild const. The regime now ships with the snapshots, not the binary.Default
true⇒ no behavior change — safe to merge to main as-is. Turning the plain regime on (defaultfalse+ erigon36 webseeds + release/3.6) is a separate switch, #21376.Detecting a file's regime (content sample)
A commitment
.kvis "referenced" if any branch value carries a shortened (varint file-offset) key instead of a full one. Detected at decompressor open and cached onFilesItem.referenced.commitmentFileReferenced(domain_committed.go):totalPairs = decompressor.Count() / 2.commitmentReferenceSampleCount = 200evenly-spread pairs,stride = totalPairs / 200(min 1) — skips the rest.length.Addr(20, account) /length.Addr+length.Hash(52, storage) is an offset ⇒ referenced; first hit returns early.New
.kvstampDataKV.Current(v2.1) regardless of regime;MinSupportedv1.0, so v1.0/v2.0/v2.1 all load.Read / merge / flag
replaceShortenedKeysInBranch): derefs iff the file is referenced; a per-branch guard independently derefs any single branch carrying a shortened key — the file-level sample drives the fast path but never gates correctness.ReloadErigonDBSettingswrites it while the merge loop reads it (findMergeRange/mergeFiles/vt); verified under-race.Behavior
true: keeps writing referenced, derefs old. No migration.erigondb.toml(false) wins; merges produce plain.DataKV.Current < v2.1) rejects v2.1 viaMustSupport— loud panic, not silent misread.Validation
sepolia archive,
references_in_commitment_branches = falseon a v1.1-referenced datadir: re-executed the gap, caught up, follows live tip — derefs referenced inputs, produces plain merges, 0 crit / root-mismatch / panic over 1d+. Also soaked under-race(0 races). Test-report comment below.Known / open
CheckCommitmentKvDereffull-scan. A deterministic first-key scan instead of sampling is under review discussion.erigondb.tomlvs auto-converge-on-restart (db/state: move to noref commitment files production, works with 3.4 ref commitment snapshots #21814) — separate.ReplaceKeysInValues → ReferencesInCommitmentBranchesrename is mechanical.