Skip to content

db/state, db/config3: erigondb.toml as source of truth for commitment branch referencing#21452

Open
awskii wants to merge 41 commits into
mainfrom
awskii/erigondb-commitment-refs-flag
Open

db/state, db/config3: erigondb.toml as source of truth for commitment branch referencing#21452
awskii wants to merge 41 commits into
mainfrom
awskii/erigondb-commitment-refs-flag

Conversation

@awskii

@awskii awskii commented May 27, 2026

Copy link
Copy Markdown
Member

erigondb.toml carries the commitment "references in branches" regime (references_in_commitment_branches, *bool, absent → true), replacing the AggregatorSqueezeCommitmentValues build const. The regime now ships with the snapshots, not the binary.

Default true ⇒ no behavior change — safe to merge to main as-is. Turning the plain regime on (default false + erigon36 webseeds + release/3.6) is a separate switch, #21376.

Detecting a file's regime (content sample)

A commitment .kv is "referenced" if any branch value carries a shortened (varint file-offset) key instead of a full one. Detected at decompressor open and cached on FilesItem.referenced.

commitmentFileReferenced (domain_committed.go):

  • totalPairs = decompressor.Count() / 2.
  • Inspects up to commitmentReferenceSampleCount = 200 evenly-spread pairs, stride = totalPairs / 200 (min 1) — skips the rest.
  • A branch key whose length ≠ length.Addr (20, account) / length.Addr+length.Hash (52, storage) is an offset ⇒ referenced; first hit returns early.
  • Corrupt word stream (panic) ⇒ treated as referenced, never plain.
  • Runs at every open site (open / build / merge), commitment domain only.

New .kv stamp DataKV.Current (v2.1) regardless of regime; MinSupported v1.0, so v1.0/v2.0/v2.1 all load.

Read / merge / flag

  • Read (replaceShortenedKeysInBranch): derefs iff the file is referenced; a per-branch guard independently derefs any single branch carrying a shortened key — the file-level sample drives the fast path but never gates correctness.
  • Merge: expands a referenced input's offsets to plain per-input (even with the flag off — offsets are never copied verbatim); re-shortens the output only when the flag is on AND output range ≥ threshold. The account/storage→commitment barrier fires only when the merge needs the merged files; plain merges run in parallel.
  • Flag: governs new writes only (whether a merge re-shortens), never reads. Mutex-guarded — ReloadErigonDBSettings writes it while the merge loop reads it (findMergeRange / mergeFiles / vt); verified under -race.

Behavior

  • In-place upgrade (existing datadir, field absent) → true: keeps writing referenced, derefs old. No migration.
  • Plain set + downloader → downloaded erigondb.toml (false) wins; merges produce plain.
  • Flipping on a populated datadir is correct either way: per-file by content, so mixed referenced/plain files coexist and converge to the live regime lazily through merges.
  • Old binary (DataKV.Current < v2.1) rejects v2.1 via MustSupport — loud panic, not silent misread.

Validation

sepolia archive, references_in_commitment_branches = false on a v1.1-referenced datadir: re-executed the gap, caught up, follows live tip — derefs referenced inputs, produces plain merges, 0 crit / root-mismatch / panic over 1d+. Also soaked under -race (0 races). Test-report comment below.

Known / open

  • The file-level detector is a bounded sample, so a sparsely-referenced file could read as plain. Covered by the per-branch read guard (loud crit, never silent) and the offline CheckCommitmentKvDeref full-scan. A deterministic first-key scan instead of sampling is under review discussion.
  • erigondb.toml vs auto-converge-on-restart (db/state: move to noref commitment files production, works with 3.4 ref commitment snapshots #21814) — separate.
  • ReplaceKeysInValues → ReferencesInCommitmentBranches rename is mechanical.

awskii added 19 commits May 26, 2026 14:25
…ranches

Mechanical rename of the commitment-branch referencing flag (field + the
ForTest* toggle) repo-wide; no behavior change. Existing tests are the safety
net. Behavioral decoupling of the merge-scheduling guards is a later task.
… aggregator

Apply the erigondb.toml-resolved commitment-refs flag to the global statecfg.Schema
and the live commitment domain via Aggregator.applyReferencesInCommitmentBranches,
called from ReloadErigonDBSettings and from the WithErigonDBSettings -> Open builder
path (covering all standalone entry points).
Add a version.Version field to FilesItem, populated in Domain.openDirtyFiles
from the version parsed by MatchVersionedFile, and expose it via a
visibleFile.Version() accessor. This makes a per-file regime marker reachable
at the commitment deref/merge sites (Tasks 7/8) via the existing range-match
loop, without extending getLatestFromFiles.
…sion (Task 6)

Bump commitment domain DataKV.Current to v2.1 in versions.yaml (regenerated
version_schema_gen.go) so the read ceiling accepts both v2.0 referenced and
v2.1 plain commitment files (range [v1.0, v2.1]).

Add Domain.kvWriteVersion: the commitment .kv write version is decoupled from
Current — v2.0 when ReferencesInCommitmentBranches is on, v2.1 when off. Other
domains keep writing at DataKV.Current.
…nditionally (Task 9)

Decouple the commitment merge transformer's expand and re-shorten decisions so
flipping references_in_commitment_branches off on a populated datadir stays correct:

- vt closure: expand input short keys to plain whenever the input file is
  referenced (its own version+range), independent of the live flag; re-shorten
  into the merged output only when the flag is on AND the output range reaches
  the threshold. Never return a referenced valBuf as-is — that copied stale
  offsets into the merged file.
- evaluate the threshold against the input file range for expansion and the
  output range for re-shortening (previously conflated on the output range).
- SqueezeCommitmentFiles now writes its re-referenced output to the flag-derived
  kvNewFilePath (v2.0) rather than reusing the input filename. A rebuild writes
  plain v2.1 files during its flag-off window; reusing that name for squeezed
  referenced content produced a v2.1-named file with short keys, which reads as
  garbage after a disk reload.

Tests (commitment_merge_version_test.go):
- TestCommitmentMergeFlagOffExpandsReferencedInputs: disjoint early-only keys
  keep v2.0 referenced branches as merge winners; flag-off merge must produce a
  v2.1 plain file with no short offsets.
- TestCommitmentRebuildSqueezeReadableAfterReload: full rebuild→squeeze cycle
  reopened from disk reads back to the same root with version/content consistent.
…cks (Task 10)

Integrity checks decided whether a commitment file carried short references by range
alone (ValuesPlainKeyReferencingThresholdReached). With v2.0-referenced and v2.1-plain
files coexisting, a v2.1 file at/above the threshold was wrongly treated as referencing
and silently under-verified.

Expose Version() on the kv.VisibleFile interface, export
state.CommitmentBranchReferenced, and route the three integrity decision sites through
commitmentFileReferencing(file, stepSize) so they consult the file's own version+range.
- db/state/aggregator.go: gate applyReferencesInCommitmentBranches writes on
  actual change so the live commitment-domain flag read by background merges is
  mutated only at first resolution (before any merge spawns), never concurrently
- db/state/merge.go: stamp merged commitment file's in-memory version from
  kvWriteVersion so freshly-merged files are classified by regime without a
  folder reopen (was zero-version, over-approximating as referenced until restart)
- db/state/commitment_merge_version_test.go: add TestCommitmentReadDerefsReferencedFileWithFlagOff
  (reads a referenced v2.0 branch with the flag off through the real deref path and
  asserts short refs expand to plain — the central safety claim, previously only
  covered by self-consistent root equality) and TestMergedCommitmentFileVersionStampedInMemory;
  both mutation-verified red->green
- docs/plans: correct the downgrade caveat — an old binary hard-rejects v2.1 via
  MustSupport panic, it does not silently misread
integrateDirtyFiles left FilesItem.version at the zero value, unlike
mergeFiles and openDirtyFiles. The version-aware commitment read/merge
logic treats a zero version as referenced (below v2.1), so a freshly
built plain commitment file misclassified as referenced until restart,
forcing the read down the deref/range-lookup path it should skip.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates the commitment "references in branches" setting from a build-time constant (AggregatorSqueezeCommitmentValues) to a per-datadir field in snapshots/erigondb.toml (references_in_commitment_branches, defaults to true). Reads become per-file version-driven (a commitment .kv is referenced iff its version < v2.1 and its range reaches the referencing threshold), while writes follow the live flag. The commitment DataKV version is bumped v2.0 → v2.1, with the read window accepting both and the write version flag-derived.

Changes:

  • Add *bool ReferencesInCommitmentBranches to ErigonDBSettings, thread it via WithErigonDBSettings/ReloadErigonDBSettings into statecfg.Schema and the live commitment domain; rename DomainCfg.ReplaceKeysInValues → ReferencesInCommitmentBranches and the test helper.
  • Decouple commitment .kv write version from DataKV.Current (flag-on → v2.0, flag-off → v2.1); persist per-file parsed version on FilesItem, expose via VisibleFile.Version(), stamp newly merged/integrated files; squeeze writes to a flag-derived kvNewFilePath rather than reusing the input name.
  • Make read deref, integrity checks, and merge planning/transformer use the per-file version+range predicate (CommitmentBranchReferenced), so referenced inputs are always expanded even with the flag off and plain v2.1 files are never re-dereferenced.

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated no comments.

Show a summary per file
File Description
db/config3/config3.go Adds DefaultReferencesInCommitmentBranches = true.
db/state/erigondb_settings.go Adds the new TOML field, accessor with default, normalizes/writes it in resolution paths.
db/state/erigondb_settings_test.go New tests for roundtrip, absent-field normalization, legacy/fresh write behavior.
db/state/aggregator2.go Plumbs referencesInCommitmentBranches through AggOpts/Open and WithErigonDBSettings.
db/state/aggregator.go Renames ForTest… helper, adds applyReferencesInCommitmentBranches, gates merge planning on resolved-referencing predicate.
db/state/aggregator_refs_test.go New tests for reload/open propagation of the flag.
db/state/aggregator_test.go New tests for commitmentMergeInputsReferenced / commitmentVisibleFilesReferenced.
db/state/aggregator_fuzz_test.go Rename of ForTestReplaceKeysInValues.
db/state/statecfg/statecfg.go Renames ReplaceKeysInValues field on DomainCfg.
db/state/statecfg/state_schema.go Removes AggregatorSqueezeCommitmentValues, wires default into the commitment domain.
db/state/statecfg/state_schema_test.go Asserts default and schema agree.
db/state/statecfg/versions.yaml, version_schema_gen.go Bump commitment DataKV current to v2.1.
db/state/domain.go Adds kvWriteVersion(); stamps integrated file version.
db/state/domain_test.go New tests for write-version selection and acceptance window.
db/state/domain_committed.go New CommitmentBranchReferenced predicate plus visible/merge/file helpers; version-aware deref and merge transformer.
db/state/domain_committed_test.go New table tests for CommitmentBranchReferenced.
db/state/dirty_files.go FilesItem.version field; populated on open; visibleFile.Version() implementation.
db/state/dirty_files_test.go New tests for visibleFile.Version() and openDirtyFiles version population.
db/state/merge.go Stamps merged output version.
db/state/squeeze.go Renames flag usage; squeeze output now uses kvNewFilePath.
db/state/squeeze_test.go, squeeze_concurrent_rebuild_test.go Rename of ForTest… helper.
db/state/commitment_merge_version_test.go, commitment_version_testutil_test.go New mixed-regime / merge / rebuild+squeeze tests and shared helpers.
db/test/aggregator_ext_test.go Helper rename.
db/state/trie_reader_integration_test.go Helper rename.
db/kv/visible_file.go Adds Version() to VisibleFile interface.
db/integrity/commitment_integrity.go Switches integrity decisions to version-aware predicate; updated log messages.
db/integrity/commitment_version_test.go, commitment_version_integration_test.go New unit + integration coverage of version-regime behavior.
cmd/integration/commands/commitment.go Helper rename in rebuild path.
db/agents.md Documents snapshots/erigondb.toml fields and read/write semantics.
.claude/skills/erigondb-sync-integration-test-plan/SKILL.md Updates plan with new field, producer workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

awskii added 8 commits May 27, 2026 22:01
Resolve 3 conflicts from main's overlapping refactors:
- db/state/aggregator.go: drop frozenBlocks/FrozenBlocksProvider (removed
  repo-wide by #21415 block-catchup-recovery), keep commitmentRefsOverride.
- db/state/domain_committed.go: keep both the PR's reshorten gate and
  #21303's per-merge findShortenedKey caches; the vt closure uses all three.
- db/state/squeeze.go: keep the PR rename ForTestReferencesInCommitmentBranches
  and the wantsReferencesInBranches guard over main's old ForTestReplaceKeysInValues.
…ersion

End-to-end guard for the two upgrade scenarios: an in-place upgrade (erigondb.toml
without references_in_commitment_branches -> default true) keeps producing v2.0
referenced commitment files; a downloaded plain snapshot set (references=false) produces
v2.1 plain files. Existing tests covered each link (resolution->live-domain bool, and
domain bool->kvWriteVersion) separately but not the datadir-shape->produced-version chain.
If a datadir ends up with both a v2.0 (referenced) and v2.1 (plain) commitment file
for the same step range (e.g. a download landing the other regime's file), openDirtyFiles
must collapse them to one dirty item resolved via MatchVersionedFile to the highest version
(v2.1), leaving the v2.0 twin on disk but unopened. Asserted order-independent.
…mmitment (fix Windows CI)

wipeCommitment dir.RemoveFile'd commitment .kv files while the aggregator still
mmapped them. Unix allows unlink-while-open, but Windows refuses (file in use),
failing TestCommitmentRebuildSqueezeReadableAfterReload on the windows-2025 test legs.
Release the handles via agg.Close() before deleting, then reopen a fresh agg over the
wiped folder and return the new (db, agg); all 5 call sites updated to capture them.
kvWriteVersion special-cased the commitment domain by name and inlined the v2.0/v2.1
literals. Replace it with an optional KVWriteVersion func(*DomainCfg) on DomainCfg: the
generic method delegates to the hook (else DataKV.Current), and the commitment regime->version
mapping is declared once next to the schema. The hook reads the live ReferencesInCommitmentBranches
flag, preserving single-source-of-truth (the flag is toggled at runtime during rebuild/squeeze,
so the version must stay derived, not stored).
@awskii

awskii commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

bucket erigon36-v1-snapshots-sepolia seems to be non-existent @sudeepdino008

@sudeepdino008

Copy link
Copy Markdown
Member

bucket erigon36-v1-snapshots-sepolia seems to be non-existent @sudeepdino008

I'll do the release. AFAIK you did noref on all chains.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Comment on lines +99 to +105
valBuf, _ = r.Next(valBuf[:0])
if bytes.Equal(keyBuf, commitmentdb.KeyCommitmentState) {
continue
}
if branchHasShortenedKey(valBuf) {
return true
}

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes — three issues, all on the merge/concurrency path.

1. Data race on ReferencesInCommitmentBranches (blocker). applyReferencesInCommitmentBranches writes a.d[kv.CommitmentDomain].ReferencesInCommitmentBranches unsynchronized (called from ReloadErigonDBSettingsSpawnStageSnapshots, stage_snapshots.go:210). The background MergeLoop goroutine (backend.go) reads it concurrently in findMergeRange/mergeFiles (via at.a.Cfg(...) — the struct copy still reads the bool unsynchronized) and in commitmentValTransformDomain's vt closure (dt.d.ReferencesInCommitmentBranches). The write is skipped on the default true→true path, so it's invisible until the flag actually flips at runtime — i.e. a plain-snapshot consumer with a downloader, which is the feature's main use case. Please make the flag an atomic.Bool read through an accessor in all merge/squeeze paths (or quiesce merges around the reload), and confirm under -race.

2. Merge-path sampling false-negative. Please switch commitmentFileReferenced to the deterministic classifier discussed in review (scan to the first key-bearing branch, classify by key length). A file's regime is uniform, so this is exact and typically cheaper than 200 strided samples — and it closes the only un-caught corruption vector: a missed referenced input merged with the flag off copies stale offsets verbatim. Reads are covered by the per-branch guard; merges are not.

3. Per-read cost on the plain regime. Once #2 makes referenced trustworthy, drop the per-branch branchHasShortenedKey(branch) parse from replaceShortenedKeysInBranch's fast path. As written, every read of a plain file — the regime this PR converges everyone toward — pays a full ReplacePlainKeys walk that main short-circuited on a cheap boolean.

Minor: RebuildCommitmentFiles now runs the squeeze pass whenever the flag is on regardless of squeeze=false — correct for the new model, but worth a note since it changes the parameter's meaning.

@sudeepdino008

Copy link
Copy Markdown
Member

about upgrade (without seg reset) from 3.4

  1. #21814 (comment)
  2. consequently, commitment.kv version will be 2.1 and they'll be ref
  3. independent of the above 2 issues, we need to update erigondb.toml on snapshotter to include the new flag.

Sepolia node:

-r--r--r-- 1 ubuntu ubuntu  44G Dec 12  2025 v1.1-commitment.0-512.kv
-rw------- 1 ubuntu ubuntu  19G Mar  1 14:41 v1.1-commitment.512-576.kv
-rw------- 1 ubuntu ubuntu  18G Jun 17 01:00 v2.1-commitment.576-640.kv
-rw------- 1 ubuntu ubuntu 4.1G Jun 17 01:58 v2.1-commitment.640-656.kv
-rw------- 1 ubuntu ubuntu 2.1G Jun 17 21:00 v2.1-commitment.656-660.kv

first two are old referenced, v2.1-commitment.576-640.kv is the first ref->plain merge, latter 2 were built plain

those v2.1 files are referenced, I'm 100% sure (also because I just ran a sepolia upgrade test)

awskii added 2 commits June 18, 2026 19:11
…kets

Switch EmbeddedWebseeds for all chains from erigon34 to erigon36 buckets and
DefaultSnapshotGitBranch from release/3.4 to release/3.6, so OtterSync pulls the
v3.6 all-plain snapshot dataset.
…ferenced

Greedy scanner: reads every branch value in a commitment domain .kv and reports
referenced (first shortened key) or plain (all full keys).
@awskii

awskii commented Jun 18, 2026

Copy link
Copy Markdown
Member Author
  1. Merge-path sampling false-negative. Please switch commitmentFileReferenced to the deterministic classifier discussed in review (scan to the first key-bearing branch, classify by key length). A file's regime is uniform, so this is exact and typically cheaper than 200 strided samples — and it closes the only un-caught corruption vector: a missed referenced input merged with the flag off copies stale offsets verbatim. Reads are covered by the per-branch guard; merges are not.

Nope. Guess is wrong as well, there is no uniformity (if referenced - every branch data MAY have one or more shortened keys, but could be plain - touched as a siblings

@sudeepdino008 i added tool to check plain/referenced files cmd/commitment-regime in this branch. and it reports clearly:

v1.1-commitment.0-512.kv                 referenced (first shortened key at pair 4 of 4)
v1.1-commitment.512-576.kv               referenced (first shortened key at pair 4 of 4)
v2.1-commitment.576-640.kv               plain (scanned all 57763381 pairs)
v2.1-commitment.640-656.kv               plain (scanned all 11486172 pairs)
v2.1-commitment.656-660.kv               plain (scanned all 5092803 pairs)

@awskii

awskii commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@sudeepdino008 @lystopad

im still getting 404 on downloading snapshots, after setting branch to release/3.6 and updating webseed url to 36:

WARN[06-18|12:18:51.901] [Downloader] error fetching metainfo from webseeds err="all webseed urls failed. last error: fetching from \"https://erigon36-v1-snapshots-sepolia.erigon.network/v1.1-010340-010350-bodies.seg.torrent\": unexpected http response status code: 404" name=v1.1-010340-010350-bodies.seg infohash=68bd975960b0f29a90d0b59e318e6ecd7c2c6255
WARN[06-18|12:18:51.903] [Downloader] error fetching metainfo from webseeds err="all webseed urls failed. last error: fetching from \"https://erigon36-v1-snapshots-sepolia.erigon.network/v2.0-010350-010360-bodies.idx.torrent\": unexpected http response status code: 404" name=v2.0-010350-010360-bodies.idx infohash=2f92157ebeaaeca848b09e4610653f347e22b001

@awskii

awskii commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

requested files are not exist on R2 bucket, some discrepancy on publishing block segments for very recent steps. seg retire bug?

waiting for https://github.com/erigontech/erigon-snapshots-automation/actions/runs/27747302801 to finish

@sudeepdino008

Copy link
Copy Markdown
Member

sepolia is uploading now (first time), probably bucket is empty rn

ReferencesInCommitmentBranches is flipped by ReloadErigonDBSettings (snapshots
stage) while the background merge loop reads it in findMergeRange, mergeFiles and
the commitmentValTransformDomain transformer — an unsynchronized read/write that
surfaces when a downloader flips the flag at runtime (the plain-snapshot consumer
path). Route every read through a locked accessor, take the lock on every write,
and pass the flag into commitmentValTransformDomain since the transformer has no
aggregator handle. Add a -race regression test.
@awskii

awskii commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

erigontech/erigon-snapshot#1324 after it i was able to sync correctly and continue execution

i suppose to cherry pick 246f8b9 into #21376 so release changes will be only there while this carrying right wiring.

Also worth setting referencing enabled to true by default i think

@AskAlexSharov

AskAlexSharov commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

@awskii #21376 has wrong base branch (3.4 will not able to open 3.6 files - new .bt/.kvi) - if you want release non-referenced .kv to 3.4 better do it from 3.4 snapshotter machines

@sudeepdino008

Copy link
Copy Markdown
Member

Also worth setting referencing enabled to true by default i think

shouldn't it be false (for the upgrade to produce plain files)

@awskii

awskii commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

@AskAlexSharov #21376 supposed to be merged in release/3.6 because raises version and we wanted it to be separate, later, as we agreed

this (awskii/erigondb-commitment-refs-flag) branch is used to test "what it is like on 3.6 erigon" but since we are going to merge it in main - those 3.6 changes has to be moved in 21376

@sudeepdino008

shouldn't it be false (for the upgrade to produce plain files)

right, but to continue working in main with 3.4 until we ENABLE this change deliberately, we should flip default, to avoid issue you've got yesterday.

awskii added 2 commits June 19, 2026 15:48
ResolveErigonDBSettings tests hardcoded true for the resolved/written
references flag; assert config3.DefaultReferencesInCommitmentBranches instead so
they verify resolve-to-default rather than a literal, and survive flipping the
default.
That case asserts the explicit file value is kept, not the config default.
@yperbasis

Copy link
Copy Markdown
Member

Re: the deterministic classifier — two clarifications.

Non-uniformity doesn't sink it. Agreed that classifying by the first key-bearing branch is unsound (referenced files do carry plain, sibling-touched branches). But the classifier I mean is the one already in this PR — commitmentFileReferencing (integrity) and commitment-regime's scan(): scan to the first shortened key anywhere, or EOF. That's exact regardless of uniformity.

"Same cost or cheaper" is backwards for the referenced case. The sampler doesn't seek — it Skip-walks the stride grid, so it can't confirm a file referenced until it reaches a sampled pair that happens to carry a shortened key (pair ≈ stride, ~287k on the 0-512 file). The exact scan stops at the first shortened key anywhere — pair 4 in your own output above. So exact is cheaper on referenced files (today's default), and both are a single full pass on plain files. The sampler's only edge is Skip vs Next on that plain-file pass — a constant factor, not an order.

The actual gap is the merge path with the flag off. Reads are fine — the per-branch guard derefs a shortened key even on a sampled-plain file. But mergeFiles gates the transformer on the file-level verdict alone (commitmentMergeNeedsTransform), so a missed referenced input + flag off copies stale offsets verbatim into the merged file. That's the producer path, where nobody reads those branches until a consumer does. Swapping commitmentFileReferenced → the exact scan when the flag is off (the only regime that can corrupt) closes it with code we already trust, and is faster on the referenced files we have today.

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor, non-blocking notes inline. The one blocking item — the merge-path false-negative when references_in_commitment_branches is off — is in the thread above, not here.

Comment thread db/state/squeeze.go Outdated
a.ForTestReplaceKeysInValues(kv.CommitmentDomain, false)
// Capture the resolved flag before we temporarily flip it off for the rebuild loop;
// the squeeze gate below uses the captured value, not process-global schema state.
wantsReferencesInBranches := a.Cfg(kv.CommitmentDomain).ReferencesInCommitmentBranches

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.Cfg(...) copies DomainCfg by value without taking commitmentRefsMu, while writers (applyReferencesInCommitmentBranches, ForTestReferencesInCommitmentBranches) hold it. Offline-rebuild only, so low real risk — but it's an unsynchronized read of the mutex-guarded field, and -race would flag it if a rebuild ever overlaps a settings reload. Same at :833. Suggest reading through referencesInCommitmentBranches().

Comment thread db/state/erigondb_settings.go Outdated
Comment thread db/integrity/commitment_integrity.go Outdated
Comment thread cmd/commitment-regime/main.go Outdated
- Consolidate the len!=20/52 shortened-key check into commitment.BranchData.HasShortenedKeys; callers in state, integrity and cmd/commitment-regime delegate to it.
- Guard the two remaining RebuildCommitmentFiles flag reads with the referencesInCommitmentBranches accessor (-race).
- RefsInCommitmentBranches resolves nil to config3.DefaultReferencesInCommitmentBranches; drop the now-redundant resolve-time normalization.
- Integrity skip log: 'no shortened keys found (full scan)' instead of 'sampled as plain'.
@awskii

awskii commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

Race-test — data-race fix for the runtime commitment-refs flag (intermediate)

Context: this PR makes references_in_commitment_branches flippable at runtime. ReloadErigonDBSettings (snapshots stage) writes it while the background merge loop reads it (findMergeRange / mergeFiles / the commitmentValTransformDomain vt closure) — an unsynchronized read/write that only surfaces once the flag actually flips (the plain-snapshot-consumer path). Fixed by guarding the field with a mutex + accessor.

Unit (-race): TestReferencesInCommitmentBranchesConcurrent — without the lock it reports WARNING: DATA RACE (write in applyReferencesInCommitmentBranches vs read in referencesInCommitmentBranches); with it, passes.

Live soak — sepolia archive, go build -race, references_in_commitment_branches = false on a v1.1-referenced datadir:

  • Startup + the flag-write (ReloadErigonDBSettings) + caplin CL catch-up + live execution at tip: 0 DATA RACE.
  • Then seg rm-state (drop latest ~2 steps) + integration stage_exec --reset + restart under -race --no-downloader: re-executing the trimmed gap and rebuilding plain v2.1 commitment files. ~5 h in, ~block 11.089M climbing back to tip (~1–2 blk/s, race-instrumented). 0 DATA RACE, 0 crit / root-mismatch / panic.

Intermediate — the re-exec is still in progress under -race; will update once it reaches tip and resumes live-follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants