Skip to content

txhash: cold-store streamhash index — build + read (#728)#780

Draft
tamirms wants to merge 1 commit into
events-eventstore-corefrom
txhash-cold-store-728
Draft

txhash: cold-store streamhash index — build + read (#728)#780
tamirms wants to merge 1 commit into
events-eventstore-corefrom
txhash-cold-store-728

Conversation

@tamirms

@tamirms tamirms commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Part of #728 — implements the buildable slice: the cold (immutable-file) half of the txhash store.

Base: stacked on events-eventstore-core (#756) for github.com/stellar/streamhash. Retarget to feature/full-history once #756 lands.

What this adds

A per-index streamhash MPHF over (txhash, ledgerSeq) and a ColdReader that resolves a tx hash to the ledger it was committed in.

  • cold_format.go — per-index on-disk format: 3-byte ledger-seq payload (offset from a 4-byte MinLedger anchor), 1-byte fingerprint. ColdReader.Lookup(hash) -> ledgerSeq, ErrNotFound on miss, idempotent mmap Close; rejects a payload width ≠ ColdPayloadSize at open.
  • cold_index.go + cold_merge.goBuildColdIndex merges per-chunk sorted .bin files into one streamhash index via a parallel O_DIRECT fan-in merge (ported from streamhash cmd/bench), fed single-pass into the sorted-mode builder so I/O, merge CPU, and the MPHF build overlap. Header/size + payload-budget guards, ctx cancellation, first-error-wins pipeline.
  • odirect_{linux,other}.go — O_DIRECT page-cache bypass on Linux (cached-open fallback on tmpfs/EINVAL; no-op elsewhere).

Tuning (benchmark-driven: warm macOS + cold Linux NVMe over 382M real keys)

  • streamhash block-build workers default to NumCPU/2 (the end-to-end gate; ~2.7× over serial).
  • merge leaves capped at NumCPU/2 (not a fixed 32): NumCPU/2 leaves + NumCPU/2 builders fill the cores without oversubscription — ~+18% e2e on cold NVMe vs NumCPU, neutral on warm.
  • mergeBatchSize 16384 (~+5–7% e2e).
  • k1 merge tiebreak kept: k0-only was ~8% faster but the index is built rarely/offline, and k1 keeps byte-reproducibility resting only on streamhash being deterministic-given-input.
  • Sorted k-way merge kept over NewUnsortedBuilder (measured 1.7–6× faster for the pre-sorted inputs); loser-tree / 4-ary heap / wider fan-in rejected with cold data.

Out of scope (blocked on unbuilt deps)

Tests / benchmarks

Build/query round-trip, miss, concurrent reads, fan-in tree, large-file refill, and error/format guards. Benchmarks (warm + skip-guarded real-data) document and reproduce the tuning choices. go test -race + golangci-lint clean.

🤖 Generated with Claude Code

Implements the buildable slice of #728: the cold (immutable-file) half of
the txhash store — a per-index streamhash MPHF over (txhash, ledgerSeq) and
a ColdReader that resolves a tx hash to the ledger it was committed in.

- cold_format.go: per-index format — 3-byte ledger-seq payload (offset from
  a 4-byte MinLedger anchor), 1-byte fingerprint; ColdReader.Lookup(hash) ->
  ledgerSeq, ErrNotFound on miss, idempotent mmap Close. Rejects a payload
  width other than ColdPayloadSize at open.
- cold_index.go + cold_merge.go: BuildColdIndex merges per-chunk sorted .bin
  files into one streamhash index via a parallel O_DIRECT fan-in merge
  (ported from streamhash cmd/bench), fed single-pass into the sorted-mode
  builder so I/O, merge CPU, and the MPHF build overlap. Header/size and
  payload-budget guards; ctx cancellation; first-error-wins pipeline.
- odirect_{linux,other}.go: O_DIRECT page-cache bypass on Linux, cached-open
  fallback on tmpfs/EINVAL, no-op elsewhere.

Benchmark-driven tuning (warm macOS + cold Linux NVMe over 382M real keys):
- streamhash block-build workers default to NumCPU/2 (the e2e gate; ~2.7x
  over serial, saturates at NumCPU/2).
- merge leaves capped at NumCPU/2 (not a fixed 32): NumCPU/2 leaves +
  NumCPU/2 builders fill the cores without oversubscription; ~+18% e2e cold
  vs NumCPU, neutral warm.
- mergeBatchSize 16384 (~+5-7% e2e, fewer channel hand-offs).
- k1 merge tiebreak kept: keying on k0 only was ~8% faster but the index is
  built rarely/offline, and the tiebreak makes byte-reproducibility rest on
  streamhash being deterministic-given-input rather than insensitive to the
  within-block order of same-prefix keys.
- Sorted k-way merge kept over NewUnsortedBuilder (measured 1.7-6x faster for
  the pre-sorted inputs); loser-tree / 4-ary heap / wider fan-in rejected
  with cold data.

Out of scope (blocked on unbuilt deps): production build wiring to the cold
txhash ingester (#765) and the getTransaction read assembly over the
tx-details-by-hash view (#764) + cold ledger reader (#725). The .bin input
format is the documented seam for #765.

Tests cover build/query round-trip, miss, concurrent reads, fan-in tree,
large-file refill, and error/format guards. Benchmarks (warm + skip-guarded
real-data) document and reproduce the tuning choices.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant