Epic: Tiered backtest storage (rewrite the storage layer)
Epic. Replace the single-file .iafbt storage primitive with a tiered backend (relational index + columnar Parquet bulk + content-addressed chunks). Demote .iafbt to a deterministic export format. Keep the OSS path local-first and self-contained.
Full design: docs/design/tiered-backtest-storage.md
Companion specs: docs/design/bundle-format-v2.md, docs/design/ohlcv-dedup-protocol.md
Why now
Empirical measurements on a real production-shape archive (12,500 bundles ≈ 64 GB, 10 May 2026) showed that per-file format work has hit its ceiling:
| Configuration |
Per bundle |
Total |
Notes |
| v1, zstd 7 (pre-v8.9) |
569 KB |
64.0 GB |
baseline |
| v2 + zstd 19 (v8.9, shipped) |
489 KB |
~55 GB |
per-file ceiling |
| Tiered store + content-addressed dedup |
n/a (decomposed) |
< 20 GB projected |
this epic |
Two structural problems remain after v8.9:
- Per-file compression has hit its ceiling. zstd at level 22 saturates at the same size as level 19. Within a single
.iafbt, no remaining headroom.
- The
.iafbt is the wrong primitive for two of the three real workloads — listing/ranking and cross-run analytics. Decoding 12,500 zstd payloads to read 50 scalar metrics each is a multi-minute loop instead of a 50 ms SQL query.
Cross-bundle redundancy (strategy params, symbol metadata, OHLCV slices, recurring trade patterns) is the dominant unexploited source of size, and zstd cannot see across files. The fix lives one layer up.
Architecture (target)
┌─────────────────────────────────────────────────────────────────┐
│ Tier 1 — Index (SQLite locally, Postgres remotely) │
│ One row per backtest run. Scalar metrics + provenance + refs. │
│ Indexed for ranking, filtering, sweep navigation. │
├─────────────────────────────────────────────────────────────────┤
│ Tier 2 — Columnar bulk (Parquet on local disk or object store)│
│ Per-project Parquet datasets, partitioned by run_id: │
│ portfolio_snapshots/ trades/ orders/ metric_series/ │
├─────────────────────────────────────────────────────────────────┤
│ Tier 3 — Content-addressed chunks │
│ SHA-256-keyed, immutable: │
│ ohlcv/ code/ params/ symbols/ metric-series-blobs/ │
└─────────────────────────────────────────────────────────────────┘
.iafbt v2 stays — but as backtest.export("x.iafbt") / Backtest.import_("x.iafbt"), deterministic and round-trippable through any store. It is no longer the storage primitive.
Phases
This epic ships in three phases, each independently mergeable. Each phase is purely additive until phase 3, which deprecates (does not remove) the directory-of-.iafbt default.
Phase 1 — BacktestSummary DTO + scalar-summary read path
Goal: make "list / rank by Sharpe over many bundles" cheap on the existing on-disk shape, without writing anything new.
Risk: low. Pure additive read path. No write changes.
Phase 2 — iaf index CLI + SQLite index
Goal: turn a folder of .iafbt files into a queryable archive without changing how they're written.
Risk: low. SQLite is a derived view — delete and rebuild is always safe.
Phase 3 — BacktestStore abstraction + LocalTieredStore
Goal: introduce the store interface so the framework can write into multiple backends; ship the local tiered backend; demote .iafbt to export format.
Risk: medium. Touches every backtest service constructor. Mitigated by keeping LocalDirStore the default for at least one minor release with a deprecation warning when the old kwargs are used.
Wire format note
The .iafbt v2 produced by phase 0 (already shipped in v8.9) is the only wire format the framework reads or writes for the archive-as-file use case. v1 is readable indefinitely; the v8.9 writer never emits v1.
A remote store implementation (Finterion-side, closed-source, not part of this epic) will negotiate-then-upload chunks via the protocol in docs/design/ohlcv-dedup-protocol.md generalized to all chunk types. The BacktestStore interface from phase 3 is the OSS-side contract that makes that possible.
Non-goals
- Per-bundle Parquet for everything (measured net-negative on real data — see design doc §9)
- Custom binary column format (Parquet is solved; leverage it)
- Lossy snapshot/trade compression (user's data, hands off)
- Schema-on-read JSON anywhere (the original
(value, ISO-string) mistake)
- Removing
LocalDirStore (stays as default for at least one minor cycle after phase 3)
- Cross-tenant or cross-project dedup at the OSS layer (a remote store's concern)
- Remote / cloud store implementation (closed-source, separate)
Tracking
Replaces (closed as superseded):
Why one epic, not three issues: the three phases share schemas, naming, and tests, and shipping them as a single coherent epic makes the deprecation path for .iafbt-as-storage one decision rather than three. Each phase is still independently mergeable as separate PRs against this epic.
Headline
Today's design treats a backtest as a file. The future design treats a backtest as a row that points to chunks, with the file as one of several possible views.
That single shift turns the 64 GB problem into the 20 GB problem, makes "list 12,500 backtests sorted by Sharpe" a 50 ms query instead of a 30-minute decode loop, and unlocks DuckDB/Polars analytics over the entire archive without writing any new code.
Epic: Tiered backtest storage (rewrite the storage layer)
Full design:
docs/design/tiered-backtest-storage.mdCompanion specs:
docs/design/bundle-format-v2.md,docs/design/ohlcv-dedup-protocol.mdWhy now
Empirical measurements on a real production-shape archive (12,500 bundles ≈ 64 GB, 10 May 2026) showed that per-file format work has hit its ceiling:
Two structural problems remain after v8.9:
.iafbt, no remaining headroom..iafbtis the wrong primitive for two of the three real workloads — listing/ranking and cross-run analytics. Decoding 12,500 zstd payloads to read 50 scalar metrics each is a multi-minute loop instead of a 50 ms SQL query.Cross-bundle redundancy (strategy params, symbol metadata, OHLCV slices, recurring trade patterns) is the dominant unexploited source of size, and
zstdcannot see across files. The fix lives one layer up.Architecture (target)
.iafbtv2 stays — but asbacktest.export("x.iafbt")/Backtest.import_("x.iafbt"), deterministic and round-trippable through any store. It is no longer the storage primitive.Phases
This epic ships in three phases, each independently mergeable. Each phase is purely additive until phase 3, which deprecates (does not remove) the directory-of-
.iafbtdefault.Phase 1 —
BacktestSummaryDTO + scalar-summary read pathGoal: make "list / rank by Sharpe over many bundles" cheap on the existing on-disk shape, without writing anything new.
BacktestSummarydataclass — frozen schema for the ~50 scalar metrics + provenance + config columns from design doc §3.1.Backtest.scalar_summary() -> BacktestSummary— readable from a v2 bundle without decoding Parquet metric blobs (already supported bysummary_only=True; this wraps it in a typed return).BacktestSummary→ SQL row payload.BacktestSummaryschema as the authoritative Tier 1 row contract.Risk: low. Pure additive read path. No write changes.
Phase 2 —
iaf indexCLI + SQLite indexGoal: turn a folder of
.iafbtfiles into a queryable archive without changing how they're written.iaf index <dir>walks.iafbtfiles (recursive), populates<dir>/index.sqlitewith one row per run fromscalar_summary().bundle_id+ mtime).iaf list <index.sqlite> --sort sharpe --limit 20iaf rank <index.sqlite> --by sortino --filter 'tag LIKE "sweep_%"'iaf indexbuilds a SQLite fromexamples/batch_one/in < 5 s;iaf list --sort sharpe --limit 20returns in < 100 ms over 12,500-row index.Risk: low. SQLite is a derived view — delete and rebuild is always safe.
Phase 3 —
BacktestStoreabstraction +LocalTieredStoreGoal: introduce the store interface so the framework can write into multiple backends; ship the local tiered backend; demote
.iafbtto export format.BacktestStoreProtocol with:put,get,list,delete,export,import_.LocalDirStore— wraps currentsave_bundle/open_bundle. Default for backwards compatibility.LocalTieredStore:index.sqlite(schema =BacktestSummaryfrom phase 1)chunks/directory per design doc §3.3store=kwarg, defaultLocalDirStore.iaf migrate-store --from local-dir --to local-tiered <src> <dst>one-shot migrator.backtest.export("x.iafbt")reassembles a v2 bundle deterministically from any store.Backtest.import_("x.iafbt")decomposes into the configured store (idempotent onbundle_id).LocalTieredStore.list(sort='sharpe', limit=20)returns in < 100 ms; round-tripLocalDirStore→export→LocalTieredStore.import_→exportis byte-identical (modulo writer timestamp).Risk: medium. Touches every backtest service constructor. Mitigated by keeping
LocalDirStorethe default for at least one minor release with a deprecation warning when the old kwargs are used.Wire format note
The
.iafbtv2 produced by phase 0 (already shipped in v8.9) is the only wire format the framework reads or writes for the archive-as-file use case. v1 is readable indefinitely; the v8.9 writer never emits v1.A remote store implementation (Finterion-side, closed-source, not part of this epic) will negotiate-then-upload chunks via the protocol in
docs/design/ohlcv-dedup-protocol.mdgeneralized to all chunk types. TheBacktestStoreinterface from phase 3 is the OSS-side contract that makes that possible.Non-goals
(value, ISO-string)mistake)LocalDirStore(stays as default for at least one minor cycle after phase 3)Tracking
Replaces (closed as superseded):
v8.10: read-side index + scalar-summary access (tiered storage phase 1) #538 v8.10 read-side indexv8.11: BacktestStore abstraction + LocalTieredStore (tiered storage phase 2) #539 v8.11 BacktestStore + LocalTieredStoreWhy one epic, not three issues: the three phases share schemas, naming, and tests, and shipping them as a single coherent epic makes the deprecation path for
.iafbt-as-storage one decision rather than three. Each phase is still independently mergeable as separate PRs against this epic.Headline