perf(store): shrink the per-key object - box ValueRepr variants (mem rounds 1-2) by ELares · Pull Request #262 · ELares/IronCache

ELares · 2026-06-16T07:54:41Z

First two rounds of the campaign to beat redis 8.8.0 on memory (and not lose speed). Pure layout shrinks of the per-key object, zero behavior change.

Round 1: box the 4 collection ValueRepr variants (List/Hash/Set/ZSet -> Box). ValueRepr 72->48, KvObj 112->88, slot 128->104. SSO preserved.
Round 2: box the embstr inline buffer (Inline(InlineBuf) -> Inline(Box<[u8]>)). ValueRepr 48->24, KvObj 88->64, slot 104->80. Allocation-parity with redis.

Measured vs redis 8.8.0 (head-to-head, 300k keys, 128B values): bytes-per-key 526.7 -> 386.85 (gap 2.41x -> 1.77x), qps 71.4k -> 77.9k (+9%, smaller slot = better table cache density), tail latency still a big win. Whole-workspace tests green; #![forbid(unsafe_code)] intact.

The perf-gate (A5) runs on this PR and should show the improvement (bytes-per-key fell). docs/bench/OPTIMIZATION_LOG.md is the running tally and records the validated next lever: a single-allocation blob entry in a key-dedup hashbrown::HashTable (the small-value gap, 2.88x at 32B, is structural and needs that).

🤖 Generated with Claude Code

…ey slot (round 1) Memory optimization toward beating redis 8.8.0. ValueRepr was 72 bytes, sized for its largest variants (InlineBuf 45 and ZSetVal 64), so every string/int key reserved ~56 B it never used. Box the four collection variants (List/Hash/Set/ZSet -> Box<...>), keeping Int/Inline/Raw unboxed so the embstr SSO and the string/int hot path are untouched. Measured (sizeof): ValueRepr 72->48, KvObj 112->88, table slot 128->104. Measured (head-to-head vs redis 8.8.0, 300k keys, 128B values): bytes-per-key 526.7 -> 421.86 (-20%; gap 2.41x -> 1.93x), and qps 71.4k -> 77.9k (+9%, the smaller slot improves table cache density). Zero behavior change; whole-workspace tests green. See docs/bench/OPTIMIZATION_LOG.md round 1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zeke <ezequiel.lares@outlook.com>

…ound 2) Round 2 toward beating redis 8.8.0. Inline(InlineBuf) (a 45 B in-object SSO buffer) -> Inline(Box<[u8]>), dropping ValueRepr 48->24, KvObj 88->64, table slot 104->80. Allocation-parity with redis (which also heap-allocates the object). Measured (head-to-head vs redis 8.8.0, 300k keys): 128B values bytes-per-key 421.86 -> 386.85 (gap 1.93x -> 1.77x), qps steady ~77.6k. Table slack per key 146.8 -> 125.8. Zero behavior change; whole-workspace tests green. InlineBuf removed. Logged the key structural finding: the small-value gap (32B: 291 vs 101 = 2.88x) is dominated by IronCache's ~3 allocations per key + key duplication, which safe field-shrinks cannot close. The next lever is a single-allocation blob entry in a key-dedup table (see docs/bench/OPTIMIZATION_LOG.md). Round 2 keeps the safe wins banked while that larger rewrite is scoped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zeke <ezequiel.lares@outlook.com>

… (round 3 plan) Research (redis 8.2 kvobj, valkey 8.0/8.1, Dragonfly Dashtable, hashbrown HashTable, SwissTable/Dash/MemC3/F14) confirms the lever for the small-value gap and a SAFE Rust path: hashbrown::HashTable<Entry> with key-from-blob hash/eq closures (no key duplication) + a thin-pointer single-allocation entry ([header|key|value]). Logged in docs/bench/OPTIMIZATION_LOG.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zeke <ezequiel.lares@outlook.com>

github-actions · 2026-06-16T07:58:26Z

perf-gate (A5)

Same-runner ratchet of HEAD against the merge-base (both rebuilt and measured in this job).
PASS = within the noise band, WARN = a real move inside budget (does not fail), FAIL = past budget in the bad direction.

metric	base	head	delta%	band	budget	verdict
qps_median (peak)	71103.54	70732.66	-0.52%	+/-5.09%	drop <= 15%	PASS
bytes_per_key int	239.99	156.11	-34.95%	det	rise <= 5%	PASS
bytes_per_key embstr	240.09	172.21	-28.27%	det	rise <= 5%	PASS
bytes_per_key raw	496.10	412.22	-16.91%	det	rise <= 5%	PASS

Overall: PASS

qps: noisy on shared CI, so the band comes from the base reps spread (floored at 5%); a drop is only a regression past the 15% budget.
bytes_per_key: deterministic (allocator-true memmodel), so a tight 5% rise budget; any rise beyond it FAILs.
Open-loop tails / criterion micro-benches are reported-not-failed (tail noise is high) and are not part of this ratchet.
An intentional perf trade is landed by raising the relevant budget in this PR with a documented reason (CI never auto-commits a baseline).

ELares and others added 3 commits June 16, 2026 00:28

ELares merged commit 0ea8f9b into main Jun 16, 2026
12 checks passed

ELares deleted the perf/mem-round-1 branch June 16, 2026 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(store): shrink the per-key object - box ValueRepr variants (mem rounds 1-2)#262

perf(store): shrink the per-key object - box ValueRepr variants (mem rounds 1-2)#262
ELares merged 3 commits into
mainfrom
perf/mem-round-1

ELares commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ELares commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

perf-gate (A5)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant