perf(store): shrink the per-key object - box ValueRepr variants (mem rounds 1-2)#262
Merged
Conversation
…ey slot (round 1) Memory optimization toward beating redis 8.8.0. ValueRepr was 72 bytes, sized for its largest variants (InlineBuf 45 and ZSetVal 64), so every string/int key reserved ~56 B it never used. Box the four collection variants (List/Hash/Set/ZSet -> Box<...>), keeping Int/Inline/Raw unboxed so the embstr SSO and the string/int hot path are untouched. Measured (sizeof): ValueRepr 72->48, KvObj 112->88, table slot 128->104. Measured (head-to-head vs redis 8.8.0, 300k keys, 128B values): bytes-per-key 526.7 -> 421.86 (-20%; gap 2.41x -> 1.93x), and qps 71.4k -> 77.9k (+9%, the smaller slot improves table cache density). Zero behavior change; whole-workspace tests green. See docs/bench/OPTIMIZATION_LOG.md round 1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zeke <ezequiel.lares@outlook.com>
…ound 2) Round 2 toward beating redis 8.8.0. Inline(InlineBuf) (a 45 B in-object SSO buffer) -> Inline(Box<[u8]>), dropping ValueRepr 48->24, KvObj 88->64, table slot 104->80. Allocation-parity with redis (which also heap-allocates the object). Measured (head-to-head vs redis 8.8.0, 300k keys): 128B values bytes-per-key 421.86 -> 386.85 (gap 1.93x -> 1.77x), qps steady ~77.6k. Table slack per key 146.8 -> 125.8. Zero behavior change; whole-workspace tests green. InlineBuf removed. Logged the key structural finding: the small-value gap (32B: 291 vs 101 = 2.88x) is dominated by IronCache's ~3 allocations per key + key duplication, which safe field-shrinks cannot close. The next lever is a single-allocation blob entry in a key-dedup table (see docs/bench/OPTIMIZATION_LOG.md). Round 2 keeps the safe wins banked while that larger rewrite is scoped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zeke <ezequiel.lares@outlook.com>
… (round 3 plan) Research (redis 8.2 kvobj, valkey 8.0/8.1, Dragonfly Dashtable, hashbrown HashTable, SwissTable/Dash/MemC3/F14) confirms the lever for the small-value gap and a SAFE Rust path: hashbrown::HashTable<Entry> with key-from-blob hash/eq closures (no key duplication) + a thin-pointer single-allocation entry ([header|key|value]). Logged in docs/bench/OPTIMIZATION_LOG.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zeke <ezequiel.lares@outlook.com>
perf-gate (A5)Same-runner ratchet of HEAD against the merge-base (both rebuilt and measured in this job).
Overall: PASS
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First two rounds of the campaign to beat redis 8.8.0 on memory (and not lose speed). Pure layout shrinks of the per-key object, zero behavior change.
ValueReprvariants (List/Hash/Set/ZSet -> Box). ValueRepr 72->48, KvObj 112->88, slot 128->104. SSO preserved.Inline(InlineBuf)->Inline(Box<[u8]>)). ValueRepr 48->24, KvObj 88->64, slot 104->80. Allocation-parity with redis.Measured vs redis 8.8.0 (head-to-head, 300k keys, 128B values): bytes-per-key 526.7 -> 386.85 (gap 2.41x -> 1.77x), qps 71.4k -> 77.9k (+9%, smaller slot = better table cache density), tail latency still a big win. Whole-workspace tests green;
#![forbid(unsafe_code)]intact.The perf-gate (A5) runs on this PR and should show the improvement (bytes-per-key fell).
docs/bench/OPTIMIZATION_LOG.mdis the running tally and records the validated next lever: a single-allocation blob entry in a key-deduphashbrown::HashTable(the small-value gap, 2.88x at 32B, is structural and needs that).🤖 Generated with Claude Code