perf(store): 8-byte tagged-pointer Entry slot - first CLEAR memory win vs redis 8.8.0 (round 5) by ELares · Pull Request #265 · ELares/IronCache

ELares · 2026-06-16T12:59:35Z

What

Shrink the per-shard hashbrown::HashTable<Entry> slot from 16 bytes to 8 by turning Entry from a 16-byte enum (Str(Box<[u8]>) fat pointer | Coll(Box<CollEntry>)) into a single 8-byte NonNull<u8> tagged pointer:

low bit 0 = a manually-allocated Str thin blob [u32 total_len][round-3 blob] (the length moves into the allocation, leaving a one-word pointer), align 8;
low bit 1 = Box::into_raw(Box<CollEntry>).

Both allocations are >= 2-aligned, so the low tag bit is always free. This keeps the Round-3 single-allocation-per-key + no-key-duplication win and adds the slot shrink, roughly halving table_bytes_per_key.

This lifts #![forbid(unsafe_code)] on ironcache-store (authorized) in favor of #![deny(unsafe_op_in_unsafe_fn)]. All unsafe is confined to one heavily documented Entry impl in kvobj.rs (manual alloc/dealloc, strict-provenance tag set/clear via map_addr, the access reconstructions, Drop, Clone); every block carries a // SAFETY: justification. The blob content is still parsed with safe bounds-checked slicing. The Store waist is unchanged; only the rmw type-dispatch in lib.rs swaps the old enum match for obj.as_coll_val_mut() now that Entry is opaque.

Result (the optimization-campaign goal: beat redis 8.8.0 on memory)

metric	before (round 3)	after (round 5)	redis 8.8.0
bytes/key (h2h, 128B, 300k, macOS)	221.5 (parity)	199.69	218.61
ratio vs redis	1.01x	0.91x (CLEAR WIN)	-
memmodel `table_bytes_per_key`	26.2	13.11	-
`size_of::<Entry>()`	16	8	-

This is the first clear memory win over redis 8.8.0 (round 3 was at parity). Memory bytes-per-key is the reliable metric on any box; the macOS throughput number remains contention-bound and non-authoritative (a clean speed verdict needs pinned Linux).

Soundness (3-lens adversarial review: UB / aliasing / behavioral parity)

Aliasing = SOUND, parity = PRESERVED. The UB lens found and this PR fixes two issues miri's executed paths could not catch:

CRITICAL (fixed): the new u32 total-length prefix would truncate for a single value > 4 GiB, so Drop would dealloc with a wrong Layout (UB). Reachable because APPEND grew a value unbounded (no proto-max-bulk-len check). This was a regression the prefix introduced (round 3's Box<[u8]> carried a usize length). Fix: (1) a hard expect in alloc_str_blob so the truncation branch is gone (a controlled panic backstop, not UB); (2) cap APPEND at 512 MB returning the exact Redis error (new ErrorReply::string_exceeds_max), matching Redis checkStringLength and keeping every value < 4 GiB so the backstop is unreachable in practice.
HIGH (fixed): the tag scheme needs the Str blob and Box<CollEntry> >= 2-aligned, guarded only by a release-stripped debug_assert. Added two const assertions (STR_ALIGN >= 2, align_of::<CollEntry>() >= 2) so a future alignment-breaking edit fails the build.

Gates

849 tests green; clippy -D warnings, fmt, invariant-lint clean.
miri under -Zmiri-strict-provenance clean across the store lib (incl. 8 dedicated Entry unsafe-path tests) AND every integration test (primitives, keyspace, eviction, the four collection in-place suites, watch). The jemalloc accounting test is miri-ignored (FFI not miri-executable; documented, non-UB).
The A5 per-PR perf-gate (bytes-per-key + qps ratchet) runs on this PR.

🤖 Generated with Claude Code

…n vs redis 8.8.0 (round 5) Shrink the per-shard `hashbrown::HashTable<Entry>` slot from 16 bytes to 8 by turning `Entry` from a 16-byte enum (`Str(Box<[u8]>)` fat pointer | `Coll(Box< CollEntry>)`) into a single 8-byte `NonNull<u8>` TAGGED POINTER: - low bit 0 = a manually-allocated Str THIN blob `[u32 total_len][round-3 blob]` (the length moves INTO the allocation, so the pointer is one word, not a fat ptr+len), align 8; - low bit 1 = `Box::into_raw(Box<CollEntry>)`. Both allocations are >= 2-aligned, so the low tag bit is always free. This keeps the Round-3 single-allocation-per-key + no-key-duplication win and adds the slot shrink, roughly halving `table_bytes_per_key`. This LIFTS `#![forbid(unsafe_code)]` on ironcache-store (authorized) in favor of `#![deny(unsafe_op_in_unsafe_fn)]`. All unsafe is CONFINED to one heavily documented `Entry` impl in kvobj.rs (manual alloc/dealloc, strict-provenance tag set/clear via `map_addr`, the access reconstructions, Drop, Clone); every unsafe block carries a `// SAFETY:` justification. The blob CONTENT is still parsed with SAFE bounds-checked slicing. The Store waist (ValueRef/RmwEntry/side-traits) is UNCHANGED; only the rmw type-dispatch in lib.rs swaps the old enum match for `obj.as_coll_val_mut()` now that `Entry` is opaque. Result (the optimization-campaign goal: beat redis 8.8.0 on memory): - head-to-head (128B, 300k keys, macOS): bytes/key 221.5 -> 199.69 vs redis 218.61 = 0.91x, CLEARLY BELOW redis (Round 3 was parity at 221.5). - memmodel (allocator-true): table_bytes_per_key 26.2 -> 13.11; size_of::<Entry> 16 -> 8. Soundness hardening from a 3-lens adversarial review (UB / aliasing / behavioral parity). Aliasing = SOUND, parity = PRESERVED; the UB lens found and this fixes two issues miri's executed paths could not catch: - CRITICAL: the new u32 total-length prefix would TRUNCATE for a single value > 4 GiB, so Drop would dealloc with a wrong Layout (UB). It was reachable because APPEND grew a value unbounded (no proto-max-bulk-len check). This was a regression the prefix introduced (round 3's Box<[u8]> carried a usize length). Fix: (1) a hard `expect` in alloc_str_blob so the truncation branch is gone (a controlled panic backstop, not UB); (2) cap APPEND at 512 MB returning the exact Redis error (new ErrorReply::string_exceeds_max), matching Redis checkStringLength AND keeping every value < 4 GiB so the backstop is unreachable in practice. - HIGH: the tag scheme needs the Str blob and Box<CollEntry> >= 2-aligned, guarded only by a release-stripped debug_assert. Added two `const` assertions (STR_ALIGN >= 2, align_of::<CollEntry>() >= 2) so a future alignment-breaking edit fails the BUILD instead of silently corrupting the tag. Gates: 849 tests green; clippy -D warnings, fmt, invariant-lint clean; miri under -Zmiri-strict-provenance clean across the store lib (incl. 8 dedicated Entry unsafe-path tests) AND every integration test. The jemalloc accounting test is miri-ignored (FFI not miri-executable; documented, non-UB). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zeke <ezequiel.lares@outlook.com>

github-actions · 2026-06-16T13:03:22Z

perf-gate (A5)

Same-runner ratchet of HEAD against the merge-base (both rebuilt and measured in this job).
PASS = within the noise band, WARN = a real move inside budget (does not fail), FAIL = past budget in the bad direction.

metric	base	head	delta%	band	budget	verdict
qps_median (peak)	70452.15	70895.86	0.63%	+/-5.29%	drop <= 15%	PASS
bytes_per_key int	58.11	45.02	-22.53%	det	rise <= 5%	PASS
bytes_per_key embstr	58.17	61.07	4.99%	det	rise <= 5%	WARN
bytes_per_key raw	346.28	333.15	-3.79%	det	rise <= 5%	PASS

Overall: WARN

qps: noisy on shared CI, so the band comes from the base reps spread (floored at 5%); a drop is only a regression past the 15% budget.
bytes_per_key: deterministic (allocator-true memmodel), so a tight 5% rise budget; any rise beyond it FAILs.
Open-loop tails / criterion micro-benches are reported-not-failed (tail noise is high) and are not part of this ratchet.
An intentional perf trade is landed by raising the relevant budget in this PR with a documented reason (CI never auto-commits a baseline).

ELares merged commit 600403d into main Jun 16, 2026
12 checks passed

ELares deleted the perf/tagged-slot branch June 16, 2026 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(store): 8-byte tagged-pointer Entry slot - first CLEAR memory win vs redis 8.8.0 (round 5)#265

perf(store): 8-byte tagged-pointer Entry slot - first CLEAR memory win vs redis 8.8.0 (round 5)#265
ELares merged 1 commit into
mainfrom
perf/tagged-slot

ELares commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ELares commented Jun 16, 2026

What

Result (the optimization-campaign goal: beat redis 8.8.0 on memory)

Soundness (3-lens adversarial review: UB / aliasing / behavioral parity)

Gates

Uh oh!

github-actions Bot commented Jun 16, 2026

perf-gate (A5)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant