[EPIC]: IronCache vision and master index
The most efficient Redis-wire-compatible cache in the world, shipped as one Rust static binary.
Tenets, ranked and non-negotiable in this order: Compatible > Efficient > Simple > Scalable > AI-Driven.
This issue is the index of the whole project. Every other issue hangs off the map at the bottom. If an issue is not in the map, it is either an orphan to be re-parented or a duplicate to be closed.
Why this matters
Every incumbent in this space is wrong in a specific, fixable way. Each wrong is mapped to the tenet it violates.
Redis 8, single-threaded data core (violates Efficient). The keyspace is owned by one thread; the rest of the box is spent on I/O threads and housekeeping. Throughput-per-core is capped by design, and the standard answer is "run more shards / more processes," which is operational tax, not efficiency. IronCache treats per-core throughput as the headline number.
Valkey, inherited the same ceiling (violates Efficient). Valkey is the credible community fork and our compatibility oracle, but it carries Redis's single-threaded keyspace architecture. It improves around the edges; it does not remove the core constraint. We respect it as the bar to beat on compatibility while beating it on efficiency.
KeyDB, multi-threaded but a locked shared keyspace, and dormant (violates Efficient and Simple). KeyDB multithreaded by sharing one keyspace behind locks, trading contention for cores, and the project is effectively dormant. Shared-nothing thread-per-core removes the lock instead of optimizing it.
DragonflyDB, vertical-only and C++ (violates Scalable and Simple). Dragonfly proved shared-nothing throughput is real, but it scales vertically (scale-up the box) rather than out, and it is a C++ codebase. IronCache is shared-nothing AND has a single-node-first, slot-ready path to horizontal distribution, in memory-safe Rust.
Memcached, no compatibility contract (violates Compatible). Fast and simple, but it offers no rich data types and no Redis wire contract, so it is not a drop-in. We commit to a published RESP compatibility contract instead of "mostly works."
Garnet, managed runtime (violates Simple and Efficient). Garnet shows the design space (RESP + tiered storage + great numbers) but runs on a managed .NET runtime with a GC and a runtime dependency. IronCache ships as one static native binary with no runtime, no GC pauses, and predictable tail latency.
The gap nobody fills: a memory-safe, single-static-binary, RESP-compatible cache whose primary axis of competition is efficiency per core and memory per item , that scales out, and that is honest about what it does not do.
The five tenets, with measurable intent
Compatible (highest). Redis-wire compatibility is a published contract, not a vibe. We define compatibility tiers (Tier 0-4), pin a Valkey/Redis differential oracle, and refuse to ship a behavior claim without a differential or conformance test. Intent: an unmodified mainstream client and the common command set work against IronCache unchanged.
Efficient. The headline metrics are throughput-per-core and memory-at-a-fixed-hit-ratio, not aggregate ops/sec on a big box. Intent: beat Valkey on per-core throughput and beat Redis on bytes-per-stored-item at equal hit ratio, both proven on a reproducible harness.
Simple. One static binary, one config file, eviction and a memory ceiling ON by default, no sidecars or mandatory proxy, no managed runtime. Intent: install-to-first-GET measured in seconds, operable by reading one INFO output.
Scalable. Single-node-first, but the storage layout is slot-ready from day one so horizontal distribution is an unlock, not a rewrite. Intent: a Redis-Cluster-compatible client contract and online slot migration without a write freeze.
AI-Driven (lowest, and strictly off the data path). A background advisor that selects experts and autotunes bounded knobs against an efficiency objective. Intent: measurable headroom over a tuned W-TinyLFU + SIEVE baseline, with hard guardrails, hysteresis, rollback, and a kill-switch. Never per-request inference on the hot path.
The ranking is a tie-breaker rule: when two designs conflict, the higher tenet wins. Compatibility beats a clever efficiency trick; efficiency beats a scaling convenience; and AI never wins against any of the other four.
Prior art
We do not get to assert "fastest" or "most memory-efficient" without receipts. Prior-art foundations, the pinned competitor landscape, and every quantitative claim we make about an incumbent live in docs/PRIOR_ART.md, with each claim recorded, sourced, and falsifiable in docs/prior-art/claims.yaml. Claims without a verifying test or a written correction are non-goals by policy. See #6 for the verification process and #9 for the measured single-core bar.
What IronCache IS and is NOT
IS: a Rust, single static binary, RESP/Redis-wire-compatible, shared-nothing thread-per-core in-memory cache, with eviction and a memory ceiling on by default, transparent value compression, opt-in forkless snapshotting, a single-node-first but slot-ready distribution path, and an off-path AI advisor.
IS NOT (committed non-goals):
No embedded scripting VM / Lua on the hot path; native atomic ops cover the common cases instead. See [NON-GOAL]: Committed non-goals register (scripting, Memcached protocol, RDMA, managed runtime) #10 , [RESEARCH]: Native atomic-op set covering common scripting use cases without a Lua VM #23 .
No Memcached protocol, no RDMA transport, no managed runtime. See [NON-GOAL]: Committed non-goals register (scripting, Memcached protocol, RDMA, managed runtime) #10 .
No fork()+copy-on-write snapshotting, no mandatory proxy, and no reliance on host THP/overcommit tuning. See [NON-GOAL]: fork()+copy-on-write snapshotting, mandatory proxy, and host THP/overcommit tuning #11 .
No strong consistency / zero write loss in the default async replication mode; strong consistency is an opt-in tier. See [NON-GOAL]: Strong consistency / zero write loss in the default async mode #12 .
No per-request neural/ML inference on the data path. See [NON-GOAL]: per-request neural/ML inference on the data path #13 .
No consistency or efficiency claim without its test or a written correction. See [NON-GOAL]: No consistency or efficiency claim without its test/correction #14 .
Open decisions (cross-cutting)
These cut across multiple pillars and gate the architecture:
Acceptance criteria
The project has earned its name when, on the reproducible harness (#8 ) against the pinned oracle (#96 ), all of the following hold and each is backed by a committed test:
Throughput-per-core: sustained single-core GET/SET throughput strictly exceeds Valkey 9.x single-core on identical hardware and payload mix, with the multiple reported (target: >= 1.5x per core on the standard mixed workload).
Memory-per-item: resident bytes-per-stored-item at a fixed 95 percent hit ratio is below Redis 8 on the value-size survey corpus (target: <= 0.7x Redis bytes/item at equal hit ratio), measured with compression in its default posture.
Tail latency: p99.9 GET latency at the target per-core throughput stays under a fixed bound with no GC-class pauses (target: p99.9 <= 1 ms at the documented load), demonstrating the no-managed-runtime claim.
Install-to-first-GET: from downloading the single binary to a successful GET against a running default instance in under 60 seconds, with zero required config edits (eviction and memory ceiling already on).
Redis-conformance bar: 100 percent pass on the declared Tier 0/1 command surface in the differential suite against pinned redis-server/valkey-server, with documented and tested behavior for every Tier 2+ deviation. See [DESIGN]: Conformance, differential, fuzz, property, and DST testing stack #95 , [DESIGN]: Differential testing against pinned redis-server/valkey-server #97 , [DECISION]: Define and publish the IronCache compatibility tiering (Tier 0-4) #16 .
No headline claim ships without the corresponding row in docs/prior-art/claims.yaml and a green test, per #14 .
Issue map
Every planned issue is listed here, grouped by milestone. Nothing is orphaned.
M0, Charter, claims, and decisions to make before building
[META]: Scope, ranked tenets, and the five-pillar charter #2 : scope, ranked tenets, five-pillar charter (the governing META).
[META]: Glossary and the load-bearing system invariants #3 : glossary and load-bearing system invariants.
[META]: ADR index, decision register, and record format #4 : ADR index, decision register, and record format.
[META]: Issue-tree coherence and deduplication audit #5 : issue-tree coherence and deduplication audit.
[RESEARCH]: Prior-art foundations, claim verification, and the pinned competitor landscape #6 : prior-art foundations, claim verification, pinned competitor landscape.
[DECISION]: Headline metrics are throughput-per-core and memory-at-fixed-hit-ratio #7 : decide headline metrics = throughput-per-core and memory-at-fixed-hit-ratio.
[RESEARCH]: Single-core throughput bar vs Redis 8 / Valkey / Dragonfly / Garnet #9 : measure the single-core bar vs Redis 8 / Valkey / Dragonfly / Garnet.
[NON-GOAL]: Committed non-goals register (scripting, Memcached protocol, RDMA, managed runtime) #10 : committed non-goals register (scripting, Memcached protocol, RDMA, managed runtime).
[NON-GOAL]: fork()+copy-on-write snapshotting, mandatory proxy, and host THP/overcommit tuning #11 : non-goal, fork()+COW snapshotting, mandatory proxy, host THP/overcommit tuning.
[NON-GOAL]: Strong consistency / zero write loss in the default async mode #12 : non-goal, strong consistency / zero write loss in default async mode.
[NON-GOAL]: per-request neural/ML inference on the data path #13 : non-goal, per-request neural/ML inference on the data path.
[NON-GOAL]: No consistency or efficiency claim without its test/correction #14 : non-goal, no consistency or efficiency claim without its test/correction.
[DECISION]: Define and publish the IronCache compatibility tiering (Tier 0-4) #16 : decide and publish the compatibility tiering (Tier 0-4).
[RESEARCH]: Runtime bake-off, monoio vs glommio vs tokio+epoll on GET/SET #26 : runtime bake-off, monoio vs glommio vs tokio+epoll on GET/SET.
[RESEARCH]: Hot-shard mitigation and memory-reclamation strategy under shard-per-core #32 : hot-shard mitigation and memory-reclamation strategy under shard-per-core.
[RESEARCH]: Adaptive vs fixed encoding-conversion thresholds #37 : adaptive vs fixed encoding-conversion thresholds.
[RESEARCH]: Benchmark jemalloc/mimalloc/snmalloc under a cache workload #42 : benchmark jemalloc/mimalloc/snmalloc under a cache workload.
[DECISION]: Ship a memory ceiling and eviction ON by default #45 : decide to ship a memory ceiling and eviction ON by default.
[RESEARCH]: Benchmark SIEVE/S3-FIFO/W-TinyLFU/ARC/LIRS on cachemon corpus plus KV traces #47 : benchmark SIEVE/S3-FIFO/W-TinyLFU/ARC/LIRS on cachemon corpus plus KV traces.
[DECISION]: Default codec = zstd low-level; LZ4 and none as policy options #53 : decide default codec = zstd low-level; LZ4 and none as policy options.
[RESEARCH]: Cache value-size and compressibility distribution survey #57 : cache value-size and compressibility distribution survey.
[DECISION]: Durability stance (ephemeral default, opt-in snapshot, warm-restart, later tiers) #59 : decide durability stance (ephemeral default, opt-in snapshot, warm-restart, later tiers).
[RESEARCH]: Bound and enforce snapshot memory overhead; fast parallel restart #61 : bound and enforce snapshot memory overhead; fast parallel restart.
[DECISION]: Single-node-first roadmap with slot-ready storage layout #69 : decide single-node-first roadmap with slot-ready storage layout.
[RESEARCH]: Per-shard Raft for an opt-in strongly-consistent tier #78 : research per-shard Raft for an opt-in strongly-consistent tier.
[RESEARCH]: Post-ketama consistent hashing for internal placement #80 : post-ketama consistent hashing for internal placement.
[RESEARCH]: Define the advisor objective metric (throughput-per-core/memory, not raw hit ratio) #89 : define the advisor objective metric (efficiency, not raw hit ratio).
[RESEARCH]: Quantify advisor headroom over a tuned W-TinyLFU + SIEVE baseline #90 : quantify advisor headroom over a tuned W-TinyLFU + SIEVE baseline.
[TASK]: Valkey 9.x as RESP differential-test oracle and head-to-head baseline #96 : adopt Valkey 9.x as RESP differential oracle and head-to-head baseline.
M1, Core architecture, decisions locked, foundational designs
[DESIGN]: Reproducible benchmark and memory-model harness #8 : reproducible benchmark and memory-model harness.
[DESIGN]: RESP protocol surface, parser, and compatibility tiers #15 : RESP protocol surface, parser, and compatibility tiers (protocol EPIC).
[DECISION]: RESP3 reply-shaping policy and error-string fidelity #17 : RESP3 reply-shaping policy and error-string fidelity.
[DESIGN]: Redis-compatible error-string catalog #18 : Redis-compatible error-string catalog.
[DESIGN]: Security surface (AUTH, requirepass, ACL, embedded TLS) #22 : security surface (AUTH, requirepass, ACL, embedded TLS).
[DECISION]: Shared-nothing thread-per-core as the core concurrency model #24 : decide shared-nothing thread-per-core as the core concurrency model.
[DESIGN]: Shared-nothing core runtime and the async/io stack #25 : shared-nothing core runtime and the async/io stack (runtime EPIC).
[DECISION]: Transaction and scripting surface scope #30 : decide transaction and scripting surface scope.
[DECISION]: Design the runtime for determinism to enable DST #31 : decide to design the runtime for determinism to enable DST.
[DECISION]: Epoch-based reclamation (crossbeam-epoch) vs custom drain-list framework #33 : decide epoch-based reclamation (crossbeam-epoch) vs custom drain-list.
[DESIGN]: Narrow-waist storage API (Read/Upsert/Delete/RMW) under the RESP layer #34 : narrow-waist storage API (Read/Upsert/Delete/RMW) under the RESP layer.
[DESIGN]: Hash table, data-structure encodings, and per-key object layout #35 : hash table, data-structure encodings, per-key object layout (datastructures EPIC).
[DECISION]: Per-shard single-thread HashMap vs shared concurrent map fallback #36 : decide per-shard single-thread HashMap vs shared concurrent map fallback.
[DECISION]: Default global allocator and memory-accounting strategy #41 : decide default global allocator and memory-accounting strategy.
[DESIGN]: Online defragmentation strategy #43 : online defragmentation strategy.
[DECISION]: THP and snapshot stance (MADV_NOHUGEPAGE heap, non-fork serialization) #44 : decide THP and snapshot stance (MADV_NOHUGEPAGE heap, non-fork serialization).
[DECISION]: Default eviction policy (SIEVE vs S3-FIFO vs W-TinyLFU-fronted FIFO) #46 : decide the default eviction policy.
[DESIGN]: Pluggable EvictionPolicy trait and ghost queue #48 : pluggable EvictionPolicy trait and ghost queue.
[DESIGN]: W-TinyLFU frequency admission filter (CM-sketch + doorkeeper + aging) #49 : W-TinyLFU frequency admission filter (CM-sketch + doorkeeper + aging).
[DESIGN]: Map Redis maxmemory-policy names onto IronCache's internal engine #50 : map Redis maxmemory-policy names onto IronCache's internal engine.
[DESIGN]: Transparent value compression strategy #52 : transparent value compression strategy (compression EPIC).
[DECISION]: C-bound zstd vs pure-Rust zstd for the static binary #54 : decide C-bound zstd vs pure-Rust zstd for the static binary.
[DESIGN]: ZDICT per-prefix dictionary training, versioning, and tagging #55 : ZDICT per-prefix dictionary training, versioning, and tagging.
[DESIGN]: Persistence, forkless snapshot, and storage-engine architecture #58 : persistence, forkless snapshot, storage-engine architecture (persistence EPIC).
[DECISION]: Reject RocksDB/LSM as the core cold engine; choose hybrid-log vs F2 #65 : decide to reject RocksDB/LSM as the core cold engine; hybrid-log vs F2.
[DESIGN]: Single-node to multi-node distribution (partitioning, routing, replication, membership) #68 : single-node to multi-node distribution (clustering EPIC).
[DESIGN]: Redis-Cluster-compatible client contract (16384 slots, CRC16, hash tags, MOVED/ASK) #70 : Redis-Cluster-compatible client contract (16384 slots, CRC16, hash tags, MOVED/ASK).
[DECISION]: Internal shard map and partition count decoupled from the 16384 compatibility slots #71 : decide internal shard map decoupled from the 16384 compatibility slots.
[DECISION]: Keyspace partition count as dual-purpose shard/migration unit #72 : decide keyspace partition count as dual-purpose shard/migration unit.
[DESIGN]: Raft-managed authoritative slot map and in-binary HA control plane #73 : Raft-managed authoritative slot map and in-binary HA control plane.
[DESIGN]: SWIM + Lifeguard data-plane membership and failure detection #74 : SWIM + Lifeguard data-plane membership and failure detection.
[DECISION]: Default replication and consistency model (async primary/replica + WAIT) #76 : decide default replication and consistency model (async + WAIT).
[DESIGN]: Single static binary, CLI, and single-binary operations #81 : single static binary, CLI, and single-binary operations (binary EPIC).
[DECISION]: clap subcommands vs argv[0] symlink mode-switching and artifact signing #82 : decide clap subcommands vs argv[0] symlink mode-switching and artifact signing.
[DESIGN]: Observability (Prometheus /metrics, INFO/SLOWLOG/LATENCY parity) #86 : observability (Prometheus /metrics, INFO/SLOWLOG/LATENCY parity).
[DESIGN]: AI-driven background advisor (expert selection + bounded knob autotuning) #88 : AI-driven background advisor (expert selection + bounded knob autotuning) (AI EPIC).
[DESIGN]: Advisor safety guardrails and mechanism detail (bounded knobs, hysteresis, rollback, kill-switch) #91 : advisor safety guardrails (bounded knobs, hysteresis, rollback, kill-switch).
[DESIGN]: AI-assisted development pipeline with adversarial claim verification #94 : AI-assisted development pipeline with adversarial claim verification.
[DESIGN]: Conformance, differential, fuzz, property, and DST testing stack #95 : conformance, differential, fuzz, property, and DST testing stack (testing EPIC).
M2, Advanced engine, distribution, and the harder tests
[DESIGN]: MULTI/EXEC/DISCARD/WATCH with optimistic locking and no rollback #19 : MULTI/EXEC/DISCARD/WATCH with optimistic locking and no rollback.
[DESIGN]: Unified server-push channel (Pub/Sub, sharded Pub/Sub, keyspace notifications, CSC) #20 : unified server-push channel (Pub/Sub, sharded Pub/Sub, keyspace notifications, CSC).
[DESIGN]: CLIENT TRACKING (BCAST + RESP3 push default, per-client table, RESP2 REDIRECT) #21 : CLIENT TRACKING (BCAST + RESP3 push default, per-client table, RESP2 REDIRECT).
[RESEARCH]: Native atomic-op set covering common scripting use cases without a Lua VM #23 : native atomic-op set covering common scripting use cases without a Lua VM.
[DESIGN]: Runtime/IO abstraction layer keeping monoio/glommio/tokio swappable #27 : runtime/IO abstraction layer keeping monoio/glommio/tokio swappable.
[DESIGN]: io_uring fast path with registered buffers and multishot ops #28 : io_uring fast path with registered buffers and multishot ops.
[DESIGN]: Cross-shard coordinator and transaction/scripting surface #29 : cross-shard coordinator and transaction/scripting surface.
[DESIGN]: Segmented extendible-hash index (Dash-style) with SIMD fingerprint probing #38 : segmented extendible-hash index (Dash-style) with SIMD fingerprint probing.
[TASK]: intset and HyperLogLog sparse/dense encodings for wire compatibility #39 : intset and HyperLogLog sparse/dense encodings for wire compatibility.
[DESIGN]: OBJECT ENCODING / DEBUG OBJECT compatibility mapping #40 : OBJECT ENCODING / DEBUG OBJECT compatibility mapping.
[DESIGN]: Compression interaction with mutating commands and hot-key cost #56 : compression interaction with mutating commands and hot-key cost.
[DESIGN]: Forkless versioned point-in-time snapshot and diskless full-sync #60 : forkless versioned point-in-time snapshot and diskless full-sync.
[DESIGN]: mmap warm-restart (graceful shutdown + state file + pointer fixup) #62 : mmap warm-restart (graceful shutdown + state file + pointer fixup).
[DESIGN]: Segment + atomic manifest durable log with corruption recovery #63 : segment + atomic manifest durable log with corruption recovery.
[DESIGN]: HybridLog storage engine with in-place hot-set updates #64 : HybridLog storage engine with in-place hot-set updates.
[DESIGN]: Tiered RAM->SSD value store (extstore-inspired) #66 : tiered RAM->SSD value store (extstore-inspired).
[DESIGN]: io_uring snapshot/tiering write path with SQPOLL and fallback #67 : io_uring snapshot/tiering write path with SQPOLL and fallback.
[DESIGN]: TTL expiration via per-shard timing wheel with lazy backstop and background reclamation #51 : TTL expiration via per-shard timing wheel with lazy backstop and background reclamation.
[DESIGN]: Atomic, snapshot-streamed online slot migration without write freeze #75 : atomic, snapshot-streamed online slot migration without write freeze.
[DESIGN]: Offset-based async replication with adaptive, disk-spillable backlog #77 : offset-based async replication with adaptive, disk-spillable backlog.
[RESEARCH]: Opt-in active-active CRDT mode (reject blanket LWW; principled CRDT/HLC) #79 : opt-in active-active CRDT mode (reject blanket LWW; principled CRDT/HLC).
[DESIGN]: ironcache upgrade with verified rollback #83 : ironcache upgrade with verified rollback.
[TASK]: Packaging, cross-build matrix, reproducible builds, SBOM, and musl penalty research #84 : packaging, cross-build matrix, reproducible builds, SBOM, musl penalty research.
[DESIGN]: TOML config with CONFIG GET/SET/REWRITE parity and live reload #85 : TOML config with CONFIG GET/SET/REWRITE parity and live reload.
[RESEARCH]: Continuously-reported online Belady-MIN gap metric #87 : continuously-reported online Belady-MIN gap metric.
[DESIGN]: Off-path per-value compression decision model #92 : off-path per-value compression decision model.
[TASK]: Offline Belady-MIN and learned-Belady oracle in the benchmark harness #93 : offline Belady-MIN and learned-Belady oracle in the benchmark harness.
[DESIGN]: Differential testing against pinned redis-server/valkey-server #97 : differential testing against pinned redis-server/valkey-server.
[DESIGN]: Property-based and model-based tests for every data type #98 : property-based and model-based tests for every data type.
[DESIGN]: Jepsen + Elle test plan for clustering/replication #99 : Jepsen + Elle test plan for clustering/replication.
[TASK]: Seeded fault-injection and corruption scenarios #100 : seeded fault-injection and corruption scenarios.
References
docs/PRIOR_ART.md, competitor landscape, the specific incumbent claims above, and their sources.
docs/prior-art/claims.yaml, machine-readable claim register; every quantitative claim has a row and a verifying test or correction.
docs/research/, research-issue outputs (benchmark bake-offs, eviction corpus results, value-size survey, runtime bake-off, allocator benchmarks).
Post-audit additions (2026-06-13)
The pre-implementation audit (see docs/AUDIT.md ) filed these issues. Decompositions of too-large issues:
from [NON-GOAL]: fork()+copy-on-write snapshotting, mandatory proxy, and host THP/overcommit tuning #11 : [NON-GOAL]: no fork()+copy-on-write point-in-time snapshotting (BGSAVE model) #101 , [NON-GOAL]: do not require host THP disable or vm.overcommit_memory=1 as a precondition #102 , [NON-GOAL]: no mandatory proxy on the hot path for routing #103
from [DESIGN]: Security surface (AUTH, requirepass, ACL, embedded TLS) #22 : [DESIGN]: AUTH handshake and credential model (HELLO AUTH, AUTH, requirepass, SHA-256 default user) #104 , [DESIGN]: Embedded rustls TLS listener (cert/key config, TLS-only mode, no C TLS lib) #105 , [DESIGN]: Full ACL engine and aclfile persistence (deferred from M1) #106
from [DESIGN]: Cross-shard coordinator and transaction/scripting surface #29 : [DESIGN]: Cross-shard coordinator: topology, txid ordering, MGET/MSET atomicity, and back-pressure #107 , [DESIGN]: Pub/Sub fan-out topology under shared-nothing shards #108 , [DOC/ADR]: Apply #30's transaction-and-scripting surface decision to the coordinator #109
from [DESIGN]: Hash table, data-structure encodings, and per-key object layout #35 : [DESIGN]: Per-shard bucket table geometry and incremental rehash policy #110 , [DESIGN]: One-allocation per-key object layout (embedded key + inline small value + metadata bits) #111 , [DESIGN]: Compact scalar value encodings (SSO, tagged small int/float, variable-width string header) #112 , [DESIGN]: Universal collection container and intset analog (cascade-update designed out) #113
from [TASK]: intset and HyperLogLog sparse/dense encodings for wire compatibility #39 : [TASK]: intset encoding (sorted packed int16/32/64, width upgrade, 512-entry cap, promotion to #35 set path) #114 , [TASK]: HyperLogLog encoding (P=14 dense + sparse ZERO/XZERO/VAL, PFADD/PFCOUNT/PFMERGE) #115 , [TASK]: SIMD register-histogram / merge kernels for PFCOUNT, PFMERGE, BITCOUNT (benchmark-gated) #116
from [DECISION]: Default global allocator and memory-accounting strategy #41 : [DECISION]: Default global allocator (jemalloc vs mimalloc vs snmalloc) #117 , [DECISION]: Memory accounting and size-class scheme for honest maxmemory #118
from [DECISION]: clap subcommands vs argv[0] symlink mode-switching and artifact signing #82 : [DECISION]: CLI mode dispatch - clap subcommands vs argv[0] symlink branching #119 , [DECISION]: Release-artifact and self-update signing scheme - minisign vs cosign/sigstore #120
from [TASK]: Packaging, cross-build matrix, reproducible builds, SBOM, and musl penalty research #84 : [TASK]: Reproducible cross-build matrix on cargo-zigbuild (musl x86_64/aarch64 + glibc-pinned gnu fallback) #121 , [TASK]: Release distribution and install paths (curl|sh installer, Homebrew formula, distroless image, hardened systemd unit) #122 , [TASK]: Supply-chain SBOM and artifact attestation (cargo-auditable embedded SBOM + per-release CycloneDX) #123 , [SPIKE]: musl allocator contention benchmark (musl malloc vs static mimalloc vs static jemalloc) on the IronCache cache workload #124 , [NON-GOAL]: Record native Windows server binary as a deferred M2 non-goal (Docker/WSL) #125
from [DESIGN]: AI-driven background advisor (expert selection + bounded knob autotuning) #88 : [DESIGN]: AI-driven background advisor - expert selection + bounded knob autotuning (runtime) #126 , [TASK]: Stand up the LLM/agent prior-art mining + adversarial claim-verification pipeline #127
Coverage-gap issues:
M0: [NON-GOAL]: Redis Streams (XADD/XREAD/XRANGE, consumer groups, rax index) out of scope for M0-M2 #132 , [DECISION]: Headline scale-out targets (max nodes, slots-per-node working range, rebalance-time and failover-time budgets) #146 , [DECISION]: Advisor default posture (ship shadow/off by default; engine fully correct and fast with the advisor disabled) #155 , [NON-GOAL]: The advisor adds no runtime network, model-service, or GPU dependency; AI is build-time only except the in-binary O(1) controller #156 , [DECISION]: Per-tenet acceptance targets and release gates for Compatible/Efficient/Simple/Scalable/AI-Driven #157 , [RESEARCH]: Pin the second-tier KV/cache landscape (Aerospike, Tarantool, Kvrocks, Hazelcast/Ignite/Coherence, Skytable, Redka) #162
M1: [DESIGN]: Per-data-type command semantics for strings, lists, hashes, sets, and sorted sets #128 , [DESIGN]: Generic keyspace commands and SCAN cursor-stability contract (SCAN/HSCAN/SSCAN/ZSCAN, KEYS, TYPE, RANDOMKEY, RENAME, COPY, TOUCH, DUMP/RESTORE) #129 , [DECISION]: Geo command family (GEOADD/GEOSEARCH/GEODIST) scope vs non-goal #133 , [RESEARCH]: Large-collection structure bake-off (zset ordered index: skiplist vs B-tree/ART; list deque structure) on throughput-per-core and bytes-per-element #136 , [DESIGN]: Connection admission, max-clients, output-buffer limits, and the OOM-write/DoS contract under sustained pressure #137 , [DESIGN]: RESP request-size and adversarial-input hardening (proto-max-bulk-len tunable, multibulk count cap, accumulated-frame bound, RESP3 nesting depth, inline-length cap, parser-work budget) #138 , [DESIGN]: Idle-connection timeout, TCP keepalive, and dead-peer reaping #140 , [DECISION]: Per-command operational latency budget and slow-operation guard under defrag/eviction/snapshot/expire #141 , [DESIGN]: Written threat model (assets, trust boundaries, attacker capabilities, STRIDE per subsystem) #142 , [TASK]: Continuous dependency-vulnerability and license auditing as a merge/release gate (cargo-audit/RUSTSEC + cargo-deny) #144 , [DESIGN]: Secrets handling: log/MONITOR redaction, in-memory zeroization, and core-dump/swap exposure #145 , [DESIGN]: Replica-read contract (READONLY/READWRITE, replica routing, bounded staleness surfaced to clients) #147 , [DESIGN]: Cluster bootstrap and node-lifecycle (seed/MEET join, learner-to-voter-to-slot-owner promotion, add/remove-node surface) #149 , [DESIGN]: Admin/introspection command family (CLIENT LIST/INFO/KILL/PAUSE/NO-EVICT/NO-TOUCH, COMMAND DOCS/INFO/COUNT/GETKEYS) #150 , [DESIGN]: Metric/label registry, native INFO field catalog, and per-command cardinality bounds #152 , [DESIGN]: AI advisor observability, explainability, and decision/audit trail (knob from->to, trigger, objective delta, snapshot version, rollback/kill-switch events; surfaced via INFO/metrics and queryable, emitted even in shadow mode) #153 , [DESIGN]: Advisor evaluation and promotion gate (offline replay + shadow A/B proving a change beats the static baseline before it acts) #154 , [DESIGN]: Continuous performance-regression CI gate (per-commit throughput-per-core and bytes-per-key ratchet) #159
M2: [DESIGN]: Blocking command semantics (BLPOP/BRPOP/BLMOVE/BLMPOP/BZPOPMIN/BZMPOP, WAIT, XREAD BLOCK) under the shared-nothing model #130 , [DESIGN]: Bitmap and BITFIELD semantics over the string type (SETBIT/GETBIT/BITCOUNT/BITPOS/BITOP/BITFIELD: addressing, growth, signed/unsigned overflow) #131 , [DESIGN]: Sorted-set (zset) large representation: ordered index plus parallel member->score map #134 , [DESIGN]: List (quicklist-equivalent) representation: linked listpack chunks with O(1) head/tail and node sizing #135 , [DESIGN]: Graceful shutdown contract (SHUTDOWN [NOSAVE|SAVE], SIGTERM/SIGINT, connection drain, optional save-on-exit, orchestrator exit/grace contract) #139 , [DESIGN]: At-rest encryption of snapshots, warm-restart state, and tiered-store SSD files #143 , [DESIGN]: Rebalancing policy and orchestration (when/which partitions move, hot-slot trigger, node drain and decommission) #148 , [DESIGN]: Troubleshooting introspection commands (MEMORY USAGE/STATS/DOCTOR, LATENCY DOCTOR) #151 , [DESIGN]: Real client-driver compatibility matrix (run lettuce/redis-py/go-redis/ioredis/node-redis/jedis/StackExchange.Redis own test suites against IronCache) #158 , [TASK]: Determinism replay-contract verification as a CI gate (same seed yields byte-identical execution; Env-seam lint against direct nondeterminism) #160 , [TASK]: Long-horizon soak and memory-stability correctness gate (no leak, bounded fragmentation/RSS drift, no fd/timer/tracked-key growth) #161 , [RESEARCH]: Foundational CRDT literature and OR-Set tombstone GC for the full Redis type surface #163
Implementation readiness
Sequencing of the whole tree into a critical path to first code lives in #164 and docs/ROADMAP.md . The Implementation Readiness milestone holds the 42-issue gate set; wave:0..3 labels carry the order; critical-path marks the thin first slice.
[EPIC]: IronCache vision and master index
This issue is the index of the whole project. Every other issue hangs off the map at the bottom. If an issue is not in the map, it is either an orphan to be re-parented or a duplicate to be closed.
Why this matters
Every incumbent in this space is wrong in a specific, fixable way. Each wrong is mapped to the tenet it violates.
The gap nobody fills: a memory-safe, single-static-binary, RESP-compatible cache whose primary axis of competition is efficiency per core and memory per item, that scales out, and that is honest about what it does not do.
The five tenets, with measurable intent
The ranking is a tie-breaker rule: when two designs conflict, the higher tenet wins. Compatibility beats a clever efficiency trick; efficiency beats a scaling convenience; and AI never wins against any of the other four.
Prior art
We do not get to assert "fastest" or "most memory-efficient" without receipts. Prior-art foundations, the pinned competitor landscape, and every quantitative claim we make about an incumbent live in
docs/PRIOR_ART.md, with each claim recorded, sourced, and falsifiable indocs/prior-art/claims.yaml. Claims without a verifying test or a written correction are non-goals by policy. See #6 for the verification process and #9 for the measured single-core bar.What IronCache IS and is NOT
IS: a Rust, single static binary, RESP/Redis-wire-compatible, shared-nothing thread-per-core in-memory cache, with eviction and a memory ceiling on by default, transparent value compression, opt-in forkless snapshotting, a single-node-first but slot-ready distribution path, and an off-path AI advisor.
IS NOT (committed non-goals):
fork()+copy-on-write snapshotting, no mandatory proxy, and no reliance on host THP/overcommit tuning. See [NON-GOAL]: fork()+copy-on-write snapshotting, mandatory proxy, and host THP/overcommit tuning #11.Open decisions (cross-cutting)
These cut across multiple pillars and gate the architecture:
Acceptance criteria
The project has earned its name when, on the reproducible harness (#8) against the pinned oracle (#96), all of the following hold and each is backed by a committed test:
No headline claim ships without the corresponding row in
docs/prior-art/claims.yamland a green test, per #14.Issue map
Every planned issue is listed here, grouped by milestone. Nothing is orphaned.
M0, Charter, claims, and decisions to make before building
M1, Core architecture, decisions locked, foundational designs
M2, Advanced engine, distribution, and the harder tests
References
docs/PRIOR_ART.md, competitor landscape, the specific incumbent claims above, and their sources.docs/prior-art/claims.yaml, machine-readable claim register; every quantitative claim has a row and a verifying test or correction.docs/research/, research-issue outputs (benchmark bake-offs, eviction corpus results, value-size survey, runtime bake-off, allocator benchmarks).Post-audit additions (2026-06-13)
The pre-implementation audit (see docs/AUDIT.md) filed these issues. Decompositions of too-large issues:
Coverage-gap issues:
Implementation readiness
Sequencing of the whole tree into a critical path to first code lives in #164 and docs/ROADMAP.md. The Implementation Readiness milestone holds the 42-issue gate set;
wave:0..3labels carry the order;critical-pathmarks the thin first slice.