Skip to content

[EPIC] IronCache: vision, tenets, scope, and the research-to-design plan #1

@ELares

Description

@ELares

[EPIC]: IronCache vision and master index

The most efficient Redis-wire-compatible cache in the world, shipped as one Rust static binary.
Tenets, ranked and non-negotiable in this order: Compatible > Efficient > Simple > Scalable > AI-Driven.

This issue is the index of the whole project. Every other issue hangs off the map at the bottom. If an issue is not in the map, it is either an orphan to be re-parented or a duplicate to be closed.

Why this matters

Every incumbent in this space is wrong in a specific, fixable way. Each wrong is mapped to the tenet it violates.

  • Redis 8, single-threaded data core (violates Efficient). The keyspace is owned by one thread; the rest of the box is spent on I/O threads and housekeeping. Throughput-per-core is capped by design, and the standard answer is "run more shards / more processes," which is operational tax, not efficiency. IronCache treats per-core throughput as the headline number.
  • Valkey, inherited the same ceiling (violates Efficient). Valkey is the credible community fork and our compatibility oracle, but it carries Redis's single-threaded keyspace architecture. It improves around the edges; it does not remove the core constraint. We respect it as the bar to beat on compatibility while beating it on efficiency.
  • KeyDB, multi-threaded but a locked shared keyspace, and dormant (violates Efficient and Simple). KeyDB multithreaded by sharing one keyspace behind locks, trading contention for cores, and the project is effectively dormant. Shared-nothing thread-per-core removes the lock instead of optimizing it.
  • DragonflyDB, vertical-only and C++ (violates Scalable and Simple). Dragonfly proved shared-nothing throughput is real, but it scales vertically (scale-up the box) rather than out, and it is a C++ codebase. IronCache is shared-nothing AND has a single-node-first, slot-ready path to horizontal distribution, in memory-safe Rust.
  • Memcached, no compatibility contract (violates Compatible). Fast and simple, but it offers no rich data types and no Redis wire contract, so it is not a drop-in. We commit to a published RESP compatibility contract instead of "mostly works."
  • Garnet, managed runtime (violates Simple and Efficient). Garnet shows the design space (RESP + tiered storage + great numbers) but runs on a managed .NET runtime with a GC and a runtime dependency. IronCache ships as one static native binary with no runtime, no GC pauses, and predictable tail latency.

The gap nobody fills: a memory-safe, single-static-binary, RESP-compatible cache whose primary axis of competition is efficiency per core and memory per item, that scales out, and that is honest about what it does not do.

The five tenets, with measurable intent

  1. Compatible (highest). Redis-wire compatibility is a published contract, not a vibe. We define compatibility tiers (Tier 0-4), pin a Valkey/Redis differential oracle, and refuse to ship a behavior claim without a differential or conformance test. Intent: an unmodified mainstream client and the common command set work against IronCache unchanged.
  2. Efficient. The headline metrics are throughput-per-core and memory-at-a-fixed-hit-ratio, not aggregate ops/sec on a big box. Intent: beat Valkey on per-core throughput and beat Redis on bytes-per-stored-item at equal hit ratio, both proven on a reproducible harness.
  3. Simple. One static binary, one config file, eviction and a memory ceiling ON by default, no sidecars or mandatory proxy, no managed runtime. Intent: install-to-first-GET measured in seconds, operable by reading one INFO output.
  4. Scalable. Single-node-first, but the storage layout is slot-ready from day one so horizontal distribution is an unlock, not a rewrite. Intent: a Redis-Cluster-compatible client contract and online slot migration without a write freeze.
  5. AI-Driven (lowest, and strictly off the data path). A background advisor that selects experts and autotunes bounded knobs against an efficiency objective. Intent: measurable headroom over a tuned W-TinyLFU + SIEVE baseline, with hard guardrails, hysteresis, rollback, and a kill-switch. Never per-request inference on the hot path.

The ranking is a tie-breaker rule: when two designs conflict, the higher tenet wins. Compatibility beats a clever efficiency trick; efficiency beats a scaling convenience; and AI never wins against any of the other four.

Prior art

We do not get to assert "fastest" or "most memory-efficient" without receipts. Prior-art foundations, the pinned competitor landscape, and every quantitative claim we make about an incumbent live in docs/PRIOR_ART.md, with each claim recorded, sourced, and falsifiable in docs/prior-art/claims.yaml. Claims without a verifying test or a written correction are non-goals by policy. See #6 for the verification process and #9 for the measured single-core bar.

What IronCache IS and is NOT

IS: a Rust, single static binary, RESP/Redis-wire-compatible, shared-nothing thread-per-core in-memory cache, with eviction and a memory ceiling on by default, transparent value compression, opt-in forkless snapshotting, a single-node-first but slot-ready distribution path, and an off-path AI advisor.

IS NOT (committed non-goals):

Open decisions (cross-cutting)

These cut across multiple pillars and gate the architecture:

Acceptance criteria

The project has earned its name when, on the reproducible harness (#8) against the pinned oracle (#96), all of the following hold and each is backed by a committed test:

  • Throughput-per-core: sustained single-core GET/SET throughput strictly exceeds Valkey 9.x single-core on identical hardware and payload mix, with the multiple reported (target: >= 1.5x per core on the standard mixed workload).
  • Memory-per-item: resident bytes-per-stored-item at a fixed 95 percent hit ratio is below Redis 8 on the value-size survey corpus (target: <= 0.7x Redis bytes/item at equal hit ratio), measured with compression in its default posture.
  • Tail latency: p99.9 GET latency at the target per-core throughput stays under a fixed bound with no GC-class pauses (target: p99.9 <= 1 ms at the documented load), demonstrating the no-managed-runtime claim.
  • Install-to-first-GET: from downloading the single binary to a successful GET against a running default instance in under 60 seconds, with zero required config edits (eviction and memory ceiling already on).
  • Redis-conformance bar: 100 percent pass on the declared Tier 0/1 command surface in the differential suite against pinned redis-server/valkey-server, with documented and tested behavior for every Tier 2+ deviation. See [DESIGN]: Conformance, differential, fuzz, property, and DST testing stack #95, [DESIGN]: Differential testing against pinned redis-server/valkey-server #97, [DECISION]: Define and publish the IronCache compatibility tiering (Tier 0-4) #16.

No headline claim ships without the corresponding row in docs/prior-art/claims.yaml and a green test, per #14.

Issue map

Every planned issue is listed here, grouped by milestone. Nothing is orphaned.

M0, Charter, claims, and decisions to make before building

M1, Core architecture, decisions locked, foundational designs

M2, Advanced engine, distribution, and the harder tests

References

  • docs/PRIOR_ART.md, competitor landscape, the specific incumbent claims above, and their sources.
  • docs/prior-art/claims.yaml, machine-readable claim register; every quantitative claim has a row and a verifying test or correction.
  • docs/research/, research-issue outputs (benchmark bake-offs, eviction corpus results, value-size survey, runtime bake-off, allocator benchmarks).

Post-audit additions (2026-06-13)

The pre-implementation audit (see docs/AUDIT.md) filed these issues. Decompositions of too-large issues:

Coverage-gap issues:

Implementation readiness

Sequencing of the whole tree into a critical path to first code lives in #164 and docs/ROADMAP.md. The Implementation Readiness milestone holds the 42-issue gate set; wave:0..3 labels carry the order; critical-path marks the thin first slice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    designDesign specification / decision record to be vettedepicLarge, multi-issue workstream that groups design issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions