Skip to content

manishklach/sram-interface-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SRAM Interface Demo: A Memory-Control-Plane Prototype

Build Status Live Demo

A minimal systems prototype showing how software can explicitly bind hot data to SRAM-like regions instead of relying on implicit cache behavior.

Includes:

  • mock SRAM backend (portable)
  • Linux /dev/mem MMIO backend
  • prototype mem_hint API

🌐 Live microsite: https://manishklach.github.io/sram-interface-demo/


Low-level memory experiments

The repository includes architecture-specific helpers for exploring the interaction between software and the memory hierarchy:

  • Barriers: Strict ordering for MMIO/SRAM apertures.
  • Timing: Cycle-accurate measurement using rdtsc or cntvct.
  • Cache Hints: Manual control via clflush and prefetch instructions.
  • Virtual Memory: Educational demos of TLB and Page Table concepts.

Warning: This repository does not directly control page tables or TLBs from user-space. These experiments are for educational and architectural demonstration only. Detailed notes are available in docs/cache_tlb_notes.md.

Architecture

Memory Control Plane

This prototype exposes a thin software-visible control plane over SRAM-style memory, instead of relying purely on implicit cache behavior.


Why this exists

Standard CPUs hide memory placement behind multiple layers of abstraction (caches, coherence protocols, and speculative execution). While efficient for general-purpose workloads, this model breaks down for:

  • KV-cache heavy inference: Where deterministic residency prevents tail latency spikes.
  • Tensor tiling: Where explicit staging overlaps computation with data movement.
  • Multi-tier memory systems: Where placement between SRAM, HBM, and CXL must be software-directed.

This repository explores a different idea: explicit software-directed residency in fast memory.


What this is / is not

What this is

  • A small prototype of a memory control plane.
  • An educational systems demo for hardware/software co-design.
  • A demonstration of explicit residency control.

What this is not

  • Not a production-ready memory driver.
  • Not a kernel-level manager (though it explores the interface).
  • Not a replacement for standard CPU caches or coherence.

Quickstart

Build and run the demo on any environment (defaults to Mock mode):

make
./sram_demo

Sample Output

[mem_hint] reserve "kv_tile_0"
[mem_hint] bind → SRAM offset 0x100
[mem_hint] write → 39 bytes
[mem_hint] readback → verified ✔

Ordered MMIO access (assembly layer)

The most important part of interacting with SRAM/MMIO isn’t the store—it’s the ordering.

static inline void sram_barrier(void) {
#if defined(__x86_64__)
    __asm__ volatile("mfence" ::: "memory");
#elif defined(__aarch64__)
    __asm__ volatile("dmb sy" ::: "memory");
#else
    __sync_synchronize();
#endif
}

For MMIO/SRAM apertures, we follow an ordered pattern: barrier → store/load → barrier

Without these architecture-specific instructions, CPU reordering can break correctness by allowing the hardware to observe a "ready" bit before the data payload has actually reached the aperture.

Ordered MMIO Access For SRAM/MMIO apertures, correctness depends on ordering: barrier → store/load → barrier.


Mock-mode benchmark

The repository includes a latency benchmark to measure the overhead of the memory-control-plane logic in mock mode. This is useful for validating the control path but does not reflect real SRAM hardware latency.

make latency_bench
./latency_bench
./latency_bench 5000000

Example Output

[bench] backend: mock SRAM
[bench] iterations: 1000000
[bench] write32 avg: 4.52 ns/op
[bench] read32 avg: 3.10 ns/op
[bench] verify: OK ✔

Backends

Backend Platform Purpose
Mock SRAM Any OS Development / testing
/dev/mem Linux Hardware MMIO prototype

Use cases

  • KV-cache tile residency: Binding active attention blocks to on-chip SRAM.
  • Tensor tile staging: Manual orchestration of matrix multiplication tiles.
  • MoE expert placement: Promoting active experts to fast residency.
  • FPGA scratchpad memory: Software management of non-coherent BRAM.

Roadmap

  • Mock backend implementation
  • MMIO/devmem backend implementation
  • Assembly-backed ordering layer
  • UIO/VFIO secure mapping backend
  • /dev/mem_hint conceptual kernel interface
  • Compiler/runtime intent integration

License

MIT