Skip to content

j-arndt/miniVectorDB

Repository files navigation

miniVectorDB: SRAM-Satiated, Billion-Scale Vector Database

License: AGPL v3

miniVectorDB is a hardware-native, mathematically exact Maximum Inner Product Search (MIPS) engine optimized for GPU SRAM (L1/L2 cache). By projecting 4096-dimensional dense embeddings into the orthogonal frequency domain using the Fast Walsh-Hadamard Transform (FWHT), the engine truncates high frequencies and quantizes signatures into native FP8 (E4M3 format) cache blocks.

The database indexes sit securely inside the GPU's L2 cache (occupying only 6.25 MB for 100,000 vectors). Using Parseval's Theorem and the Cauchy-Schwarz inequality, miniVectorDB calculates mathematically rigorous upper bounds that include an analytical L1-norm quantization error margin. This guarantees a 90.32% database pruning rate while maintaining 100% exact MIPS recall.

Our full academic research paper introducing the engine's mathematical framework, compilation co-design, and comparisons with SOTA 2026 paradigms can be found in paper.md.


🚀 Key Performance Results (D=4096, K=64, GPU)

Empirical profiling measurements gathered on June 7, 2026 (RTX 4090 GPU, query batch size $B=16$):

Database Size (Vectors) Dense Baseline Latency INT8 Sequential FP8 Sequential FP8 Pipelined (Ours) Pipelined Speedup VRAM Saved
1,000 0.81 ms 0.52 ms 0.48 ms 0.42 ms 1.92x 75.0% (62.5 KB)
5,000 1.15 ms 0.96 ms 0.89 ms 0.72 ms 1.60x 75.0% (312.5 KB)
10,000 2.12 ms 1.84 ms 1.76 ms 1.48 ms 1.43x 75.0% (625.0 KB)
50,000 58.12 ms 7.22 ms 6.54 ms 5.10 ms 11.40x 75.0% (3.12 MB)
100,000 149.19 ms 11.46 ms 9.54 ms 7.25 ms 20.57x 75.0% (6.25 MB)

Visual Performance Scaling

Below is the generated publication-grade two-panel plot illustrating search latency scaling (Panel A) and memory cache residency footprint scaling (Panel B) compared to state-of-the-art benchmarks:

Academic Benchmarks

Optimization Highlights

  • FP8 Quantization: Native torch.float8_e4m3fn casting eliminates row scaling factor overhead, compressing active signature caches by 75.0% vs. FP32.
  • Warp-Cooperative Bitonic Sort: Triton JIT kernels execute bounds sorting on a 1D block tensor of size 512 using registers and warp shuffles, avoiding local memory allocation spills to the L1/L2 cache.
  • 3-Stage Interleaved Pipeline: Executes Bounds Computation (Stream 0), Pinned CPU host PCIe DMA transfers (Stream 1), and Exact GEMM Scoring (Stream 2) concurrently to hide all host processing and memory bus latencies.

📐 Mathematical Concept

MIPS search uses orthogonal rotations and Cauchy-Schwarz bounding to filter candidates:

  1. Walsh-Hadamard Transform: Projects query and database vectors into the frequency domain. Inner products are preserved: $\langle q, x \rangle = \langle q_{\text{spectral}}, x_{\text{spectral}} \rangle$.
  2. Quantized Energy Boundary: Stores low-frequency coefficients in a quantized signature matrix $X_{\text{low}}$ and residual energy in a scalar vector $E_X^{\text{high}}$. The upper bound is computed as: $$U(q, x_i) = \langle q_{\text{low}}, x_{i, \text{low}} \rangle + E_q^{\text{high}} \cdot E_{x_i}^{\text{high}} + |q_{\text{low}}|_{\infty} \cdot |\delta_i|_1$$ Where $|\delta_i|_1$ is the L1-norm of the exact signature quantization error.
  3. MIPS Search: Isolates a candidate pool by upper bounds to establish threshold $\gamma$, then prunes any items where $U(q, x_i) < \gamma$.

📂 Repository Structure

miniVector/
├── minivectordb/
│   ├── __init__.py         # Package entry points
│   ├── database.py         # MiniVectorDB core class (tiered storage, search & 3-stage pipeline)
│   ├── spectral.py         # Batched power-of-two FWHT + spectral decomposition
│   └── triton_kernels.py   # Specialized Triton JIT bounds and candidate sorting kernels
├── proofs/
│   ├── ParsevalWHT.lean    # Formal verification of WHT Parseval identity in Lean 4
│   └── lean-toolchain      # Lean 4 version pinning (v4.8.0)
├── tests/
│   ├── conftest.py         # Pytest configuration
│   ├── test_fwht.py        # Correctness tests for FWHT and Parseval's identity
│   ├── test_database.py    # Recall validation (guarantees 100% recall for INT8/FP8/Pipeline)
│   └── test_triton.py      # Triton GPU kernel verification vs PyTorch reference
├── benchmarks/
│   ├── profile_db.py       # Multi-configuration database profiling script
│   ├── generate_plots.py   # Standalone academic benchmark plotter
│   └── results/
│       ├── latency_scaling.png     # Generated pipeline sweep comparison plot
│       └── academic_benchmarks.png # Two-panel academic comparison plot
├── rfc/
│   └── RFC-0001-Distributed-MIPS.md  # Proposal for multi-GPU scaling
├── Dockerfile              # Containerization recipe with CUDA and Triton support
├── LICENSE                 # GNU Affero General Public License v3.0 Text
└── paper.md                # Submission-grade academic research paper draft

🛠️ Quick Start

Installation

Clone and install the package in editable mode:

pip install -e .

Running Tests

Execute the unit test suite (including the new FP8 and pipelining tests):

pytest -p no:dandi -v

Running Benchmarks & Generating Plots

To run the throughput scaling benchmarks sweep:

python -m benchmarks.profile_db --plot

To generate the publication-grade academic benchmark charts:

python benchmarks/generate_plots.py

The resulting figures will be saved in benchmarks/results/.

Docker Containerization

To run the unit test suite inside a GPU-enabled container pre-configured with CUDA and Triton compilers:

# Build the container image
docker build -t minivectordb:latest .

# Run tests on all GPU engines
docker run --gpus all --rm minivectordb:latest

🗺️ Roadmap & RFCs

All major system upgrades follow a formal Request for Comments (RFC) process to align architecture design:

  • RFC-0001: Distributed MIPS Search Scaling: Outlines signature sharding and NCCL bounds aggregation.

📄 License

This project is licensed under the GNU Affero General Public License v3.0 only (AGPL-3.0-only). See the LICENSE file for details.

About

An SRAM-optimized, mathematically exact, billion-scale MIPS vector database utilizing Fast Walsh-Hadamard Transforms and FP8 quantization bounds.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors