Skip to content

vishwakneelamegam/text-diffusion-quantization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Diffusion Quantization

Squeezing text-diffusion models onto your laptop. ⚡

An open-source effort to make diffusion-based language and vision-language models run efficiently on consumer hardware through quantization, optimization, and memory-efficient inference — one model at a time.

Why

Diffusion-based (V)LMs are built for high-end GPUs and servers. This project asks a simpler question: how small can we make them before they stop being useful? Every model we tackle gets the same treatment — measure its real footprint, quantize it, prove it still works, and document exactly what fits where.

Research Areas

  • 4-bit and 8-bit quantization
  • Memory-efficient diffusion inference
  • Vision encoder compression
  • ONNX, TensorRT, and OpenVINO optimization
  • CPU and integrated GPU acceleration
  • Apple Silicon support
  • Low-RAM deployment techniques
  • Benchmarking quality vs. performance tradeoffs

Models

Model Status Result
Nemotron-Labs-Diffusion-VLM-8B ✅ 4-bit proven 5.6 GiB checkpoint, runs in 8.3 GiB (fits a 16 GB laptop), 0-point accuracy drop on MMLU + ScienceQA

More models coming. Each one follows the same workflow: footprint → quantize → verify → benchmark.

Mission

Push the boundaries of local AI by bringing state-of-the-art diffusion models to everyday laptops — and documenting every breakthrough along the way.

Status

🚧 Experimental Research Project

Contributions, benchmarks, optimization ideas, and reproducible results are welcome.

Development