DeepSeek-R1 7B INT4 at 69.3 tok/s on a $300 RTX 3060. Faster than llama.cpp, vLLM, and NVIDIA TensorRT-LLM. Is one developer + Ai really better than the entire industry?
-
Updated
May 19, 2026 - Python
DeepSeek-R1 7B INT4 at 69.3 tok/s on a $300 RTX 3060. Faster than llama.cpp, vLLM, and NVIDIA TensorRT-LLM. Is one developer + Ai really better than the entire industry?
🚀 A high-performance, GPU-accelerated (NVIDIA CUDA 13.2) Dev Container for Machine Learning and Deep Learning. Features Python 3.12 (uv), Node 22 (fnm), JupyterLab, and Docker-in-Docker, keeping your host system 100% clean.
Add a description, image, and links to the rtx-3060 topic page so that developers can more easily learn about it.
To associate your repository with the rtx-3060 topic, visit your repo's landing page and select "manage topics."