📊 Local LLM Test Results Repository

This repository is a collection of reproducible benchmark results for a wide range of locally‑run large language models (LLMs). It was created to give developers, researchers, and hobbyists a clear picture of how different models behave on the same hardware and prompts, without relying on cloud services.

What you’ll find here

Section	Contents	Highlights
Model performance tables	Detailed timing, token‑rate, and prompt‑/eval statistics for each model (e.g., DeepSeek‑Coder‑V2, Granite 3.2, LLaVA, Qwen 3‑VL, Gemma 3, Llama 3.2‑vision).	Shows GPU vs CPU split, model size, and raw benchmark numbers
Coding benchmark suite	“Write a JavaScript function to remove a specific JSON element” test run on models from ~8 GB up to ~120 GB.	Includes success/failure flags and sample code snippets for each model
Vision‑LLM OCR tests	OCR output from image‑aware models on a map of Ilam Park.	Demonstrates the text‑extraction capabilities of Gemma 3, LLaVA, Qwen 3‑VL, Llama 3.2‑vision and deepseek-ocr
System‑level comparisons	Example of running the same prompt on Windows 11 vs. WSL (Ubuntu 24.04) with DeepSeek‑R1.	Provides raw timing data to illustrate environment impact
Setup & reproducibility	Exact command‑line invocations (e.g., `ollama run deepseek-r1:32b --verbose`), hardware specs, and a note that every test was performed on a fresh model load with no prior context.	Guarantees that numbers are comparable across runs

Why this matters

Transparency – All raw numbers, prompts, and model versions are stored in plain markdown tables, so you can verify or extend the data yourself.
Local‑first – No API keys or remote inference; everything runs on your own machine (or VM).
Model‑agnostic – The suite works with any Ollama‑compatible model, from 8 GB quantised builds up to 120 GB instruction‑tuned giants.

Discussions

Local LLM tests

All tests were performed on a clean model load with the default Ollama settings, ensuring a fair baseline for comparison.

Happy benchmarking!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Local LLM Test Results Repository

What you’ll find here

Why this matters

Discussions

About

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
Resources		Resources
DeepSeek-Coder-V2 performance comparisons.md		DeepSeek-Coder-V2 performance comparisons.md
DeepSeek-R1 performance and reasoning comparison.md		DeepSeek-R1 performance and reasoning comparison.md
GPU-CPU off-loading.md		GPU-CPU off-loading.md
Granite3.2 vs Llava vs Qwen3-VL vs Gemma3 vs Llama3.2 - vision focused LLMs.md		Granite3.2 vs Llava vs Qwen3-VL vs Gemma3 vs Llama3.2 - vision focused LLMs.md
LLMs of less than 12GB coding comparison.md		LLMs of less than 12GB coding comparison.md
LLMs of ~15GB coding comparison.md		LLMs of ~15GB coding comparison.md
LLMs of ~25GB coding comparison.md		LLMs of ~25GB coding comparison.md
LLMs of ~40GB coding comparison.md		LLMs of ~40GB coding comparison.md
LLMs of ~50GB coding comparison.md		LLMs of ~50GB coding comparison.md
LLMs of ~70GB and over coding comparison.md		LLMs of ~70GB and over coding comparison.md
OCR efforts of vision capable LLMs.md		OCR efforts of vision capable LLMs.md
README.md		README.md
WSL (Ubuntu) vs Windows 11.md		WSL (Ubuntu) vs Windows 11.md

Folders and files

Latest commit

History

Repository files navigation

📊 Local LLM Test Results Repository

What you’ll find here

Why this matters

Discussions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!