Community benchmark database for running LLMs on Apple Silicon Macs
-
Updated
Apr 22, 2026 - Shell
Community benchmark database for running LLMs on Apple Silicon Macs
Hands-on CPU vs GPU benchmarks for Apple Silicon (M-series): PyTorch MPS, TensorFlow-Metal, MLX, and llama.cpp to measure TFLOP/s & tokens/sec and learn why GPUs accelerate training.
Pi extension that measures and displays model TPS in the status bar
Local LLM inference benchmarker. Measures TPS, TTFT, and VRAM pressure across context sizes — from 2K to 256K tokens. Works with llama.cpp, Ollama, LM Studio, and any OpenAI-compatible server.
Lightweight shell script to benchmark token generation speed (tok/s) across Ollama models running in Docker. Auto-discovers all installed models or accepts a custom list via CLI. Uses Ollama's internal eval_duration timing for accurate results — no dependencies beyond curl and awk.
Speedtest for AI. Test latency to every major AI provider from your terminal.
Simple tool for measuring inference engine performance under multi-user load
Test AI provider latency (TTFB, TTFT, TPS) in your CI/CD pipeline. Benchmark OpenAI, Anthropic, Google, and more.
Add a description, image, and links to the tokens-per-second topic page so that developers can more easily learn about it.
To associate your repository with the tokens-per-second topic, visit your repo's landing page and select "manage topics."