Skip to content

michaeldtimpe/llmtop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llmtop

A realtime memory, swap, and model-residency monitor for MLX and other local LLM workloads on Apple Silicon.

top doesn't tell you whether your 70B model is actually pinned in unified memory or quietly being paged to swap. llmtop does.

What it shows

  • System memory — wired / active / inactive / compressed / free, plus total RAM.
  • Swap activity — used vs. total, instantaneous in/out rate, and cumulative counters. Swap-out rate is highlighted red because for an LLM workload it usually means you're about to evict weights.
  • Memory pressure — kernel pressure level (Normal / Warning / Critical from kern.memorystatus_vm_pressure_level), compressor and decompressor throughput in MB/s, file-backed pageout rate (red when non-zero — catches mmap'd weight eviction before swap moves), and purgeable headroom.
  • iogpu.wired_limit_mb — the GPU wired-memory cap (auto or whatever you've sysctl'd).
  • Active models — unified view of resident models, with size on disk vs. resident memory and a residency %. Pulled from:
    • the ollama local API (/api/ps on 127.0.0.1:11434),
    • the omlx local API (/v1/models/status, using the API key in ~/.omlx/settings.json),
    • the LM Studio local API (/api/v0/models on 127.0.0.1:1234, override with LMSTUDIO_PORT),
    • per-process open file descriptors for .safetensors / .gguf / .mlx weights (authoritative — works for mlx_lm, vllm, llama.cpp, LM Studio's helper process, etc.),
    • in-process MLX / PyTorch / llama.cpp / vLLM loads detected from mapped engine libraries (libmlx.dylib, libtorch*, libllama, …), then attributed to the most recently accessed HuggingFace cache snapshot. Covers the common case where Python reads safetensors into the Metal heap and immediately closes the fd.
    • process command-line as a fallback.
  • Matching processes — top-N by RSS, filtered by name/cmdline patterns.
  • Footer — short git rev of the running llmtop checkout (-dirty if there are local edits).

Install

llmtop is a single-file PEP 723 script. With uv installed, it bootstraps its own dependencies on first run — no venv to manage.

chmod +x llmtop
./llmtop

Optional — drop it on your PATH:

install -m 755 llmtop /usr/local/bin/llmtop
llmtop

Requires Python ≥ 3.11 and macOS (relies on vm_stat and sysctl). Built and tested on Apple Silicon.

Usage

./llmtop                            # live TUI, default filters, 1s refresh
./llmtop -i 0.5                     # 0.5s refresh
./llmtop -m ollama -m llama         # custom process filters (repeatable)
./llmtop --log run.csv              # also append CSV alongside the TUI
./llmtop --jsonl run.jsonl          # also append JSONL
./llmtop --no-tui --log run.csv     # headless logger (great for benchmarks)
./llmtop --pane procs               # show only the matching-processes pane
./llmtop --pane system --pane models  # show only those two panes

Flags

Flag Description
-i, --interval Sample interval in seconds (default 1.0).
-m, --match Substring to match against process name + cmdline. Repeatable. Defaults: python, mlx, omlx, ollama, llama, lm-studio, lmstudio, lm studio, vllm, text-generation-inference, tgi, candle.
--log PATH Append a CSV row per sample (one column per metric).
--jsonl PATH Append a JSON object per sample (full process + model breakdown).
--no-tui Skip the TUI; print a one-line summary per tick. Use with --log/--jsonl for unattended runs.
--pane NAME Show only this pane. Repeatable. Choices: system, pressure, models, procs. Defaults to all four.

ctrl-c exits cleanly and flushes the log files.

Output schemas

CSV (--log)

ts, wired_mb, active_mb, compressed_mb, free_mb, swap_used_mb, swap_total_mb, swapins_total, swapouts_total, swapin_rate, swapout_rate, top_pid, top_name, top_rss_mb

JSONL (--jsonl)

One object per sample, including the full top-8 process list and every detected model with size + resident bytes. Suitable for piping into jq, DuckDB, or a notebook.

How model detection works

For each matched process, llmtop tries three strategies in order, most specific first:

  1. Open weight files via lsof. Groups .safetensors / .gguf / .mlx files (≥ 50 MB) by a derived model id:
    • HuggingFace cache layout (models--org--repo/snapshots/<hash>/<file>) → org/repo.
    • Single-file ggufs → filename.
    • Otherwise → containing directory name.
  2. Engine library signature + HuggingFace cache recency. When step 1 finds nothing (typical for MLX / mlx-vlm / transformers workloads, which read safetensors into the Metal heap and then close the file descriptor), the mapped libraries are scanned for libmlx.dylib / mlx.metallib / libtorch* / libllama / libggml / site-packages/vllm/. The process's loaded model is then guessed as the ~/.cache/huggingface/hub/models--* directory whose atime is most recently bumped after the process started.
  3. Cmdline parsing. Falls back to --model / --hf-repo / --model-path flags.

Resident bytes are the process RSS (or, when known, the model's on-disk size, whichever is smaller); size on disk is the sum of weight files. Residency percent close to 100% means the model is fully paged in — anything lower (especially with active swap-outs) means parts are getting evicted.

When ollama, omlx, or LM Studio are running, their local APIs supply authoritative numbers for the models they host.

Why not top/htop/asitop?

  • top and htop show RSS, but conflate the model with everything else the process is doing.
  • asitop is great for power and frequency, but doesn't track which model weights are resident.
  • llmtop is specifically about: is my model in unified memory, and is it staying there?

Security

See SECURITY.md for how to report vulnerabilities.

License

Apache License 2.0 — see LICENSE and NOTICE.

About

Realtime memory/swap & model-residency monitor for MLX and local LLMs on Apple Silicon

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages