llmtop

A realtime memory, swap, and model-residency monitor for MLX and other local LLM workloads on Apple Silicon.

top doesn't tell you whether your 70B model is actually pinned in unified memory or quietly being paged to swap. llmtop does.

What it shows

System memory — wired / active / inactive / compressed / free, plus total RAM.
Swap activity — used vs. total, instantaneous in/out rate, and cumulative counters. Swap-out rate is highlighted red because for an LLM workload it usually means you're about to evict weights.
Memory pressure — kernel pressure level (Normal / Warning / Critical from kern.memorystatus_vm_pressure_level), compressor and decompressor throughput in MB/s, file-backed pageout rate (red when non-zero — catches mmap'd weight eviction before swap moves), and purgeable headroom.
iogpu.wired_limit_mb — the GPU wired-memory cap (auto or whatever you've sysctl'd).
Active models — unified view of resident models, with size on disk vs. resident memory and a residency %. Pulled from:
- the ollama local API (/api/ps on 127.0.0.1:11434),
- the omlx local API (/v1/models/status, using the API key in ~/.omlx/settings.json),
- the LM Studio local API (/api/v0/models on 127.0.0.1:1234, override with LMSTUDIO_PORT),
- per-process open file descriptors for .safetensors / .gguf / .mlx weights (authoritative — works for mlx_lm, vllm, llama.cpp, LM Studio's helper process, etc.),
- in-process MLX / PyTorch / llama.cpp / vLLM loads detected from mapped engine libraries (libmlx.dylib, libtorch*, libllama, …), then attributed to the most recently accessed HuggingFace cache snapshot. Covers the common case where Python reads safetensors into the Metal heap and immediately closes the fd.
- process command-line as a fallback.
Matching processes — top-N by RSS, filtered by name/cmdline patterns.
Footer — short git rev of the running llmtop checkout (-dirty if there are local edits).

Install

llmtop is a single-file PEP 723 script. With uv installed, it bootstraps its own dependencies on first run — no venv to manage.

chmod +x llmtop
./llmtop

Optional — drop it on your PATH:

install -m 755 llmtop /usr/local/bin/llmtop
llmtop

Requires Python ≥ 3.11 and macOS (relies on vm_stat and sysctl). Built and tested on Apple Silicon.

Usage

./llmtop                            # live TUI, default filters, 1s refresh
./llmtop -i 0.5                     # 0.5s refresh
./llmtop -m ollama -m llama         # custom process filters (repeatable)
./llmtop --log run.csv              # also append CSV alongside the TUI
./llmtop --jsonl run.jsonl          # also append JSONL
./llmtop --no-tui --log run.csv     # headless logger (great for benchmarks)
./llmtop --pane procs               # show only the matching-processes pane
./llmtop --pane system --pane models  # show only those two panes

Flags

Flag	Description
`-i`, `--interval`	Sample interval in seconds (default `1.0`).
`-m`, `--match`	Substring to match against process name + cmdline. Repeatable. Defaults: `python, mlx, omlx, ollama, llama, lm-studio, lmstudio, lm studio, vllm, text-generation-inference, tgi, candle`.
`--log PATH`	Append a CSV row per sample (one column per metric).
`--jsonl PATH`	Append a JSON object per sample (full process + model breakdown).
`--no-tui`	Skip the TUI; print a one-line summary per tick. Use with `--log`/`--jsonl` for unattended runs.
`--pane NAME`	Show only this pane. Repeatable. Choices: `system`, `pressure`, `models`, `procs`. Defaults to all four.

ctrl-c exits cleanly and flushes the log files.

Output schemas

CSV (`--log`)

ts, wired_mb, active_mb, compressed_mb, free_mb, swap_used_mb, swap_total_mb, swapins_total, swapouts_total, swapin_rate, swapout_rate, top_pid, top_name, top_rss_mb

JSONL (`--jsonl`)

One object per sample, including the full top-8 process list and every detected model with size + resident bytes. Suitable for piping into jq, DuckDB, or a notebook.

How model detection works

For each matched process, llmtop tries three strategies in order, most specific first:

Open weight files via lsof. Groups .safetensors / .gguf / .mlx files (≥ 50 MB) by a derived model id:
- HuggingFace cache layout (models--org--repo/snapshots/<hash>/<file>) → org/repo.
- Single-file ggufs → filename.
- Otherwise → containing directory name.
Engine library signature + HuggingFace cache recency. When step 1 finds nothing (typical for MLX / mlx-vlm / transformers workloads, which read safetensors into the Metal heap and then close the file descriptor), the mapped libraries are scanned for libmlx.dylib / mlx.metallib / libtorch* / libllama / libggml / site-packages/vllm/. The process's loaded model is then guessed as the ~/.cache/huggingface/hub/models--* directory whose atime is most recently bumped after the process started.
Cmdline parsing. Falls back to --model / --hf-repo / --model-path flags.

Resident bytes are the process RSS (or, when known, the model's on-disk size, whichever is smaller); size on disk is the sum of weight files. Residency percent close to 100% means the model is fully paged in — anything lower (especially with active swap-outs) means parts are getting evicted.

When ollama, omlx, or LM Studio are running, their local APIs supply authoritative numbers for the models they host.

Why not `top`/`htop`/`asitop`?

top and htop show RSS, but conflate the model with everything else the process is doing.
asitop is great for power and frequency, but doesn't track which model weights are resident.
llmtop is specifically about: is my model in unified memory, and is it staying there?

Security

See SECURITY.md for how to report vulnerabilities.

License

Apache License 2.0 — see LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
llmtop		llmtop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmtop

What it shows

Install

Usage

Flags

Output schemas

CSV (`--log`)

JSONL (`--jsonl`)

How model detection works

Why not `top`/`htop`/`asitop`?

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmtop

What it shows

Install

Usage

Flags

Output schemas

CSV (--log)

JSONL (--jsonl)

How model detection works

Why not top/htop/asitop?

Security

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

CSV (`--log`)

JSONL (`--jsonl`)

Why not `top`/`htop`/`asitop`?

Packages