TokenJam reads your agent's telemetry and tells you when to downsize, when to trim prompts, what to cache, what to script, and what plans you've already paid to figure out β then shows it all in a local browser dashboard. Runs entirely on your machine.
pipx install tokenjam
Don't have pipx? brew install pipx on macOS, apt install pipx on Debian/Ubuntu, or see docs/installation.md. pip install tokenjam also works in a clean venv.
No cloud Β· No signup Β· No vendor lock-in
TokenJam reads telemetry from every major agent runtime, framework, provider, and observability tool and surfaces savings across five areas β then brings them together in a local browser dashboard.
|
Flags sessions where a cheaper model in the same family is worth a look. Never claims quality equivalence β surfaces examples so you can spot-check. |
Shows your current caching ratio per (provider, model) and suggests Anthropic prompt-cache breakpoints from stable prefixes in your real usage. |
|
Finds clusters of deterministic |
Predicts which regions of your prompts the model gives little weight to. Surfaces what's safe to cut. |
|
Detects clusters of sessions where your agent re-plans the same work and exports reviewable skeleton templates you can drop into a slash command or script. |
A local browser dashboard that brings every analyzer's findings, your real spend, and your alerts together in one place. No cloud, no signup, fully offline. |
Run all five analyzers with tj optimize. Run several with tj optimize downsize cache reuse.
For Claude Code users β zero code, auto-backfills your last 30 days:
pipx install tokenjam
tj onboard --claude-code
tj optimize # cost-saving candidates from your actual usage
tj serve # open the dashboard at http://127.0.0.1:7391/To upgrade later: pipx upgrade tokenjam (then tj stop && tj serve & to reload the daemon, and tj --version to verify). See docs/installation.md.
For any Python agent:
from tokenjam.sdk import watch
from tokenjam.sdk.integrations.anthropic import patch_anthropic
patch_anthropic()
@watch(agent_id="my-agent")
def run(task: str) -> str:
...β Python SDK Β· TypeScript SDK Β· Codex Β· OTel-compatible agents
tj serve runs Lens at http://127.0.0.1:7391/: an Overview triage screen with spend, recoverable waste, and health at a glance; an Optimize tab showing every analyzer's findings side by side; and the standard Status, Traces, Cost, Alerts, Drift, and Budget screens. Plan-tier-aware, fully offline, no signup.
![]() |
![]() |
![]() |
![]() |
β tokenjam.dev/products/lens for the visual walkthrough.
TokenJam is also a full observability stack. The five analyzers and Lens ride on top.
- Real-time cost tracking β every LLM call priced as it happens
- Safety alerts β 13 alert types, 6 channels (ntfy, Discord, Telegram, webhook, file, stdout)
- Behavioral drift detection β Z-score baselines, no LLM required
- Schema validation β declare or infer JSON Schema for tool outputs
- OTel-native β point any OTLP exporter at
tj serveand you're done - MCP server β lets Claude Code query its own telemetry mid-session
tj optimize # all five cost-optimization analyzers
tj optimize downsize # one analyzer (positional args)
tj tokenmaxx # shareable spend-tier callout
tj status # current cost, tokens, active alerts
tj cost --since 7d # spend by agent / model / day / tool
tj alerts # everything that fired while you were away
tj drift # behavioral drift Z-scores
tj report --reuse # HTML + Markdown skeleton export for the Reuse analyzer
tj backfill claude-code # ingest historical ~/.claude/projects/ sessions
tj serve # start Lens + REST API| Topic | Where |
|---|---|
| πͺΆ Downsize / Cache / Script / Trim deep-dives | docs/optimize/ |
| π Reuse analyzer deep-dive | docs/optimize/reuse.md |
| Claude Code & Codex integration | docs/claude-code-integration.md |
| Python SDK reference | docs/python-sdk.md |
| TypeScript SDK reference | docs/typescript-sdk.md |
| Framework support (LangChain / CrewAI / etc.) | docs/framework-support.md |
| Alert channels & rule reference | docs/alerts.md |
| Backfill from Langfuse / Helicone / OTLP | docs/backfill/ |
| Configuration | docs/configuration.md |
| Architecture deep-dive | docs/architecture.md |
| Installation extras (Trim, framework patches) | docs/installation.md |
| Export to Grafana / Datadog / NDJSON | docs/export.md |
| NemoClaw sandbox observer | docs/nemoclaw-integration.md |
| Release notes | GitHub Releases |
Shipped in 0.3.x: Downsize Β· Cache Β· Script Β· Trim Β· Claude Code + Codex onboarding Β· MCP server Β· Web UI Β· Backfill adapters (Langfuse, Helicone, OTLP) Β· Period comparison Β· Routing-config export Β· Read-only policy preview
Shipped in 0.4.x:
- TokenJam Lens β local dashboard rebrand: Overview triage front-door, Optimize detail tab, real spend-over-time charts, cross-screen drill-through
- Reuse analyzer β fifth analyzer: detects clusters of sessions with repeated planning, exports reviewable skeleton templates you can convert into slash commands or scripts
- Daemon DB concurrency β per-thread DuckDB cursors so the Overview's fan-out doesn't block on a single shared connection (v0.4.1)
- Cache cost transparency β
cache_read+cache_writetoken columns surfaced in CLI + UI + API (the previously-hidden ~91% cost driver on cache-heavy workloads)
Up next:
-
tj policy add | edit | applyβ unified rule surface -
tj replayβ replay captured sessions against new model versions - TypeScript framework patches (LangChain JS, OpenAI Agents SDK)
- Vercel AI SDK & Mastra integrations
- Docker image
- GitHub Actions for CI drift/cost checks
tokenjam.dev Β· PyPI Β· npm Β· Issues
MIT License Β· Built by Metabuilder Labs
TokenJam was created by Anil Murty β reach him at anil@metabldr.com.



