A Vellum skill that analyzes and reduces LLM spend by mapping call-site overrides to managed model profiles.
Vellum assistants run dozens of different LLM calls per conversation — main agent, memory ops, summarization, UI copy, and more. By default, many of these run on your highest-capability (most expensive) model. This skill walks you through pinning each call site to the right model tier so you're not using a sledgehammer to crack a nut.
Typical result: 60-65% reduction in weekly LLM spend with no meaningful quality loss.
Step 1 — Install the skill
Run this in your terminal (or ask your Vellum assistant to run it):
assistant skills add vellum-ai/llm-cost-optimizer@llm-cost-optimizerStep 2 — Restart the app
Restart your Vellum assistant app so it picks up the newly installed skill.
Step 3 — Ask your assistant to run it
Open a conversation with your Vellum assistant and say:
"Load the llm-cost-optimizer skill and run it"
Your assistant will load SKILL.md and walk through the steps with you.
- Pull your weekly spend breakdown by call site
- Read your current overrides
- Apply the recommended profile assignment
- Fix common config gotchas
- Apply the complete turnkey blob (covers all ~40 known call sites)
- Escalate to Opus on-demand with
/modelwhen you need it - Tune individual call sites if quality degrades
| Profile | Model | Use for |
|---|---|---|
balanced |
Claude Sonnet | Main agent, reasoning, memory consolidation |
cost-optimized |
Claude Haiku | Memory ops, summarization, UI copy, background tasks |
quality-optimized |
Claude Opus | On-demand only — never pinned |
- Vellum assistant (cloud or local)
assistantCLI