Top picks across coding, writing, search, and reasoning — so you know exactly what to plug in and why.
Data from SWE-bench Pro, GPQA Diamond, Chatbot Arena, and BenchLM. Updated May 2, 2026. Source →
Great at everything. If you only pick one, pick from here. These handle coding, writing, research, and reasoning with minimal trade-offs.
Ranked on SWE-bench Pro (1,865 real GitHub issues, multi-language, standardised scaffold — the current clean benchmark) and SWE-bench Verified. These fix bugs and ship features, not just autocomplete.
Creative writing, copywriting, long-form docs, email drafts. Ranked on EQ-Bench Creative Writing Elo (sycophancy-resistant, community-verified) and Chatbot Arena creative writing scores.
Models with real-time web access, retrieval, and research capabilities. For Hermes cron jobs that check news, summarise feeds, monitor prices, or answer time-sensitive questions.
Hard math, PhD-level science, complex multi-step logic, knowledge work, and second-brain tasks. Ranked on GPQA Diamond (PhD expert baseline: 65%), Humanity's Last Exam, and ARC-AGI-2.
Open-weight models you can run on hardware you own — no API key, no monthly bill, no data leaving your machine. Ranked by practical capability on consumer GPU and Apple Silicon hardware.
llama.cpp or Unsloth Studio — Ollama GGUFs don't yet pair the vision projector. Supersedes Qwen3.6 35B released just a week prior.ollama run deepseek-r1:32b.ollama run phi4-reasoning:14b-plus. Short context (32K, tested to 64K) is the main limit — not suitable for large documents. For math, structured analysis, and code review on laptop hardware, nothing at this weight class comes close.Each model needs an API key and a provider setting. Here's the quick version for each major provider.
Get your key at console.anthropic.com, then in Hermes settings set provider: anthropic and ANTHROPIC_API_KEY in your environment. Model names: claude-opus-4-7, claude-opus-4-6, claude-sonnet-4-6.
Get your key at platform.openai.com, then set provider: openai and OPENAI_API_KEY. Current models: gpt-5-5 (flagship, $5/$30), gpt-5-4 (affordable, $2.50/$15), gpt-5-4-mini (small), gpt-5-4-nano (nano).
Get your key at aistudio.google.com, then set provider: google and GOOGLE_API_KEY. Model names: gemini-3-1-pro-preview, gemini-3-pro, gemini-3-flash.
The easiest way to try multiple models without multiple accounts. Get a key at openrouter.ai, set provider: openrouter and OPENROUTER_API_KEY. Access Claude, GPT, Gemini, DeepSeek, Llama and more with one key.
Run models locally with llama.cpp or Ollama. DeepSeek V4-Flash (MIT, $0.14/$0.28 per 1M — released Apr 24, 2026) and Qwen 3.6-Plus are the top open-weight picks for coding. Gemma 4 26B MoE (Apache 2.0, 82.3% GPQA Diamond with only 3.8B active parameters) is the best edge/self-hosted reasoning option. Point Hermes at your local server: set provider: openai with a base_url like http://localhost:11434/v1. Your API key can be any string.
Self-host Hermes in under five minutes and bring your own API key.