docs(site): add Opus 4.7, update star count to 105k+, fix stale 2025→2026 meta, refresh April 20

- index.html: star badge 100k+ → 105k+ (NousResearch/hermes-agent at 105k)
- models/index.html: bump meta description year 2025 → 2026
- models/index.html: update date to April 20, 2026
- models/index.html: Opus 4.6 → Opus 4.7 across all 4 sections (Overall, Coding, Writing, Research)
  - New stats: 70% CursorBench, 98.5% XBOW visual-acuity, 3.75MP image resolution
  - Picker card 'Complex coding' updated to Opus 4.7
  - Anthropic setup box: add claude-opus-4-7 as first model name
- compare/claude-code.html: update Opus 4.6 reference to 4.7
- compare/perplexity-computer.html: update Opus 4.6 reference to 4.7
This commit is contained in:
nesquena-hermes
2026-04-20 19:45:00 +00:00
parent 87315afb19
commit cfb48eaab4
4 changed files with 32 additions and 32 deletions
+1 -1
View File
@@ -318,7 +318,7 @@
<h3>The provider lock-in question</h3>
<p>Claude Code supports <strong>Anthropic's API, AWS Bedrock, Google Vertex AI, and Anthropic Foundry</strong>. These are meaningful options for enterprise deployment, but they share one constraint: every inference uses a Claude model. If Claude pricing changes, if a competitor releases a model significantly better for a specific task, or if you simply want to use a local open-source model for cost or privacy reasons, you cannot do that within Claude Code. The tool is by design Claude-native.</p>
<p>Hermes is provider-agnostic. You configure which provider and model to use, and can change it at any time — or route different tasks to different providers. GPT-5.4 for one thing, Claude Opus 4.6 for another, a local Ollama model for private data. This flexibility is especially valuable when the model landscape is moving as fast as it currently is: you're not locked into today's best option when something better ships.</p>
<p>Hermes is provider-agnostic. You configure which provider and model to use, and can change it at any time — or route different tasks to different providers. GPT-5.4 for one thing, Claude Opus 4.7 for another, a local Ollama model for private data. This flexibility is especially valuable when the model landscape is moving as fast as it currently is: you're not locked into today's best option when something better ships.</p>
<p>For most users evaluating these two tools for coding work, provider flexibility is a secondary concern — Claude is genuinely excellent at coding tasks, and Claude Code's tight integration is an advantage. But for users with strong privacy requirements, cost sensitivity, or a preference to hedge model risk, Hermes's open provider model is a concrete benefit.</p>
<h3>Using them together</h3>
+1 -1
View File
@@ -299,7 +299,7 @@
Perplexity Computer — launched February 25, 2026 for Max subscribers and expanded to all Pro subscribers in March 2026 — is a cloud-based agentic workflow engine. Its tagline is "Chat answers. Agents do tasks. <strong>Computer works.</strong>" You describe a goal in natural language; the system decomposes it into subtasks, dispatches parallel sub-agents, and runs everything in an isolated cloud sandbox (2 vCPUs, 8 GB RAM, Python + Node.js pre-installed) with real browser access and a real filesystem.
</p>
<p>
The core differentiator is <strong>multi-model orchestration at scale</strong>. Perplexity Computer routes across 19+ frontier models — Claude Opus 4.6, GPT-5.4, Gemini, Grok, and others — automatically selecting the best model for each subtask. No single model handles the whole pipeline; orchestration is the product. Tasks can run for hours, check in only when genuinely blocked, and notify you on completion via email or push notification.
The core differentiator is <strong>multi-model orchestration at scale</strong>. Perplexity Computer routes across 19+ frontier models — Claude Opus 4.7, GPT-5.4, Gemini, Grok, and others — automatically selecting the best model for each subtask. No single model handles the whole pipeline; orchestration is the product. Tasks can run for hours, check in only when genuinely blocked, and notify you on completion via email or push notification.
</p>
<p>
It also ships with <strong>400+ prebuilt OAuth connectors</strong> (Google Drive, Gmail, Notion, Jira, GitHub, Slack, Salesforce, Snowflake, and more) and MCP server support added in March 2026. The connector ecosystem is broad, though independent reviewers found several connectors (Vercel, Ahrefs, GitHub OAuth) unreliable in practice — the GitHub personal access token path worked better than the official connector.
+1 -1
View File
@@ -1448,7 +1448,7 @@
</a>
</div>
<div class="hero-stats">
<span class="stat-badge"><span class="dot"></span>100k+ GitHub stars</span>
<span class="stat-badge"><span class="dot"></span>105k+ GitHub stars</span>
<span class="stat-badge"><span class="dot"></span>Multi-surface access</span>
<span class="stat-badge"><span class="dot"></span>47 built-in tools</span>
<span class="stat-badge"><span class="dot"></span>MIT licensed</span>
+29 -29
View File
@@ -14,7 +14,7 @@
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Best AI Models for Hermes — 2026 Guide</title>
<meta name="description" content="The best AI models to use with Hermes in 2025 — top picks for coding, writing, web search, and all-around use.">
<meta name="description" content="The best AI models to use with Hermes in 2026 — top picks for coding, writing, web search, and all-around use.">
<style>
:root {
--bg-primary: #0d1117;
@@ -617,7 +617,7 @@
<div class="hero-badge">2026 Model Guide</div>
<h1>Best AI models for <em>Hermes</em></h1>
<p class="hero-sub">Top picks across coding, writing, search, and reasoning — so you know exactly what to plug in and why.</p>
<p class="hero-note">Data from SWE-bench Pro, GPQA Diamond, Chatbot Arena, and BenchLM. Updated April 13, 2026. <a href="https://lmarena.ai/leaderboard" target="_blank">Source →</a></p>
<p class="hero-note">Data from SWE-bench Pro, GPQA Diamond, Chatbot Arena, and BenchLM. Updated April 20, 2026. <a href="https://lmarena.ai/leaderboard" target="_blank">Source →</a></p>
</section>
<!-- =========================================
@@ -638,18 +638,18 @@
<div class="model-card" style="--model-color: #f0a500;">
<div class="rank-badge gold">🥇</div>
<div class="model-info">
<h3>Claude Opus 4.6</h3>
<div class="model-meta">Anthropic &middot; 1M context &middot; $5 / $25 per 1M tokens</div>
<div class="model-why">#2 on Chatbot Arena (1,503 Elo) — the highest score of any publicly available model for complex reasoning. Leads instruction-following, long-form work, and agentic tasks. The thinking variant pushes even further. Slower than Sonnet but makes fewer mistakes on genuinely hard problems. Use it for your most important work.</div>
<h3>Claude Opus 4.7</h3>
<div class="model-meta">Anthropic &middot; 1M context &middot; $5 / $25 per 1M tokens &middot; Released Apr 16, 2026</div>
<div class="model-why">Anthropic's most capable generally available model as of April 2026. 70% on CursorBench (vs 58% for Opus 4.6), 98.5% XBOW visual-acuity (vs 54.5%), 3x more resolved production tasks at Rakuten. Catches its own logical faults mid-planning. 3x higher image resolution — 3.75MP vs 1.15MP. Substantially better at multi-session memory and long agentic work. Use it for your most demanding tasks where quality matters more than speed.</div>
<div class="model-pills">
<span class="pill gold">Arena #2 (1503 Elo)</span>
<span class="pill gold">New Anthropic flagship</span>
<span class="pill green">Best all-rounder</span>
<span class="pill blue">1M context</span>
</div>
</div>
<div class="model-score">
<span class="score-val">1504</span>
<span class="score-label">Arena Elo</span>
<span class="score-val">70%</span>
<span class="score-label">CursorBench</span>
</div>
</div>
@@ -756,18 +756,18 @@
<div class="model-card" style="--model-color: #f0a500;">
<div class="rank-badge gold">🥇</div>
<div class="model-info">
<h3>Claude Opus 4.6</h3>
<h3>Claude Opus 4.7</h3>
<div class="model-meta">Anthropic &middot; 1M context &middot; $5 / $25 per 1M tokens</div>
<div class="model-why">The engine powering Claude Code and Cursor — the two most-used AI coding tools. 80.8% on SWE-bench Verified and 65.4% on Terminal-Bench (DevOps/CLI tasks). Where Opus shines specifically is deep multi-file reasoning: architecture decisions, debugging subtle cross-module issues, reviewing large PRs. Its extended 1M context fits entire codebases.</div>
<div class="model-why">Powers Claude Code and Cursor — the two most-used AI coding tools. 70% on CursorBench (vs 58% for Opus 4.6), 90.9% on BigLaw Bench at high effort, 10-15% task success lift at Factory, 10%+ recall improvement on complex PRs at CodeRabbit. Where Opus 4.7 shines specifically is deep multi-file reasoning with self-correction: it catches its own logical faults during planning before reporting back. The 1M context window fits entire codebases.</div>
<div class="model-pills">
<span class="pill gold">Powers Claude Code</span>
<span class="pill green">Deep reasoning</span>
<span class="pill green">Self-correcting</span>
<span class="pill blue">1M context</span>
</div>
</div>
<div class="model-score">
<span class="score-val">80.8%</span>
<span class="score-label">SWE-bench Verified</span>
<span class="score-val">70%</span>
<span class="score-label">CursorBench</span>
</div>
</div>
@@ -883,18 +883,18 @@
<div class="model-card" style="--model-color: #58a6ff;">
<div class="rank-badge silver">🥈</div>
<div class="model-info">
<h3>Claude Opus 4.6</h3>
<h3>Claude Opus 4.7</h3>
<div class="model-meta">Anthropic &middot; 1M context &middot; $5 / $25 per 1M tokens</div>
<div class="model-why">Leads the Mazur creative writing benchmark (8.53) and instruction-following Arena (1,500 Elo — highest of any model tested). The thinking variant (8.56 Mazur) pushes further for complex literary work. #1 on Chatbot Arena overall at 1,504 Elo. Best for projects demanding deep voice and stylistic range — where spending more per token is justified by the work's importance.</div>
<div class="model-why">The upgraded heir to Opus 4.6 on literary and instruction-following benchmarks. More direct and opinionated tone than 4.6 — fewer hedges, more conviction. Described as "best model in the world for building dashboards and data-rich interfaces" by Val Town. Raises the bar on professional output quality — interfaces, slides, long-form docs. Best for projects demanding precision and creative depth where spending more per token is worth it.</div>
<div class="model-pills">
<span class="pill blue">Mazur #1 (8.53)</span>
<span class="pill gold">IF Elo #1</span>
<span class="pill blue">Professional output</span>
<span class="pill gold">More opinionated</span>
<span class="pill green">Literary depth</span>
</div>
</div>
<div class="model-score">
<span class="score-val">8.53</span>
<span class="score-label">Mazur score</span>
<span class="score-val">4.7</span>
<span class="score-label">Latest Anthropic</span>
</div>
</div>
@@ -1119,18 +1119,18 @@
<div class="model-card" style="--model-color: #3fb950;">
<div class="rank-badge bronze">🥉</div>
<div class="model-info">
<h3>Claude Opus 4.6</h3>
<div class="model-meta">Anthropic &middot; 1M context &middot; 89.2% GPQA Diamond</div>
<div class="model-why">89.2% GPQA Diamond and #1 on Chatbot Arena overall (1,504 Elo). Best for long-context reasoning tasks requiring both analytical depth and 1M-token coherence — feeding in entire research corpora, reviewing large codebases, or synthesising book-length material. Claude Sonnet 4.6 leads GDPval-AA structured knowledge retrieval (1,633 Elo #1) if throughput and cost matter.</div>
<h3>Claude Opus 4.7</h3>
<div class="model-meta">Anthropic &middot; 1M context &middot; 128K max output</div>
<div class="model-why">21% fewer errors on OfficeQA Pro document reasoning (Databricks), 13% resolution lift on 93-task coding benchmark (Morph), and 90.9% on BigLaw Bench (Harvey). Now accepts images up to 3.75MP — 3x more than Opus 4.6 — making it the strongest pick for visual research, dense diagrams, and chart analysis. Best for research requiring both depth and 1M-token coherence. Claude Sonnet 4.6 leads GDPval-AA retrieval (1,633 Elo) if throughput matters.</div>
<div class="model-pills">
<span class="pill gold">Arena #1 overall</span>
<span class="pill gold">Best Anthropic model</span>
<span class="pill blue">1M context</span>
<span class="pill green">Long-context depth</span>
<span class="pill green">3.75MP vision</span>
</div>
</div>
<div class="model-score">
<span class="score-val">89.2%</span>
<span class="score-label">GPQA Diamond</span>
<span class="score-val">90.9%</span>
<span class="score-label">BigLaw Bench</span>
</div>
</div>
@@ -1312,7 +1312,7 @@
<div style="display:grid;gap:16px;">
<div class="setup-box">
<h4>Anthropic (Claude models)</h4>
<p>Get your key at <a href="https://console.anthropic.com" target="_blank" style="color:var(--blue)">console.anthropic.com</a>, then in Hermes settings set <code>provider: anthropic</code> and <code>ANTHROPIC_API_KEY</code> in your environment. Model names: <code>claude-opus-4-6</code>, <code>claude-sonnet-4-6</code>, <code>claude-sonnet-4-5</code>.</p>
<p>Get your key at <a href="https://console.anthropic.com" target="_blank" style="color:var(--blue)">console.anthropic.com</a>, then in Hermes settings set <code>provider: anthropic</code> and <code>ANTHROPIC_API_KEY</code> in your environment. Model names: <code>claude-opus-4-7</code>, <code>claude-opus-4-6</code>, <code>claude-sonnet-4-6</code>.</p>
</div>
<div class="setup-box">
<h4>OpenAI (GPT models)</h4>
@@ -1347,8 +1347,8 @@
</div>
<div class="picker-card">
<div class="use-case"><span class="uc-icon">💻</span> Complex coding</div>
<div class="recommendation">Claude Opus 4.6</div>
<div class="why-short">Powers Claude Code &amp; Cursor. Deep multi-file reasoning.</div>
<div class="recommendation">Claude Opus 4.7</div>
<div class="why-short">Newest Anthropic flagship. Powers Claude Code. Self-correcting reasoning.</div>
</div>
<div class="picker-card">
<div class="use-case"><span class="uc-icon">✍️</span> Creative writing</div>