mirror of
https://github.com/nesquena/hermes-webui.git
synced 2026-05-27 04:00:37 +00:00
docs: update models page — Kimi K2.6 replaces K2.5 (58.6% SWE-Pro #1 coding, writing section updated)
This commit is contained in:
+25
-23
@@ -617,7 +617,7 @@
|
||||
<div class="hero-badge">2026 Model Guide</div>
|
||||
<h1>Best AI models for <em>Hermes</em></h1>
|
||||
<p class="hero-sub">Top picks across coding, writing, search, and reasoning — so you know exactly what to plug in and why.</p>
|
||||
<p class="hero-note">Data from SWE-bench Pro, GPQA Diamond, Chatbot Arena, and BenchLM. Updated April 20, 2026. <a href="https://lmarena.ai/leaderboard" target="_blank">Source →</a></p>
|
||||
<p class="hero-note">Data from SWE-bench Pro, GPQA Diamond, Chatbot Arena, and BenchLM. Updated April 21, 2026. <a href="https://lmarena.ai/leaderboard" target="_blank">Source →</a></p>
|
||||
</section>
|
||||
|
||||
<!-- =========================================
|
||||
@@ -771,8 +771,26 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="model-card" style="--model-color: #58a6ff;">
|
||||
<div class="model-card" style="--model-color: #ff9040;">
|
||||
<div class="rank-badge silver">🥈</div>
|
||||
<div class="model-info">
|
||||
<h3>Kimi K2.6</h3>
|
||||
<div class="model-meta">Moonshot AI · 262K context · $0.60 / $3.00 per 1M tokens · Open weights</div>
|
||||
<div class="model-why">Leads SWE-bench Pro at 58.6% — beating GPT-5.4 (57.7%) and every other closed model. The only major open coding model with native image and video input (MoonViT-3D encoder). Supports agent swarms up to 300 parallel sub-agents with 4,000 coordinated tool calls and 12+ hours of sustained autonomous execution. Demonstrated real-world gains: 185% throughput improvement on a production financial matching engine, and 15% task success lift reported by Factory.ai. Self-hostable under a modified MIT license.</div>
|
||||
<div class="model-pills">
|
||||
<span class="pill orange">SWE-Pro #1 (58.6%)</span>
|
||||
<span class="pill blue">300-agent swarm</span>
|
||||
<span class="pill green">Open weights</span>
|
||||
</div>
|
||||
</div>
|
||||
<div class="model-score">
|
||||
<span class="score-val">58.6%</span>
|
||||
<span class="score-label">SWE-bench Pro</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="model-card" style="--model-color: #58a6ff;">
|
||||
<div class="rank-badge silver">🥉</div>
|
||||
<div class="model-info">
|
||||
<h3>GPT-5.4</h3>
|
||||
<div class="model-meta">OpenAI · 1.1M context · $2.50 / $15 per 1M tokens</div>
|
||||
@@ -790,7 +808,7 @@
|
||||
</div>
|
||||
|
||||
<div class="model-card" style="--model-color: #3fb950;">
|
||||
<div class="rank-badge bronze">🥉</div>
|
||||
<div class="rank-badge bronze">4</div>
|
||||
<div class="model-info">
|
||||
<h3>Claude Sonnet 4.6</h3>
|
||||
<div class="model-meta">Anthropic · 200K context · $3 / $15 per 1M tokens</div>
|
||||
@@ -808,7 +826,7 @@
|
||||
</div>
|
||||
|
||||
<div class="model-card" style="--model-color: #8b949e;">
|
||||
<div class="rank-badge">4</div>
|
||||
<div class="rank-badge">5</div>
|
||||
<div class="model-info">
|
||||
<h3>Gemini 3.1 Pro</h3>
|
||||
<div class="model-meta">Google DeepMind · 1–2M context · $2 / $12 per 1M tokens</div>
|
||||
@@ -825,22 +843,6 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="model-card" style="--model-color: #ff9040;">
|
||||
<div class="rank-badge">5</div>
|
||||
<div class="model-info">
|
||||
<h3>Qwen 3.6-Plus</h3>
|
||||
<div class="model-meta">Alibaba · 1M context · Emerging agentic pick · OpenRouter available</div>
|
||||
<div class="model-why">Leads Terminal-Bench at 61.6% — ahead of both GPT-5.4 and Gemini 3.1 Pro on CLI and DevOps automation. 88.2% on GPQA Diamond. The 1M token context fits large codebases cleanly. An emerging dark-horse for agentic coding pipelines with strong independent eval scores. Available now via Alibaba Cloud and OpenRouter.</div>
|
||||
<div class="model-pills">
|
||||
<span class="pill orange">Terminal-Bench #1 (61.6%)</span>
|
||||
<span class="pill blue">1M context</span>
|
||||
<span class="pill green">Agentic emerging</span>
|
||||
</div>
|
||||
</div>
|
||||
<div class="model-score">
|
||||
<span class="score-val">61.6%</span>
|
||||
<span class="score-label">Terminal-Bench</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
@@ -937,9 +939,9 @@
|
||||
<div class="model-card" style="--model-color: #ff9040;">
|
||||
<div class="rank-badge">5</div>
|
||||
<div class="model-info">
|
||||
<h3>Kimi K2.5</h3>
|
||||
<div class="model-meta">Moonshot AI · 128K context · $0.60 / $2.50 per 1M tokens</div>
|
||||
<div class="model-why">~1,700 EQ-Bench Creative Writing Elo — roughly 87% of Sonnet's literary quality at 80% lower cost. The budget pick for high-volume content: product descriptions, social copy, blog drafts, content pipelines where you need coherent writing at scale without paying frontier prices on every call. API is live and available now.</div>
|
||||
<h3>Kimi K2.6</h3>
|
||||
<div class="model-meta">Moonshot AI · 262K context · $0.60 / $3.00 per 1M tokens · Open weights</div>
|
||||
<div class="model-why">~1,700 EQ-Bench Creative Writing Elo — roughly 87% of Sonnet's literary quality at 80% lower cost. The budget pick for high-volume content: product descriptions, social copy, blog drafts, and content pipelines where you need coherent writing at scale without paying frontier prices on every call. Upgraded from K2.5 with a larger 262K context window and native image input. API live via Moonshot platform.</div>
|
||||
<div class="model-pills">
|
||||
<span class="pill orange">Budget CW pick</span>
|
||||
<span class="pill green">~1700 EQ-Bench CW</span>
|
||||
|
||||
Reference in New Issue
Block a user