-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
72 lines (66 loc) · 13.5 KB
/
index.html
File metadata and controls
72 lines (66 loc) · 13.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Final Multi-Model Benchmark Report (2026-03-31)</title>
<style>
:root {
--bg: #0b1020; --panel: #121933; --panel2: #172043; --border: #27335a;
--text: #e8eefc; --muted: #9fb0d0; --link: #7cc4ff;
}
body { font-family: system-ui, -apple-system, sans-serif; margin: 2rem; background: var(--bg); color: var(--text); }
a { color: var(--link); }
code { background:#11182c; padding:0.15rem 0.35rem; border-radius:6px; }
.hero { display:grid; grid-template-columns: repeat(auto-fit, minmax(220px,1fr)); gap:1rem; margin:1rem 0 1.5rem; }
.card { background: var(--panel); border: 1px solid var(--border); border-radius: 14px; padding: 1rem; }
.label { color: var(--muted); font-size: 0.85rem; margin-bottom: 0.35rem; }
.value { font-size: 1.25rem; font-weight: 700; }
h1, h2, h3 { margin-top: 1.5rem; }
img { width: 100%; max-width: 1100px; display:block; margin: 0.75rem 0 1.25rem; border:1px solid var(--border); border-radius: 12px; background: #0f1530; }
table { width:100%; border-collapse: collapse; margin: 1rem 0 1.5rem; }
th, td { padding: 0.7rem; border-bottom: 1px solid var(--border); text-align:left; vertical-align: top; }
th { color: var(--muted); font-weight: 600; }
ul { line-height: 1.6; }
</style>
</head>
<body>
<h1>Final Multi-Model Benchmark Report (2026-03-31)</h1>
<p>This is the polished report for the completed single-A10G benchmark matrix comparing <strong>vLLM</strong> and <strong>SGLang</strong> across multiple open models.</p>
<div class="hero">
<div class="card"><div class="label">Fastest TTFT p95</div><div class="value">Gemma 2 2B</div><div>vLLM • 22.5 ms</div></div>
<div class="card"><div class="label">Best tok/s</div><div class="value">Gemma 2 2B</div><div>vLLM • 264.7 tok/s</div></div>
<div class="card"><div class="label">Best req/s</div><div class="value">Gemma 2 2B</div><div>vLLM • 1.14 req/s</div></div>
</div>
<div class="card">
<h2>Environment</h2>
<ul>
<li>AWS <strong>g5.2xlarge</strong></li>
<li>NVIDIA <strong>A10G 24 GB</strong></li>
<li>Sequential engine execution on a single GPU</li>
<li>Models included: <strong>Gemma 2 2B, SmolLM3 3B, Llama 3.2 3B, Phi-3 mini, Phi-4 mini, Gemma 3 4B, Qwen 2.5 7B, DS-R1 Qwen 7B, Mistral 7B, Llama 3.1 8B, Qwen3 8B, Granite 8B, DS-R1 Llama 8B, Gemma 2 9B, Gemma 3 12B, Qwen3 30B-A3B</strong></li>
</ul>
</div>
<div class="card">
<h2>Important notes</h2>
<ul>
<li>vLLM required tuned launch settings on the single A10G: `max_model_len=2048`, `gpu_memory_utilization=0.95`, `--disable-frontend-multiprocessing`, and `--enforce-eager`.</li>
</ul>
</div>
<h2>Visual summary</h2>
<h3>Single-request latency (TTFT p95)</h3>
<img src="figures/single_request_ttft_p95.svg" alt="Single request TTFT p95" />
<h3>Throughput tokens/sec</h3>
<img src="figures/throughput_tokens_per_sec.svg" alt="Throughput tokens per second" />
<h3>Throughput requests/sec</h3>
<img src="figures/throughput_requests_per_sec.svg" alt="Throughput requests per second" />
<h3>Throughput latency p95</h3>
<img src="figures/throughput_latency_p95.svg" alt="Throughput latency p95" />
<h3>Throughput tradeoff map</h3>
<img src="figures/throughput_tradeoff.svg" alt="Throughput tradeoff map" />
<h2>Single-request latency results</h2>
<table><thead><tr><th>Model</th><th>Scenario</th><th>Engine</th><th>TTFT p50</th><th>TTFT p95</th><th>Latency p95</th><th>Tok/s</th><th>Req/s</th><th>Success</th></tr></thead><tbody><tr><td>Gemma 2 2B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>19.7 ms</td><td>22.5 ms</td><td>1655.6 ms</td><td>77.6</td><td>0.75</td><td>100.0%</td></tr><tr><td>Gemma 2 2B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>30.4 ms</td><td>31.5 ms</td><td>1656.1 ms</td><td>78.2</td><td>0.75</td><td>100.0%</td></tr><tr><td>Llama 3.2 3B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>22.6 ms</td><td>23.3 ms</td><td>1928.7 ms</td><td>66.3</td><td>0.52</td><td>100.0%</td></tr><tr><td>SmolLM3 3B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>23.6 ms</td><td>24.1 ms</td><td>1882.5 ms</td><td>69.2</td><td>0.54</td><td>100.0%</td></tr><tr><td>Llama 3.2 3B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>32.1 ms</td><td>33.2 ms</td><td>1903.4 ms</td><td>67.7</td><td>0.53</td><td>100.0%</td></tr><tr><td>SmolLM3 3B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>57.1 ms</td><td>60.0 ms</td><td>2052.1 ms</td><td>63.4</td><td>0.50</td><td>100.0%</td></tr><tr><td>Gemma 3 4B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>87.4 ms</td><td>90.0 ms</td><td>5332.0 ms</td><td>23.8</td><td>0.21</td><td>100.0%</td></tr><tr><td>Phi-3 mini</td><td><code>single_request_latency</code></td><td>vLLM</td><td>25.0 ms</td><td>25.3 ms</td><td>2233.8 ms</td><td>57.8</td><td>0.45</td><td>100.0%</td></tr><tr><td>Phi-4 mini</td><td><code>single_request_latency</code></td><td>vLLM</td><td>32.7 ms</td><td>33.4 ms</td><td>2269.0 ms</td><td>56.8</td><td>0.55</td><td>100.0%</td></tr><tr><td>Gemma 3 4B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>78.0 ms</td><td>80.5 ms</td><td>2837.6 ms</td><td>45.0</td><td>0.40</td><td>100.0%</td></tr><tr><td>Phi-3 mini</td><td><code>single_request_latency</code></td><td>SGLang</td><td>43.2 ms</td><td>50.5 ms</td><td>2319.8 ms</td><td>55.7</td><td>0.44</td><td>100.0%</td></tr><tr><td>Phi-4 mini</td><td><code>single_request_latency</code></td><td>SGLang</td><td>39.6 ms</td><td>39.9 ms</td><td>2411.7 ms</td><td>52.7</td><td>0.53</td><td>100.0%</td></tr><tr><td>DS-R1 Qwen 7B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>40.1 ms</td><td>62.8 ms</td><td>4215.8 ms</td><td>30.5</td><td>0.28</td><td>100.0%</td></tr><tr><td>Mistral 7B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>41.0 ms</td><td>62.5 ms</td><td>4064.0 ms</td><td>31.8</td><td>0.26</td><td>100.0%</td></tr><tr><td>Qwen 2.5 7B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>40.9 ms</td><td>62.7 ms</td><td>4220.3 ms</td><td>30.6</td><td>0.29</td><td>100.0%</td></tr><tr><td>DS-R1 Qwen 7B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>65.7 ms</td><td>66.3 ms</td><td>4177.2 ms</td><td>30.9</td><td>0.24</td><td>100.0%</td></tr><tr><td>Mistral 7B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>62.4 ms</td><td>62.7 ms</td><td>4047.0 ms</td><td>31.8</td><td>0.26</td><td>100.0%</td></tr><tr><td>Qwen 2.5 7B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>65.6 ms</td><td>65.7 ms</td><td>4170.2 ms</td><td>30.9</td><td>0.27</td><td>100.0%</td></tr><tr><td>DS-R1 Llama 8B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>42.4 ms</td><td>43.1 ms</td><td>4242.3 ms</td><td>30.3</td><td>0.24</td><td>100.0%</td></tr><tr><td>Granite 8B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>46.3 ms</td><td>47.0 ms</td><td>4629.6 ms</td><td>27.7</td><td>0.22</td><td>100.0%</td></tr><tr><td>Llama 3.1 8B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>42.7 ms</td><td>43.8 ms</td><td>4247.3 ms</td><td>30.3</td><td>0.24</td><td>100.0%</td></tr><tr><td>Qwen3 8B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>44.0 ms</td><td>44.6 ms</td><td>4381.3 ms</td><td>29.2</td><td>0.23</td><td>100.0%</td></tr><tr><td>DS-R1 Llama 8B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>68.6 ms</td><td>68.9 ms</td><td>4253.0 ms</td><td>30.3</td><td>0.24</td><td>100.0%</td></tr><tr><td>Granite 8B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>76.0 ms</td><td>76.5 ms</td><td>4645.8 ms</td><td>27.6</td><td>0.22</td><td>100.0%</td></tr><tr><td>Llama 3.1 8B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>69.1 ms</td><td>69.5 ms</td><td>4256.0 ms</td><td>30.3</td><td>0.24</td><td>100.0%</td></tr><tr><td>Qwen3 8B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>71.6 ms</td><td>72.1 ms</td><td>4387.8 ms</td><td>29.4</td><td>0.23</td><td>100.0%</td></tr><tr><td>Gemma 2 9B</td><td><code>single_request_latency</code></td><td>vLLM</td><td>74.0 ms</td><td>105.8 ms</td><td>5360.3 ms</td><td>24.0</td><td>0.21</td><td>100.0%</td></tr><tr><td>Gemma 2 9B</td><td><code>single_request_latency</code></td><td>SGLang</td><td>82.9 ms</td><td>83.4 ms</td><td>5312.4 ms</td><td>24.1</td><td>0.21</td><td>100.0%</td></tr></tbody></table>
<h2>Throughput-ramp results</h2>
<table><thead><tr><th>Model</th><th>Scenario</th><th>Engine</th><th>TTFT p50</th><th>TTFT p95</th><th>Latency p95</th><th>Tok/s</th><th>Req/s</th><th>Success</th></tr></thead><tbody><tr><td>Gemma 2 2B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>41.1 ms</td><td>135.5 ms</td><td>4588.2 ms</td><td>264.7</td><td>1.14</td><td>100.0%</td></tr><tr><td>Gemma 2 2B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>46.9 ms</td><td>156.1 ms</td><td>4844.5 ms</td><td>258.0</td><td>1.05</td><td>100.0%</td></tr><tr><td>Llama 3.2 3B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>49.5 ms</td><td>174.9 ms</td><td>5415.8 ms</td><td>223.5</td><td>0.87</td><td>100.0%</td></tr><tr><td>SmolLM3 3B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>48.5 ms</td><td>143.3 ms</td><td>5521.5 ms</td><td>229.5</td><td>0.90</td><td>100.0%</td></tr><tr><td>Llama 3.2 3B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>46.3 ms</td><td>159.5 ms</td><td>5544.7 ms</td><td>226.0</td><td>0.88</td><td>100.0%</td></tr><tr><td>SmolLM3 3B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>95.1 ms</td><td>371.9 ms</td><td>10724.5 ms</td><td>204.6</td><td>0.80</td><td>100.0%</td></tr><tr><td>Gemma 3 4B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>126.8 ms</td><td>218.1 ms</td><td>11538.7 ms</td><td>83.9</td><td>0.33</td><td>100.0%</td></tr><tr><td>Phi-3 mini</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>53.7 ms</td><td>228.9 ms</td><td>8063.6 ms</td><td>190.9</td><td>0.75</td><td>100.0%</td></tr><tr><td>Phi-4 mini</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>55.5 ms</td><td>153.1 ms</td><td>7511.6 ms</td><td>188.6</td><td>0.74</td><td>100.0%</td></tr><tr><td>Gemma 3 4B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>125.6 ms</td><td>302.8 ms</td><td>14731.5 ms</td><td>149.2</td><td>0.58</td><td>100.0%</td></tr><tr><td>Phi-3 mini</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>56.6 ms</td><td>167.3 ms</td><td>7538.7 ms</td><td>187.2</td><td>0.73</td><td>100.0%</td></tr><tr><td>Phi-4 mini</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>56.8 ms</td><td>160.9 ms</td><td>8410.8 ms</td><td>175.9</td><td>0.79</td><td>100.0%</td></tr><tr><td>DS-R1 Qwen 7B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>92.6 ms</td><td>184.7 ms</td><td>9322.1 ms</td><td>105.6</td><td>0.41</td><td>100.0%</td></tr><tr><td>Mistral 7B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>92.5 ms</td><td>209.8 ms</td><td>10139.1 ms</td><td>106.6</td><td>0.42</td><td>100.0%</td></tr><tr><td>Qwen 2.5 7B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>92.5 ms</td><td>196.5 ms</td><td>9772.7 ms</td><td>105.3</td><td>0.41</td><td>100.0%</td></tr><tr><td>DS-R1 Qwen 7B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>69.7 ms</td><td>313.3 ms</td><td>9603.0 ms</td><td>106.1</td><td>0.41</td><td>100.0%</td></tr><tr><td>Mistral 7B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>72.7 ms</td><td>179.3 ms</td><td>10122.3 ms</td><td>106.8</td><td>0.42</td><td>99.9%</td></tr><tr><td>Qwen 2.5 7B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>65.9 ms</td><td>175.1 ms</td><td>9542.9 ms</td><td>106.3</td><td>0.42</td><td>100.0%</td></tr><tr><td>DS-R1 Llama 8B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>98.3 ms</td><td>244.6 ms</td><td>10547.8 ms</td><td>101.8</td><td>0.40</td><td>100.0%</td></tr><tr><td>Granite 8B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>107.4 ms</td><td>233.9 ms</td><td>12299.3 ms</td><td>92.8</td><td>0.36</td><td>100.0%</td></tr><tr><td>Llama 3.1 8B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>97.6 ms</td><td>195.6 ms</td><td>10523.5 ms</td><td>101.7</td><td>0.40</td><td>100.0%</td></tr><tr><td>Qwen3 8B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>101.2 ms</td><td>220.8 ms</td><td>11646.3 ms</td><td>98.3</td><td>0.38</td><td>100.0%</td></tr><tr><td>DS-R1 Llama 8B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>94.1 ms</td><td>177.8 ms</td><td>10611.2 ms</td><td>102.0</td><td>0.40</td><td>100.0%</td></tr><tr><td>Granite 8B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>77.1 ms</td><td>194.6 ms</td><td>12521.8 ms</td><td>92.8</td><td>0.36</td><td>100.0%</td></tr><tr><td>Llama 3.1 8B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>69.5 ms</td><td>194.0 ms</td><td>10584.6 ms</td><td>102.1</td><td>0.40</td><td>100.0%</td></tr><tr><td>Qwen3 8B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>72.5 ms</td><td>183.1 ms</td><td>11692.4 ms</td><td>98.5</td><td>0.38</td><td>100.0%</td></tr><tr><td>Gemma 2 9B</td><td><code>throughput_ramp</code></td><td>vLLM</td><td>126.9 ms</td><td>280.3 ms</td><td>14399.0 ms</td><td>79.8</td><td>0.33</td><td>100.0%</td></tr><tr><td>Gemma 2 9B</td><td><code>throughput_ramp</code></td><td>SGLang</td><td>124.6 ms</td><td>25982.1 ms</td><td>46027.4 ms</td><td>77.5</td><td>0.30</td><td>99.7%</td></tr></tbody></table>
</body>
</html>