DigitalOcean Inference Benchmark

A web app for benchmarking any model on the DigitalOcean Serverless Inference engine. Pick models by ID, set your token mix and cache-hit ratio, and compare latency, throughput, and behavior under concurrency side by side. Built with Next.js so it deploys to App Platform in a few clicks.

What it measures

Speed (single request, run sequentially):

Time to first token (TTFT), p50 and p95
Output throughput (tokens/sec)
End-to-end latency, p50 and p95
Actual prompt / completion / cached token counts from the API

Concurrency sweep (load test):

p50 / p95 / p99 end-to-end latency at each concurrency level
Completed requests per second
Aggregate output tokens/sec across all in-flight requests
Error rate (where rate limits start to bite)

Results render as live charts and a comparison table, and export to CSV or JSON.

Inputs

Models: any model ID from the DigitalOcean model list. Pick from the catalog or paste any ID. Up to 6 at once.
Input tokens / Output tokens: target prompt size and max generation.
Cache hit ratio: the fraction of the input sent as a reused, cacheable prefix. On cache-enabled models a higher ratio lowers TTFT and shows up as cached tokens in the usage data. See "How cache modeling works" below.
Concurrency levels and requests per level for the load test.
Streaming on/off, temperature, and mock mode.

How cache modeling works

The app builds each prompt from two parts:

A system message that is identical across every request in a run. This is the cacheable prefix. Its size is inputTokens * cacheHitRatio.
A user message that carries a unique nonce so it never caches. Its size is the remainder.

Reusing an identical prefix lets whatever prompt caching the endpoint supports kick in, so the TTFT and cached-token numbers you see reflect real caching behavior rather than a formula. Token sizing of the synthetic prompt is approximate, but every number shown comes from the model's reported usage, not from our estimate.

Run locally

npm install
cp .env.example .env.local   # add your Model Access Key
npm run dev                  # http://localhost:3000

Mock mode is on by default, so you can explore the UI with simulated timings and no API key. Uncheck "mock mode" to hit the live endpoint.

Environment variables

Variable	Required	Default	Notes
`DO_MODEL_ACCESS_KEY`	for live runs	none	Create one in the DO Control Panel. You can also paste a key in the UI instead.
`DO_INFERENCE_BASE_URL`	no	`https://inference.do-ai.run/v1`	OpenAI-compatible base URL.

Deploy to App Platform

Option A: dashboard. Push this folder to a Git repo, then in App Platform choose Create App, import the repo, and App Platform auto-detects Next.js. Add DO_MODEL_ACCESS_KEY as an encrypted environment variable.

Option B: spec file. Edit the github block in .do/app.yaml, then:

doctl apps create --spec .do/app.yaml

Set the DO_MODEL_ACCESS_KEY secret in the dashboard or with doctl apps update.

Notes

The app calls the inference endpoint server-side, so a key pasted in the UI is sent only to this app's own backend, never to the browser-exposed network.
Long load tests run as a streamed response. Keep request counts reasonable on small instance sizes.
Cost modeling and a small quality-eval mode are natural next additions; the result types already carry the token counts needed for cost math.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
.do		.do
app		app
components		components
lib		lib
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DigitalOcean Inference Benchmark

What it measures

Inputs

How cache modeling works

Run locally

Environment variables

Deploy to App Platform

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DigitalOcean Inference Benchmark

What it measures

Inputs

How cache modeling works

Run locally

Environment variables

Deploy to App Platform

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages