Single-GPU Operation Guide

This repo supports both vLLM and SGLang, but on a single GPU host (for example AWS g5.2xlarge with one A10G) the recommended operating mode is:

Why this matters

Running both engines at the same time on a single A10G can cause:

docker compose up -d dashboard vllm
curl http://localhost:8000/health

docker compose up -d dashboard sglang
curl http://localhost:8001/health

docker compose stop vllm sglang
sleep 300   # optional cooldown between engines for long benchmark sessions

The dashboard is intentionally decoupled from engine startup so it can run while:

That makes it safer for sequential benchmark workflows.