Thank you for your interest in contributing to the vLLM vs SGLang Inference Engine Benchmark System.
- New models — add benchmark results for models not yet covered
- New scenarios — implement additional benchmark scenarios in
benchmarks/scenarios/ - New engines — add client adapters in
engines/for other inference engines - Hardware results — reproduce runs on different GPUs (A100, H100, RTX 4090, etc.)
- Bug fixes — fix issues in the runner, analysis, or dashboard
- Documentation — improve setup guides, scenario descriptions, or result explanations
# Fork and clone
git clone https://github.com/<your-username>/inference-engine-benchmark-system.git
cd inference-engine-benchmark-system
# Set up environment
conda create -n benchmark python=3.11
conda activate benchmark
pip install -r requirements.txt
# Copy env template
cp .env.example .env
# Add your HuggingFace token for gated models- Add the model to the matrix in
scripts/run_all_benchmarks.sh - Note any engine-specific flags needed (e.g.
--enforce-eagerfor Gemma 3) - Run the full benchmark suite and commit the result JSONs under
results/<model-slug>/ - Update the tables in
README.mdwith the new numbers
Scenarios live in benchmarks/scenarios/. Each scenario is a Python class that:
- Inherits from
BenchmarkScenario - Implements
generate_requests()→ list of prompt/config dicts - Implements
compute_metrics()→ dict with standardized metric keys
See benchmarks/scenarios/single_request_latency.py for a reference implementation.
- One logical change per PR
- Include raw result JSON files when submitting new benchmark runs
- Update
README.mdtables if your PR adds new numbers - Keep commit messages concise and descriptive
Open a GitHub Issue with:
- GPU model and VRAM
- Docker image versions (vLLM and SGLang)
- The exact command that failed
- Relevant log output (last 50 lines of container logs)
- Python:
rufffor linting,blackfor formatting - Shell scripts:
shellcheckclean - No new dependencies without discussion
By contributing, you agree that your contributions will be licensed under the MIT License.