Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,10 @@ guidellm run \
- `--data-loader type=pytorch,samples=1000`: Limit how many rows are loaded (`-1` for all)
- `--tokenizer huggingface_auto "model=gpt2"`: Tokenizer for synthetic data or local token counting

### Synthetic Visual Data

GuideLLM can synthesize images and short videos on the fly so you can benchmark Vision-Language Model (VLM) serving configurations without bringing your own dataset. Two `--data` kinds — `synthetic_image` and `synthetic_video` — compose with the existing text token controls. See [Synthetic Visual Data](docs/guides/multimodal/synthetic_vision.md) for example commands and the full list of configuration options.

### Request Types and API Targets

You can benchmark chat completions, text completions, or other supported request types. This example configures the benchmark to test the chat completions API using a custom dataset file, with GuideLLM automatically formatting requests to match the chat completions schema.
Expand Down
2 changes: 2 additions & 0 deletions docs/guides/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ GuideLLM supports several types of datasets, each with its own advantages and us

Synthetic datasets allow you to generate data on the fly with customizable parameters. This is useful for controlled experiments, stress testing, and simulating specific scenarios. For example, you might want to evaluate how a model handles long prompts or generates outputs with specific characteristics.

GuideLLM supports both synthetic *text* — described below — and synthetic *visual* data (images and short videos) for benchmarking Vision-Language Models. See [Synthetic Visual Data](multimodal/synthetic_vision.md) for the `synthetic_image` and `synthetic_video` `--data` kinds, which compose with all of the text token controls listed here.

#### Example Commands

```bash
Expand Down
8 changes: 8 additions & 0 deletions docs/guides/multimodal/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,12 @@ Ensure you have a running inference server and model compatible with the OpenAI

[:octicons-arrow-right-24: Audio Guide](audio.md)

- :material-image-multiple-outline:{ .lg .middle } Synthetic Vision

______________________________________________________________________

Generate images and short videos on the fly to benchmark Vision-Language Model (VLM) serving configurations without bringing your own dataset. Covers the `synthetic_image` and `synthetic_video` `--data` kinds.

[:octicons-arrow-right-24: Synthetic Vision Guide](synthetic_vision.md)

</div>
93 changes: 93 additions & 0 deletions docs/guides/multimodal/synthetic_vision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
weight: 40
---

# Synthetic Visual Data

GuideLLM can synthesize images and short videos on the fly so you can benchmark Vision-Language Model (VLM) serving configurations without bringing your own dataset. Two `--data` kinds — `synthetic_image` and `synthetic_video` — compose with the existing synthetic text token controls (`text_tokens`, `output_tokens`, and their `stdev`/`min`/`max` companions) so a single command produces a fully-shaped multimodal request.

Synthetic visual data is useful when you want to control payload shape precisely (image dimensions, frame count, frames-per-second) or stress-test serving paths that the preprocessor cache would otherwise hide. Defaults are tuned so every generated payload is byte-different from the next, which defeats vLLM's multimodal preprocessor cache while still compressing like real media on the wire.

## Prerequisites

Install GuideLLM with the `vision` extra to enable image and video synthesis:

```bash
pip install guidellm[vision]
```

## Synthetic image

Use `--data "kind=synthetic_image"` to generate a single image per request alongside any text prompt.

### Example Commands

A single 720p image alongside 200 text tokens and 64 output tokens:

```bash
guidellm run \
--backend "kind=openai_http,target=http://localhost:8000" \
--data "kind=synthetic_image,resolution=720p,text_tokens=200,output_tokens=64"
```

A 1280×720 JPEG with two images per request:

```bash
guidellm run \
--backend "kind=openai_http,target=http://localhost:8000" \
--data "kind=synthetic_image,width=1280,height=720,format=jpeg,images_per_request=2,text_tokens=200,output_tokens=64"
```

### Configuration Options

- `width`: Width of the generated image in pixels.
- `height`: Height of the generated image in pixels.
- `resolution`: Shorthand that sets `height` to a named value (`480p`, `720p`, `1080p`, …); pairs with `aspect_ratio` to derive `width`.
- `aspect_ratio`: Shorthand such as `16:9` or `4:3` that derives the missing dimension when only one of `width`/`height`/`resolution` is given.
- `format`: Encoded image format, `jpeg` (default) or `png`.
- `jpeg_quality`: JPEG quality factor (1–100) when `format=jpeg`. Defaults to 85.
- `content`: Per-row image content. `gradient` (default) emits a per-row seeded gradient that compresses like real photography; `noise` emits uniform random pixels for worst-case wire size; `solid` and `checkerboard` are useful for preprocessor-cache sensitivity sweeps.
- `images_per_request`: Number of images to attach to each request. Defaults to 1.
- `text_tokens`: Average number of tokens in the accompanying text prompt. Accepts the same `stdev` / `min` / `max` suffixes as the synthetic text mode. `prompt_tokens` is accepted as an alias.
- `output_tokens`: Average number of tokens the model should generate. Same `stdev` / `min` / `max` suffixes apply.
- `seed`: Random seed for reproducible generation across runs.

## Synthetic video

Use `--data "kind=synthetic_video"` to generate a short clip per request alongside any text prompt. Output is `mp4` (h264, yuv420p).

### Example Commands

A six-frame 480p clip at 1 fps with modest prompt and output budgets:

```bash
guidellm run \
--backend "kind=openai_http,target=http://localhost:8000" \
--data "kind=synthetic_video,width=854,height=480,frames=6,fps=1,text_tokens=64,output_tokens=128"
```

A twelve-frame 720p clip at 3 fps with an explicit h264 target bitrate:

```bash
guidellm run \
--backend "kind=openai_http,target=http://localhost:8000" \
--data "kind=synthetic_video,width=1280,height=720,frames=12,fps=3,video_bitrate=2M,text_tokens=64,output_tokens=128"
```

### Configuration Options

- `width`: Width of the generated video in pixels.
- `height`: Height of the generated video in pixels. The same `resolution` / `aspect_ratio` shorthands as for synthetic image apply.
- `frames`: Number of frames in the clip.
- `fps`: Frames per second. Combined with `frames`, this also determines the clip duration.
- `video_bitrate`: Optional h264 target bitrate (e.g. `1M`, `500k`) — useful when you want to specify a fixed wire size across runs.
- `content`: Per-row clip content. `gradient` (default) emits a seeded gradient with a coordinate warp so each clip compresses similarly to real video; `noise` emits uniform random pixels for worst-case wire size.
- `text_tokens`: Average number of tokens in the accompanying text prompt; same `stdev` / `min` / `max` suffixes as synthetic image. `prompt_tokens` is accepted as an alias.
- `output_tokens`: Average number of tokens the model should generate; same `stdev` / `min` / `max` suffixes apply.
- `seed`: Random seed for reproducible generation across runs.

## Notes

- A tokenizer is required for the text portion of the request. By default the model passed in or retrieved from the server is used; otherwise specify one with `--tokenizer`.
- Per-row seeded gradients produce byte-different payloads on every request, which bypasses vLLM's multimodal preprocessor cache. If you want to deliberately hit the cache, use fixed payload settings such as `content=solid` for images, or a fixed `seed` with a fixed `--data-loader "kind=pytorch,samples=..."` value.
- The exact mp4 bytes produced for a given seed depend on the installed `ffmpeg` and `PIL` versions. Output token counts and request shape stay stable across versions, but if you are comparing byte-level outputs or wire-size measurements across machines, expect small variation.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ audio = [
vision = [
"datasets[vision]",
"pillow",
"imageio[ffmpeg]",
Comment thread
dbutenhof marked this conversation as resolved.
]
# Dev Tooling
dev = [
Expand Down
16 changes: 16 additions & 0 deletions src/guidellm/data/deserializers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,16 @@
SyntheticTextDataset,
SyntheticTextDatasetDeserializer,
)
from .synthetic_image import (
SyntheticImageDataArgs,
SyntheticImageDataset,
SyntheticImageDatasetDeserializer,
)
from .synthetic_video import (
SyntheticVideoDataArgs,
SyntheticVideoDataset,
SyntheticVideoDatasetDeserializer,
)
from .trace_mooncake import TraceMooncakeDataArgs, TraceMooncakeDatasetDeserializer
from .trace_synthetic import TraceSyntheticDataArgs, TraceSyntheticDatasetDeserializer

Expand All @@ -50,9 +60,15 @@
"InMemoryItemListDatasetDeserializer",
"JSONFileDatasetDeserializer",
"ParquetFileDatasetDeserializer",
"SyntheticImageDataArgs",
"SyntheticImageDataset",
"SyntheticImageDatasetDeserializer",
"SyntheticTextDataArgs",
"SyntheticTextDataset",
"SyntheticTextDatasetDeserializer",
"SyntheticVideoDataArgs",
"SyntheticVideoDataset",
"SyntheticVideoDatasetDeserializer",
"TarFileDatasetDeserializer",
"TextFileDatasetDeserializer",
"TraceMooncakeDataArgs",
Expand Down
Loading
Loading