From ddaf4a6680daffc0ba1c14251d633c112afbf2cb Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Thu, 25 Jun 2026 13:16:28 -0400
Subject: [PATCH 1/8] migration guide

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/backends.md               |   2 +-
 docs/guides/datasets.md               |   2 +-
 docs/guides/embeddings.md             |   2 +-
 docs/guides/v0.7.0_migration_guide.md | 108 ++++++++++++++++++++++++++
 4 files changed, 111 insertions(+), 3 deletions(-)
 create mode 100644 docs/guides/v0.7.0_migration_guide.md

diff --git a/docs/guides/backends.md b/docs/guides/backends.md
index 69bfe15e5..e998dce92 100644
--- a/docs/guides/backends.md
+++ b/docs/guides/backends.md
@@ -147,4 +147,4 @@ guidellm run \
 
 ## Expanding Backend Support
 
-GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](../../CONTRIBUTING.md) file.
+GuideLLM is an open platform, and we encourage contributions to extend its backend support. Whether it's adding new server implementations, integrating with Python-based backends, or enhancing existing capabilities, your contributions are welcome. For more details on how to contribute, see the [CONTRIBUTING.md](https://github.com/vllm-project/guidellm/blob/main/CONTRIBUTING.md) file.
diff --git a/docs/guides/datasets.md b/docs/guides/datasets.md
index c7f0c8157..9bb525783 100644
--- a/docs/guides/datasets.md
+++ b/docs/guides/datasets.md
@@ -333,7 +333,7 @@ guidellm preprocess dataset \
 
 | Argument      | Description                                                                                                                                   |
 | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
-| `DATA`        | Identify the dataset to process. Supports all dataset formats documented in the [Dataset Configurations](../datasets.md).                     |
+| `DATA`        | Identify the dataset to process. Supports all dataset formats documented in the [Dataset Configurations](#datasets).                     |
 | `OUTPUT_PATH` | Path to save the processed dataset, including file suffix (e.g., `processed_dataset.jsonl`, `output.csv`).                                    |
 | `--processor` | **Required.** Processor or tokenizer name/path for calculating token counts. Can be a Hugging Face model ID or local path.                    |
 | `--config`    | **Required.** Configuration specifying target token sizes. Can be a JSON string, key=value pairs, or file path (.json, .yaml, .yml, .config). |
diff --git a/docs/guides/embeddings.md b/docs/guides/embeddings.md
index d37cc80bb..2558c435d 100644
--- a/docs/guides/embeddings.md
+++ b/docs/guides/embeddings.md
@@ -37,6 +37,6 @@ guidellm run \
 
 ## See Also
 
-- [Benchmark Profiles](benchmark-profiles.md) - Detailed explanation of all profile types
+- [Benchmark Profiles](../getting-started/benchmark.md#benchmark-profiles---profile) - Detailed explanation of all profile types
 - [Datasets Guide](datasets.md) - Creating and using custom datasets
 - [Metrics Guide](metrics.md) - Understanding performance metrics
diff --git a/docs/guides/v0.7.0_migration_guide.md b/docs/guides/v0.7.0_migration_guide.md
new file mode 100644
index 000000000..343a8468b
--- /dev/null
+++ b/docs/guides/v0.7.0_migration_guide.md
@@ -0,0 +1,108 @@
+# CLI Migration Guide
+
+## `guidellm benchmark [run]`
+
+Run a benchmark against a generative model.
+
+This command is now `guidellm run`
+
+| v0.6.0 option | v0.7.0 equivalent |
+| :---- | :---- |
+| \--backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api\_key": "apikey-\*", "verify": false}' | Options passed to `--backend`, like `--backend “kind=openai_http,api_key=sk…”` |
+| \--backend Backend type. Options: vllm\_python, openai\_http. | Merged with `--backend-kwargs` `--backend ‘{“kind”: “openai_http”, “extras”: {“body”: {“temperature”: 0.6}}}’` |
+| \--cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric \>=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. | Specify with profile, e.g., `--profile kind=synchronous,cooldown=2` for a two second cooldown or `--profile ‘{“kind”:”concurrent”,”cooldown”:{“mode”:”duration”,”value”:2}}` |
+| \--data-args JSON string of arguments to pass to dataset creation. | Specified with “load\_kwargs” as part of data, e.g., `--data ‘{“kind”:”huggingface”,”load_kwargs”:{“split”:”train”}}` |
+| \--data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text\_column": "article", "output\_tokens\_count\_column" :"output\_tokens"}' | Data column mappers have a “kind”: `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}` |
+| \--data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}' | Use `--data-finalizer kind=generative` |
+| \--data-num-workers Number of worker processes for data loading. | Specified as part of Data Loader configuration with `--data-loader kind=pytorch,num_workers=3` |
+| \--data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors. | `--data-preprocessor ‘{“kind”:”encode_media”,”audio_kwargs”:{“format”:”mp3”}}` |
+| \--data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode\_media,my\_custom\_preprocessor' | `--data-preprocessor kind=encode_media` … can be repeated to configure multiple preprocessors. |
+| \--data-sampler Data sampler type. | Shuffle function is under `--data-loader kind=pytorch,shuffle=true` |
+| \--data-samples Number of samples from dataset. \-1 (default) uses all samples and dynamically generates more. | Specify as part of Data Loader configuration, as `--data-loader kind=pytorch,samples=10` |
+| \--data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value). | `--data kind=huggingface,source=<id>` `--data kind=csv_file,path=<file.csv>` `--data kind=synthetic_text,prompt_tokens=128,output_tokens=64` |
+| \--dataloader-kwargs JSON string of arguments to pass to the dataloader constructor. | Passed directly to Data Loader, as `--data_loader kind=pytorch,shuffle=true,samples=100` |
+| \--detect-saturation Enable over-saturation detection with default settings. | Enable oversaturation constraint, for example `--constraint kind=over_saturation` |
+| \--disable-console-interactive Disable interactive console progress updates. | Unchanged: `--disable-console-interactive` or `--disable-progress` |
+| \--disable-console Disable all outputs to the console (updates, interactive progress, results). | Unchanged: `--disable-console` or `--disable-console-outputs` |
+| \--max-error-rate Maximum error rate before stopping the benchmark. | Enable maximum error rate constraint, for example `--constraint kind=max_error_rate,rate=10` |
+| \--max-errors Maximum errors before stopping the benchmark. | Enable maximum error count constraint, for example `--constraint kind=max_errors,count=10` |
+| \--max-global-error-rate Maximum global error rate across all benchmarks. | Enable maximum global error rate constraint, for example `--constraint kind=max_global_error_rate,rate=10,minimum=100` |
+| \--max-requests Maximum requests per benchmark. If None, runs until max\_seconds or data exhaustion. | Enable maximum requests constraint, for example `--constraint kind=max_requests,count=1000` |
+| \--max-seconds Maximum seconds per benchmark. If None, runs until max\_requests or data exhaustion. | Enable maximum duration constraint, for example `--constraint kind=max_duration,seconds=60` |
+| \--model Model ID to benchmark. If not provided, uses first available model. | Specify a model name as part of the backend configuration, for example `--backend kind=openai_http,model=gpt4` |
+| \--output-dir or –output-path: The directory path to save file output types in | Specify paths as part of the individual output configurations, for example `--output kind=json,path=/tmp/reports/benchmark.json` |
+| \--outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.\[EXT\]) | Specify multiple output formats by repeating the `--output` option, for example `--output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv` |
+| \--over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min\_seconds": 30}'). Defaults to None (disabled). | Enable oversaturation constraint, for example `--constraint kind=over_saturation,mode=enforce,min_seconds=30` |
+| \--processor-args JSON string of arguments to pass to the processor constructor. | Specify options directly to the tokenizer, for example `--tokenizer ‘{“kind”:”huggingface_auto”,”load_kwargs”:{“fast”:true}}’` |
+| \--processor Processor or tokenizer for token count calculations. If not provided, loads from model. | Defaults to the default tokenizer for the first model supported by the backend target. To override, `--tokenizer kind=huggingface_auto,model=gpt4` |
+| \--profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant. | Specify the benchmark profile to use, for example, `--profile kind=sweep,sweep_size=10,warmup=1,cooldown=1` |
+| \--rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies | Specify as part of profile, for example `--profile kind=constant,rate=10,rampup_duration=2` |
+| \--random-seed Random seed for reproducibility. | Specify the random seed configuration like `--seed kind=static,value=42` |
+| \--rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second. | “Rate” was overloaded to specify the primary configuration for each profile type. Specify with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
+| \--request-format Format to use for requests. Options depend on backend. For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-templateFor openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat\_completions); default /v1/chat/completions. | Specify as part of backend configuration, like `--backend kind=openai_http,request_format=/v1/responses` |
+| \--sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20\. | TBD |
+| \--scenario Builtin scenario name or path to config file. CLI options override scenario settings. | The preferred name is now `--config`, although both `--scenario` and `-c` are aliases, for example `--config chat` or `--config my-scenario.yaml`. |
+| \--target Target backend URL (e.g., [http://localhost:8000](http://localhost:8000)). | Specify as part of backend configuration, for example `--backend kind=openai_http,target=http://localhost:8000` |
+| \--warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric \>=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. | Specify with profile, e.g., `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile ‘{“kind”:”concurrent”,”warmup”:{“mode”:”duration”,”value”:2}}` |
+
+## `Guidellm benchmark from-file`
+
+Load a saved benchmark report and optionally re-export data
+
+| Option | v0.7.0 equivalent |
+| :---- | :---- |
+| PATH | Unchanged |
+| Path to the saved benchmark report file (default: ./benchmarks.json). | Unchanged |
+| \--output-path | Unchanged |
+| Directory or file path to save re-exported benchmark results. If a directory, all output formats will be saved there. If a file, the matching format will be saved to that file. | Unchanged |
+| \--output-formats | Unchanged |
+| Output formats for benchmark results (e.g., console, json, html, csv). | Unchanged |
+
+## `guidellm config`
+
+Show configuration settings
+
+Changed from `guidellm config` to `guidellm env` to clarify that it displays environment variables affecting GuideLLM operation.
+
+`guidellm config` will be used later for a different purpose, to generate YAML config files from `run` options.
+
+## `guidellm mock-server`
+
+Start a mock OpenAI/vLLM-compatible server for testing. **\[NO CHANGE\]**
+
+| v0.6.0 option | v0.7.0 equivalent |
+| :---- | :---- |
+| \--host TEXT Host address to bind the server to. | Unchanged |
+| \--port INTEGER Port number to bind the server to. | Unchanged |
+| \--workers INTEGER Number of worker processes. | Unchanged |
+| \--model TEXT Name of the model to mock. | Unchanged |
+| \--processor TEXT Processor or tokenizer to use for requests. | Unchanged |
+| \--request-latency FLOAT Request latency in seconds for non-streaming requests. | Unchanged |
+| \--request-latency-std FLOAT Request latency standard deviation in seconds (normal distribution). | Unchanged |
+| \--ttft-ms FLOAT Time to first token in milliseconds for streaming requests. | Unchanged |
+| \--ttft-ms-std FLOAT Time to first token standard deviation in milliseconds. | Unchanged |
+| \--itl-ms FLOAT Inter-token latency in milliseconds for streaming requests. | Unchanged |
+| \--itl-ms-std FLOAT Inter-token latency standard deviation in milliseconds. | Unchanged |
+| \--output-tokens INTEGER Number of output tokens for streaming requests. | Unchanged |
+| \--output-tokens-std FLOAT Output tokens standard deviation (normal distribution). | Unchanged |
+
+## `guidellm preprocess dataset`
+
+Tools for preprocessing datasets for use in benchmarks.
+
+| v0.6.0 option | v0.7.0 equivalent |
+| :---- | :---- |
+| data (positional parameter) | Use dataset descriptor, for example `kind=huggingface,source=<id>` |
+| output_path (positional parameter) | Results file path, for example `file.json` |
+| \--processor TEXT Processor or tokenizer name for calculating token counts. | Unchanged |
+| \--config TEXT PreprocessDatasetConfig as JSON string,  key=value pairs, or file path (.json, .yaml, .yml, .config). Example: `prompt_tokens=100,output_tokens=50,prefix_tokens_max=10` or  `{"prompt_tokens": 100, "output_tokens": 50, "prefix_tokens_max": 10}` \[Mandatory\]  | Unchanged |
+| \--processor-args TEXT JSON string of arguments to pass to the processor constructor. | Unchanged |
+| \--data-args TEXT JSON string of arguments to pass to dataset creation | Unchanged |
+| \--data-column-mapper JSON string of column mappings to apply to the dataset | Specify a data column mapper object, for example `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}` |
+| \--short-prompt-strategy \[ignore|concatenate|pad|error\] Strategy for handling prompts shorter than target length. \[default: ignore\] | Unchanged |
+| \--pad-char TEXT Character to pad short prompts with when using “pad” strategy (used with ‘concatenate’ strategy). | Unchanged |
+| \--concat-delimiter TEXT Delimiter for concatenating short prompts (used with ‘concatenate’ strategy). | Unchanged |
+| \--include-prefix-in-token-count Include prefix tokens in prompt token count calculation. | Unchanged |
+| \--push-to-hub Push the processed dataset to Hugging Face Hub. | Unchanged |
+| \--hub-dataset-id TEXT Hugging Face Hub dataset ID for upload (required if `--push-to-hub` is set). | Unchanged |
+| \--random-seed INTEGER Random seed for reproducible token sampling. \[default: 42\] | Unchanged |

From a7b29f600a069255b656c859002e9ea52cce36dd Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Thu, 25 Jun 2026 15:41:43 -0400
Subject: [PATCH 2/8] Update

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/v0.7.0_migration_guide.md | 163 +++++++++++++-------------
 1 file changed, 84 insertions(+), 79 deletions(-)

diff --git a/docs/guides/v0.7.0_migration_guide.md b/docs/guides/v0.7.0_migration_guide.md
index 343a8468b..50d6abae8 100644
--- a/docs/guides/v0.7.0_migration_guide.md
+++ b/docs/guides/v0.7.0_migration_guide.md
@@ -6,57 +6,62 @@ Run a benchmark against a generative model.
 
 This command is now `guidellm run`
 
-| v0.6.0 option | v0.7.0 equivalent |
-| :---- | :---- |
-| \--backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api\_key": "apikey-\*", "verify": false}' | Options passed to `--backend`, like `--backend “kind=openai_http,api_key=sk…”` |
-| \--backend Backend type. Options: vllm\_python, openai\_http. | Merged with `--backend-kwargs` `--backend ‘{“kind”: “openai_http”, “extras”: {“body”: {“temperature”: 0.6}}}’` |
-| \--cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric \>=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. | Specify with profile, e.g., `--profile kind=synchronous,cooldown=2` for a two second cooldown or `--profile ‘{“kind”:”concurrent”,”cooldown”:{“mode”:”duration”,”value”:2}}` |
-| \--data-args JSON string of arguments to pass to dataset creation. | Specified with “load\_kwargs” as part of data, e.g., `--data ‘{“kind”:”huggingface”,”load_kwargs”:{“split”:”train”}}` |
-| \--data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text\_column": "article", "output\_tokens\_count\_column" :"output\_tokens"}' | Data column mappers have a “kind”: `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}` |
-| \--data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}' | Use `--data-finalizer kind=generative` |
-| \--data-num-workers Number of worker processes for data loading. | Specified as part of Data Loader configuration with `--data-loader kind=pytorch,num_workers=3` |
-| \--data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors. | `--data-preprocessor ‘{“kind”:”encode_media”,”audio_kwargs”:{“format”:”mp3”}}` |
-| \--data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode\_media,my\_custom\_preprocessor' | `--data-preprocessor kind=encode_media` … can be repeated to configure multiple preprocessors. |
-| \--data-sampler Data sampler type. | Shuffle function is under `--data-loader kind=pytorch,shuffle=true` |
-| \--data-samples Number of samples from dataset. \-1 (default) uses all samples and dynamically generates more. | Specify as part of Data Loader configuration, as `--data-loader kind=pytorch,samples=10` |
-| \--data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value). | `--data kind=huggingface,source=<id>` `--data kind=csv_file,path=<file.csv>` `--data kind=synthetic_text,prompt_tokens=128,output_tokens=64` |
-| \--dataloader-kwargs JSON string of arguments to pass to the dataloader constructor. | Passed directly to Data Loader, as `--data_loader kind=pytorch,shuffle=true,samples=100` |
-| \--detect-saturation Enable over-saturation detection with default settings. | Enable oversaturation constraint, for example `--constraint kind=over_saturation` |
-| \--disable-console-interactive Disable interactive console progress updates. | Unchanged: `--disable-console-interactive` or `--disable-progress` |
-| \--disable-console Disable all outputs to the console (updates, interactive progress, results). | Unchanged: `--disable-console` or `--disable-console-outputs` |
-| \--max-error-rate Maximum error rate before stopping the benchmark. | Enable maximum error rate constraint, for example `--constraint kind=max_error_rate,rate=10` |
-| \--max-errors Maximum errors before stopping the benchmark. | Enable maximum error count constraint, for example `--constraint kind=max_errors,count=10` |
-| \--max-global-error-rate Maximum global error rate across all benchmarks. | Enable maximum global error rate constraint, for example `--constraint kind=max_global_error_rate,rate=10,minimum=100` |
-| \--max-requests Maximum requests per benchmark. If None, runs until max\_seconds or data exhaustion. | Enable maximum requests constraint, for example `--constraint kind=max_requests,count=1000` |
-| \--max-seconds Maximum seconds per benchmark. If None, runs until max\_requests or data exhaustion. | Enable maximum duration constraint, for example `--constraint kind=max_duration,seconds=60` |
-| \--model Model ID to benchmark. If not provided, uses first available model. | Specify a model name as part of the backend configuration, for example `--backend kind=openai_http,model=gpt4` |
-| \--output-dir or –output-path: The directory path to save file output types in | Specify paths as part of the individual output configurations, for example `--output kind=json,path=/tmp/reports/benchmark.json` |
-| \--outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.\[EXT\]) | Specify multiple output formats by repeating the `--output` option, for example `--output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv` |
-| \--over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min\_seconds": 30}'). Defaults to None (disabled). | Enable oversaturation constraint, for example `--constraint kind=over_saturation,mode=enforce,min_seconds=30` |
-| \--processor-args JSON string of arguments to pass to the processor constructor. | Specify options directly to the tokenizer, for example `--tokenizer ‘{“kind”:”huggingface_auto”,”load_kwargs”:{“fast”:true}}’` |
-| \--processor Processor or tokenizer for token count calculations. If not provided, loads from model. | Defaults to the default tokenizer for the first model supported by the backend target. To override, `--tokenizer kind=huggingface_auto,model=gpt4` |
-| \--profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant. | Specify the benchmark profile to use, for example, `--profile kind=sweep,sweep_size=10,warmup=1,cooldown=1` |
-| \--rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies | Specify as part of profile, for example `--profile kind=constant,rate=10,rampup_duration=2` |
-| \--random-seed Random seed for reproducibility. | Specify the random seed configuration like `--seed kind=static,value=42` |
-| \--rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second. | “Rate” was overloaded to specify the primary configuration for each profile type. Specify with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
-| \--request-format Format to use for requests. Options depend on backend. For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-templateFor openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat\_completions); default /v1/chat/completions. | Specify as part of backend configuration, like `--backend kind=openai_http,request_format=/v1/responses` |
-| \--sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20\. | TBD |
-| \--scenario Builtin scenario name or path to config file. CLI options override scenario settings. | The preferred name is now `--config`, although both `--scenario` and `-c` are aliases, for example `--config chat` or `--config my-scenario.yaml`. |
-| \--target Target backend URL (e.g., [http://localhost:8000](http://localhost:8000)). | Specify as part of backend configuration, for example `--backend kind=openai_http,target=http://localhost:8000` |
-| \--warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric \>=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema. | Specify with profile, e.g., `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile ‘{“kind”:”concurrent”,”warmup”:{“mode”:”duration”,”value”:2}}` |
+| v0.6.0 option                                                                                                                                                                                                                                                                                                                                                                                                                                                     | v0.7.0 equivalent                                                                                                                                                                                                                                         |
+| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| --backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api_key": "apikey-\*", "verify": false}'                                                                                                                                                                                                                                                                                                                                               | Options passed to `--backend`, like `--backend “kind=openai_http,api_key=sk…”`                                                                                                                                                                            |
+| --backend Backend type. Options: vllm_python, openai_http.                                                                                                                                                                                                                                                                                                                                                                                                        | Merged with `--backend-kwargs` `--backend ‘{“kind”: “openai_http”, “extras”: {“body”: {“temperature”: 0.6}}}’`                                                                                                                                            |
+| --cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                             | Specify with profile, e.g., `--profile kind=synchronous,cooldown=2` for a two second cooldown or `--profile ‘{“kind”:”concurrent”,”cooldown”:{“mode”:”duration”,”value”:2}}`                                                                              |
+| --data-args JSON string of arguments to pass to dataset creation.                                                                                                                                                                                                                                                                                                                                                                                                 | Specified with “load_kwargs” as part of data, e.g., `--data ‘{“kind”:”huggingface”,”load_kwargs”:{“split”:”train”}}`                                                                                                                                      |
+| --data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text_column": "article", "output_tokens_count_column" :"output_tokens"}'                                                                                                                                                                                                                                                                                                    | Data column mappers have a “kind”: `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}`                                                                                                            |
+| --data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}'                                                                                                                                                                                                                                                                                                                                     | Use `--data-finalizer kind=generative`                                                                                                                                                                                                                    |
+| --data-num-workers Number of worker processes for data loading.                                                                                                                                                                                                                                                                                                                                                                                                   | Specified as part of Data Loader configuration with `--data-loader kind=pytorch,num_workers=3`                                                                                                                                                            |
+| --data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors.                                                                                                                                                                                                                                                                                                                                                                                | `--data-preprocessor ‘{“kind”:”encode_media”,”audio_kwargs”:{“format”:”mp3”}}`                                                                                                                                                                            |
+| --data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode_media,my_custom_preprocessor'                                                                                                                                                                                                                                                                                                                                                   | `--data-preprocessor kind=encode_media` … can be repeated to configure multiple preprocessors.                                                                                                                                                            |
+| --data-sampler Data sampler type.                                                                                                                                                                                                                                                                                                                                                                                                                                 | Shuffle function is under `--data-loader kind=pytorch,shuffle=true`                                                                                                                                                                                       |
+| --data-samples Number of samples from dataset. -1 (default) uses all samples and dynamically generates more.                                                                                                                                                                                                                                                                                                                                                      | Specify as part of Data Loader configuration, as `--data-loader kind=pytorch,samples=10`                                                                                                                                                                  |
+| --data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value).                                                                                                                                                                                                                                                                                                                                | `--data kind=huggingface,source=<id>` `--data kind=csv_file,path=<file.csv>` `--data kind=synthetic_text,prompt_tokens=128,output_tokens=64`                                                                                                              |
+| --dataloader-kwargs JSON string of arguments to pass to the dataloader constructor.                                                                                                                                                                                                                                                                                                                                                                               | Passed directly to Data Loader, as `--data_loader kind=pytorch,shuffle=true,samples=100`                                                                                                                                                                  |
+| --detect-saturation Enable over-saturation detection with default settings.                                                                                                                                                                                                                                                                                                                                                                                       | Enable oversaturation constraint, for example `--constraint kind=over_saturation`                                                                                                                                                                         |
+| --disable-console-interactive Disable interactive console progress updates.                                                                                                                                                                                                                                                                                                                                                                                       | Unchanged: `--disable-console-interactive` or `--disable-progress`                                                                                                                                                                                        |
+| --disable-console Disable all outputs to the console (updates, interactive progress, results).                                                                                                                                                                                                                                                                                                                                                                    | Unchanged: `--disable-console` or `--disable-console-outputs`                                                                                                                                                                                             |
+| --max-error-rate Maximum error rate before stopping the benchmark.                                                                                                                                                                                                                                                                                                                                                                                                | Enable maximum error rate constraint, for example `--constraint kind=max_error_rate,rate=10`                                                                                                                                                              |
+| --max-errors Maximum errors before stopping the benchmark.                                                                                                                                                                                                                                                                                                                                                                                                        | Enable maximum error count constraint, for example `--constraint kind=max_errors,count=10`                                                                                                                                                                |
+| --max-global-error-rate Maximum global error rate across all benchmarks.                                                                                                                                                                                                                                                                                                                                                                                          | Enable maximum global error rate constraint, for example `--constraint kind=max_global_error_rate,rate=10,minimum=100`                                                                                                                                    |
+| --max-requests Maximum requests per benchmark. If None, runs until max_seconds or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                | Enable maximum requests constraint, for example `--constraint kind=max_requests,count=1000`                                                                                                                                                               |
+| --max-seconds Maximum seconds per benchmark. If None, runs until max_requests or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                 | Enable maximum duration constraint, for example `--constraint kind=max_duration,seconds=60`                                                                                                                                                               |
+| --model Model ID to benchmark. If not provided, uses first available model.                                                                                                                                                                                                                                                                                                                                                                                       | Specify a model name as part of the backend configuration, for example `--backend kind=openai_http,model=gpt4`                                                                                                                                            |
+| --output-dir or –output-path: The directory path to save file output types in                                                                                                                                                                                                                                                                                                                                                                                     | Specify paths as part of the individual output configurations, for example `--output kind=json,path=/tmp/reports/benchmark.json`                                                                                                                          |
+| --outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.[EXT])                                                                                                                                                                                                                                                                                       | Specify multiple output formats by repeating the `--output` option, for example `--output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv`                                                                                              |
+| --over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min_seconds": 30}'). Defaults to None (disabled).                                                                                                                                                                                                                                                                                              | Enable oversaturation constraint, for example `--constraint kind=over_saturation,mode=enforce,min_seconds=30`                                                                                                                                             |
+| --processor-args JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                                                                                                                                                                                                                   | Specify options directly to the tokenizer, for example `--tokenizer ‘{“kind”:”huggingface_auto”,”load_kwargs”:{“fast”:true}}’`                                                                                                                            |
+| --processor Processor or tokenizer for token count calculations. If not provided, loads from model.                                                                                                                                                                                                                                                                                                                                                               | Defaults to the default tokenizer for the first model supported by the backend target. To override, `--tokenizer kind=huggingface_auto,model=gpt4`                                                                                                        |
+| --profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant.                                                                                                                                                                                                                                                                                                                                                  | Specify the benchmark profile to use, for example, `--profile kind=sweep,sweep_size=10,warmup=1,cooldown=1`                                                                                                                                               |
+| --rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies                                                                                                                                                                                                                                                                                                                                   | Specify as part of profile, for example `--profile kind=constant,rate=10,rampup_duration=2`                                                                                                                                                               |
+| --random-seed Random seed for reproducibility.                                                                                                                                                                                                                                                                                                                                                                                                                    | Specify the random seed configuration like `--seed kind=static,value=42`                                                                                                                                                                                  |
+| --rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second.                                                                                                                                                                                                                                                                                             | “Rate” was overloaded to specify the primary configuration for each profile type. Specify with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
+| --request-format Format to use for requests. Options depend on backend. For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-templateFor openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat_completions); default /v1/chat/completions. | Specify as part of backend configuration, like `--backend kind=openai_http,request_format=/v1/responses`                                                                                                                                                  |
+| --sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20.                                                                                                                                                                                                                                                                                                                                                        | Specify as part of the profile configuration, for example `--profile kind=concurrent,sample_size=20`                                                                                                                                                      |
+| --scenario Builtin scenario name or path to config file. CLI options override scenario settings.                                                                                                                                                                                                                                                                                                                                                                  | The preferred name is now `--config`, although both `--scenario` and `-c` are aliases, for example `--config chat` or `--config my-scenario.yaml`.                                                                                                        |
+| --target Target backend URL (e.g., [http://localhost:8000](http://localhost:8000)).                                                                                                                                                                                                                                                                                                                                                                               | Specify as part of backend configuration, for example `--backend kind=openai_http,target=http://localhost:8000`                                                                                                                                           |
+| --warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                              | Specify with profile, e.g., `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile ‘{“kind”:”concurrent”,”warmup”:{“mode”:”duration”,”value”:2}}`                                                                                    |
+
+| **NEW OPTIONS**                                                          | **v0.7.0 new options**                                                                                                                                            |
+| :----------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Add metadata to output reports                                           | Specify key-value pairs of metadata labels which will be written to the output reports, for example `--label gpu=NVIDIA_Z500 --label creator=Intrepid_Adventurer` |
+| Override concurrent profile stream count and async/constant/poisson rate | Specify profile settings to override the default profile settings, for example `--override profile.rate 10,20,30` or `--override profile.streams 10,20,30`        |
 
 ## `Guidellm benchmark from-file`
 
 Load a saved benchmark report and optionally re-export data
 
-| Option | v0.7.0 equivalent |
-| :---- | :---- |
-| PATH | Unchanged |
-| Path to the saved benchmark report file (default: ./benchmarks.json). | Unchanged |
-| \--output-path | Unchanged |
-| Directory or file path to save re-exported benchmark results. If a directory, all output formats will be saved there. If a file, the matching format will be saved to that file. | Unchanged |
-| \--output-formats | Unchanged |
-| Output formats for benchmark results (e.g., console, json, html, csv). | Unchanged |
+| Option                                                                                                                                                                           | v0.7.0 equivalent |
+| :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------- |
+| PATH                                                                                                                                                                             | Unchanged         |
+| Path to the saved benchmark report file (default: ./benchmarks.json).                                                                                                            | Unchanged         |
+| --output-path                                                                                                                                                                    | Unchanged         |
+| Directory or file path to save re-exported benchmark results. If a directory, all output formats will be saved there. If a file, the matching format will be saved to that file. | Unchanged         |
+| --output-formats                                                                                                                                                                 | Unchanged         |
+| Output formats for benchmark results (e.g., console, json, html, csv).                                                                                                           | Unchanged         |
 
 ## `guidellm config`
 
@@ -68,41 +73,41 @@ Changed from `guidellm config` to `guidellm env` to clarify that it displays env
 
 ## `guidellm mock-server`
 
-Start a mock OpenAI/vLLM-compatible server for testing. **\[NO CHANGE\]**
-
-| v0.6.0 option | v0.7.0 equivalent |
-| :---- | :---- |
-| \--host TEXT Host address to bind the server to. | Unchanged |
-| \--port INTEGER Port number to bind the server to. | Unchanged |
-| \--workers INTEGER Number of worker processes. | Unchanged |
-| \--model TEXT Name of the model to mock. | Unchanged |
-| \--processor TEXT Processor or tokenizer to use for requests. | Unchanged |
-| \--request-latency FLOAT Request latency in seconds for non-streaming requests. | Unchanged |
-| \--request-latency-std FLOAT Request latency standard deviation in seconds (normal distribution). | Unchanged |
-| \--ttft-ms FLOAT Time to first token in milliseconds for streaming requests. | Unchanged |
-| \--ttft-ms-std FLOAT Time to first token standard deviation in milliseconds. | Unchanged |
-| \--itl-ms FLOAT Inter-token latency in milliseconds for streaming requests. | Unchanged |
-| \--itl-ms-std FLOAT Inter-token latency standard deviation in milliseconds. | Unchanged |
-| \--output-tokens INTEGER Number of output tokens for streaming requests. | Unchanged |
-| \--output-tokens-std FLOAT Output tokens standard deviation (normal distribution). | Unchanged |
+Start a mock OpenAI/vLLM-compatible server for testing. **[NO CHANGE]**
+
+| v0.6.0 option                                                                                    | v0.7.0 equivalent |
+| :----------------------------------------------------------------------------------------------- | :---------------- |
+| --host TEXT Host address to bind the server to.                                                  | Unchanged         |
+| --port INTEGER Port number to bind the server to.                                                | Unchanged         |
+| --workers INTEGER Number of worker processes.                                                    | Unchanged         |
+| --model TEXT Name of the model to mock.                                                          | Unchanged         |
+| --processor TEXT Processor or tokenizer to use for requests.                                     | Unchanged         |
+| --request-latency FLOAT Request latency in seconds for non-streaming requests.                   | Unchanged         |
+| --request-latency-std FLOAT Request latency standard deviation in seconds (normal distribution). | Unchanged         |
+| --ttft-ms FLOAT Time to first token in milliseconds for streaming requests.                      | Unchanged         |
+| --ttft-ms-std FLOAT Time to first token standard deviation in milliseconds.                      | Unchanged         |
+| --itl-ms FLOAT Inter-token latency in milliseconds for streaming requests.                       | Unchanged         |
+| --itl-ms-std FLOAT Inter-token latency standard deviation in milliseconds.                       | Unchanged         |
+| --output-tokens INTEGER Number of output tokens for streaming requests.                          | Unchanged         |
+| --output-tokens-std FLOAT Output tokens standard deviation (normal distribution).                | Unchanged         |
 
 ## `guidellm preprocess dataset`
 
 Tools for preprocessing datasets for use in benchmarks.
 
-| v0.6.0 option | v0.7.0 equivalent |
-| :---- | :---- |
-| data (positional parameter) | Use dataset descriptor, for example `kind=huggingface,source=<id>` |
-| output_path (positional parameter) | Results file path, for example `file.json` |
-| \--processor TEXT Processor or tokenizer name for calculating token counts. | Unchanged |
-| \--config TEXT PreprocessDatasetConfig as JSON string,  key=value pairs, or file path (.json, .yaml, .yml, .config). Example: `prompt_tokens=100,output_tokens=50,prefix_tokens_max=10` or  `{"prompt_tokens": 100, "output_tokens": 50, "prefix_tokens_max": 10}` \[Mandatory\]  | Unchanged |
-| \--processor-args TEXT JSON string of arguments to pass to the processor constructor. | Unchanged |
-| \--data-args TEXT JSON string of arguments to pass to dataset creation | Unchanged |
-| \--data-column-mapper JSON string of column mappings to apply to the dataset | Specify a data column mapper object, for example `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}` |
-| \--short-prompt-strategy \[ignore|concatenate|pad|error\] Strategy for handling prompts shorter than target length. \[default: ignore\] | Unchanged |
-| \--pad-char TEXT Character to pad short prompts with when using “pad” strategy (used with ‘concatenate’ strategy). | Unchanged |
-| \--concat-delimiter TEXT Delimiter for concatenating short prompts (used with ‘concatenate’ strategy). | Unchanged |
-| \--include-prefix-in-token-count Include prefix tokens in prompt token count calculation. | Unchanged |
-| \--push-to-hub Push the processed dataset to Hugging Face Hub. | Unchanged |
-| \--hub-dataset-id TEXT Hugging Face Hub dataset ID for upload (required if `--push-to-hub` is set). | Unchanged |
-| \--random-seed INTEGER Random seed for reproducible token sampling. \[default: 42\] | Unchanged |
+| v0.6.0 option                                                                                                                                                                                                                                                               | v0.7.0 equivalent                                                                                                                                            |
+| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| data (positional parameter)                                                                                                                                                                                                                                                 | Use dataset descriptor, for example `kind=huggingface,source=<id>`                                                                                           |
+| output_path (positional parameter)                                                                                                                                                                                                                                          | Results file path, for example `file.json`                                                                                                                   |
+| --processor TEXT Processor or tokenizer name for calculating token counts.                                                                                                                                                                                                  | Unchanged                                                                                                                                                    |
+| --config TEXT PreprocessDatasetConfig as JSON string, key=value pairs, or file path (.json, .yaml, .yml, .config). Example: `prompt_tokens=100,output_tokens=50,prefix_tokens_max=10` or `{"prompt_tokens": 100, "output_tokens": 50, "prefix_tokens_max": 10}` [Mandatory] | Unchanged                                                                                                                                                    |
+| --processor-args TEXT JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                        | Unchanged                                                                                                                                                    |
+| --data-args TEXT JSON string of arguments to pass to dataset creation                                                                                                                                                                                                       | Unchanged                                                                                                                                                    |
+| --data-column-mapper JSON string of column mappings to apply to the dataset                                                                                                                                                                                                 | Specify a data column mapper object, for example `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}` |
+| --short-prompt-strategy \[ignore                                                                                                                                                                                                                                            | concatenate                                                                                                                                                  |
+| --pad-char TEXT Character to pad short prompts with when using “pad” strategy (used with ‘concatenate’ strategy).                                                                                                                                                           | Unchanged                                                                                                                                                    |
+| --concat-delimiter TEXT Delimiter for concatenating short prompts (used with ‘concatenate’ strategy).                                                                                                                                                                       | Unchanged                                                                                                                                                    |
+| --include-prefix-in-token-count Include prefix tokens in prompt token count calculation.                                                                                                                                                                                    | Unchanged                                                                                                                                                    |
+| --push-to-hub Push the processed dataset to Hugging Face Hub.                                                                                                                                                                                                               | Unchanged                                                                                                                                                    |
+| --hub-dataset-id TEXT Hugging Face Hub dataset ID for upload (required if `--push-to-hub` is set).                                                                                                                                                                          | Unchanged                                                                                                                                                    |
+| --random-seed INTEGER Random seed for reproducible token sampling. [default: 42]                                                                                                                                                                                            | Unchanged                                                                                                                                                    |

From 20136f20c48a6b8a652c793ad8f277436b4b1020 Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Thu, 25 Jun 2026 16:21:01 -0400
Subject: [PATCH 3/8] Something got mangled ...

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/v0.7.0_migration_guide.md | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/docs/guides/v0.7.0_migration_guide.md b/docs/guides/v0.7.0_migration_guide.md
index 50d6abae8..02acf59c2 100644
--- a/docs/guides/v0.7.0_migration_guide.md
+++ b/docs/guides/v0.7.0_migration_guide.md
@@ -54,14 +54,11 @@ This command is now `guidellm run`
 
 Load a saved benchmark report and optionally re-export data
 
-| Option                                                                                                                                                                           | v0.7.0 equivalent |
-| :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------- |
-| PATH                                                                                                                                                                             | Unchanged         |
-| Path to the saved benchmark report file (default: ./benchmarks.json).                                                                                                            | Unchanged         |
-| --output-path                                                                                                                                                                    | Unchanged         |
-| Directory or file path to save re-exported benchmark results. If a directory, all output formats will be saved there. If a file, the matching format will be saved to that file. | Unchanged         |
-| --output-formats                                                                                                                                                                 | Unchanged         |
-| Output formats for benchmark results (e.g., console, json, html, csv).                                                                                                           | Unchanged         |
+| Option                                                                                                                                                                                         | v0.7.0 equivalent |
+| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------- |
+| PATH Path to the saved benchmark report file (default: ./benchmarks.                                                                                                                           | Unchanged         |
+| --output-path Directory or file path to save re-exported benchmark results. If a directory, all output formats will be saved there. If a file, the matching format will be saved to that file. | Unchanged         |
+| --output-formats Output formats for benchmark results (e.g., console, json, html, csv).                                                                                                        | Unchanged         |
 
 ## `guidellm config`
 

From c39d7fbe300042bcd405aed6fd0cd1075462c257 Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Thu, 25 Jun 2026 16:29:05 -0400
Subject: [PATCH 4/8] Another cleanup on aisle preprocess

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/v0.7.0_migration_guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/v0.7.0_migration_guide.md b/docs/guides/v0.7.0_migration_guide.md
index 02acf59c2..56bab2707 100644
--- a/docs/guides/v0.7.0_migration_guide.md
+++ b/docs/guides/v0.7.0_migration_guide.md
@@ -101,7 +101,7 @@ Tools for preprocessing datasets for use in benchmarks.
 | --processor-args TEXT JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                        | Unchanged                                                                                                                                                    |
 | --data-args TEXT JSON string of arguments to pass to dataset creation                                                                                                                                                                                                       | Unchanged                                                                                                                                                    |
 | --data-column-mapper JSON string of column mappings to apply to the dataset                                                                                                                                                                                                 | Specify a data column mapper object, for example `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}` |
-| --short-prompt-strategy \[ignore                                                                                                                                                                                                                                            | concatenate                                                                                                                                                  |
+| --short-prompt-strategy [ignore, concatenate, pad, error] Strategy for handling prompts shorter than target length. [default: ignore]                                                                                                                                       | Unchanged                                                                                                                                                    |
 | --pad-char TEXT Character to pad short prompts with when using “pad” strategy (used with ‘concatenate’ strategy).                                                                                                                                                           | Unchanged                                                                                                                                                    |
 | --concat-delimiter TEXT Delimiter for concatenating short prompts (used with ‘concatenate’ strategy).                                                                                                                                                                       | Unchanged                                                                                                                                                    |
 | --include-prefix-in-token-count Include prefix tokens in prompt token count calculation.                                                                                                                                                                                    | Unchanged                                                                                                                                                    |

From 431745d4844087ef7ba05e57f6e065dc56a87d52 Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Thu, 25 Jun 2026 16:36:45 -0400
Subject: [PATCH 5/8] formatting?

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/datasets.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/datasets.md b/docs/guides/datasets.md
index 9bb525783..23be265c7 100644
--- a/docs/guides/datasets.md
+++ b/docs/guides/datasets.md
@@ -333,7 +333,7 @@ guidellm preprocess dataset \
 
 | Argument      | Description                                                                                                                                   |
 | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
-| `DATA`        | Identify the dataset to process. Supports all dataset formats documented in the [Dataset Configurations](#datasets).                     |
+| `DATA`        | Identify the dataset to process. Supports all dataset formats documented in the [Dataset Configurations](#datasets).                          |
 | `OUTPUT_PATH` | Path to save the processed dataset, including file suffix (e.g., `processed_dataset.jsonl`, `output.csv`).                                    |
 | `--processor` | **Required.** Processor or tokenizer name/path for calculating token counts. Can be a Hugging Face model ID or local path.                    |
 | `--config`    | **Required.** Configuration specifying target token sizes. Can be a JSON string, key=value pairs, or file path (.json, .yaml, .yml, .config). |

From cf491f2948b8ebcac9d27209c4e88a3a460020f3 Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Thu, 25 Jun 2026 20:15:22 -0400
Subject: [PATCH 6/8] Update sample size

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/v0.7.0_migration_guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/guides/v0.7.0_migration_guide.md b/docs/guides/v0.7.0_migration_guide.md
index 56bab2707..c47e018f6 100644
--- a/docs/guides/v0.7.0_migration_guide.md
+++ b/docs/guides/v0.7.0_migration_guide.md
@@ -40,7 +40,7 @@ This command is now `guidellm run`
 | --random-seed Random seed for reproducibility.                                                                                                                                                                                                                                                                                                                                                                                                                    | Specify the random seed configuration like `--seed kind=static,value=42`                                                                                                                                                                                  |
 | --rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second.                                                                                                                                                                                                                                                                                             | “Rate” was overloaded to specify the primary configuration for each profile type. Specify with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
 | --request-format Format to use for requests. Options depend on backend. For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-templateFor openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat_completions); default /v1/chat/completions. | Specify as part of backend configuration, like `--backend kind=openai_http,request_format=/v1/responses`                                                                                                                                                  |
-| --sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20.                                                                                                                                                                                                                                                                                                                                                        | Specify as part of the profile configuration, for example `--profile kind=concurrent,sample_size=20`                                                                                                                                                      |
+| --sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20.                                                                                                                                                                                                                                                                                                                                                        | Specify as part of the metrics configuration, for example `--metrics kind=generative,sample_size=20`                                                                                                                                                      |
 | --scenario Builtin scenario name or path to config file. CLI options override scenario settings.                                                                                                                                                                                                                                                                                                                                                                  | The preferred name is now `--config`, although both `--scenario` and `-c` are aliases, for example `--config chat` or `--config my-scenario.yaml`.                                                                                                        |
 | --target Target backend URL (e.g., [http://localhost:8000](http://localhost:8000)).                                                                                                                                                                                                                                                                                                                                                                               | Specify as part of backend configuration, for example `--backend kind=openai_http,target=http://localhost:8000`                                                                                                                                           |
 | --warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                              | Specify with profile, e.g., `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile ‘{“kind”:”concurrent”,”warmup”:{“mode”:”duration”,”value”:2}}`                                                                                    |

From c55bd32924a08cb956bba1585330edee8e2d3b58 Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Fri, 26 Jun 2026 07:02:29 -0400
Subject: [PATCH 7/8] stuff

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/v0.7.0_migration_guide.md | 56 +++++++++++++--------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/docs/guides/v0.7.0_migration_guide.md b/docs/guides/v0.7.0_migration_guide.md
index c47e018f6..6822c2634 100644
--- a/docs/guides/v0.7.0_migration_guide.md
+++ b/docs/guides/v0.7.0_migration_guide.md
@@ -8,19 +8,19 @@ This command is now `guidellm run`
 
 | v0.6.0 option                                                                                                                                                                                                                                                                                                                                                                                                                                                     | v0.7.0 equivalent                                                                                                                                                                                                                                         |
 | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| --backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api_key": "apikey-\*", "verify": false}'                                                                                                                                                                                                                                                                                                                                               | Options passed to `--backend`, like `--backend “kind=openai_http,api_key=sk…”`                                                                                                                                                                            |
-| --backend Backend type. Options: vllm_python, openai_http.                                                                                                                                                                                                                                                                                                                                                                                                        | Merged with `--backend-kwargs` `--backend ‘{“kind”: “openai_http”, “extras”: {“body”: {“temperature”: 0.6}}}’`                                                                                                                                            |
-| --cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                             | Specify with profile, e.g., `--profile kind=synchronous,cooldown=2` for a two second cooldown or `--profile ‘{“kind”:”concurrent”,”cooldown”:{“mode”:”duration”,”value”:2}}`                                                                              |
-| --data-args JSON string of arguments to pass to dataset creation.                                                                                                                                                                                                                                                                                                                                                                                                 | Specified with “load_kwargs” as part of data, e.g., `--data ‘{“kind”:”huggingface”,”load_kwargs”:{“split”:”train”}}`                                                                                                                                      |
-| --data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text_column": "article", "output_tokens_count_column" :"output_tokens"}'                                                                                                                                                                                                                                                                                                    | Data column mappers have a “kind”: `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}`                                                                                                            |
-| --data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}'                                                                                                                                                                                                                                                                                                                                     | Use `--data-finalizer kind=generative`                                                                                                                                                                                                                    |
+| --backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api_key": "apikey-\*", "verify": false}'                                                                                                                                                                                                                                                                                                                                               | Options passed to `--backend`, like `--backend "kind=openai_http,api_key=sk…"`                                                                                                                                                                            |
+| --backend Backend type. Options: vllm_python, openai_http.                                                                                                                                                                                                                                                                                                                                                                                                        | Merged with `--backend-kwargs` `--backend '{"kind": "openai_http", "extras": {"body": {"temperature": 0.6}}}'`                                                                                                                                            |
+| --cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                             | Specify with profile, e.g., `--profile kind=synchronous,cooldown=2` for a two second cooldown or `--profile '{"kind":"concurrent","cooldown":{"mode":"duration","value":2}}`                                                                              |
+| --data-args JSON string of arguments to pass to dataset creation.                                                                                                                                                                                                                                                                                                                                                                                                 | Specified with "load_kwargs" as part of data, e.g., `--data '{"kind":"huggingface","load_kwargs":{"split":"train"}}'`                                                                                                                                     |
+| --data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text_column": "article", "output_tokens_count_column" :"output_tokens"}'\`                                                                                                                                                                                                                                                                                                  | Data column mappers have a "kind": `--data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}}`                                                                                                            |
+| --data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}'\`                                                                                                                                                                                                                                                                                                                                   | Use `--data-finalizer kind=generative`                                                                                                                                                                                                                    |
 | --data-num-workers Number of worker processes for data loading.                                                                                                                                                                                                                                                                                                                                                                                                   | Specified as part of Data Loader configuration with `--data-loader kind=pytorch,num_workers=3`                                                                                                                                                            |
-| --data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors.                                                                                                                                                                                                                                                                                                                                                                                | `--data-preprocessor ‘{“kind”:”encode_media”,”audio_kwargs”:{“format”:”mp3”}}`                                                                                                                                                                            |
+| --data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors.                                                                                                                                                                                                                                                                                                                                                                                | `--data-preprocessor '{"kind":"encode_media","audio_kwargs":{"format":"mp3"}}'`                                                                                                                                                                           |
 | --data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode_media,my_custom_preprocessor'                                                                                                                                                                                                                                                                                                                                                   | `--data-preprocessor kind=encode_media` … can be repeated to configure multiple preprocessors.                                                                                                                                                            |
 | --data-sampler Data sampler type.                                                                                                                                                                                                                                                                                                                                                                                                                                 | Shuffle function is under `--data-loader kind=pytorch,shuffle=true`                                                                                                                                                                                       |
 | --data-samples Number of samples from dataset. -1 (default) uses all samples and dynamically generates more.                                                                                                                                                                                                                                                                                                                                                      | Specify as part of Data Loader configuration, as `--data-loader kind=pytorch,samples=10`                                                                                                                                                                  |
 | --data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value).                                                                                                                                                                                                                                                                                                                                | `--data kind=huggingface,source=<id>` `--data kind=csv_file,path=<file.csv>` `--data kind=synthetic_text,prompt_tokens=128,output_tokens=64`                                                                                                              |
-| --dataloader-kwargs JSON string of arguments to pass to the dataloader constructor.                                                                                                                                                                                                                                                                                                                                                                               | Passed directly to Data Loader, as `--data_loader kind=pytorch,shuffle=true,samples=100`                                                                                                                                                                  |
+| --dataloader-kwargs JSON string of arguments to pass to the dataloader constructor.                                                                                                                                                                                                                                                                                                                                                                               | Passed directly to Data Loader, as `--data-loader kind=pytorch,shuffle=true,samples=100`                                                                                                                                                                  |
 | --detect-saturation Enable over-saturation detection with default settings.                                                                                                                                                                                                                                                                                                                                                                                       | Enable oversaturation constraint, for example `--constraint kind=over_saturation`                                                                                                                                                                         |
 | --disable-console-interactive Disable interactive console progress updates.                                                                                                                                                                                                                                                                                                                                                                                       | Unchanged: `--disable-console-interactive` or `--disable-progress`                                                                                                                                                                                        |
 | --disable-console Disable all outputs to the console (updates, interactive progress, results).                                                                                                                                                                                                                                                                                                                                                                    | Unchanged: `--disable-console` or `--disable-console-outputs`                                                                                                                                                                                             |
@@ -30,20 +30,20 @@ This command is now `guidellm run`
 | --max-requests Maximum requests per benchmark. If None, runs until max_seconds or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                | Enable maximum requests constraint, for example `--constraint kind=max_requests,count=1000`                                                                                                                                                               |
 | --max-seconds Maximum seconds per benchmark. If None, runs until max_requests or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                 | Enable maximum duration constraint, for example `--constraint kind=max_duration,seconds=60`                                                                                                                                                               |
 | --model Model ID to benchmark. If not provided, uses first available model.                                                                                                                                                                                                                                                                                                                                                                                       | Specify a model name as part of the backend configuration, for example `--backend kind=openai_http,model=gpt4`                                                                                                                                            |
-| --output-dir or –output-path: The directory path to save file output types in                                                                                                                                                                                                                                                                                                                                                                                     | Specify paths as part of the individual output configurations, for example `--output kind=json,path=/tmp/reports/benchmark.json`                                                                                                                          |
+| --output-dir or –-output-path: The directory path to save file output types in                                                                                                                                                                                                                                                                                                                                                                                    | Specify paths as part of the individual output configurations, for example `--output kind=json,path=/tmp/reports/benchmark.json`                                                                                                                          |
 | --outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.[EXT])                                                                                                                                                                                                                                                                                       | Specify multiple output formats by repeating the `--output` option, for example `--output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv`                                                                                              |
 | --over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min_seconds": 30}'). Defaults to None (disabled).                                                                                                                                                                                                                                                                                              | Enable oversaturation constraint, for example `--constraint kind=over_saturation,mode=enforce,min_seconds=30`                                                                                                                                             |
-| --processor-args JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                                                                                                                                                                                                                   | Specify options directly to the tokenizer, for example `--tokenizer ‘{“kind”:”huggingface_auto”,”load_kwargs”:{“fast”:true}}’`                                                                                                                            |
+| --processor-args JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                                                                                                                                                                                                                   | Specify options directly to the tokenizer, for example `--tokenizer '{"kind":"huggingface_auto","load_kwargs":{"fast":true}}'`                                                                                                                            |
 | --processor Processor or tokenizer for token count calculations. If not provided, loads from model.                                                                                                                                                                                                                                                                                                                                                               | Defaults to the default tokenizer for the first model supported by the backend target. To override, `--tokenizer kind=huggingface_auto,model=gpt4`                                                                                                        |
 | --profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant.                                                                                                                                                                                                                                                                                                                                                  | Specify the benchmark profile to use, for example, `--profile kind=sweep,sweep_size=10,warmup=1,cooldown=1`                                                                                                                                               |
 | --rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies                                                                                                                                                                                                                                                                                                                                   | Specify as part of profile, for example `--profile kind=constant,rate=10,rampup_duration=2`                                                                                                                                                               |
 | --random-seed Random seed for reproducibility.                                                                                                                                                                                                                                                                                                                                                                                                                    | Specify the random seed configuration like `--seed kind=static,value=42`                                                                                                                                                                                  |
-| --rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second.                                                                                                                                                                                                                                                                                             | “Rate” was overloaded to specify the primary configuration for each profile type. Specify with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
+| --rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second.                                                                                                                                                                                                                                                                                             | "Rate" was overloaded to specify the primary configuration for each profile type. Specify with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
 | --request-format Format to use for requests. Options depend on backend. For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-templateFor openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat_completions); default /v1/chat/completions. | Specify as part of backend configuration, like `--backend kind=openai_http,request_format=/v1/responses`                                                                                                                                                  |
 | --sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20.                                                                                                                                                                                                                                                                                                                                                        | Specify as part of the metrics configuration, for example `--metrics kind=generative,sample_size=20`                                                                                                                                                      |
 | --scenario Builtin scenario name or path to config file. CLI options override scenario settings.                                                                                                                                                                                                                                                                                                                                                                  | The preferred name is now `--config`, although both `--scenario` and `-c` are aliases, for example `--config chat` or `--config my-scenario.yaml`.                                                                                                        |
 | --target Target backend URL (e.g., [http://localhost:8000](http://localhost:8000)).                                                                                                                                                                                                                                                                                                                                                                               | Specify as part of backend configuration, for example `--backend kind=openai_http,target=http://localhost:8000`                                                                                                                                           |
-| --warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                              | Specify with profile, e.g., `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile ‘{“kind”:”concurrent”,”warmup”:{“mode”:”duration”,”value”:2}}`                                                                                    |
+| --warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                              | Specify with profile, e.g., `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile '{"kind":"concurrent","warmup":{"mode":"duration","value":2}}'`                                                                                   |
 
 | **NEW OPTIONS**                                                          | **v0.7.0 new options**                                                                                                                                            |
 | :----------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -92,19 +92,19 @@ Start a mock OpenAI/vLLM-compatible server for testing. **[NO CHANGE]**
 
 Tools for preprocessing datasets for use in benchmarks.
 
-| v0.6.0 option                                                                                                                                                                                                                                                               | v0.7.0 equivalent                                                                                                                                            |
-| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| data (positional parameter)                                                                                                                                                                                                                                                 | Use dataset descriptor, for example `kind=huggingface,source=<id>`                                                                                           |
-| output_path (positional parameter)                                                                                                                                                                                                                                          | Results file path, for example `file.json`                                                                                                                   |
-| --processor TEXT Processor or tokenizer name for calculating token counts.                                                                                                                                                                                                  | Unchanged                                                                                                                                                    |
-| --config TEXT PreprocessDatasetConfig as JSON string, key=value pairs, or file path (.json, .yaml, .yml, .config). Example: `prompt_tokens=100,output_tokens=50,prefix_tokens_max=10` or `{"prompt_tokens": 100, "output_tokens": 50, "prefix_tokens_max": 10}` [Mandatory] | Unchanged                                                                                                                                                    |
-| --processor-args TEXT JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                        | Unchanged                                                                                                                                                    |
-| --data-args TEXT JSON string of arguments to pass to dataset creation                                                                                                                                                                                                       | Unchanged                                                                                                                                                    |
-| --data-column-mapper JSON string of column mappings to apply to the dataset                                                                                                                                                                                                 | Specify a data column mapper object, for example `--data-column-mapper ‘{“kind”:”generative_column_mapper”,”column_mappings”:{“text_column”:”instruction”}}` |
-| --short-prompt-strategy [ignore, concatenate, pad, error] Strategy for handling prompts shorter than target length. [default: ignore]                                                                                                                                       | Unchanged                                                                                                                                                    |
-| --pad-char TEXT Character to pad short prompts with when using “pad” strategy (used with ‘concatenate’ strategy).                                                                                                                                                           | Unchanged                                                                                                                                                    |
-| --concat-delimiter TEXT Delimiter for concatenating short prompts (used with ‘concatenate’ strategy).                                                                                                                                                                       | Unchanged                                                                                                                                                    |
-| --include-prefix-in-token-count Include prefix tokens in prompt token count calculation.                                                                                                                                                                                    | Unchanged                                                                                                                                                    |
-| --push-to-hub Push the processed dataset to Hugging Face Hub.                                                                                                                                                                                                               | Unchanged                                                                                                                                                    |
-| --hub-dataset-id TEXT Hugging Face Hub dataset ID for upload (required if `--push-to-hub` is set).                                                                                                                                                                          | Unchanged                                                                                                                                                    |
-| --random-seed INTEGER Random seed for reproducible token sampling. [default: 42]                                                                                                                                                                                            | Unchanged                                                                                                                                                    |
+| v0.6.0 option                                                                                                                                                                                                                                                               | v0.7.0 equivalent                                                                                                                                             |
+| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| data (positional parameter)                                                                                                                                                                                                                                                 | Use dataset descriptor, for example `kind=huggingface,source=<id>`                                                                                            |
+| output_path (positional parameter)                                                                                                                                                                                                                                          | Results file path, for example `file.json`                                                                                                                    |
+| --processor TEXT Processor or tokenizer name for calculating token counts.                                                                                                                                                                                                  | Unchanged                                                                                                                                                     |
+| --config TEXT PreprocessDatasetConfig as JSON string, key=value pairs, or file path (.json, .yaml, .yml, .config). Example: `prompt_tokens=100,output_tokens=50,prefix_tokens_max=10` or `{"prompt_tokens": 100, "output_tokens": 50, "prefix_tokens_max": 10}` [Mandatory] | Unchanged                                                                                                                                                     |
+| --processor-args TEXT JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                        | Unchanged                                                                                                                                                     |
+| --data-args TEXT JSON string of arguments to pass to dataset creation                                                                                                                                                                                                       | Unchanged                                                                                                                                                     |
+| --data-column-mapper JSON string of column mappings to apply to the dataset                                                                                                                                                                                                 | Specify a data column mapper object, for example `--data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}}'` |
+| --short-prompt-strategy [ignore, concatenate, pad, error] Strategy for handling prompts shorter than target length. [default: ignore]                                                                                                                                       | Unchanged                                                                                                                                                     |
+| --pad-char TEXT Character to pad short prompts with when using "pad" strategy (used with 'concatenate' strategy).                                                                                                                                                           | Unchanged                                                                                                                                                     |
+| --concat-delimiter TEXT Delimiter for concatenating short prompts (used with 'concatenate' strategy).                                                                                                                                                                       | Unchanged                                                                                                                                                     |
+| --include-prefix-in-token-count Include prefix tokens in prompt token count calculation.                                                                                                                                                                                    | Unchanged                                                                                                                                                     |
+| --push-to-hub Push the processed dataset to Hugging Face Hub.                                                                                                                                                                                                               | Unchanged                                                                                                                                                     |
+| --hub-dataset-id TEXT Hugging Face Hub dataset ID for upload (required if `--push-to-hub` is set).                                                                                                                                                                          | Unchanged                                                                                                                                                     |
+| --random-seed INTEGER Random seed for reproducible token sampling. [default: 42]                                                                                                                                                                                            | Unchanged                                                                                                                                                     |

From 8c417ac2d3b6081de1cbcb0c0c071d71c2c08182 Mon Sep 17 00:00:00 2001
From: David Butenhof <dbutenho@redhat.com>
Date: Fri, 26 Jun 2026 17:10:13 -0400
Subject: [PATCH 8/8] more

Signed-off-by: David Butenhof <dbutenho@redhat.com>
---
 docs/guides/v0.7.0_migration_guide.md | 104 +++++++++++---------------
 1 file changed, 45 insertions(+), 59 deletions(-)

diff --git a/docs/guides/v0.7.0_migration_guide.md b/docs/guides/v0.7.0_migration_guide.md
index 6822c2634..241cc76d5 100644
--- a/docs/guides/v0.7.0_migration_guide.md
+++ b/docs/guides/v0.7.0_migration_guide.md
@@ -6,44 +6,44 @@ Run a benchmark against a generative model.
 
 This command is now `guidellm run`
 
-| v0.6.0 option                                                                                                                                                                                                                                                                                                                                                                                                                                                     | v0.7.0 equivalent                                                                                                                                                                                                                                         |
-| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| --backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api_key": "apikey-\*", "verify": false}'                                                                                                                                                                                                                                                                                                                                               | Options passed to `--backend`, like `--backend "kind=openai_http,api_key=sk…"`                                                                                                                                                                            |
-| --backend Backend type. Options: vllm_python, openai_http.                                                                                                                                                                                                                                                                                                                                                                                                        | Merged with `--backend-kwargs` `--backend '{"kind": "openai_http", "extras": {"body": {"temperature": 0.6}}}'`                                                                                                                                            |
-| --cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                             | Specify with profile, e.g., `--profile kind=synchronous,cooldown=2` for a two second cooldown or `--profile '{"kind":"concurrent","cooldown":{"mode":"duration","value":2}}`                                                                              |
-| --data-args JSON string of arguments to pass to dataset creation.                                                                                                                                                                                                                                                                                                                                                                                                 | Specified with "load_kwargs" as part of data, e.g., `--data '{"kind":"huggingface","load_kwargs":{"split":"train"}}'`                                                                                                                                     |
-| --data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text_column": "article", "output_tokens_count_column" :"output_tokens"}'\`                                                                                                                                                                                                                                                                                                  | Data column mappers have a "kind": `--data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}}`                                                                                                            |
-| --data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}'\`                                                                                                                                                                                                                                                                                                                                   | Use `--data-finalizer kind=generative`                                                                                                                                                                                                                    |
-| --data-num-workers Number of worker processes for data loading.                                                                                                                                                                                                                                                                                                                                                                                                   | Specified as part of Data Loader configuration with `--data-loader kind=pytorch,num_workers=3`                                                                                                                                                            |
-| --data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors.                                                                                                                                                                                                                                                                                                                                                                                | `--data-preprocessor '{"kind":"encode_media","audio_kwargs":{"format":"mp3"}}'`                                                                                                                                                                           |
-| --data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode_media,my_custom_preprocessor'                                                                                                                                                                                                                                                                                                                                                   | `--data-preprocessor kind=encode_media` … can be repeated to configure multiple preprocessors.                                                                                                                                                            |
-| --data-sampler Data sampler type.                                                                                                                                                                                                                                                                                                                                                                                                                                 | Shuffle function is under `--data-loader kind=pytorch,shuffle=true`                                                                                                                                                                                       |
-| --data-samples Number of samples from dataset. -1 (default) uses all samples and dynamically generates more.                                                                                                                                                                                                                                                                                                                                                      | Specify as part of Data Loader configuration, as `--data-loader kind=pytorch,samples=10`                                                                                                                                                                  |
-| --data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value).                                                                                                                                                                                                                                                                                                                                | `--data kind=huggingface,source=<id>` `--data kind=csv_file,path=<file.csv>` `--data kind=synthetic_text,prompt_tokens=128,output_tokens=64`                                                                                                              |
-| --dataloader-kwargs JSON string of arguments to pass to the dataloader constructor.                                                                                                                                                                                                                                                                                                                                                                               | Passed directly to Data Loader, as `--data-loader kind=pytorch,shuffle=true,samples=100`                                                                                                                                                                  |
-| --detect-saturation Enable over-saturation detection with default settings.                                                                                                                                                                                                                                                                                                                                                                                       | Enable oversaturation constraint, for example `--constraint kind=over_saturation`                                                                                                                                                                         |
-| --disable-console-interactive Disable interactive console progress updates.                                                                                                                                                                                                                                                                                                                                                                                       | Unchanged: `--disable-console-interactive` or `--disable-progress`                                                                                                                                                                                        |
-| --disable-console Disable all outputs to the console (updates, interactive progress, results).                                                                                                                                                                                                                                                                                                                                                                    | Unchanged: `--disable-console` or `--disable-console-outputs`                                                                                                                                                                                             |
-| --max-error-rate Maximum error rate before stopping the benchmark.                                                                                                                                                                                                                                                                                                                                                                                                | Enable maximum error rate constraint, for example `--constraint kind=max_error_rate,rate=10`                                                                                                                                                              |
-| --max-errors Maximum errors before stopping the benchmark.                                                                                                                                                                                                                                                                                                                                                                                                        | Enable maximum error count constraint, for example `--constraint kind=max_errors,count=10`                                                                                                                                                                |
-| --max-global-error-rate Maximum global error rate across all benchmarks.                                                                                                                                                                                                                                                                                                                                                                                          | Enable maximum global error rate constraint, for example `--constraint kind=max_global_error_rate,rate=10,minimum=100`                                                                                                                                    |
-| --max-requests Maximum requests per benchmark. If None, runs until max_seconds or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                | Enable maximum requests constraint, for example `--constraint kind=max_requests,count=1000`                                                                                                                                                               |
-| --max-seconds Maximum seconds per benchmark. If None, runs until max_requests or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                 | Enable maximum duration constraint, for example `--constraint kind=max_duration,seconds=60`                                                                                                                                                               |
-| --model Model ID to benchmark. If not provided, uses first available model.                                                                                                                                                                                                                                                                                                                                                                                       | Specify a model name as part of the backend configuration, for example `--backend kind=openai_http,model=gpt4`                                                                                                                                            |
-| --output-dir or –-output-path: The directory path to save file output types in                                                                                                                                                                                                                                                                                                                                                                                    | Specify paths as part of the individual output configurations, for example `--output kind=json,path=/tmp/reports/benchmark.json`                                                                                                                          |
-| --outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.[EXT])                                                                                                                                                                                                                                                                                       | Specify multiple output formats by repeating the `--output` option, for example `--output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv`                                                                                              |
-| --over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min_seconds": 30}'). Defaults to None (disabled).                                                                                                                                                                                                                                                                                              | Enable oversaturation constraint, for example `--constraint kind=over_saturation,mode=enforce,min_seconds=30`                                                                                                                                             |
-| --processor-args JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                                                                                                                                                                                                                   | Specify options directly to the tokenizer, for example `--tokenizer '{"kind":"huggingface_auto","load_kwargs":{"fast":true}}'`                                                                                                                            |
-| --processor Processor or tokenizer for token count calculations. If not provided, loads from model.                                                                                                                                                                                                                                                                                                                                                               | Defaults to the default tokenizer for the first model supported by the backend target. To override, `--tokenizer kind=huggingface_auto,model=gpt4`                                                                                                        |
-| --profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant.                                                                                                                                                                                                                                                                                                                                                  | Specify the benchmark profile to use, for example, `--profile kind=sweep,sweep_size=10,warmup=1,cooldown=1`                                                                                                                                               |
-| --rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies                                                                                                                                                                                                                                                                                                                                   | Specify as part of profile, for example `--profile kind=constant,rate=10,rampup_duration=2`                                                                                                                                                               |
-| --random-seed Random seed for reproducibility.                                                                                                                                                                                                                                                                                                                                                                                                                    | Specify the random seed configuration like `--seed kind=static,value=42`                                                                                                                                                                                  |
-| --rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second.                                                                                                                                                                                                                                                                                             | "Rate" was overloaded to specify the primary configuration for each profile type. Specify with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
-| --request-format Format to use for requests. Options depend on backend. For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-templateFor openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat_completions); default /v1/chat/completions. | Specify as part of backend configuration, like `--backend kind=openai_http,request_format=/v1/responses`                                                                                                                                                  |
-| --sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20.                                                                                                                                                                                                                                                                                                                                                        | Specify as part of the metrics configuration, for example `--metrics kind=generative,sample_size=20`                                                                                                                                                      |
-| --scenario Builtin scenario name or path to config file. CLI options override scenario settings.                                                                                                                                                                                                                                                                                                                                                                  | The preferred name is now `--config`, although both `--scenario` and `-c` are aliases, for example `--config chat` or `--config my-scenario.yaml`.                                                                                                        |
-| --target Target backend URL (e.g., [http://localhost:8000](http://localhost:8000)).                                                                                                                                                                                                                                                                                                                                                                               | Specify as part of backend configuration, for example `--backend kind=openai_http,target=http://localhost:8000`                                                                                                                                           |
-| --warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                              | Specify with profile, e.g., `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile '{"kind":"concurrent","warmup":{"mode":"duration","value":2}}'`                                                                                   |
+| v0.6.0 option                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | v0.7.0 equivalent                                                                                                                                                                                                                                                                   |
+| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| --backend-kwargs JSON string of arguments to pass to the backend. E.g., '{"api_key": "apikey-\*", "verify": false}'                                                                                                                                                                                                                                                                                                                                                                  | Options passed to `--backend` constructor, for example `--backend "kind=openai_http,api_key=sk…"`                                                                                                                                                                                   |
+| --backend Backend type. Options: vllm_python, openai_http.                                                                                                                                                                                                                                                                                                                                                                                                                           | The "kind" of the backend specification, for example `--backend '{"kind": "openai_http", "extras": {"body": {"temperature": 0.6}}}'`                                                                                                                                                |
+| --cooldown Cooldown specification: int, float, or dict as string (json or key=value). Controls time or requests after measurement ends. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                                                | Specify with the `cooldown` profile attribute, for example `--profile kind=synchronous,cooldown=2` for a two second cooldown or `--profile '{"kind":"concurrent","cooldown":{"mode":"duration","value":2}}`                                                                         |
+| --data-args JSON string of arguments to pass to dataset creation.                                                                                                                                                                                                                                                                                                                                                                                                                    | Specified with "load_kwargs" data attribute, e.g., `--data '{"kind":"huggingface","load_kwargs":{"split":"train"}}'`                                                                                                                                                                |
+| --data-column-mapper JSON string of column mappings to apply to the dataset. E.g., '{"text_column": "article", "output_tokens_count_column" :"output_tokens"}'\`                                                                                                                                                                                                                                                                                                                     | Specify the kind and attributes of a data column mapper, for example `--data-column-mapper '{"kind":"generative_column_mapper","column_mappings":{"text_column":"instruction"}}`                                                                                                    |
+| --data-finalizer JSON string of finalizer to convert dataset rows to requests. E.g., 'generative' or '{"type": "generative"}'\`                                                                                                                                                                                                                                                                                                                                                      | Specify the kind of finalizer, for example `--data-finalizer kind=generative`                                                                                                                                                                                                       |
+| --data-num-workers Number of worker processes for data loading.                                                                                                                                                                                                                                                                                                                                                                                                                      | Specified with the Data Loader `num_workers` attribute, for example `--data-loader kind=pytorch,num_workers=3`                                                                                                                                                                      |
+| --data-preprocessors-kwargs JSON string of arguments to pass to all preprocessors.                                                                                                                                                                                                                                                                                                                                                                                                   | Add parameters to the data preprocessor constructor, for example `--data-preprocessor '{"kind":"encode_media","audio_kwargs":{"format":"mp3"}}'`                                                                                                                                    |
+| --data-preprocessors List of preprocessors to apply to the dataset. E.g., 'encode_media,my_custom_preprocessor'                                                                                                                                                                                                                                                                                                                                                                      | Specify the preprocessor kind and attributes, for example `--data-preprocessor kind=encode_media` … can be repeated to configure multiple preprocessors.                                                                                                                            |
+| --data-sampler Data sampler type.                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Shuffle function is a data loader attribute, for example `--data-loader kind=pytorch,shuffle=true`                                                                                                                                                                                  |
+| --data-samples Number of samples from dataset. -1 (default) uses all samples and dynamically generates more.                                                                                                                                                                                                                                                                                                                                                                         | Specify as part of Data Loader configuration, for example `--data-loader kind=pytorch,samples=10`                                                                                                                                                                                   |
+| --data HuggingFace dataset ID, path to dataset, path to data file (csv/json/jsonl/txt), or synthetic data config (json/key=value).                                                                                                                                                                                                                                                                                                                                                   | Specify the kind of dataset together with attributes, for example `--data kind=huggingface,source=<id>` `--data kind=csv_file,path=<file.csv>` `--data kind=synthetic_text,prompt_tokens=128,output_tokens=64`                                                                      |
+| --dataloader-kwargs JSON string of arguments to pass to the dataloader constructor.                                                                                                                                                                                                                                                                                                                                                                                                  | Passed directly to Data Loader, for example `--data-loader kind=pytorch,shuffle=true,samples=100`                                                                                                                                                                                   |
+| --detect-saturation Enable over-saturation detection with default settings.                                                                                                                                                                                                                                                                                                                                                                                                          | Specify oversaturation constraint kind and attributes, for example `--constraint kind=over_saturation`                                                                                                                                                                              |
+| --disable-console-interactive Disable interactive console progress updates.                                                                                                                                                                                                                                                                                                                                                                                                          | Unchanged: `--disable-console-interactive` or `--disable-progress`                                                                                                                                                                                                                  |
+| --disable-console Disable all outputs to the console (updates, interactive progress, results).                                                                                                                                                                                                                                                                                                                                                                                       | Unchanged: `--disable-console` or `--disable-console-outputs`                                                                                                                                                                                                                       |
+| --max-error-rate Maximum error rate before stopping the benchmark.                                                                                                                                                                                                                                                                                                                                                                                                                   | Specify maximum error rate constraint kind and attributes, for example `--constraint kind=max_error_rate,rate=10`                                                                                                                                                                   |
+| --max-errors Maximum errors before stopping the benchmark.                                                                                                                                                                                                                                                                                                                                                                                                                           | Specify maximum error count constraint kind and attributes, for example `--constraint kind=max_errors,count=10`                                                                                                                                                                     |
+| --max-global-error-rate Maximum global error rate across all benchmarks.                                                                                                                                                                                                                                                                                                                                                                                                             | Specify maximum global error rate constraint kind and attributes, for example `--constraint kind=max_global_error_rate,rate=10,minimum=100`                                                                                                                                         |
+| --max-requests Maximum requests per benchmark. If None, runs until max_seconds or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                                   | Specify maximum requests constraint kind and attributes, for example `--constraint kind=max_requests,count=1000`                                                                                                                                                                    |
+| --max-seconds Maximum seconds per benchmark. If None, runs until max_requests or data exhaustion.                                                                                                                                                                                                                                                                                                                                                                                    | Specify maximum duration constraint kind and attributes, for example `--constraint kind=max_duration,seconds=60`                                                                                                                                                                    |
+| --model Model ID to benchmark. If not provided, uses first available model.                                                                                                                                                                                                                                                                                                                                                                                                          | Specify with the `model` attribute of the backend configuration, for example `--backend kind=openai_http,model=gpt4`                                                                                                                                                                |
+| --output-dir or –-output-path: The directory path to save file output types in                                                                                                                                                                                                                                                                                                                                                                                                       | Specify paths as part of the individual output configurations, for example `--output kind=json,path=/tmp/reports/benchmark.json`                                                                                                                                                    |
+| --outputs The filename.ext for each of the outputs to create or the alises (json, csv, html) for the output files to create with their default file names (benchmark.[EXT])                                                                                                                                                                                                                                                                                                          | Specify multiple output formats by repeating the `--output` option, for example `--output kind=json,path=benchmark.json –output kind=csv,path=benchmark.csv`                                                                                                                        |
+| --over-saturation Enable over-saturation detection. Pass a JSON dict with configuration (e.g., '{"enabled": true, "min_seconds": 30}'). Defaults to None (disabled).                                                                                                                                                                                                                                                                                                                 | Specify oversaturation constraint kind and attributes, for example `--constraint kind=over_saturation,mode=enforce,min_seconds=30`                                                                                                                                                  |
+| --processor-args JSON string of arguments to pass to the processor constructor.                                                                                                                                                                                                                                                                                                                                                                                                      | Specify options directly along with the tokenizer kind, for example `--tokenizer '{"kind":"huggingface_auto","load_kwargs":{"fast":true}}'`                                                                                                                                         |
+| --processor Processor or tokenizer for token count calculations. If not provided, loads from model.                                                                                                                                                                                                                                                                                                                                                                                  | Defaults to the default tokenizer for the first model supported by the backend target. To override, specify the tokener kind and attributes, for example `--tokenizer kind=huggingface_auto,model=gpt4`                                                                             |
+| --profile Benchmark profile type. Options: sweep, async, poisson, synchronous, throughput, concurrent, constant.                                                                                                                                                                                                                                                                                                                                                                     | Specify the benchmark profile kind and attributes to use, for example, `--profile kind=sweep,sweep_size=10,warmup=1,cooldown=1`                                                                                                                                                     |
+| --rampup The time, in seconds, to ramp up the request rate over. Applicable for Throughput, Concurrent, and Constant strategies                                                                                                                                                                                                                                                                                                                                                      | Specify with the `rampup_duration` profile attribute, for example `--profile kind=constant,rate=10,rampup_duration=2`                                                                                                                                                               |
+| --random-seed Random seed for reproducibility.                                                                                                                                                                                                                                                                                                                                                                                                                                       | Specify the random seed configuration kind and attributes, for example `--seed kind=static,value=42`                                                                                                                                                                                |
+| --rate Benchmark rate(s) to test. Meaning depends on profile: sweep=number of benchmarks, concurrent=concurrent requests, async/constant/poisson=requests per second.                                                                                                                                                                                                                                                                                                                | "Rate" was overloaded to specify the primary configuration for each profile type. Specify the appropriate attribute with `--profile` or `--override profile.<name>`: async/constant/poisson → `rate`, concurrent → `streams`, sweep → `sweep_size`, throughput → `max_concurrency`. |
+| --request-format Format to use for requests. Options depend on backend.<br/><br/>For vLLM backend: plain (no chat template, text appending only), default-template (use tokenizer default), or a file path / single-line template per vLLM docs. Default: default-template<br/><br/>For openai backend: http endpoint path (/v1/chat/completions, /v1/completions, /v1/audio/transcriptions, /v1/audio/translations) or alias (e.g. chat_completions); default /v1/chat/completions. | Specify as part of backend configuration, like `--backend kind=openai_http,request_format=/v1/responses`                                                                                                                                                                            |
+| --sample-requests Number of sample requests per status to save. None (default) saves all, recommended: 20.                                                                                                                                                                                                                                                                                                                                                                           | Specify with the `sample_size` attribute of the metrics configuration, for example `--metrics kind=generative,sample_size=20`. The default if unspecified is to save all samples.                                                                                                   |
+| --scenario Builtin scenario name or path to config file. CLI options override scenario settings.                                                                                                                                                                                                                                                                                                                                                                                     | The preferred name is now `--config`, although both `--scenario` and `-c` are aliases, for example `--config chat` or `--config my-scenario.yaml`.                                                                                                                                  |
+| --target Target backend URL (e.g., [http://localhost:8000](http://localhost:8000)).                                                                                                                                                                                                                                                                                                                                                                                                  | Specify with the `target` attribute of the backend configuration, for example `--backend kind=openai_http,target=http://localhost:8000`                                                                                                                                             |
+| --warmup Warmup specification: int, float, or dict as string (json or key=value). Controls time or requests before measurement starts. Numeric in (0, 1): percent of duration or request count. Numeric >=1: duration in seconds or request count. Advanced config: see TransientPhaseConfig schema.                                                                                                                                                                                 | Specify with the `warmup` profile attribute, for example `--profile kind=synchronous,warmup=2` for a two second warmup or `--profile '{"kind":"concurrent","warmup":{"mode":"duration","value":2}}'`                                                                                |
 
 | **NEW OPTIONS**                                                          | **v0.7.0 new options**                                                                                                                                            |
 | :----------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -52,7 +52,10 @@ This command is now `guidellm run`
 
 ## `Guidellm benchmark from-file`
 
-Load a saved benchmark report and optionally re-export data
+Load a saved benchmark report and optionally re-export data.
+
+> [!WARNING]\
+> This command may be changed to be more consistent with the `run` command in the future.
 
 | Option                                                                                                                                                                                         | v0.7.0 equivalent |
 | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------- |
@@ -68,30 +71,13 @@ Changed from `guidellm config` to `guidellm env` to clarify that it displays env
 
 `guidellm config` will be used later for a different purpose, to generate YAML config files from `run` options.
 
-## `guidellm mock-server`
-
-Start a mock OpenAI/vLLM-compatible server for testing. **[NO CHANGE]**
-
-| v0.6.0 option                                                                                    | v0.7.0 equivalent |
-| :----------------------------------------------------------------------------------------------- | :---------------- |
-| --host TEXT Host address to bind the server to.                                                  | Unchanged         |
-| --port INTEGER Port number to bind the server to.                                                | Unchanged         |
-| --workers INTEGER Number of worker processes.                                                    | Unchanged         |
-| --model TEXT Name of the model to mock.                                                          | Unchanged         |
-| --processor TEXT Processor or tokenizer to use for requests.                                     | Unchanged         |
-| --request-latency FLOAT Request latency in seconds for non-streaming requests.                   | Unchanged         |
-| --request-latency-std FLOAT Request latency standard deviation in seconds (normal distribution). | Unchanged         |
-| --ttft-ms FLOAT Time to first token in milliseconds for streaming requests.                      | Unchanged         |
-| --ttft-ms-std FLOAT Time to first token standard deviation in milliseconds.                      | Unchanged         |
-| --itl-ms FLOAT Inter-token latency in milliseconds for streaming requests.                       | Unchanged         |
-| --itl-ms-std FLOAT Inter-token latency standard deviation in milliseconds.                       | Unchanged         |
-| --output-tokens INTEGER Number of output tokens for streaming requests.                          | Unchanged         |
-| --output-tokens-std FLOAT Output tokens standard deviation (normal distribution).                | Unchanged         |
-
 ## `guidellm preprocess dataset`
 
 Tools for preprocessing datasets for use in benchmarks.
 
+> [!WARNING]\
+> This command may be changed to be more consistent with the `run` command in the future.
+
 | v0.6.0 option                                                                                                                                                                                                                                                               | v0.7.0 equivalent                                                                                                                                             |
 | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | data (positional parameter)                                                                                                                                                                                                                                                 | Use dataset descriptor, for example `kind=huggingface,source=<id>`                                                                                            |