Skip to content

Audio transcription metrics (RTF & RTFx) #833

Description

@AlonKellner-RedHat

Problem Statement

When benchmarking audio transcription one of the more common metrics that's used is the Real-Time-Factor (RTF), or its inverse RTFx.

Currently, GuideLLM doesn't provide any insight into this when benchmarking audio, while it's very simple to calculate - E2E latency divided by audio duration (RTF) or the other way around (RTFx).

Proposed Solution

When using the audio_transcription or the realtime_transcription request types, automatically report RTF and RTFx.

Alternatives Considered

We could accept that RTF and RTFx won't be available when benchmarking audio transcription.

Usage Examples

python3 -m vllm.entrypoints.openai.api_server \
  --model mistralai/Voxtral-Mini-4B-Realtime-2602 \
  --tokenizer-mode mistral \
  --config-format mistral \
  --load-format mistral \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode":"PIECEWISE"}' \
  --tensor-parallel-size 1 \
  --max-model-len 45000 \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 16 \
  --gpu-memory-utilization 0.90 \
  --host 0.0.0.0 --port 8000


guidellm benchmark \
  --target http://localhost:8000/v1 \
  --request-type realtime_transcription \
  --backend openai_realtime_ws \
  --data /workspace/custom-audio-dataset/hf_dataset \
  --profile synchronous \
  --max-requests 10 \
  --output-dir /workspace/repo/runs/2026-04-23T13-41-41 \
  --outputs json,html,csv

# RTF & RTFx always reported when `request-type=realtime_transcription` or `request-type=audio_transcription`

Additional Context

#713
https://openvoice-tech.net/index.php/Real-time-factor

I am interested in creating a PR for this 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions