Skip to content

Add WebSocket round-trip metrics (Time To Last Round Trip and Avg RTT)#850

Open
dreamer-89 wants to merge 5 commits into
vllm-project:mainfrom
dreamer-89:feat/ws-roundtrip-metrics
Open

Add WebSocket round-trip metrics (Time To Last Round Trip and Avg RTT)#850
dreamer-89 wants to merge 5 commits into
vllm-project:mainfrom
dreamer-89:feat/ws-roundtrip-metrics

Conversation

@dreamer-89

@dreamer-89 dreamer-89 commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Adds two round-trip metrics to the realtime WebSocket backend (openai_websocket): Time To Last Round Trip and an approximate Avg RTT. In streaming transcription, input packets and output tokens flow as separate streams, so the existing token metrics (TTFT, ITL, TPOT) do not capture the send-to-receive lag a user actually experiences. These two estimate that lag from the send and receive timestamps and are reported automatically whenever the WebSocket backend is used. Per the issue discussion, Time To First Round Trip is omitted because it overlaps the existing TTFT, and the names avoid TTFT/TTLT to prevent collision with the token metrics.

Details

  • schemas/info.py: add scalar timing fields to RequestTimings (last_request_sent, request_sent_sum/request_sent_count, token_received_sum/token_received_count). Scalars rather than lists, so they stay correct under RequestInfo.model_copy (which shallow-copies timings).
  • backends/openai/websocket.py: record a timestamp after each outbound frame (session.update, audio appends, both commits) via a small _record_request_sent helper, and accumulate received content-token timestamps in _record_content_tokens.
  • schemas/request_stats.py: add computed properties time_to_last_round_trip_ms (last received token minus last sent packet) and avg_round_trip_time_ms (mean received minus mean sent). Both return None when no sends were recorded, so non-WebSocket backends are unaffected.
  • benchmark/schemas/metrics.py and accumulator.py: add the two StatusDistributionSummary fields, compile them in GenerativeMetrics.compile(), and accumulate them for live progress.
  • benchmark/outputs/console.py and csv.py: show both in the request-latency table and the CSV export. JSON/YAML already include them via the schema. The HTML report is left for a follow-up.
  • docs/guides/metrics.md: document both metrics, noting they are WebSocket-only and that Avg RTT is approximate (it assumes sent packets and received tokens line up evenly in time).

Test Plan

  • tox -e test-unit passes, including new tests:
    • tests/unit/backends/openai/test_realtime_ws.py: the backend records send/receive timestamps (counts, sums, last-sent) over an in-process WebSocket stub.
    • tests/unit/schemas/test_request_stats.py: the two properties compute correctly and return None without send timings.
    • tests/unit/benchmark/schemas/test_metrics.py: the metrics expose schema fields and compile into distributions.
  • tox -e lint-check and tox -e type-check pass.
  • Manual check on CPU (no GPU): drove OpenAIWebSocketBackend.resolve() against a fake realtime server with ~100 ms simulated per-token lag. Time To Last Round Trip came out around 300 ms and Avg RTT around 200 ms (matching the lag), and both were None for a non-WebSocket request. For a live run, point a benchmark at a vLLM realtime server (e.g. Voxtral) and confirm both metrics appear in the console latency table and benchmarks.json.

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes code generated or substantially modified by an AI agent
  • Includes tests generated or substantially modified by an AI agent

NOTE: the Generated-by or Assisted-by trailers should be used in git commit messages when code or tests were generated or substantially modified by an AI agent, as described in the project's DEVELOPING.md file.


git log

commit 814859c
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700

test(openai): add failing websocket round-trip metric tests (#832)

Add RED unit tests for Time To Last Round Trip and Avg RTT on the openai_websocket backend (send/receive timestamp recording, computed properties, None-gating, and metric compile). They fail until the implementation lands.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>

commit c5e2599
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700

feat(openai): record websocket send/receive timestamps (#832)

Add scalar round-trip timing fields to RequestTimings and record per-frame sent timestamps and received content-token timestamps in the openai_websocket backend. These feed the Time To Last Round Trip and Avg RTT metrics; HTTP backends leave them unset.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>

commit e85cc9b
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700

feat(benchmark): compute and aggregate websocket round-trip metrics (#832)

Add time_to_last_round_trip_ms and avg_round_trip_time_ms as computed request stats (None unless send timings exist), aggregate them in GenerativeMetrics.compile(), and accumulate them for live progress. Avg RTT is an approximation of the mean send-to-receive lag.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>

commit 2bb4b29
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700

feat(benchmark): show websocket round-trip metrics in console and CSV (#832)

Surface Time To Last Round Trip and Avg RTT in the request latency console table and the CSV export. JSON/YAML already include them via the schema; the HTML report is left for a follow-up.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>

commit fc9858a
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:53 2026 -0700

docs(metrics): document websocket round-trip metrics (#832)

Describe Time To Last Round Trip and Avg RTT in the metrics guide, noting they are websocket-only and that Avg RTT is an approximation.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>

Assisted-by: Claude Code
Signed-off-by: Suraj Singh surajrider@gmail.com

…ject#832)

Add RED unit tests for Time To Last Round Trip and Avg RTT on the openai_websocket backend (send/receive timestamp recording, computed properties, None-gating, and metric compile). They fail until the implementation lands.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>
)

Add scalar round-trip timing fields to RequestTimings and record per-frame sent timestamps and received content-token timestamps in the openai_websocket backend. These feed the Time To Last Round Trip and Avg RTT metrics; HTTP backends leave them unset.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>
…llm-project#832)

Add time_to_last_round_trip_ms and avg_round_trip_time_ms as computed request stats (None unless send timings exist), aggregate them in GenerativeMetrics.compile(), and accumulate them for live progress. Avg RTT is an approximation of the mean send-to-receive lag.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>
…vllm-project#832)

Surface Time To Last Round Trip and Avg RTT in the request latency console table and the CSV export. JSON/YAML already include them via the schema; the HTML report is left for a follow-up.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Describe Time To Last Round Trip and Avg RTT in the metrics guide, noting they are websocket-only and that Avg RTT is an approximation.

Assisted-by: Claude Code
Signed-off-by: Suraj Singh <surajrider@gmail.com>
@dreamer-89 dreamer-89 marked this pull request as ready for review June 24, 2026 22:32
@sjmonson sjmonson added this to the v0.8.0 milestone Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Websocket backend metrics

2 participants