Add WebSocket round-trip metrics (Time To Last Round Trip and Avg RTT)#850
Open
dreamer-89 wants to merge 5 commits into
Open
Add WebSocket round-trip metrics (Time To Last Round Trip and Avg RTT)#850dreamer-89 wants to merge 5 commits into
dreamer-89 wants to merge 5 commits into
Conversation
…ject#832) Add RED unit tests for Time To Last Round Trip and Avg RTT on the openai_websocket backend (send/receive timestamp recording, computed properties, None-gating, and metric compile). They fail until the implementation lands. Assisted-by: Claude Code Signed-off-by: Suraj Singh <surajrider@gmail.com>
) Add scalar round-trip timing fields to RequestTimings and record per-frame sent timestamps and received content-token timestamps in the openai_websocket backend. These feed the Time To Last Round Trip and Avg RTT metrics; HTTP backends leave them unset. Assisted-by: Claude Code Signed-off-by: Suraj Singh <surajrider@gmail.com>
…llm-project#832) Add time_to_last_round_trip_ms and avg_round_trip_time_ms as computed request stats (None unless send timings exist), aggregate them in GenerativeMetrics.compile(), and accumulate them for live progress. Avg RTT is an approximation of the mean send-to-receive lag. Assisted-by: Claude Code Signed-off-by: Suraj Singh <surajrider@gmail.com>
…vllm-project#832) Surface Time To Last Round Trip and Avg RTT in the request latency console table and the CSV export. JSON/YAML already include them via the schema; the HTML report is left for a follow-up. Assisted-by: Claude Code Signed-off-by: Suraj Singh <surajrider@gmail.com>
Describe Time To Last Round Trip and Avg RTT in the metrics guide, noting they are websocket-only and that Avg RTT is an approximation. Assisted-by: Claude Code Signed-off-by: Suraj Singh <surajrider@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two round-trip metrics to the realtime WebSocket backend (
openai_websocket): Time To Last Round Trip and an approximate Avg RTT. In streaming transcription, input packets and output tokens flow as separate streams, so the existing token metrics (TTFT, ITL, TPOT) do not capture the send-to-receive lag a user actually experiences. These two estimate that lag from the send and receive timestamps and are reported automatically whenever the WebSocket backend is used. Per the issue discussion, Time To First Round Trip is omitted because it overlaps the existing TTFT, and the names avoidTTFT/TTLTto prevent collision with the token metrics.Details
schemas/info.py: add scalar timing fields toRequestTimings(last_request_sent,request_sent_sum/request_sent_count,token_received_sum/token_received_count). Scalars rather than lists, so they stay correct underRequestInfo.model_copy(which shallow-copies timings).backends/openai/websocket.py: record a timestamp after each outbound frame (session.update, audio appends, both commits) via a small_record_request_senthelper, and accumulate received content-token timestamps in_record_content_tokens.schemas/request_stats.py: add computed propertiestime_to_last_round_trip_ms(last received token minus last sent packet) andavg_round_trip_time_ms(mean received minus mean sent). Both returnNonewhen no sends were recorded, so non-WebSocket backends are unaffected.benchmark/schemas/metrics.pyandaccumulator.py: add the twoStatusDistributionSummaryfields, compile them inGenerativeMetrics.compile(), and accumulate them for live progress.benchmark/outputs/console.pyandcsv.py: show both in the request-latency table and the CSV export. JSON/YAML already include them via the schema. The HTML report is left for a follow-up.docs/guides/metrics.md: document both metrics, noting they are WebSocket-only and that Avg RTT is approximate (it assumes sent packets and received tokens line up evenly in time).Test Plan
tox -e test-unitpasses, including new tests:tests/unit/backends/openai/test_realtime_ws.py: the backend records send/receive timestamps (counts, sums, last-sent) over an in-process WebSocket stub.tests/unit/schemas/test_request_stats.py: the two properties compute correctly and returnNonewithout send timings.tests/unit/benchmark/schemas/test_metrics.py: the metrics expose schema fields and compile into distributions.tox -e lint-checkandtox -e type-checkpass.OpenAIWebSocketBackend.resolve()against a fake realtime server with ~100 ms simulated per-token lag. Time To Last Round Trip came out around 300 ms and Avg RTT around 200 ms (matching the lag), and both wereNonefor a non-WebSocket request. For a live run, point a benchmark at a vLLM realtime server (e.g. Voxtral) and confirm both metrics appear in the console latency table andbenchmarks.json.Related Issues
Use of AI
git log
commit 814859c
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700
commit c5e2599
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700
commit e85cc9b
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700
commit 2bb4b29
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:52 2026 -0700
commit fc9858a
Author: Suraj Singh surajrider@gmail.com
Date: Wed Jun 24 15:00:53 2026 -0700
Assisted-by: Claude Code
Signed-off-by: Suraj Singh surajrider@gmail.com