✨ Support for voxtral realtime stt#1277
Open
cameledev wants to merge 11 commits into
Open
Conversation
5d4ec36 to
ec7eb87
Compare
|
ec7eb87 to
de37643
Compare
cameledev
commented
May 28, 2026
Comment on lines
+80
to
+88
| const existingIndex = prevSegments.findIndex( | ||
| (s: TranscriptionSegmentWithParticipant) => s.id === segment.id | ||
| ) | ||
| if (existingIndex === -1) { | ||
| return [...prevSegments, { participant, ...segment }] | ||
| } | ||
| const next = prevSegments.slice() | ||
| next[existingIndex] = { ...next[existingIndex], ...segment } | ||
| return next |
Collaborator
Author
There was a problem hiding this comment.
allow update of previously received segments
cameledev
commented
May 29, 2026
| "default_country": settings.ROOM_TELEPHONY_DEFAULT_COUNTRY, | ||
| }, | ||
| "subtitle": {"enabled": settings.ROOM_SUBTITLE_ENABLED}, | ||
| "subtitle": {"enabled": True}, # settings.ROOM_SUBTITLE_ENABLED}, |
Collaborator
Author
There was a problem hiding this comment.
REVERT BEFORE MERGE
Comment on lines
589
to
590
Collaborator
Author
There was a problem hiding this comment.
REVERT BEFORE MERGE (2/2)
| send_t.cancel() | ||
| try: | ||
| await send_t | ||
| except (asyncio.CancelledError, websockets.WebSocketException): |
| authentication_classes=[LiveKitTokenAuthentication], | ||
| ) | ||
| @FeatureFlag.require("subtitle") | ||
| ) # @FeatureFlag.require("subtitle") |
Collaborator
There was a problem hiding this comment.
revert before merge
| build: | ||
| context: ./src/agents | ||
| target: development | ||
| command: ["python", "multi_user_transcriber.py", "dev"] |
Collaborator
There was a problem hiding this comment.
It's not necessary, but I understand if you want to keep more control over the compose service
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.







Purpose
Add support for voxtral realtime stt (live transcription).
Why we can't use mistralai plugin
The official
livekit-plugins-mistralaiplugin is hardwired to Mistral's hosted cloud API both the URL path and the wire protocol. We self-host Voxtral on vLLM, which speaks a different protocol on a different path, so the plugin can't be pointed at it. Four distinct issues, in the order they occured:1. Hardcoded WebSocket path
The Mistral SDK's
RealtimeTranscription._build_urlhardcodes the path/v1/audio/transcriptions/realtime.server_urlonly controls scheme + host there is no kwarg to change the suffix, and the LiveKit plugin callsconnect()without overriding it either. vLLM serves Voxtral realtime at/v1/realtime. That route doesn't exist on vLLM, so every WebSocket connnection attemps returned HTTP 500. A monkeypatch on_build_url(/v1/audio/transcriptions/realtime→/v1/realtime) fixed this issue.2. Incompatible wire protocol (the real showstopper)
Even with the path fixed, the two sides speak different languages:
RealtimeTranscriptionSessionCreated,TranscriptionStreamTextDelta,TranscriptionStreamDonesession.created,transcription.delta,transcription.donesend_audio/flush_audio/end_audioinput_audio_buffer.append/.commitThe Mistral SDK cannot parse vLLM's messages. Monkeypatching this would mean reimplementing the SDK's entire receive path.
3. The handshake failures (how the mismatch actually surfaced)
rt.connect()does two sequential handshakes, and the plugin broke at both:websockets.exceptions.InvalidStatus: server rejected WebSocket connection: HTTP 500thrown out ofConnectionPool.prewarm._build_urlmonkeypatch, the upgrade succeeded (
[accepted]/connection openin vLLM logs), but_recv_handshakethen blocked onawait websocket.recv()waiting for Mistral'ssession.createdevent. vLLM never sends that event (its OpenAI-style handshake has a different shape), so the 10s LiveKit connect timeout fired →CancelledError→TimeoutError→APIConnectionError: Connection error., retried, then session closed as unrecoverable.4. Dependency resolution failure (packaging)
Independently,
livekit-plugins-mistralai==1.5.4pulls inmistralai[realtime]>=2.0.0. Underuv's universal resolution acrossrequires-python = ">=3.12"(which now includes 3.14), that extra isunresolvable →
requirements are unsatisfiable. (It "worked before" onlybecause pip resolved against the single Docker Python 3.13.)