Skip to content

Bug: Twilio + OpenAI Realtime g711_ulaw produces garbled/static audio on all models #154

@JasonB-IBCP

Description

@JasonB-IBCP

Problem

When using Patter with Twilio + OpenAIRealtime() engine, callers hear persistent static/audio artifacts when the agent speaks. This also breaks STT — the model transcribes noise as random text (e.g., URLs) because the audio is corrupted.

Root Cause

Twilio Media Streams deliver g711_ulaw @ 8 kHz audio to Patter. For OpenAI Realtime, Patter requests output_audio_format: g711_ulaw in the session config. However, OpenAI Realtime models (both gpt-realtime-mini and gpt-realtime) ignore this request and always return PCM16 @ 24 kHz even when g711_ulaw is specified.

Patter 0.6.3 assumes the output format matches what was requested (sets _input_is_mulaw = True for openai_realtime provider), so it passes the PCM16 bytes directly to Twilio as mulaw.

Result: Twilio receives 24 kHz PCM16 bytes interpreted as 8 kHz mulaw → garbled/static audio + broken STT.

Evidence from call transcript

Caller said static — model transcribed noise as:

Requested fix

Patter should either:

  1. Detect the actual output format returned by OpenAI and transcode accordingly
  2. Default to pcm16 for Twilio and handle full 24k→8k transcoding chain (like OpenAIRealtime2 does)
  3. At minimum, document this limitation

Environment

  • getpatter: 0.6.3
  • Model: gpt-realtime-mini (default)
  • Carrier: Twilio
  • Both phones affected: Zoom phone + physical cell phone

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions