Add Chatterbox text-to-speech backend by Geramy · Pull Request #2256 · lemonade-sdk/lemonade

Geramy · 2026-06-15T20:39:49Z

Integrates Resemble AI's Chatterbox as a text-to-speech backend, served through the existing OpenAI-compatible /v1/audio/speech endpoint.

Highlights

Multi-device: CUDA, ROCm, Metal, and CPU — auto-selects GPU when available, falls back to CPU.
Byte-level streaming: raw PCM16 @ 24 kHz via Chatterbox's generate_stream when available, with a single-chunk fallback otherwise.
Three variants: English, Multilingual, and Turbo (registered in server_models.json).
Selective downloads: only fetches the weight files each variant actually loads (~3 GB) instead of the full ~14 GB repo.

Pieces

tools/chatterbox-server/main.py — thin OpenAI-compatible HTTP wrapper around chatterbox-tts.
ChatterboxServer C++ backend (WrappedServer + ITextToSpeechServer) with per-device install params.
Recipe wiring: router, backend_utils, system_info (preference order), model_types, recipe_options, runtime_config, model_manager.
backend_versions.json pins + docs page + minor UI display name/order.

Self-contained bundles are built by the companion lemonade-sdk/chatterbox-rocm distribution repo (tracks chatterbox-tts PyPI releases).

Integrate Resemble AI's Chatterbox TTS as a new backend supporting CUDA, ROCm, Metal, and CPU, defaulting to GPU acceleration with CPU fallback. Exposes the existing OpenAI-compatible /v1/audio/speech endpoint with byte-level PCM streaming. Registers English, Multilingual, and Turbo variants, with variant-aware selective downloads to avoid pulling the full multi-gigabyte repo. Bundles are built by the lemonade-sdk/chatterbox-rocm distribution repo.

jeremyfowers · 2026-06-15T20:58:06Z

Scheduled for next release - please do not merge before 10.8 releases.

+
+    prompt = body.get("audio_prompt_path")
+    voice = body.get("voice")
+    if not prompt and isinstance(voice, str) and os.path.isfile(voice):


GitHub release assets are capped at 2 GiB; frozen torch+CUDA/ROCm bundles exceed that. Enable supports_split_archive and switch the Windows asset to .tar.gz (extracted via native tar) so the split-archive installer path serves all platforms.

fl0rianr

Thanks for the integration — the overall shape looks good, and the latest push addressing split archives is useful. I still think this should not merge yet.

Main blockers:

CodeQL is flagging an uncontrolled path expression in tools/chatterbox-server/main.py, and I think this is a real issue. The HTTP request can provide audio_prompt_path, and voice is also interpreted as a local host path when os.path.isfile(voice) succeeds. Please avoid accepting arbitrary host paths from API requests. Prefer registered voice IDs, uploaded/temp files, or a strict allowlisted directory with canonical-path validation.
The advertised platform matrix does not match the generated asset names. system_info.cpp marks Chatterbox CPU as supported on Windows/Linux/macOS for both x86_64 and arm64, but get_install_params() always emits windows-x64, linux-x64, and macos-arm64. Please either constrain the support matrix or generate arch-correct asset names.
Chatterbox-Multilingual needs explicit handling for missing language_id. Upstream requires it, while the wrapper only forwards it when present. This should return a clean 400 or use a documented default language.
Please reconsider the production from_local() -> from_pretrained() fallback. It can hide selective-download bugs and unexpectedly perform network downloads during model load. Lemonade should fail fast if the pulled snapshot is incomplete.
Please make sure the backend includes the new watchdog logic as well like #2252

Geramy requested review from fl0rianr and jeremyfowers June 15, 2026 20:40

jeremyfowers added this to the Lemonade v10.9 milestone Jun 15, 2026

github-actions Bot added engine::kokoro Kokoro TTS backend audio enhancement New feature or request labels Jun 15, 2026

github-advanced-security AI found potential problems Jun 15, 2026

View reviewed changes

Comment thread tools/chatterbox-server/main.py

prompt = body.get("audio_prompt_path")

voice = body.get("voice")

if not prompt and isinstance(voice, str) and os.path.isfile(voice):

fl0rianr requested changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Chatterbox text-to-speech backend#2256

Add Chatterbox text-to-speech backend#2256
Geramy wants to merge 2 commits into
mainfrom
geramy/chatterbox-implementation

Geramy commented Jun 15, 2026

Uh oh!

jeremyfowers commented Jun 15, 2026

Uh oh!

fl0rianr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Geramy commented Jun 15, 2026

Highlights

Pieces

Uh oh!

jeremyfowers commented Jun 15, 2026

Uh oh!

fl0rianr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants