Ultra-TTS

Ultra-TTS is an open-source local browser GUI and CLI workspace for running Japanese and multilingual text-to-speech models.

It is designed for developers, creators, educators, and accessibility-focused users who want to run local TTS workflows without relying only on cloud APIs.

The project is still early, but it already provides a practical workspace for local TTS experimentation, backend comparison, long-form text splitting, and multi-speaker generation.

Why Ultra-TTS

Local TTS workflows are useful when users need more control over privacy, cost, latency, model choice, or offline experimentation than a cloud-only workflow can provide.

Ultra-TTS focuses on making local Japanese and multilingual TTS easier to try from one workspace.

The project combines a browser-based GUI, CLI entry points, backend setup notes, local model storage conventions, and lightweight tests so maintainers can keep improving the workflow without requiring model downloads for every development task.

Demo

A Web UI screenshot is tracked as a maintainer task and will be added in a future release. See docs/demo.md.

Audio samples will be added only when model and voice licenses allow redistribution.

Features

Local browser-based TTS GUI served from web_app.py
CLI workspace for local TTS experiments and backend scripts
Japanese and multilingual TTS workflows
Single-text generation
Multi-speaker script generation with speaker labels
Long-form generation for articles, lessons, and pasted text
Long-form text splitting before backend calls
Manifest metadata next to generated long-form audio
Local model, cache, output, and log directory handling
Windows PowerShell and macOS/Linux shell launch scripts

Supported backends

Ultra-TTS integrates several local TTS workflows.

Backend availability depends on the models and dependencies installed in your local environment.

LM Studio / Orpheus: uses LM Studio's local OpenAI-compatible API for Orpheus-style speech token generation.
Chatterbox Multilingual: local multilingual TTS workflow used for Japanese and multilingual generation.
Kokoro worker: lightweight worker-based Kokoro workflow, primarily for English voices.
Piper: local-process backend using downloaded ONNX voice files in models/piper/.
Dia: experimental English dialogue backend for speaker-tagged dialogue.
MLX-Audio: Apple Silicon workflow for Chatterbox, Qwen3-TTS, Kokoro, and Dia MLX models.

Quick start

Clone the repository and enter the project directory:

git clone https://github.com/Firton/Ultra-TTS.git
cd Ultra-TTS

Create a Python environment appropriate for your platform and backend. For lightweight development and tests, no model downloads are required.

Run the browser UI on macOS/Linux:

./launch-web.sh

Run the browser UI on Windows:

powershell.exe -NoProfile -ExecutionPolicy Bypass -File .\launch-web.ps1

Or run the app directly:

python web_app.py --host 127.0.0.1 --port 8765 --open

Then open:

http://127.0.0.1:8765

For a desktop shortcut, create a shortcut that runs launch-web.ps1 on Windows or launch-web.sh on macOS/Linux.

Run lightweight tests:

python -m unittest discover -s tests

Backend setup notes

LM Studio / Orpheus

Install LM Studio.
Download an Orpheus GGUF model, for example orpheus-3b-0.1-ft-q4_k_m.gguf.
Load the model in LM Studio.
Start the local server in LM Studio at http://127.0.0.1:1234.
Select the Orpheus backend in the Ultra-TTS web UI.

CLI example:

python gguf_orpheus.py --text "Hello, this is a test" --voice tara

Orpheus voices:

tara, leah, jess, leo, dan, mia, zac, zoe

Chatterbox Multilingual

Chatterbox is used for local multilingual generation, including Japanese workflows. It requires its Python dependencies and model files to be available locally.

Kokoro

Kokoro runs through a separate lightweight worker environment.

In this repository, Kokoro-specific dependencies are expected to live outside the main application environment when needed.

Piper

Piper is a lightweight local-process backend. Download voice files into models/piper/:

python scripts/download_models.py --piper-basic

Dia

Dia is dialogue-focused and expects English speaker-tagged dialogue. It runs in a separate worker process because the model is heavy and backend failures should not take down the web app.

Download Dia files only when you actually want to test Dia:

python scripts/download_models.py --current-hf dia dia-dac

MLX-Audio

MLX-Audio is recommended for Apple Silicon environments. It uses a separate .venv-mlx so its MLX and transformer dependencies do not disturb the PyTorch, Piper, or LM Studio backends.

python -m venv .venv-mlx
.venv-mlx/bin/python -m pip install -r requirements-mlx.txt
.venv-mlx/bin/python -m unidic download
python scripts/download_models.py --mlx-basic

The configured MLX model IDs are:

mlx-chatterbox
mlx-qwen3-tts
mlx-qwen3-custom
mlx-qwen3-voice-design
mlx-kokoro
mlx-dia

Long-form generation

Use the long-form tab for articles, lessons, and pasted long-form text.

Ultra-TTS splits text before calling a backend because local engines have different practical limits. Current limits include:

Orpheus: 600 characters per segment
Chatterbox: backend-defined limit
Kokoro: backend-defined limit
Piper: backend-defined limit
MLX models: model-defined limit

The generated WAV is written under outputs/web/. A sibling *.manifest.json records the backend, voice, language, segment boundaries, and text used for each generated segment.

Local files and ignored artifacts

Ultra-TTS keeps local artifacts inside the project when possible:

models/piper/ for Piper ONNX voice files
models/huggingface/ for direct Hugging Face repo snapshots downloaded by scripts/download_models.py
.cache/huggingface/ and .cache/torch/ for library-managed caches
outputs/ for generated audio
logs/ for local runtime logs
.venv/, .venv-kokoro/, and .venv-mlx/ for local Python environments

The app sets HF_HOME, HF_HUB_CACHE, HF_ASSETS_CACHE, HF_XET_CACHE, TRANSFORMERS_CACHE, TORCH_HOME, and XDG_CACHE_HOME at startup so Chatterbox, Kokoro, Dia, SNAC, and MLX-Audio cache under this repository by default.

Generated audio, model files, logs, virtual environments, local caches, and large model artifacts are intentionally ignored by Git.

Model licenses

Ultra-TTS itself is licensed under Apache-2.0.

This repository does not grant additional rights to third-party TTS models, model weights, voice files, datasets, or generated voices.

Users are responsible for checking and complying with the license terms of each model and backend they download or use, including LM Studio models, Hugging Face models, Piper voices, Kokoro, Chatterbox, Dia, and MLX-Audio models.

For a backend-by-backend responsibility summary, see THIRD_PARTY_MODELS.md.

For project status, maintainer scope, and current public evidence, see docs/maintainer-notes.md.

Security and privacy

Ultra-TTS is designed to run local TTS workflows. Generated audio, logs, downloaded models, caches, and virtual environments are intentionally excluded from Git.

Do not commit API keys, private model files, generated audio containing personal data, or unreleased vulnerability details.

Local file paths used for reference audio should be treated as private environment details unless they are intentionally shared.

Contributing

Contributions are welcome. See CONTRIBUTING.md for setup expectations, pull request guidance, and suggested first contributions.

Before opening a pull request, run:

python -m unittest discover -s tests

Roadmap

See ROADMAP.md for planned improvements around documentation, model setup clarity, packaging, CI, and regression testing.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
docs		docs
examples		examples
launchers		launchers
scripts		scripts
tests		tests
web		web
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
THIRD_PARTY_MODELS.md		THIRD_PARTY_MODELS.md
chatterbox_backend.py		chatterbox_backend.py
decoder.py		decoder.py
desktop-launcher.ps1		desktop-launcher.ps1
dia_backend.py		dia_backend.py
dia_worker.py		dia_worker.py
example.py		example.py
gguf_orpheus.py		gguf_orpheus.py
kokoro_backend.py		kokoro_backend.py
kokoro_worker.py		kokoro_worker.py
launch-web.ps1		launch-web.ps1
launch-web.sh		launch-web.sh
local_paths.py		local_paths.py
mlx_backend.py		mlx_backend.py
mlx_worker.py		mlx_worker.py
model_registry.py		model_registry.py
multi_speaker_tts.py		multi_speaker_tts.py
piper_backend.py		piper_backend.py
requirements-kokoro.txt		requirements-kokoro.txt
requirements-mlx.txt		requirements-mlx.txt
requirements.txt		requirements.txt
text_pipeline.py		text_pipeline.py
tts.ps1		tts.ps1
web_app.py		web_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ultra-TTS

Why Ultra-TTS

Demo

Features

Supported backends

Quick start

Backend setup notes

LM Studio / Orpheus

Chatterbox Multilingual

Kokoro

Piper

Dia

MLX-Audio

Long-form generation

Local files and ignored artifacts

Model licenses

Security and privacy

Contributing

Roadmap

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ultra-TTS

Why Ultra-TTS

Demo

Features

Supported backends

Quick start

Backend setup notes

LM Studio / Orpheus

Chatterbox Multilingual

Kokoro

Piper

Dia

MLX-Audio

Long-form generation

Local files and ignored artifacts

Model licenses

Security and privacy

Contributing

Roadmap

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages