qw3n-face

A local web UI for the Qwen3-TTS model family, built with NiceGUI.

Features

Tab	Description
Custom Voice	Generate speech using one of 9 built-in speaker personas with optional style instructions
Voice Design	Describe a voice in plain text and synthesise speech with it
Voice Clone	Upload a short reference clip and clone that voice onto new text
Batch	Queue multiple Custom Voice items and generate them sequentially with per-item progress
Personas	Save and manage named voice presets (speaker + language + instruction) for quick reuse

Models are loaded on demand and can be unloaded individually to free memory. Custom Voice and Voice Clone support both 0.6B and 1.7B checkpoints; Voice Design currently uses the 1.7B checkpoint only.

Additional runtime behavior:

Choose the model size before loading when multiple checkpoints are available
Choose the backend device before loading (cuda:0, mps, or cpu, depending on your machine)
Loaded tabs show the active runtime as device / dtype
On Apple Silicon, the app retries once in safer MPS float32 mode if generation fails with a probability-tensor stability error

Requirements

Python 3.11+
uv
A machine with MPS (Apple Silicon), CUDA, or enough RAM for CPU inference

Installation

git clone https://github.com/AlapinEnjoyer/qw3n-face.git
cd qw3n-face
uv sync

Running

uv run python main.py

Then app should auto open itself on http://localhost:8080.

Models

Models are downloaded automatically from Hugging Face once requested in the app:

Key	Checkpoint
Custom Voice	`Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice` or `Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice`
Voice Design	`Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign`
Voice Clone	`Qwen/Qwen3-TTS-12Hz-1.7B-Base` or `Qwen/Qwen3-TTS-12Hz-0.6B-Base`

Approximate checkpoint sizes vary by model variant; 0.6B models are substantially smaller than 1.7B models. Downloads are cached locally by Hugging Face after the first load.

Runtime Notes

CUDA uses bfloat16
MPS prefers float16, but the app can retry a failing model in float32 for stability
CPU prefers bfloat16 and falls back to float32 if needed during model load
If Apple Silicon generation still fails on MPS, switch the backend device to cpu before loading the model

TODOs

Add automatic transcription of uploaded audio
Add audio visualisation (waveform, spectrogram?)

Roadmap

Add support for fine tuning

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
app		app
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
test.py		test.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qw3n-face

Features

Requirements

Installation

Running

Models

Runtime Notes

TODOs

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

qw3n-face

Features

Requirements

Installation

Running

Models

Runtime Notes

TODOs

Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages