Skip to content

AlapinEnjoyer/qw3n-face

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qw3n-face

A local web UI for the Qwen3-TTS model family, built with NiceGUI.

Features

Tab Description
Custom Voice Generate speech using one of 9 built-in speaker personas with optional style instructions
Voice Design Describe a voice in plain text and synthesise speech with it
Voice Clone Upload a short reference clip and clone that voice onto new text
Batch Queue multiple Custom Voice items and generate them sequentially with per-item progress
Personas Save and manage named voice presets (speaker + language + instruction) for quick reuse

Models are loaded on demand and can be unloaded individually to free memory. Custom Voice and Voice Clone support both 0.6B and 1.7B checkpoints; Voice Design currently uses the 1.7B checkpoint only.

Additional runtime behavior:

  • Choose the model size before loading when multiple checkpoints are available
  • Choose the backend device before loading (cuda:0, mps, or cpu, depending on your machine)
  • Loaded tabs show the active runtime as device / dtype
  • On Apple Silicon, the app retries once in safer MPS float32 mode if generation fails with a probability-tensor stability error

Requirements

  • Python 3.11+
  • uv
  • A machine with MPS (Apple Silicon), CUDA, or enough RAM for CPU inference

Installation

git clone https://github.com/AlapinEnjoyer/qw3n-face.git
cd qw3n-face
uv sync

Running

uv run python main.py

Then app should auto open itself on http://localhost:8080.

Models

Models are downloaded automatically from Hugging Face once requested in the app:

Key Checkpoint
Custom Voice Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice or Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice
Voice Design Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
Voice Clone Qwen/Qwen3-TTS-12Hz-1.7B-Base or Qwen/Qwen3-TTS-12Hz-0.6B-Base

Approximate checkpoint sizes vary by model variant; 0.6B models are substantially smaller than 1.7B models. Downloads are cached locally by Hugging Face after the first load.

Runtime Notes

  • CUDA uses bfloat16
  • MPS prefers float16, but the app can retry a failing model in float32 for stability
  • CPU prefers bfloat16 and falls back to float32 if needed during model load
  • If Apple Silicon generation still fails on MPS, switch the backend device to cpu before loading the model

TODOs

  • Add automatic transcription of uploaded audio
  • Add audio visualisation (waveform, spectrogram?)

Roadmap

  • Add support for fine tuning

About

A NiceGUI based interface for the Qwen3-TTS model family

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors