OpenAI-compatible Text-to-Speech API powered by CosyVoice 3 (Alibaba FunAudioLLM).
No extra services — just the model served via FastAPI with an OpenAI-compatible endpoint. Supports built-in voices, instruction-based control, and zero-shot voice cloning.
The official CosyVoice repo requires manual setup and has no Docker entrypoint for production use. This project adds:
- Auto-start API server on container launch
- OpenAI-compatible
/v1/audio/speechendpoint - Zero-shot voice cloning endpoint
/v1/audio/speech/clone - Environment variables for GPU/memory control
- Unraid Community Applications template
docker run -d --gpus all --shm-size=2g \
-p 8080:8080 \
-v /path/to/models:/root/.cache/modelscope/hub \
ghcr.io/hsiang-han/cosyvoice3-api:latestFirst start downloads the model (~2GB).
curl -X POST http://localhost:8080/v1/audio/speech \
-F "input=你好,世界" \
-F "voice=中文女" \
--output speech.wavcurl -X POST http://localhost:8080/v1/audio/speech \
-F "input=今天天气真好" \
-F "voice=中文女" \
-F "instruct_text=用开心的语气说" \
--output happy.wavcurl -X POST http://localhost:8080/v1/audio/speech/clone \
-F "input=这是克隆的声音" \
-F "prompt_text=这是参考音频中说的话" \
-F "prompt_wav=@reference.wav" \
--output cloned.wavcurl http://localhost:8080/v1/voices| Variable | Default | Description |
|---|---|---|
MODEL_DIR |
FunAudioLLM/Fun-CosyVoice3-0.5B-2512 |
ModelScope model ID |
FP16 |
true |
Half-precision inference. Reduces VRAM ~50%. |
PORT |
8080 |
API server port |
| Config | Estimated VRAM |
|---|---|
| FP16=true (default) | ~3-4GB |
| FP16=false | ~6-8GB |
- Add template repo:
https://github.com/hsiang-han/unraid_templates - Find "CosyVoice3-API" in Community Applications
- Configure device and FP16 settings
- Start — first launch downloads model, subsequent starts are fast
| Endpoint | Method | Description |
|---|---|---|
/v1/audio/speech |
POST | Text-to-speech (OpenAI-compatible) |
/v1/audio/speech/clone |
POST | Zero-shot voice cloning |
/v1/voices |
GET | List available voices |
/v1/models |
GET | List models |
/health |
GET | Health check |
/docs |
GET | Swagger documentation |
Apache-2.0 (same as upstream CosyVoice)