Qwen3-ASR-API

Pure OpenAI-compatible Speech-to-Text API powered by Qwen3-ASR.

No extra services, no NGINX, no voiceprint database — just the model served via vLLM with an OpenAI-compatible endpoint.

What this adds

The official qwenllm/qwen3-asr Docker image has no entrypoint (drops to interactive shell), making it unusable on platforms like Unraid. This project adds an entrypoint for out-of-the-box usage, compatible with any Docker environment:

Auto-start qwen-asr-serve on container launch
Environment variable for model switching (no rebuild needed)
GPU memory control via env var
Unraid Community Applications template

Quick Start

docker run -d --gpus all --shm-size=4g \
  -p 8000:80 \
  -v /path/to/models:/root/.cache/huggingface \
  -e MODEL_ID=Qwen/Qwen3-ASR-0.6B \
  ghcr.io/hsiang-han/qwen3-asr-api:latest

First start downloads the model (~1-3GB depending on variant).

Usage (OpenAI-compatible)

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav" \
  -F "model=qwen3-asr"

Or with OpenAI SDK:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
result = client.audio.transcriptions.create(
    model="qwen3-asr",
    file=open("audio.wav", "rb")
)
print(result.text)

Model Options

Model	VRAM	Speed	Best for
`Qwen/Qwen3-ASR-0.6B`	~2-3GB	RTFx 166	Low latency, shared GPU
`Qwen/Qwen3-ASR-1.7B`	~4-6GB	RTFx 148	Best accuracy

Switch by changing MODEL_ID env var and restarting container.

Unraid Install

Add template repo: https://github.com/hsiang-han/unraid_templates
Find "Qwen3-ASR-API" in Community Applications
Configure MODEL_ID and GPU settings
Start — first launch downloads model, subsequent starts are fast

Environment Variables

Variable	Default	Description
`MODEL_ID`	`Qwen/Qwen3-ASR-0.6B`	Model to serve
`GPU_MEMORY_UTILIZATION`	`0.8`	GPU memory fraction (0.0-1.0)
`MAX_MODEL_LEN`	`8192`	Max sequence length for KV cache. Default supports ~10 min audio. Lower to save VRAM, raise for longer audio.
`HOST`	`0.0.0.0`	Bind address
`PORT`	`80`	Container port

License

Apache-2.0 (same as upstream Qwen3-ASR)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3-ASR-API

What this adds

Quick Start

Usage (OpenAI-compatible)

Model Options

Unraid Install

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen3-ASR-API

What this adds

Quick Start

Usage (OpenAI-compatible)

Model Options

Unraid Install

Environment Variables

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages