English | 中文
PyStreamASR is a FastAPI-based real-time ASR service for streaming speech-to-text workloads. It accepts audio over WebSocket, converts incoming G.711 or PCM audio into 16 kHz PCM for Sherpa-onnx streaming inference, returns partial and final transcription events, keeps partial state in memory, and persists finalized segments to MySQL.
- Health endpoint:
GET /health - Process metrics endpoint:
GET /metrics - Streaming endpoint:
WebSocket /ws/transcribe/{session_id} - Input audio:
alaw,ulaw,pcm16le - Internal inference audio: mono 16 kHz float32 PCM
- Partial results: in-memory cache per session
- Final results: MySQL
segmentstable - Runtime model: Sherpa-onnx Paraformer Streaming
-
Clone the repository.
git clone https://github.com/AirSodaz/PyStreamASR.git cd PyStreamASR -
Create and activate a Python 3.12 virtual environment.
py -3.12 -m venv venv .\venv\Scripts\activate
-
Install dependencies.
pip install -r requirements.txt -
Create your local environment file.
Copy-Item .env.example .env -
Update
.envwith at least:MYSQL_DATABASE_URLMODEL_PATH
-
Set
MODEL_PATHto the Sherpa-onnx Paraformer Streaming model directory. Relative paths are resolved from the project root, for example:models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/The model loader reads this configured directory and expects files such as
encoder.int8.onnx,decoder.int8.onnx, andtokens.txtinside it. -
Start the development server.
uvicorn main:app --reload
For local development, these are the most useful uvicorn options:
| Parameter | Example | What it does |
|---|---|---|
--reload |
uvicorn main:app --reload |
Restarts the server automatically when Python source files change. Use this only in development. |
--host |
--host 0.0.0.0 |
Controls the bind address. Use 127.0.0.1 for local-only access or 0.0.0.0 when other devices need to connect. |
--port |
--port 8000 |
Controls which port exposes /health, /metrics, and /ws/transcribe/{session_id}. |
--workers |
--workers 4 |
Starts multiple worker processes. This is usually unnecessary during development and should not be combined with --reload. |
Recommended development command:
uvicorn main:app --reload --host 127.0.0.1 --port 8000If you prefer to use values from .env, python main.py starts Uvicorn with APP_HOST and APP_PORT, and enables reload mode.
Open:
http://localhost:8000/health
Expected response shape:
{
"status": "ok",
"config": "loaded",
"project_name": "PyStreamASR",
"model_status": "loaded"
}Open:
http://localhost:8000/metrics
The response includes current-process model status, inference executor counters such as inflight, completed, rejected_overloaded, and timed_out, plus runtime groups for connections, websocket, audio, transcription, and storage. These runtime metrics are process-local aggregates and do not expose per-session or per-connection data.
With the virtual environment activated, run:
python scripts/simulate_stream.py --file .\path\to\your_audio.wav --host ws://localhost:8000/ws/transcribe/demo-sessionReplace .\path\to\your_audio.wav with an actual audio file. You should then see streamed partial and final JSON messages in the client output. The exact transcript depends on the input audio and model.
PyStreamASR processes each connection as a non-blocking streaming pipeline:
- The client sends binary audio chunks to
WebSocket /ws/transcribe/{session_id}. AudioProcessordecodesalaw,ulaw, orpcm16le, normalizes samples, and resamples to 16 kHz when needed.- Audio processing and model inference run through
loop.run_in_executorso the event loop stays available for WebSocket and database I/O. - Interim transcription is tracked as a session-scoped in-memory partial result.
- Finalized segments are stored in MySQL and sent back to the client as
finalevents. - Reconnecting with the same
session_idcontinues sequence numbering for that session.
| Format | Expected Source Rate | Notes |
|---|---|---|
alaw |
Usually 8000 Hz |
G.711 A-law input. Recommended for telephony-style streams. |
ulaw |
Usually 8000 Hz |
G.711 mu-law input. Recommended for telephony-style streams. |
pcm16le |
8000 or 16000 Hz |
Raw little-endian 16-bit PCM. 8000 Hz input is resampled server-side. |
All inputs are normalized and converted to mono 16 kHz PCM before inference. For G.711 streams, keep the client format and sample rate aligned with AUDIO_INPUT_FORMAT and AUDIO_SOURCE_RATE.
Create a .env file in the project root. A typical setup looks like this:
PROJECT_NAME=PyStreamASR
MYSQL_DATABASE_URL=mysql+aiomysql://root:password@localhost/pystreamasr
MODEL_PATH=models/sherpa-onnx-streaming-paraformer-bilingual-zh-en
LOG_LEVEL=INFO
LOG_DIR=logs
RETURN_TRANSCRIPTION=true
AUDIO_INPUT_FORMAT=alaw
AUDIO_SOURCE_RATE=8000
APP_HOST=0.0.0.0
APP_PORT=8000
APP_WORKERS=1
ASR_INFERENCE_WORKERS=2
ASR_INFERENCE_QUEUE_SIZE=8
ASR_INFERENCE_QUEUE_TIMEOUT_SECONDS=20.0| Variable | Required | Default | Notes |
|---|---|---|---|
MYSQL_DATABASE_URL |
Yes | None | Async SQLAlchemy DSN. Example: mysql+aiomysql://user:password@host/dbname. |
MODEL_PATH |
Yes | None | Model directory used by the runtime loader. Relative paths are resolved from the project root. |
PROJECT_NAME |
No | PyStreamASR |
Used in the FastAPI app title and /health response. |
LOG_LEVEL |
No | INFO |
Set to DEBUG to capture processed audio into WAV files for troubleshooting. |
LOG_DIR |
No | logs |
Base directory for runtime logs and debug artifacts. |
RETURN_TRANSCRIPTION |
No | true |
When false, audio is still processed and stored, but no transcription messages are sent over WebSocket. |
AUDIO_INPUT_FORMAT |
No | alaw |
One of alaw, ulaw, pcm16le. Must match the client stream. |
AUDIO_SOURCE_RATE |
No | 8000 |
One of 8000, 16000. Must match the client stream. |
APP_HOST |
No | 0.0.0.0 |
Bind host for local runs and service wrappers. |
APP_PORT |
No | 8000 |
Bind port for local runs and service wrappers. |
APP_WORKERS |
No | 1 |
Worker count used by the service wrapper. On Windows this is used with Uvicorn; on Linux/macOS it is used with Gunicorn. |
ASR_INFERENCE_WORKERS |
No | max(1, cpu_count / 2) |
Per-process ASR inference thread pool size. |
ASR_INFERENCE_QUEUE_SIZE |
No | ASR_INFERENCE_WORKERS * 4 |
Additional inference calls allowed to wait before overload rejection. |
ASR_INFERENCE_QUEUE_TIMEOUT_SECONDS |
No | 20.0 |
Maximum time an inference call may wait for a worker before the connection is closed as overloaded. |
Runtime logs include both the session_id and a per-connection connection_id, so reconnects for the same session can be distinguished while following one connection through audio processing, inference, and storage.
When LOG_LEVEL=DEBUG, each WebSocket session writes a 16 kHz mono WAV file under logs/debug_audio/ so you can inspect decoded and resampled audio.
ASR inference uses a dedicated bounded thread pool. If all inference workers and queue slots are busy, the server sends an error event with code=inference_overloaded and closes the WebSocket with close code 1013.
Use one of the following depending on your environment:
| Scenario | Command | Notes |
|---|---|---|
| Cross-platform direct run | uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4 |
Simple production-style launch. |
| Docker Compose local stack | docker compose up --build |
Starts PyStreamASR and MySQL together. Models and logs stay on the host. |
| Windows background service | powershell.exe -ExecutionPolicy Bypass -File .\install.ps1 |
Registers the PyStreamASR scheduled task and installs the pystreamasr command. |
| Linux persistent service | sudo ./install.sh |
Installs a pystreamasr.service systemd unit and the pystreamasr command. |
| Linux/macOS Gunicorn | gunicorn main:app -c gunicorn.conf.py |
Gunicorn is not supported on Windows. |
The Compose stack is an optional local deployment path. It starts one PyStreamASR container and one MySQL 8.4 container, stores MySQL data in the mysql-data Docker volume, mounts ./models into the app container as read-only, and writes runtime logs to ./logs.
If the Sherpa-onnx model is not already present under models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/, download it through the helper container:
docker compose run --rm model-downloaderStart the full local stack:
docker compose up --buildVerify the app from the host:
http://localhost:8000/health
Stop and remove the containers when finished:
docker compose downThe existing .env file remains for non-container local runs. The Compose app container receives its runtime environment from docker-compose.yml and does not require local secrets from .env. MySQL is internal to the Compose network by default; inspect it with docker compose exec mysql mysql -upystreamasr -ppystreamasr pystreamasr if needed.
After the Windows or Linux installer completes, use:
pystreamasrOn Linux, use sudo pystreamasr if service control requires elevated privileges.
pystreamasr is the console entrypoint defined in pyproject.toml. It launches a layered terminal menu instead of starting the ASR server directly.
The menu provides:
- Main-menu navigation only (no direct info rendering on the main page): service operations, status viewer, configuration manager, log viewer, and diagnostics
- Service controls:
Start/Stop/Restart - Runtime setting updates:
APP_HOST,APP_PORT, andAPP_WORKERSin.env - Status viewer with explicit refresh option
- Log viewer with source selection and configurable tail line count
- Diagnostics output with pass/warn/fail summaries and remediation details
- Screen is cleared and redrawn automatically when entering submenus and when returning to the main menu
- Exit behavior:
0exits from the main menu, while0in submenus returns to the main menu
Behavior depends on how the service was installed:
- On Windows, it manages the
PyStreamASRscheduled task and expects a Uvicorn-based install. - On Linux, it manages the
pystreamasr.servicesystemd unit and expects a Gunicorn-based install.
Typical usage flow:
- Run
install.ps1orinstall.sh. - Launch
pystreamasr. - Use submenu navigation for status checks, runtime setting updates, and service controls.
- Use the Logs and Diagnostics submenus during incidents, then restart if configuration changes require it.
If the service has not been installed yet, pystreamasr can still open, but service actions will report that the managed service is not installed.
The current repository includes more than the minimal runtime tree. The main areas are:
PyStreamASR/
├── api/ # WebSocket routes and connection lifecycle
├── core/ # Settings, logging, and request context helpers
├── docs/ # API reference in English and Chinese
├── models/ # Sherpa-onnx model assets
├── scripts/ # Stream simulators, installers, and service manager
├── services/ # Audio, inference, storage, and database schema logic
├── Dockerfile # Container image for the ASR app
├── docker-compose.yml # Local app + MySQL Compose stack
├── main.py # FastAPI app entrypoint and lifespan setup
├── install.ps1 # Windows installer / scheduled-task setup
├── install.sh # Linux installer / systemd setup
├── pyproject.toml # Package metadata and console entrypoint
└── requirements.txt # Python dependencies
- Full API reference: docs/API.md
- Chinese API reference: docs/API_zh.md
Use the README for setup and validation. Use the API docs for message shapes, examples, and interface details.
FileNotFoundErrorduring startup usually means the Sherpa-onnx model files are not present undermodels/sherpa-onnx-streaming-paraformer-bilingual-zh-en/.- Database connection failures usually mean
MYSQL_DATABASE_URLis invalid, MySQL is unavailable, or the target database does not exist. - If WebSocket messages never arrive, verify
RETURN_TRANSCRIPTION=true. - If transcription quality is poor or errors are logged during processing, check that
AUDIO_INPUT_FORMATandAUDIO_SOURCE_RATEmatch what the client is actually sending. - If you need to inspect decoded audio, set
LOG_LEVEL=DEBUGand review the WAV artifacts underlogs/debug_audio/.