🎙️ Generative AI Storyteller - Powered by Higgs Audio v3 TTS

Higgs Audio v3 TTS: This Ultra-Realistic Voice AI Silenced Everyone - Run Locally

📺 Watch the full tutorial on YouTube

🎙️ Generative AI Storyteller - Powered by Higgs Audio v3 TTS

An elegant, interactive AI storytelling app powered by Boson AI's Higgs Audio v3 - the state-of-the-art open-weights TTS model with expressive emotions, sound effects, voice cloning, and 100+ language support.

Demo · Setup · Usage · Use Cases · Roadmap

✨ Key Features

🎭 Expressive Emotions - Inject <|emotion:fear|>, <|emotion:awe|>, <|emotion:enthusiasm|> and 19 more inline
🔊 Sound Effects - Native <|sfx:laughter|>, <|sfx:sigh|>, <|sfx:sneeze|> and more
🎙️ Style Control - Switch between <|style:whispering|>, <|style:shouting|>, <|style:singing|>
⚡ Prosody Control - Speed, pitch, pauses: <|prosody:speed_slow|>, <|prosody:pitch_high|>
🌍 100+ Languages - Single-digit WER/CER across 85+ production-quality languages
🔁 Zero-Shot Voice Cloning - Supply a reference WAV to replicate any voice

🧱 Tech Stack

Component	Technology
Language	Python 3.10+
UI Framework	Streamlit ≥ 1.35
Model Server	SGLang-Omni (Docker)
HTTP Client	`requests`
Audio I/O	`soundfile`, `numpy`
Model	`Reza2kn/Higgs-Audio-v3-TTS-4bit-AWQ` (4-bit AWQ, ~2.5 GB)

📁 File Map

Higgs Audio/
├── app.py               ← Core Streamlit application (single file, ~120 lines)
├── requirements.txt     ← Python dependencies (pip installable)
├── README.md            ← This file
└── output.wav           ← Auto-generated on each synthesis run (gitignored)

`app.py`

The single-file Streamlit frontend. Handles:

Dark Obsidian-themed UI with custom CSS
3 premade story templates with Higgs v3 inline tags
Manual text area for custom scripts
Generation parameter controls (temperature, top-K, max tokens)
HTTP POST to SGLang-Omni server at http://localhost:8000/v1/audio/speech
In-browser WAV playback + one-click download

`requirements.txt`

Minimal Python dependencies. The heavy model computation runs inside Docker (SGLang-Omni), so no GPU drivers or ML frameworks are needed in the Python environment.

🚀 Setup

Prerequisites

Windows 10/11 with PowerShell 5.1+
Python 3.10+ (download)
Docker Desktop for Windows (download) with GPU support enabled
NVIDIA GPU with ≥ 4GB VRAM + NVIDIA Container Toolkit

Step 1 - Install Python Dependencies

pip install -r requirements.txt

Step 2 - Download the 4-bit AWQ Model

Install the HuggingFace CLI and download the quantized model directly (no account or token required):

pip install huggingface_hub
huggingface-cli download Reza2kn/Higgs-Audio-v3-TTS-4bit-AWQ --local-dir .\model

The model is ~2.5 GB. Downloads once and caches in the model\ folder.

Step 3 - Run the Streamlit App

Open a new PowerShell terminal:

streamlit run app.py

The app opens automatically at http://localhost:8501

🎮 Usage

Select a template from the dropdown (Fantasy, Thriller, or Kids Story)
Edit or write your own story using Higgs v3 inline tags
Adjust temperature, top-K, and max tokens as desired
Click 🎬 Generate Story Audio
Listen in the browser player or download the WAV file

Higgs v3 Tag Quick Reference

Emotions:    <|emotion:enthusiasm|>  <|emotion:fear|>  <|emotion:awe|>  <|emotion:sadness|>
Style:       <|style:whispering|>    <|style:shouting|>  <|style:singing|>
Sound FX:    <|sfx:laughter|>Haha    <|sfx:sigh|>Ugh    <|sfx:sneeze|>Achoo
Prosody:     <|prosody:speed_slow|>  <|prosody:pitch_high|>  <|prosody:long_pause|>

💡 Tip: Place emotion/style tokens at the start of the text. Place <|sfx:...|> and pause tokens inline exactly where they fire.

🎯 Use Cases

#	Use Case	Description
1	🎮 Expressive Video Game NPC Dialogues	Generate dynamic, emotionally reactive NPC speech at runtime. Each dialogue line adapts tone based on in-game state - fear during combat, joy at victory, confusion when lost.
2	📚 Immersive Audiobooks	Convert long-form written chapters into multi-voice narrated audio with automatic emotion pacing, dramatic pauses, and sound effects that match the text mood.
3	🧘 AI Meditation Guides	Produce calm, slow-paced meditation scripts with whispering style, low pitch, and long pauses for breathing cues - fully customizable per session.
4	📞 Real-Time Interactive Voice Response (IVR)	Power voice menus and AI call center responses with warm, natural-sounding speech rather than robotic TTS - switchable per brand persona.
5	🧒 Dynamic Kids Storyboards	Create interactive bedtime stories where children hear characters laugh, gasp, sneeze, and sing - with the narration adapting to chosen story paths in real time.

🔭 Future Expansion Ideas

#	Idea	Description
1	🎵 Automated Ambient Background Music Mixdown	Analyse story mood and auto-select/mix royalty-free background music from a library, fading in/out with the narrative arc using `pydub`.
2	📝 Visual Subtitle Tracking	Sync generated audio with word-level timestamps (via Whisper forced-alignment) and render scrolling karaoke-style subtitles in the browser.
3	👥 Live Multi-Character Speaker Diarization	Assign distinct Higgs voice clones to named characters. Parse `Character: "dialogue"` format and stitch per-character audio into a single multi-voice scene render.
4	⚡ Local Model Quantization with AWQ	Integrate the `Reza2kn/Higgs-Audio-v3-TTS-4bit-AWQ` 4-bit model for direct local inference without Docker - targeting ≥ 4GB VRAM consumer GPUs via AutoAWQ.
5	🧠 Sentence-Level Emotion Auto-Generation via LLM Layer	Pass story text through a small LLM (e.g., `Qwen2.5-3B`) to automatically predict and inject the optimal Higgs emotion/style/sfx tags before TTS synthesis - zero manual tagging required.

📜 License

The underlying model (bosonai/higgs-audio-v3-tts-4b) is released under the Boson Higgs Audio v3 Research and Non-Commercial License.
This application code is MIT licensed.
Commercial deployment of the model requires a separate license from Boson AI.

Built with ❤️ using Higgs Audio v3 by Boson AI

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📺 Watch the full tutorial on YouTube

🎙️ Generative AI Storyteller - Powered by Higgs Audio v3 TTS

✨ Key Features

🧱 Tech Stack

📁 File Map

`app.py`

`requirements.txt`

🚀 Setup

Prerequisites

Step 1 - Install Python Dependencies

Step 2 - Download the 4-bit AWQ Model

Step 3 - Run the Streamlit App

🎮 Usage

Higgs v3 Tag Quick Reference

🎯 Use Cases

🔭 Future Expansion Ideas

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📺 Watch the full tutorial on YouTube

🎙️ Generative AI Storyteller - Powered by Higgs Audio v3 TTS

✨ Key Features

🧱 Tech Stack

📁 File Map

app.py

requirements.txt

🚀 Setup

Prerequisites

Step 1 - Install Python Dependencies

Step 2 - Download the 4-bit AWQ Model

Step 3 - Run the Streamlit App

🎮 Usage

Higgs v3 Tag Quick Reference

🎯 Use Cases

🔭 Future Expansion Ideas

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`app.py`

`requirements.txt`

Packages