Skip to content

47thtechcorner/RayCodes_HiggsAudio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Generative AI Storyteller - Powered by Higgs Audio v3 TTS

Python Streamlit Higgs Audio License

An elegant, interactive AI storytelling app powered by Boson AI's Higgs Audio v3 - the state-of-the-art open-weights TTS model with expressive emotions, sound effects, voice cloning, and 100+ language support.

Demo · Setup · Usage · Use Cases · Roadmap


✨ Key Features

  • 🎭 Expressive Emotions - Inject <|emotion:fear|>, <|emotion:awe|>, <|emotion:enthusiasm|> and 19 more inline
  • 🔊 Sound Effects - Native <|sfx:laughter|>, <|sfx:sigh|>, <|sfx:sneeze|> and more
  • 🎙️ Style Control - Switch between <|style:whispering|>, <|style:shouting|>, <|style:singing|>
  • Prosody Control - Speed, pitch, pauses: <|prosody:speed_slow|>, <|prosody:pitch_high|>
  • 🌍 100+ Languages - Single-digit WER/CER across 85+ production-quality languages
  • 🔁 Zero-Shot Voice Cloning - Supply a reference WAV to replicate any voice

🧱 Tech Stack

Component Technology
Language Python 3.10+
UI Framework Streamlit ≥ 1.35
Model Server SGLang-Omni (Docker)
HTTP Client requests
Audio I/O soundfile, numpy
Model Reza2kn/Higgs-Audio-v3-TTS-4bit-AWQ (4-bit AWQ, ~2.5 GB)

📁 File Map

Higgs Audio/
├── app.py               ← Core Streamlit application (single file, ~120 lines)
├── requirements.txt     ← Python dependencies (pip installable)
├── README.md            ← This file
└── output.wav           ← Auto-generated on each synthesis run (gitignored)

app.py

The single-file Streamlit frontend. Handles:

  • Dark Obsidian-themed UI with custom CSS
  • 3 premade story templates with Higgs v3 inline tags
  • Manual text area for custom scripts
  • Generation parameter controls (temperature, top-K, max tokens)
  • HTTP POST to SGLang-Omni server at http://localhost:8000/v1/audio/speech
  • In-browser WAV playback + one-click download

requirements.txt

Minimal Python dependencies. The heavy model computation runs inside Docker (SGLang-Omni), so no GPU drivers or ML frameworks are needed in the Python environment.


🚀 Setup

Prerequisites

  • Windows 10/11 with PowerShell 5.1+
  • Python 3.10+ (download)
  • Docker Desktop for Windows (download) with GPU support enabled
  • NVIDIA GPU with ≥ 4GB VRAM + NVIDIA Container Toolkit

Step 1 - Install Python Dependencies

pip install -r requirements.txt

Step 2 - Download the 4-bit AWQ Model

Install the HuggingFace CLI and download the quantized model directly (no account or token required):

pip install huggingface_hub
huggingface-cli download Reza2kn/Higgs-Audio-v3-TTS-4bit-AWQ --local-dir .\model

The model is ~2.5 GB. Downloads once and caches in the model\ folder.

Step 3 - Run the Streamlit App

Open a new PowerShell terminal:

streamlit run app.py

The app opens automatically at http://localhost:8501


🎮 Usage

  1. Select a template from the dropdown (Fantasy, Thriller, or Kids Story)
  2. Edit or write your own story using Higgs v3 inline tags
  3. Adjust temperature, top-K, and max tokens as desired
  4. Click 🎬 Generate Story Audio
  5. Listen in the browser player or download the WAV file

Higgs v3 Tag Quick Reference

Emotions:    <|emotion:enthusiasm|>  <|emotion:fear|>  <|emotion:awe|>  <|emotion:sadness|>
Style:       <|style:whispering|>    <|style:shouting|>  <|style:singing|>
Sound FX:    <|sfx:laughter|>Haha    <|sfx:sigh|>Ugh    <|sfx:sneeze|>Achoo
Prosody:     <|prosody:speed_slow|>  <|prosody:pitch_high|>  <|prosody:long_pause|>

💡 Tip: Place emotion/style tokens at the start of the text. Place <|sfx:...|> and pause tokens inline exactly where they fire.


🎯 Use Cases

# Use Case Description
1 🎮 Expressive Video Game NPC Dialogues Generate dynamic, emotionally reactive NPC speech at runtime. Each dialogue line adapts tone based on in-game state - fear during combat, joy at victory, confusion when lost.
2 📚 Immersive Audiobooks Convert long-form written chapters into multi-voice narrated audio with automatic emotion pacing, dramatic pauses, and sound effects that match the text mood.
3 🧘 AI Meditation Guides Produce calm, slow-paced meditation scripts with whispering style, low pitch, and long pauses for breathing cues - fully customizable per session.
4 📞 Real-Time Interactive Voice Response (IVR) Power voice menus and AI call center responses with warm, natural-sounding speech rather than robotic TTS - switchable per brand persona.
5 🧒 Dynamic Kids Storyboards Create interactive bedtime stories where children hear characters laugh, gasp, sneeze, and sing - with the narration adapting to chosen story paths in real time.

🔭 Future Expansion Ideas

# Idea Description
1 🎵 Automated Ambient Background Music Mixdown Analyse story mood and auto-select/mix royalty-free background music from a library, fading in/out with the narrative arc using pydub.
2 📝 Visual Subtitle Tracking Sync generated audio with word-level timestamps (via Whisper forced-alignment) and render scrolling karaoke-style subtitles in the browser.
3 👥 Live Multi-Character Speaker Diarization Assign distinct Higgs voice clones to named characters. Parse Character: "dialogue" format and stitch per-character audio into a single multi-voice scene render.
4 ⚡ Local Model Quantization with AWQ Integrate the Reza2kn/Higgs-Audio-v3-TTS-4bit-AWQ 4-bit model for direct local inference without Docker - targeting ≥ 4GB VRAM consumer GPUs via AutoAWQ.
5 🧠 Sentence-Level Emotion Auto-Generation via LLM Layer Pass story text through a small LLM (e.g., Qwen2.5-3B) to automatically predict and inject the optimal Higgs emotion/style/sfx tags before TTS synthesis - zero manual tagging required.

📜 License

The underlying model (bosonai/higgs-audio-v3-tts-4b) is released under the Boson Higgs Audio v3 Research and Non-Commercial License.
This application code is MIT licensed.
Commercial deployment of the model requires a separate license from Boson AI.


Built with ❤️ using Higgs Audio v3 by Boson AI

About

Run Higgs Audio v3 TTS locally — expressive emotions, voice cloning & 100+ languages via Streamlit + SGLang

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors