Feature Request
Duix-Avatar is an impressive open-source digital human toolkit. Suggesting SenseVoice / FunASR as a built-in ASR engine for voice interaction.
Why SenseVoice for digital humans?
- Non-autoregressive — fixed low latency (~100ms), critical for natural conversation with avatars
- 5x faster than Whisper — reduces total conversation latency significantly
- 234M params — lightweight, leaves GPU memory for avatar rendering
- Emotion detection — detects speaker emotions, enables responsive avatar expressions
- 50+ languages — single model for multilingual avatar interaction
- Built-in VAD — accurate endpoint detection for natural turn-taking
Digital human integration benefits
- Lower latency = more natural conversation flow
- Emotion awareness = avatar can react to user's emotional state
- Audio event detection = avatar responds to laughter, applause
- Streaming = real-time partial results for immediate avatar response
Quick integration
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_chunk)
# text + emotion tags available immediately
Feature Request
Duix-Avatar is an impressive open-source digital human toolkit. Suggesting SenseVoice / FunASR as a built-in ASR engine for voice interaction.
Why SenseVoice for digital humans?
Digital human integration benefits
Quick integration