Skip to content

Feature Request: Add SenseVoice/FunASR as built-in ASR engine #607

@LauraGPT

Description

@LauraGPT

Feature Request

Duix-Avatar is an impressive open-source digital human toolkit. Suggesting SenseVoice / FunASR as a built-in ASR engine for voice interaction.

Why SenseVoice for digital humans?

  • Non-autoregressive — fixed low latency (~100ms), critical for natural conversation with avatars
  • 5x faster than Whisper — reduces total conversation latency significantly
  • 234M params — lightweight, leaves GPU memory for avatar rendering
  • Emotion detection — detects speaker emotions, enables responsive avatar expressions
  • 50+ languages — single model for multilingual avatar interaction
  • Built-in VAD — accurate endpoint detection for natural turn-taking

Digital human integration benefits

  1. Lower latency = more natural conversation flow
  2. Emotion awareness = avatar can react to user's emotional state
  3. Audio event detection = avatar responds to laughter, applause
  4. Streaming = real-time partial results for immediate avatar response

Quick integration

from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_chunk)
# text + emotion tags available immediately

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions