Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
Jun 8, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
Local voice-to-text for macOS and iOS. Multilingual (EN/ZH/JP) with Traditional Chinese output. Runs Qwen3-ASR on Apple Silicon via MLX. No cloud, no subscription.
Production pipeline around OpenAI gpt-4o-transcribe-diarize for long-form 2-speaker interviews. Cross-chunk speaker consistency · diarization hallucination fix · async GPT-5.5 domain-term correction. WER 6.05% / DER 4.28% on 2h26m benchmark. Beats raw OpenAI API by +11.5 Q.
Private, local-first meeting recorder + transcription, diarization, AI notes, voice dictation & read-aloud for Windows — runs on your own GPU.
Live speech-to-text streaming on Apple Silicon — Qwen3-ASR + Silero VAD + MLX
Free, private, on-device dictation for macOS (Apple Silicon). Push-to-talk speech-to-text with on-device LLM cleanup — an offline, local alternative to cloud & Whisper dictation. Parakeet TDT v3 ASR + Qwen 2.5 1.5B cleanup + macOS Accessibility injection. Pure Rust, ~300–400 ms per utterance, nothing leaves your Mac.
OpenAI-compatible speech-to-text server for nvidia/nemotron-3.5-asr-streaming-0.6b (NeMo). Runs on the DGX Spark / GB10.
Local FastAPI transcription studio: AssemblyAI Universal-2 (99 lang), FFmpeg, yt-dlp, Word/PDF/ZIP export
Skill de Claude Code que transcribe audios y videos a Markdown estructurado con timestamps y diarización, usando Google Gemini. Reemplazo gratuito de ElevenLabs Scribe / Whisper para quien ya paga Gemini.
Voice-to-text desktop app that captures speech, refines transcripts with AI, and auto-pastes at your cursor
Build a local, real-time voice assistant for Apple Silicon using MLX for speech recognition, vision-language model responses, and text-to-speech synthesis.
Add a description, image, and links to the whisper-alternative topic page so that developers can more easily learn about it.
To associate your repository with the whisper-alternative topic, visit your repo's landing page and select "manage topics."