AirPlay and AirPlay 2 audio player
-
Updated
May 10, 2026 - C
AirPlay and AirPlay 2 audio player
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
100M parameter lightweight conversational text-to-speech model with breaths, laughter, multi-speaker dialogue, voice cloning, and streaming. Llama-based, on-device.
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder, with biaobei and aishell3 datasets
VoxNovel: generate audiobooks giving each character a different voice actor.
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
Two-talker Speech Separation with LSTM/BLSTM by Permutation Invariant Training method.
A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS
This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.
Adaptive and Focusing Neural Layers for Multi-Speaker Separation Problem
Draft to Take beta: local-first AI audio production studio powered by IndexTTS2, Docker, Qwen, OmniVoice, SFX, ambience, and music sidecars.
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Multi-Speaker FastSpeech2 applicable to Korean. Description about train and synthesize in detail.
🎵 Complete offline audio transcription system with speaker diarization using OpenAI Whisper and PyAnnote. Features automatic audio cleaning, precise timestamps, multiple output formats (JSON/TXT/Markdown), and support for 20+ audio formats. No external APIs required - works entirely offline.
An Algorithm for Speaker Recognition in a Multi-Speaker Environment
Urdu Speech Recognition using Kaldi ASR, by training Triphone Acoustic GMMs using the PRUS dataset.
A professional CLI for Google Gemini's Native 2.5 TTS. Generate multi-speaker podcasts ('Deep Dive'), audio summaries, and expressive speech from text/files.
AI voice cloning panel that generates multi-speaker discussions between famous personalities on any topic, powered by Qwen3-TTS
Grid style audio router
Add a description, image, and links to the multi-speaker topic page so that developers can more easily learn about it.
To associate your repository with the multi-speaker topic, visit your repo's landing page and select "manage topics."