Get up and running with CosyVoice2 API in minutes!
# 1. Clone repository
git clone https://github.com/sin-tag/CosyVoice2-API.git
cd CosyVoice2-API
# 2. Run automated setup
chmod +x scripts/setup_conda.sh
./scripts/setup_conda.sh
# 3. Activate environment
conda activate cosyvoice2-api
# 4. Download model
python scripts/download_model.py
# 5. Verify installation (optional)
python scripts/verify_installation.py
# 6. Start server (choose one method)
python start.py # Auto-installs missing deps (recommended)
# OR
python run_uvicorn.py # Uvicorn with proper path setup
# OR
python main.py # Direct start
# OR
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1 # Now works!🎉 Done! API is now running at http://localhost:8000
✅ All startup methods now work:
python start.py- Auto-installs dependenciespython run_uvicorn.py- Uvicorn wrapperuvicorn main:app ...- Direct uvicorn (fixed!)- Auto-install missing: pydantic-settings, whisper, WeTextProcessing
- Fixed:
ModuleNotFoundError: No module named 'app.models'
- Conda/Miniconda installed (Download here)
- NVIDIA GPU with CUDA support (recommended)
- 8GB+ RAM and 10GB+ disk space
git clone https://github.com/sin-tag/CosyVoice2-API.git
cd CosyVoice2-API
./scripts/setup_conda.sh
conda activate cosyvoice2-api
python scripts/download_model.py
python main.py# 1. Clone and enter directory
git clone https://github.com/sin-tag/CosyVoice2-API.git
cd CosyVoice2-API
# 2. Create conda environment
conda create -n cosyvoice2-api python=3.9 -y
conda activate cosyvoice2-api
# 3. Install PyTorch with CUDA
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y
# 4. Install audio libraries
conda install -c conda-forge librosa soundfile -y
# 5. Install other dependencies
pip install -r requirements-conda.txt
# 6. Create directories
mkdir -p models voice_cache outputs logs
# 7. Setup configuration
cp .env.example .env
# 8. Download model
python scripts/download_model.py
# 9. Start server
python main.py# Clone repository
git clone https://github.com/sin-tag/CosyVoice2-API.git
cd CosyVoice2-API
# Create environment from file
conda env create -f environment.yml
conda activate cosyvoice2-api
# Download model and start
python scripts/download_model.py
python main.pycurl http://localhost:8000/healthcurl http://localhost:8000/api/v1/voices/pretrained/listcurl -X POST "http://localhost:8000/api/v1/voices/" \
-F "voice_id=test_voice" \
-F "name=Test Voice" \
-F "voice_type=zero_shot" \
-F "prompt_text=Hello world" \
-F "audio_file=@your_voice_sample.wav"curl -X POST "http://localhost:8000/api/v1/synthesize/sft" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, this is a test!", "voice_id": "pretrained_voice_id"}'- API Server: http://localhost:8000
- Interactive Docs: http://localhost:8000/docs
- Alternative Docs: http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
Solution: Install Miniconda from https://docs.conda.io/en/latest/miniconda.html
Solution:
# Check NVIDIA driver
nvidia-smi
# Reinstall PyTorch with CUDA
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia --force-reinstallSolution:
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
# CentOS/RHEL
sudo yum install ffmpegSolution:
# Try manual download with different model
python scripts/download_model.py --model-id iic/CosyVoice-300M
# Or check internet connection and disk spaceSolution:
# Run the automatic fix script
python scripts/fix_dependencies.py
# Or make sure you're in the right environment
conda activate cosyvoice2-api
# Reinstall dependencies
pip install -r requirements-conda.txt --force-reinstall- Read the API Documentation: Visit http://localhost:8000/docs
- Try the Examples: Check API_EXAMPLES.md
- Production Setup: See DEPLOYMENT.md
- Detailed Setup: Read CONDA_SETUP.md
- Quick Fixes: Run
python scripts/fix_dependencies.py - Detailed Setup: CONDA_SETUP.md
- Troubleshooting: TROUBLESHOOTING.md
- API Examples: docs/API_EXAMPLES.md
- Deployment Guide: DEPLOYMENT.md
Once your API is running, you can:
- Add custom voices for zero-shot cloning
- Synthesize speech in multiple languages
- Use cross-lingual voice cloning
- Control synthesis with natural language instructions
- Integrate with your applications via REST API
Happy voice cloning! 🎤✨