A system for generating dual-host Cantonese radio shows from PDF/text content, with planned real-time Q&A interruption capabilities.
backend/- FastAPI backend (Python)userstory.md- User stories and requirementssystem_architecture.md- System design and flowspec.md- Technical specificationsAction_plan.md- Development phasesprompt_design.md- LLM prompt engineering
Phase 1: The Radio - Basic broadcast functionality
- ✅ PDF/TXT upload and text extraction
- ✅ Script generation with dual-host dialogue (Agent A)
- ✅ Audio generation using Azure TTS
- ⏳ Real-time Q&A interruptions (Phase 2)
- ⏳ Polish and optimizations (Phase 3)
See backend/README.md for detailed setup instructions.
- Install Docker Desktop for Windows
- Clone this repository
- Navigate to
backend/directory - Copy
env_template.txtto.envand configure API keys - Run
docker-compose up --build
- Install Python 3.11 or 3.12
- Navigate to
backend/directory - Create virtual environment:
python -m venv venv - Activate:
venv\Scripts\activate(Windows) orsource venv/bin/activate(Mac/Linux) - Install dependencies:
pip install -r requirements.txt - Copy
env_template.txtto.envand configure API keys - Run:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Once running, visit:
- API Docs: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- Backend: FastAPI (Python)
- LLM: Azure OpenAI (GPT-4o) / OpenAI / Anthropic Claude
- TTS: Azure Cognitive Services Speech (Cantonese voices)
- Containerization: Docker
- Windows: Native x86_64 support - optimal for Azure Speech SDK
- Mac M1/M2/M3: May require Docker with x86_64 emulation or use OpenAI TTS as workaround
- Phase 1: Single-pass generation, no interruptions/Q&A yet
[Your License Here]