An intelligent AI-powered customer service assistant for Reliance Jio, featuring voice interaction, RAG-based knowledge retrieval, and real-time chat capabilities.
- Real-time text chat with AI assistant
- Streaming responses (token-by-token)
- Session management and chat history
- WebSocket support for instant messaging
| Metric | Result |
|---|---|
| End-to-end latency | < 2 seconds |
| Query resolution rate | 95%+ |
| Human escalation reduction | −35% |
- Speech-to-Text (STT): Faster-Whisper (Whisper base.en model)
- Text-to-Speech (TTS): Kokoro-82M (lightweight, high-quality)
- Voice Activity Detection (VAD): Silero-VAD (neural network-based)
- Real-time voice conversations
- Hybrid Search: Vector (BGE embeddings) + BM25 keyword search
- CRAG: Corrective RAG with relevance grading
- Knowledge Base: Comprehensive Jio plans, services, and FAQs
- ChromaDB: Vector database for efficient retrieval
- ✅ Prepaid Mobile Plans
- ✅ Postpaid Mobile Plans
- ✅ JioFiber Broadband
- ✅ JioAirFiber (5G Wireless Broadband)
- ✅ International Roaming
- ✅ ISD Calling Rates
- ✅ Digital Services (JioTV, JioCinema, etc.)
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ telecom-voice-companion-main │
├─────────────────────────────────────────────────────────────────┤
│ Backend (FastAPI) │
├──────────────┬──────────────┬──────────────┬────────────────────┤
│ Chat API │ Voice API │ RAG API │ WebSockets │
├──────────────┼──────────────┼──────────────┼────────────────────┤
│ │ │ │ │
│ Ollama LLM │ Whisper STT │ ChromaDB │ Silero VAD │
│ (llama3.1:8b)│ (base.en) │ (Vector DB) │ │
│ │ │ │ │
│ │ Kokoro TTS │ BM25 Index │ │
│ │ (82M model) │ (Keyword) │ │
└──────────────┴──────────────┴──────────────┴────────────────────┘
| Component | Technology | Description |
|---|---|---|
| Framework | FastAPI 0.109 | Async Python web framework |
| LLM | Ollama + llama3.1:8b | Local LLM inference |
| STT | Faster-Whisper (base.en) | Speech-to-Text using Whisper |
| TTS | Kokoro-82M | Lightweight 82M param TTS model |
| VAD | Silero-VAD | Neural network voice activity detection |
| Embeddings | BGE-base-en-v1.5 | State-of-the-art embedding model |
| Vector DB | ChromaDB | Vector database for RAG |
| Search | BM25 + Vector | Hybrid search (RRF fusion) |
| Database | PostgreSQL | Session and chat history storage |
| Cache | Redis | Response caching |
| Component | Technology | Description |
|---|---|---|
| Framework | React + Vite | Modern frontend tooling |
| Styling | TailwindCSS | Utility-first CSS |
| UI | shadcn/ui | Beautiful component library |
| Voice | Web Audio API | Browser audio recording |
- Python 3.12+ (required for Kokoro TTS compatibility)
- Node.js 18+ (for frontend)
- Ollama (for local LLM)
- PostgreSQL (optional, for persistence)
- Redis (optional, for caching)
⚡ CUDA is highly recommended for Kokoro TTS - provides 5-10x faster audio generation
# Check if CUDA is available
python -c "import torch; print('CUDA:', torch.cuda.is_available())"CUDA Setup:
- Install NVIDIA CUDA Toolkit 12.x
- Install cuDNN
- Reinstall PyTorch with CUDA:
pip uninstall torch torchaudio
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121git clone https://github.com/Vijaykrishna2334/Telecom-ai-assistant.git
cd Telecom-ai-assistant# Install from https://ollama.ai
ollama pull llama3.1:8bcd backend
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (Linux/Mac)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# For CUDA/GPU support (recommended for Kokoro TTS):
pip uninstall torch torchaudio -y
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121# Copy example env
copy .env.example .env # Windows
cp .env.example .env # Linux/Mac
# Edit .env with your settingsKey .env settings:
# LLM
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
# Voice (Both STT and TTS auto-detect GPU/CUDA)
STT_MODEL=base.en
# Device is auto-detected - uses CUDA if available
# Kokoro TTS (auto-detects GPU)
KOKORO_VOICE=af_heart
KOKORO_LANG_CODE=acd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8080Backend runs at: http://localhost:8080
cd telecom-voice-companion-main
# Install dependencies
npm install
# Start development server
npm run devFrontend runs at: http://localhost:5173
Telecom-ai-assistant/
├── backend/ # FastAPI backend
│ ├── app/
│ │ ├── api/ # API routes
│ │ │ ├── routes/ # REST endpoints
│ │ │ └── websockets/ # WebSocket handlers
│ │ ├── core/ # Config, logging, security
│ │ ├── models/ # Database models
│ │ └── services/
│ │ ├── llm/ # LLM (Ollama) integration
│ │ ├── rag/ # RAG pipeline (CRAG)
│ │ │ ├── crag_chain.py # CRAG orchestrator
│ │ │ ├── hybrid_retriever.py # Vector + BM25
│ │ │ └── ingestion.py # Document chunking
│ │ └── voice/ # Voice processing
│ │ ├── stt.py # Faster-Whisper STT
│ │ ├── kokoro_tts.py # Kokoro TTS
│ │ └── vad.py # Silero VAD
│ ├── requirements.txt
│ └── Dockerfile
│
├── knowledge/ # RAG knowledge base
│ ├── plans/ # Jio plan documents
│ ├── faqs/ # FAQ documents
│ ├── services/ # Service information
│ └── policies/ # Terms, conditions
│
├── telecom-voice-companion-main/ # React frontend
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── pages/ # Page components
│ │ └── hooks/ # Custom hooks
│ └── package.json
│
├── docker-compose.yml # Docker orchestration
└── README.md
- Model:
base.en(English optimized) - Device: Auto-detects CUDA GPU (falls back to CPU)
- Compute Type:
float16(GPU) orint8(CPU) - Features: VAD filtering, beam search, Jio vocabulary hints
# Auto-detection in stt.py
# Uses CUDA GPU if available: RTX 4090 → ~3x faster transcription
# Falls back to CPU with int8 quantization- Model: Kokoro-82M (82 million parameters)
- Voice:
af_heart(American female) - Sample Rate: 24kHz
- Device: Auto-detects GPU (CUDA) for faster synthesis
# Configuration in config.py
kokoro_voice: str = "af_heart"
kokoro_lang_code: str = "a" # 'a' = American EnglishAvailable Voices:
| Voice Code | Description |
|---|---|
af_heart |
American Female (warm) |
af_bella |
American Female (professional) |
am_adam |
American Male |
bf_emma |
British Female |
bm_george |
British Male |
- Model: Silero-VAD (neural network)
- Threshold: 0.85 (adjustable)
- Fallback: Energy-based detection
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API URL |
OLLAMA_MODEL |
llama3.1:8b |
LLM model name |
STT_MODEL |
base.en |
Whisper model size |
STT_DEVICE |
cpu |
STT device (cpu/cuda) |
KOKORO_VOICE |
af_heart |
TTS voice |
VAD_THRESHOLD |
0.85 |
VAD sensitivity |
CRAG_TOP_K |
10 |
Documents to retrieve |
DATABASE_URL |
postgresql://... |
PostgreSQL connection |
REDIS_URL |
redis://localhost:6379 |
Redis connection |
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f backend
# Stop services
docker-compose down| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/chat |
Send message, get response |
| POST | /api/v1/chat/stream |
Streaming response (SSE) |
| GET | /api/v1/chat/history/{session_id} |
Get chat history |
| Method | Endpoint | Description |
|---|---|---|
| WS | /api/v1/ws/voice |
Real-time voice WebSocket |
| POST | /api/v1/voice/transcribe |
Transcribe audio |
| POST | /api/v1/voice/synthesize |
Generate speech |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/knowledge/search |
Search knowledge base |
| POST | /api/v1/knowledge/ingest |
Ingest documents |
cd backend
# Run all tests
pytest
# Run with coverage
pytest --cov=app
# Run specific test
pytest tests/test_rag.py -vError: Cannot connect to Ollama
Solution: Ensure Ollama is running:
ollama serve
# In another terminal:
ollama pull llama3.1:8bCUDA available: False
Solution: Install PyTorch with CUDA support:
pip uninstall torch torchaudio -y
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121ModuleNotFoundError: No module named 'kokoro'
Solution:
pip install 'kokoro>=0.9.2' soundfileError: Collection not found
Solution: Knowledge base auto-ingests on startup. If issues persist:
cd backend
python reingest_knowledge.pyError: Address already in use :8080
Solution: Change port in .env:
PORT=8081Failed to load FFmpeg extension
This is a non-critical warning from torchaudio. TTS will still work.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama - Local LLM inference
- Faster-Whisper - Fast Whisper implementation
- Kokoro TTS - Lightweight TTS model
- Silero-VAD - Voice activity detection
- ChromaDB - Vector database
- FastAPI - Modern Python web framework
For issues or questions:
- Create a GitHub Issue
- Email: vijaykrishna2334@gmail.com
Built with ❤️ for better customer service