Conversational Professor Clone - Digital Twin

Welcome to the Conversational Professor Clone project! This repository contains the complete pipeline for generating a digital twin of a professor, enabling real-time voice-to-voice bidirectional conversations. The project covers automated data preprocessing, model fine-tuning, and a low-latency asynchronous deployment architecture over WebSockets.

Demo Video

Pipeline Architecture Overview

Preprocessing

In the preprocessing phase, we scrape a lecture from each professor and process it through an automated pipeline to create 10-30 second clips with minimal human intervention.

Transcription & Alignment: The entire lecture .mp3 is sent into our ASR model (Qwen3 1.7b ASR) coupled with a forced aligner. This yields the exact start and end timestamp for each word.
Smart Partitioning: Our partition algorithm slices the audio based on the end of words and phrases. This ensures clean clips without any trailing words cut off at the beginning or end, while maintaining context and consistency.
Evaluation: We utilize a small internal tool for rapid human evaluation and correction.
Dataset: The processed audio segments can be found at the ASU Professors Voice Cloning Dataset.

Training

The training step involves fine-tuning the VoxCPM 2 model using LoRA configs to capture the nuance, tone, and speaking style of the professor. We use the clean segments from the preprocessing pipeline to train lightweight LoRA adapters on A100 gpu instance, for efficiency and fast iteration.

Deployment

The deployment setup is designed for low latency and full-duplex communication using WebSockets.We modified the nanovLLM repository to support low-latency websocket streaming for both incoming audio and generated speech responses. Both the ASR and TTS backend servers are hosted on a single RTX A6000 on Thunder Compute.

Qwen3 1.7b ASR is served using vLLM for highly optimized transcription.
VoxCPM 2 is deployed via a self-modified version of nano-vllm that supports WebSocket integration. This hosts the base model and the dynamically loaded trained LoRA weights.
Client-side VAD for interrupt detection enabling full duplex interaction
Frontend deployment is powered by Cloudflare Pages.

Asynchronous Data Flow

The pipeline operates entirely asynchronously. Audio chunks are streamed from the client directly through the ASR inference, fed to the LLM, and synthesized by VoxCPM immediately to return output chunks to the client.

API Endpoints

WS /stream: Primary full-duplex endpoint for client-server bidirectional communication. Handles incoming audio bytes and returns synthesized audio bytes and metadata.
POST /loras: Dynamically load or swap LoRA adapters into the TTS pipeline without restarting the server.
GET /health: Basic health check validation for the container orchestration and load balancing.

Monitoring

Server metrics are gathered from endpoint instrumentation using Prometheus and visualized in Grafana. This provides deep observability into system usage, latency percentiles, WebSocket connection health, and GPU metrics for the RTX A6000.

Quickstart & Usage Guide

Below is the setup and startup sequence tailored for an Ubuntu VM on Thunder Compute.

1. Install Dependencies

Using uv, install the required libraries in both the root directory and the deployment directory:

# In the root directory
uv sync

# In the deployment directory
cd deployment
uv sync

2. Start the vLLM ASR Server

Run the ASR module from the root virtual environment:

vllm serve --config qwen3_asr_vllm.yaml

3. Start the Main FastAPI Server

Activate the environment variables and run the deployment app:

set -a
source deployment/.env
set +a

uv run fastapi run deployment/app/main.py --host 0.0.0.0 --port 8001

4. Load the LoRA Adapter dynamically

Once the server is running, load your trained professor twin weights via curl:

curl -X POST http://localhost:8001/loras \
  -H "Content-Type: application/json" \
  -d '{"name":"my_lora","path":"/home/ubuntu/latest"}'

5. Run Stress Tests

To benchmark the deployment and test concurrency under load, use the stress testing script provided in the root directory:

python stress_test.py

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Preprocessing		Preprocessing
frontend		frontend
nanovllm-voxcpm		nanovllm-voxcpm
report		report
training		training
.gitignore		.gitignore
README.md		README.md
communication.png		communication.png
human_evaluation.jpeg		human_evaluation.jpeg
pipeline.png		pipeline.png
pyproject.toml		pyproject.toml
qwen3_asr_vllm.yaml		qwen3_asr_vllm.yaml
stress_test.py		stress_test.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Professor Clone - Digital Twin

Table of Contents

Demo Video

Pipeline Architecture Overview

Preprocessing

Training

Deployment

Asynchronous Data Flow

API Endpoints

Monitoring

Quickstart & Usage Guide

1. Install Dependencies

2. Start the vLLM ASR Server

3. Start the Main FastAPI Server

4. Load the LoRA Adapter dynamically

5. Run Stress Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conversational Professor Clone - Digital Twin

Table of Contents

Demo Video

Pipeline Architecture Overview

Preprocessing

Training

Deployment

Asynchronous Data Flow

API Endpoints

Monitoring

Quickstart & Usage Guide

1. Install Dependencies

2. Start the vLLM ASR Server

3. Start the Main FastAPI Server

4. Load the LoRA Adapter dynamically

5. Run Stress Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages