"How do we know our voice agent won't fail in production?"
PersonaForge answers that question before your first customer call.
PersonaForge is a synthetic customer generation and reliability testing platform for conversational voice agents. It acts as the "GitHub Actions for Voice Agents," allowing developers to autonomously validate behavior, reliability, and compliance through thousands of simulated customer interactions.
Traditional testing for voice agents is broken. It's manual, slow, and ignores voice-native failures like interruptions and latency.
The PersonaForge Way:
Build Agent -> 1,000 Synthetic Customers -> Failure Detection -> Deploy Safely
- Forge (Persona Engine): Goal-driven, emotionally consistent synthetic customers. They don't just generate text; they maintain memory, pursue subgoals, and react to agent behavior.
- Runner (Execution Engine): High-concurrency voice-native conversation runner. Supports ElevenLabs Conversational AI with real-time audio streaming.
- Judge (Evaluation Engine): Multi-stage LLM evaluation that detects:
- Hallucinations: Agent inventing policies or facts.
- Escalation Failures: Agent failing to hand off to a human when required.
- Compliance: Violations of safety or business rules.
- Voice Metrics: Interruption recovery and response latency.
- CI/CD Integration: Built-in quality gates for your deployment pipeline.
- Studio (Dashboard): Visualize regressions, explore failure clusters, and replay conversations turn-by-turn.
Comprehensive user guides are available in the docs/ directory:
- getting-started.md: Setup, prerequisites, and first test run guide.
- configuration-guide.md: Structural parameter explanations for YAML/Markdown DSL configurations.
- cli-commands.md: Complete CLI command options, syntax, and behaviors.
- dashboard-studio.md: Startup commands for server and web services, metrics walkthrough, and database details.
- ci-cd-integration.md: Steps to add secrets and design pipeline checks.
- troubleshooting.md: Solutions for common database lock issues, WebSocket error codes, and rate limits.
You can install the CLI tool directly from PyPI:
pip install personaforgeOr install from source for development and web studio access:
git clone https://github.com/arjun-vegeta/personaforge.git
cd personaforge
pip install -r requirements.txtCreate a .env file in your project directory:
ELEVENLABS_API_KEY=your_key_here
GOOGLE_API_KEY=your_gemini_key_herepersonaforge initpersonaforge run scenarios/telecom_refund.yamlpersonaforge ci --scenario scenarios/telecom_refund.yamlThe PersonaForge Studio provides a deep dive into your agent's health.
# Start the backend
uvicorn personaforge.backend.app.main:app --reload
# Start the frontend
cd personaforge/web
npm install
npm run devVisit http://localhost:3000 to view pass rates, failure clusters, and conversation replays.
You can run the entire PersonaForge stack (PostgreSQL, Redis, Backend, Frontend, and Worker) using Docker Compose:
# Create a .env file with your API keys
cp .env.example .env
# Start the services
docker-compose up --buildPersonaForge is designed to be part of your development workflow. The repository includes a GitHub Action template in .github/workflows/ci.yml that:
- Runs unit tests.
- Initializes the PersonaForge environment.
- Executes a CI quality gate check against your scenarios.
To use this, add GOOGLE_API_KEY and ELEVENLABS_API_KEY to your GitHub repository secrets.
PersonaForge is built with a modular, provider-first architecture:
- FastAPI / SQLModel: High-performance backend with PostgreSQL.
- Gemini 3.1 Flash Lite: Ultra-low latency LLM reasoning for customer actions and judging.
- ElevenLabs ConvAI: Direct WebSocket integration for voice interaction.
- Redis / RQ: Asynchronous task processing for large-scale test suites.
This project is licensed under the MIT License - see the LICENSE file for details.
Built for the future of Conversational AI.