Skip to content

AgoraIO-Conversational-AI/agent-samples

Repository files navigation

Agora Agora Conversational AI

A guide to understanding and implementing Agora voice and video AI agents. Spin up the sample backend and one of the sample clients or ask AI to do it for you.

AI Coding Assistant Guide

AI repo entry point (progressive disclosure)AGENTS.md

Long-form implementation guide for the sample stackAGENT.md

Example prompt for Claude Code (Voice):

Clone https://github.com/AgoraIO-Conversational-AI/agent-samples and then I want to run the
React Voice AI Agent here on my laptop. Read AGENTS.md first, then use AGENT.md
for the longer implementation walkthrough.

Example prompt for Claude Code (Avatar):

Clone https://github.com/AgoraIO-Conversational-AI/agent-samples and then I want to run the
Video AI Agent with Avatar Sample here on my laptop. Read AGENTS.md first, then use AGENT.md
for the longer implementation walkthrough.

System Architecture

System Architecture Diagram

Your backend serves the client app, generates tokens and credentials, then calls the Agora Agent REST API to start the AI agent. Both client and agent join the same channel via SD-RTN where audio, video, and transcription data flow bidirectionally in real-time.

Architecture Overview

Voice AI Client

Your front-end application (web, mobile, or desktop) that captures user inputs and plays out the AI agent's responses. Built with the Agora RTC SDK and optionally components from the Agora Conversational AI agent-ui-kit used in the samples.

Your Backend Services

Your server-side application that authenticates users, generates Agora tokens, and orchestrates the AI agent. It serves the client app and calls the Agora REST API to start/stop agent instances.

Agora SD-RTN

Agora's Software-Defined Real-Time Network. A global low-latency network that routes audio, video, and data streams between participants in real-time.

AI Agent Instance

A managed AI agent that joins the channel as a participant. It listens to user audio, processes it through STT → LLM → TTS, and streams the response back.

Backend Sample

To run the server sample that your voice client will connect to, you will need:

Agora Credentials (always required):

APP_ID=                  # Required: Agora Console
APP_CERTIFICATE=         # Required: Agora Console (enable in Project Security)

Option A: Pipeline Mode (Agent Builder) — Simplest

Use a pre-configured pipeline from Agora Agent Builder. No LLM or TTS API keys needed — the pipeline owns all provider config.

APP_ID=                  # Required: Agora Console
APP_CERTIFICATE=         # Required: Agora Console
PIPELINE_ID=             # Required: 32-char hex ID from Agent Builder

Option B: Inline Config — Full Control

Configure LLM, TTS, and ASR providers directly in .env. This gives you full control over every parameter.

APP_ID=                  # Required: Agora Console
APP_CERTIFICATE=         # Required: Agora Console
LLM_API_KEY=             # Required: OpenAI or compatible API key
TTS_VENDOR=              # Required: rime, elevenlabs, openai, or cartesia
TTS_KEY=                 # Required: API key for your TTS vendor
TTS_VOICE_ID=            # Required: Voice/speaker ID for your chosen vendor

Sample

Simple Backend Python backend for creating AI agents and generating RTC credentials. Supports local development, cloud instances, and AWS Lambda deployment.

Client Samples

Core Packages

  • agent-client-toolkit - Core client toolkit published on npm as agora-agent-client-toolkit — RTC/RTM connection management, transcript handling, and React hooks
  • agent-ui-kit - React UI components for voice, chat, and video

Voice Agent Sample

Recommended and complete React JS voice client sample which looks great on any device.

React Voice Client Responsive React/Next.js voice client built with SDK packages and UI Kit. Features TypeScript, real-time transcription display, voice controls, and integrated text chat.

Voice Client Screenshot

Video Agent Sample

React Video Client with Avatar React/Next.js client with video avatar and local camera support. Includes responsive layouts and multi-stream video rendering.

Avatar Client Screenshot

Companion Servers

Optional standalone servers that extend your agent with advanced capabilities.

server-custom-llm — Custom LLM proxy for RAG, tool calling, conversation memory, and response formatting. Available in Python, Node.js, and Go.

server-mcp — MCP memory server that gives agents persistent per-user memory via tool calling. Stores and retrieves conversation context across sessions.

Basic Samples

Simple Voice AI Client (No Backend) Standalone HTML/JavaScript client for testing voice agents. Maintains persistent RTC connection allowing agents to join and leave without client reconnection.

Simple Voice AI Client (With Backend) Full-featured vanilla JavaScript client demonstrating end-to-end integration with backend for agent initialization and voice interaction.

About

A guide to creating Agora voice and video AI agents

Resources

License

Stars

Watchers

Forks

Contributors