A native Android (Kotlin + Jetpack Compose) voice-agent quickstart. It talks to a bundled,
key-less Python backend and renders the live transcript using Agora's official
ConversationalAIAPI Kotlin toolkit (vendored). The agent greets you on join — no human speech is
required to see it working.
The app owns the Agora RTC + RTM lifecycle; the vendored toolkit only parses the agent's messages into transcript and agent-state callbacks.
server/— the reused key-less Python/FastAPI backend (GET /get_config,POST /startAgent,POST /stopAgent). A managed cascade means you only needAGORA_APP_ID/AGORA_APP_CERTIFICATE.android/— the Gradle / Kotlin / Compose app, with the two Agora Maven SDKs and the vendoredconvoaiApi/toolkit.
No Agora secret ships in the app. The backend mints the RTC/RTM token and starts the managed agent;
the only secrets it needs are AGORA_APP_ID and AGORA_APP_CERTIFICATE in server/.env.local.
cd server
uv venv venv && . venv/bin/activate
uv pip install -r requirements.txt -r requirements-dev.txt
python src/server.py # serves on 0.0.0.0:8000(or pytest -q to run the server tests).
cd android
./gradlew installDebug # to a running emulator or a connected device…or open android/ in Android Studio and Run.
The app reads the backend URL from the AGENT_BACKEND_URL build-config field, default
http://10.0.2.2:8000 — 10.0.2.2 is the Android emulator's alias for the host machine, so the
emulator reaches the server running on your laptop. A physical device must instead point at the
host's LAN IP (edit AGENT_BACKEND_URL in android/app/build.gradle.kts).
RECORD_AUDIO + INTERNET permissions are requested; cleartext traffic is allowed for the dev
localhost.
GET /get_config→{app_id, token, uid, channel_name, agent_uid}.- Create the RTC engine;
loadAudioSettings(); join withChannelMediaOptions—publishMicrophoneTrack = true,autoSubscribeAudio = true,clientRoleType = BROADCASTER. - Create + log in the RTM client.
ConversationalAIAPIConfig(rtcEngine, rtmClient, renderMode = Text)→ConversationalAIAPIImpl;addHandler(...);subscribeMessage(channelName)BEFOREPOST /startAgent.POST /startAgentwithrtcUid = agent_uid,userUid = uid.onTranscriptUpdatedrows are upserted by(turnId, type)(a turn has separate user + agent rows); the ComposeLazyColumnis keyed by the composite"$turnId-$type".- End:
unsubscribeMessage→POST /stopAgent→leaveChannel→ RTMlogout→destroy.
The toolkit is configured with TranscriptRenderMode.Text (full-text rendering) for this pilot.
Resolved from Maven Central (+ jitpack):
io.agora.rtc:full-sdk:4.5.1io.agora:agora-rtm:2.2.3
android/app/src/main/java/io/agora/scene/convoai/convoaiApi/ is copied verbatim from
AgoraIO-Community/Conversational-AI-Demo
(MIT). Its license and a project NOTICE are preserved. A small CovLogger shim routes the toolkit's
logging to android.util.Log.
MIT — see LICENSE.