A Python-based conversational voice AI agent that listens to user speech, converts it into text, processes the query using OpenAI language models, and responds back using AI-generated speech.
This project demonstrates a complete voice interaction pipeline:
- User Voice → Speech-to-Text → LLM Reasoning → Tool Calling → Text-to-Speech → Voice Response
-
🎤 Voice Input
- Captures user speech through the microphone.
- Uses
SpeechRecognitionfor speech-to-text conversion.
-
🧠 Conversational AI
- Uses OpenAI chat models to generate intelligent responses.
- Maintains message history for contextual conversations.
-
🔊 Text-to-Speech Output
- Converts AI responses into natural-sounding speech.
- Uses OpenAI TTS with streaming audio playback.
-
🛠️ Tool Calling Support
-
Can call custom tools based on user intent.
-
Includes examples like:
- Weather lookup
- Running system commands
-
-
🌦️ Weather Tool
- Fetches current weather using
wttr.in.
- Fetches current weather using
-
⚙️ Command Execution Tool
- Supports running local system commands through the AI agent.
The project contains two main implementations:
A simple voice conversational agent.
Flow:
User speaks
↓
Speech is converted to text
↓
Text is sent to OpenAI chat model
↓
AI generates response
↓
Response is converted to speech
↓
Audio is played back to user
An advanced voice agent with reasoning and tool-calling support.
It follows a structured reasoning flow:
START → PLAN → TOOL → OBSERVE → OUTPUT
This allows the agent to decide whether it needs to answer directly or use an external tool first.
| Component | Technology |
|---|---|
| Language | Python |
| Speech-to-Text | SpeechRecognition |
| LLM | OpenAI GPT models |
| Text-to-Speech | OpenAI TTS |
| Tool Calling | Custom Python functions |
| Environment Variables | python-dotenv |
| Weather API | wttr.in |
| Validation | Pydantic |
Voice-AI-agent/
└── Agent/
└── VoiceAgent/
├── main.py
├── cursor.py
└── requirements.txt
git clone https://github.com/pranjalisr/Voice-AI-agent.git
cd Voice-AI-agent/Agent/VoiceAgentpython -m venv venvActivate it:
# Windows
venv\Scripts\activate# macOS/Linux
source venv/bin/activatepip install -r requirements.txtCreate a .env file inside:
Agent/VoiceAgent/
Add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_herepython main.pypython cursor.pyThen speak into your microphone when prompted:
Speak Something...
Example queries:
What is the weather in Delhi?
Tell me a joke.
Open calculator.
get_weather(city: str)Fetches current weather for a city.
Example:
What is the weather in Mumbai?
run_command(cmd: str)Runs a system command on the local machine.
Example:
Create a folder named demo
⚠️ Be careful with command execution. Do not run unsafe or destructive commands.
The agent uses OpenAI’s text-to-speech model with the coral voice.
Current TTS model:
gpt-4o-mini-ttsThe response is streamed and played locally using:
LocalAudioPlayerThe advanced agent in cursor.py uses a structured JSON-based reasoning format:
{
"step": "PLAN",
"content": "The user is asking for the weather, so I should use the weather tool."
}Tool call format:
{
"step": "TOOL",
"tool": "get_weather",
"input": "Delhi"
}Final output format:
{
"step": "OUTPUT",
"content": "The weather in Delhi is partly cloudy with a temperature of 30°C."
}User: What is the weather in Delhi?
Agent Plan:
The user wants weather information.
Tool Call:
get_weather("Delhi")
Observation:
Weather in Delhi is Partly cloudy +30°C
Final Voice Response:
The current weather in Delhi is partly cloudy and around 30°C.
- A working microphone is required.
- Internet connection is required for OpenAI and Google speech recognition.
- Keep your
.envfile private. - Never commit API keys to GitHub.
This project is open-source and available under the MIT License.