Demo video: https://drive.google.com/drive/folders/1rtzM5f_DhSS6CN9Krr4fgQyCOE-T1xGP?usp=sharing
This is my submission for the ServiceHive ML Internship assignment. It is a conversational AI agent for a fictional SaaS product called AutoStream (automated video editing for content creators). The agent classifies user intent, answers product questions using a local knowledge base (RAG), and captures high-intent leads by calling a mock backend tool.
Built with LangGraph, Gemini 2.5 Flash-Lite, and FAISS.
Note on the LLM choice: The assignment spec listed Gemini 1.5 Flash, but Google has retired the entire 1.5 model family (it now returns a 404 from the API). This project uses Gemini 2.5 Flash-Lite instead, which is Google's current free-tier successor in the same family — cheap, fast, and more than capable for intent classification and lightweight RAG.
- Intent classification into
casual,inquiry, andhigh_intent - RAG pipeline over a local markdown knowledge base (pricing, features, policies)
- Lead capture tool that only fires after collecting name, email, and platform
- Memory across multiple conversation turns using LangGraph's
MemorySaver
You need Python 3.9+ and a free Gemini API key from Google AI Studio.
1. Clone the repo and enter the folder
git clone https://github.com/Rushi9234/autostream-agent.git
cd autostream-agent2. Create a virtual environment and install dependencies
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt3. Add your Gemini API key
cp .env.example .env # On Windows: copy .env.example .envOpen the .env file and paste your key after GOOGLE_API_KEY=.
4. Run the chat
python main.pyYou can try the example conversation from the assignment:
You: Hi, tell me about your pricing.
You: I want to try the Pro plan for my YouTube channel.
You: My name is Alex, email is alex@example.com
Once the agent has all three fields (name, email, platform), it will call the mock tool and you should see:
Lead captured successfully: Alex, alex@example.com, YouTube
There is also a demo.py script that runs the assignment's example
conversation automatically (useful for recording the demo video):
python demo.pyautostream-agent/
├── main.py # CLI chat loop
├── demo.py # runs the spec example conversation
├── agent.py # LangGraph: state, nodes, routing
├── rag.py # FAISS index and retrieval
├── tools.py # mock_lead_capture function
├── prompts.py # prompt templates
├── knowledge_base/
│ └── autostream_kb.md # pricing, features, policies
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md
The agent is built as a LangGraph state machine. Every turn enters a
single route_turn node which classifies the user's latest message into
one of three intents (casual, inquiry, high_intent). Based on that
intent, a conditional edge routes the turn to exactly one handler node:
casual_node- replies briefly to greetings and small talk.rag_node- does a FAISS similarity search over the markdown knowledge base and produces a grounded answer.lead_node- extracts any name/email/platform values from the user's message. If all three are now collected, it callsmock_lead_capture(). Otherwise it asks for whatever is still missing.
One important detail: if we're already mid-way through collecting lead details, the router skips classification and stays on the lead path. Without this, a short reply like "Alex" (just a name) gets misclassified as casual and the signup flow breaks.
Why LangGraph over AutoGen: AutoGen is designed for multi-agent conversations where multiple agents coordinate with each other. This assignment is a single-agent workflow with conditional branching and one tool trigger, so LangGraph's explicit state schema and graph structure fit the problem more naturally.
State management: The state is a TypedDict with four fields:
messages, intent, lead_info, and lead_captured. Conversation memory
is handled by LangGraph's MemorySaver checkpointer, keyed by a unique
thread_id for each chat session. This gives us the 5-6 turn memory the
assignment requires without extra work.
To integrate this agent with WhatsApp, I would use the WhatsApp Cloud API, which is webhook-based. The plan:
-
Wrap the agent in a web server (e.g. FastAPI) with two endpoints:
GET /webhookfor Meta's verification handshake. Meta sendshub.verify_tokenandhub.challengequery parameters. If the token matches the one we configured in the Meta app dashboard, we echo back the challenge.POST /webhookto receive every incoming user message. The payload is JSON containing the sender's phone number and the message text.
-
Use the phone number as the
thread_id. This is the key part. Inmain.pywe already pass athread_idin the LangGraph config. If we use the sender's phone number as that ID, every user gets their own persistent conversation state automatically, without changing any agent code. -
Send the reply back by calling the WhatsApp Cloud API's messages endpoint (
https://graph.facebook.com/v20.0/<phone_number_id>/messages) with the access token in the Authorization header. -
For production, I would also replace
MemorySaverwith a persistent checkpointer (likeSqliteSaver) so conversation state survives server restarts, and acknowledge the webhook with a 200 response immediately while processing the message in a background task (since Meta retries webhooks if the response is slow).
- The knowledge base is stored as a simple markdown file in
knowledge_base/autostream_kb.md. It's split on markdown headers (H1/H2/H3) so pricing, features, and policies become separate chunks. - The lead-capture tool is guarded: it will only fire when all three fields (name, email, platform) have been collected. Until then, the agent keeps asking for the missing ones.
- Temperature is set to 0.2 to keep replies consistent.