A self-correcting, self-reflective RAG agent β built on the π¦π LangChain & π¦πΈοΈ LangGraph ecosystem.
RouteΒ βΒ RetrieveΒ βΒ GradeΒ βΒ GenerateΒ βΒ Self-Reflect
Note
Powered end-to-end by the π¦ LangChain stack: LangChain for composable LCEL chains, LangGraph for the stateful agent graph, and LangSmith for full-trace observability.
This project implements an Adaptive RAG pipeline that goes far beyond naive "retrieve-then-answer" systems. It combines three powerful research ideas into a single, robust LangGraph state machine:
| Technique | What it does |
|---|---|
| π§ Adaptive RAG | Routes each question to the right source β the local vector store or live web search. |
| π‘οΈ Corrective RAG (CRAG) | Grades every retrieved document for relevance and falls back to web search when knowledge is missing. |
| πͺ Self-RAG | Reflects on its own answer to detect hallucinations and verify the question is actually addressed β retrying when it isn't. |
The result is an agent that knows what it knows, fetches what it doesn't, and never confidently makes things up.
The agent is modeled as a graph of decision nodes. Each question flows through routing, grading, generation, and self-reflection loops until a grounded and useful answer is produced.
- π§ Route Question β An LLM router classifies the question. Topics about agents, prompt engineering, or adversarial attacks go to the vector store; everything else goes to web search.
- π Retrieve β Pulls the most semantically relevant chunks from the ChromaDB vector store.
- π Grade Documents β Each retrieved document is scored
yes/nofor relevance. Irrelevant docs are dropped, and aweb_searchflag is raised if anything is missing. - π Web Search (conditional) β If knowledge gaps are detected, Tavily fetches fresh results from the web and appends them to the context.
- βοΈ Generate β Gemini produces an answer grounded in the collected context.
- πͺ Self-Reflection β The answer is double-checked:
- Hallucination grader β Is the answer grounded in the documents? If no, regenerate.
- Answer grader β Does the answer actually resolve the question? If no, fall back to web search.
- If both pass β β return the answer.
Built on the π¦ LangChain ecosystem, with best-in-class models and infrastructure around it.
| Component | Role | |
|---|---|---|
| π¦πΈοΈ | LangGraph | Orchestration β the stateful, cyclic agent graph |
| π¦π | LangChain | Composable LCEL chains (routing, grading, generation) |
| π¦π οΈ | LangSmith | Full-trace observability & debugging |
| β¨ | Google Gemini 2.5 Flash | LLM, via langchain-google-genai |
| π’ | gemini-embedding-001 |
Document & query embeddings |
| ποΈ | ChromaDB | Locally-persisted vector store |
| π | Tavily | Real-time web search fallback |
| β‘ | uv Β· pytest Β· black Β· isort |
Tooling, testing & formatting |
langgraph-course/
βββ main.py # π Entry point β invokes the compiled graph
βββ ingestion.py # π₯ Loads & chunks docs, builds the Chroma retriever
βββ graph/
β βββ graph.py # π§© The LangGraph state machine (nodes + edges)
β βββ state.py # π¦ GraphState (TypedDict) shared across nodes
β βββ consts.py # π Node name constants
β βββ nodes/ # βοΈ Graph nodes
β β βββ retrieve.py # β fetch documents from the vector store
β β βββ grade_documents.py # β score document relevance (CRAG)
β β βββ generate.py # β produce the grounded answer
β β βββ web_search.py # β Tavily fallback search
β βββ chains/ # π Reusable LCEL chains
β βββ router.py # β route question β vectorstore | websearch
β βββ generation.py # β RAG answer-generation chain
β βββ retrieval_grader.py # β "is this doc relevant?"
β βββ hallucination_grader.py # β "is the answer grounded?"
β βββ answer_grader.py # β "does the answer resolve the question?"
β βββ tests/
β βββ test_chains.py # β
Unit tests for every chain
βββ pyproject.toml # π Dependencies & project metadata
- Python 3.11+
- uv (recommended) β
pip install uv - API keys for Google Gemini and Tavily (and optionally LangSmith)
git clone https://github.com/Mohamedkhattab02/Agentic-RAG-with-LangGraph
cd langgraph-course
# Install dependencies into a virtual environment with uv
uv syncPrefer
pip? Runpip install -e .inside a virtual environment instead.
Create a .env file in the project root:
# --- Required ---
GOOGLE_API_KEY=your_google_gemini_api_key
TAVILY_API_KEY=your_tavily_api_key
# --- Optional: LangSmith tracing & observability ---
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY=your_langsmith_api_key
LANGSMITH_PROJECT=advanced-rag
# --- Make local imports resolve ---
PYTHONPATH=.π Get your keys: Google AI Studio Β· Tavily Β· LangSmith
The vector store is built from a set of Lilian Weng's essays on agents, prompt engineering, and adversarial attacks.
Open ingestion.py and uncomment the Chroma.from_documents(...) block, then run it once to populate ./.chroma:
uv run python ingestion.pyAfter the first run, re-comment that block so subsequent runs simply load the persisted store.
uv run python main.pyYou'll see the agent reason through the graph live in your terminal:
---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
Customize the question in main.py:
from dotenv import load_dotenv
from graph.graph import app
load_dotenv()
if __name__ == "__main__":
result = app.invoke(input={"question": "What is agent memory?"})
print(result["generation"])Every chain is covered by unit tests β routing, relevance grading, generation, and hallucination detection:
uv run pytest -s -v
β οΈ Tests make live LLM calls, so a validGOOGLE_API_KEYand a populated vector store are required.
- Binary grading with structured output β Each grader uses Pydantic models +
with_structured_output()for reliable, parseable decisions. - Conditional edges β The graph branches dynamically based on grader verdicts, enabling true self-correction loops.
- Separation of concerns β Pure chains (logic) are decoupled from nodes (state I/O), keeping the graph readable and testable.
Released under the MIT License β free to use, modify, and learn from.

