A full-stack PDF Q&A chatbot with conversational memory, built with FastAPI, LangChain, ChromaDB, and React.
Backend
- FastAPI — REST API framework
- LangChain — agent orchestration and memory
- ChromaDB — persistent vector store
- HuggingFace Embeddings —
sentence-transformers/all-MiniLM-L6-v2 - Groq (llama-3.3-70b-versatile) — LLM inference
Frontend
- React.js — UI framework
- Axios — HTTP client
pdf-chatbot/
├── backend/
│ ├── main.py # FastAPI app and routes
│ ├── ingest.py # PDF → chunks → ChromaDB pipeline
│ ├── agent.py # LangChain agent with memory
│ ├── schemas.py # Pydantic request models
│ ├── config.py # Environment config
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── App.jsx
│ │ ├── App.css
│ │ ├── api.js
│ │ └── components/
│ │ ├── UploadFile.jsx
│ │ └── QueryFile.jsx
│ └── package.json
├── chroma_db/ # Auto-generated, gitignored
├── .env.example
└── README.md
User uploads PDF
↓
PyPDFLoader → RecursiveCharacterTextSplitter → HuggingFace Embeddings → ChromaDB
↓
User asks a question
↓
LangChain Agent → search_chroma tool → ChromaDB similarity search
↓
Retrieved chunks + chat history → Groq LLM → Answer
- Python 3.10+
- Node.js 18+
- Groq API key → https://console.groq.com
# Clone the repo
git clone https://github.com/your-username/pdf-oracle.git
cd pdf-oracle
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r backend/requirements.txt
# Set up environment variables
cp .env.example .env
# Add your GROQ_API_KEY to .env
# Run the server
uvicorn backend.main:app --reloadBackend runs at http://localhost:8000
API docs at http://localhost:8000/docs
cd frontend
# Install dependencies
npm install
# Set up environment variables
echo "REACT_APP_API_URL=http://localhost:8000" > .env
# Start the dev server
npm startFrontend runs at http://localhost:3000
Create a .env file in the project root:
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL_NAME=llama-3.3-70b-versatile
| Method | Endpoint | Description |
|---|---|---|
| GET | / | Welcome message |
| GET | /health | Health check |
| POST | /upload | Upload and ingest a PDF file |
| POST | /chat | Ask a question about the PDF |
Content-Type: multipart/form-data
Body: file (PDF)
{
"question": "What are the main findings of this paper?"
}backend/requirements.txt:
fastapi
uvicorn
python-multipart
langchain
langchain-community
langchain-groq
chromadb
sentence-transformers
pypdf
python-dotenv
pydantic-settings
venv/
chroma_db/
.env
__pycache__/
*.pyc
node_modules/
frontend/build/
uploads/
"I built a RAG pipeline using LangChain agents, ChromaDB as the vector store, HuggingFace embeddings, and Groq's llama-3.3-70b for inference — all wired to a React frontend. The agent uses a custom
search_chromatool to retrieve relevant PDF chunks and maintains conversational memory across turns."
Built by Roshan — GitHub