Skip to content

ROSHAN-KHANDAGALE/PDF-ChatBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Chatbot 🔮

A full-stack PDF Q&A chatbot with conversational memory, built with FastAPI, LangChain, ChromaDB, and React.


Tech Stack

Backend

  • FastAPI — REST API framework
  • LangChain — agent orchestration and memory
  • ChromaDB — persistent vector store
  • HuggingFace Embeddings — sentence-transformers/all-MiniLM-L6-v2
  • Groq (llama-3.3-70b-versatile) — LLM inference

Frontend

  • React.js — UI framework
  • Axios — HTTP client

Project Structure

pdf-chatbot/
├── backend/
│   ├── main.py            # FastAPI app and routes
│   ├── ingest.py          # PDF → chunks → ChromaDB pipeline
│   ├── agent.py           # LangChain agent with memory
│   ├── schemas.py         # Pydantic request models
│   ├── config.py          # Environment config
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── App.jsx
│   │   ├── App.css
│   │   ├── api.js
│   │   └── components/
│   │       ├── UploadFile.jsx
│   │       └── QueryFile.jsx
│   └── package.json
├── chroma_db/             # Auto-generated, gitignored
├── .env.example
└── README.md

How It Works

User uploads PDF
      ↓
PyPDFLoader → RecursiveCharacterTextSplitter → HuggingFace Embeddings → ChromaDB
      ↓
User asks a question
      ↓
LangChain Agent → search_chroma tool → ChromaDB similarity search
      ↓
Retrieved chunks + chat history → Groq LLM → Answer

Getting Started

Prerequisites


Backend Setup

# Clone the repo
git clone https://github.com/your-username/pdf-oracle.git
cd pdf-oracle

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

# Install dependencies
pip install -r backend/requirements.txt

# Set up environment variables
cp .env.example .env
# Add your GROQ_API_KEY to .env

# Run the server
uvicorn backend.main:app --reload

Backend runs at http://localhost:8000
API docs at http://localhost:8000/docs


Frontend Setup

cd frontend

# Install dependencies
npm install

# Set up environment variables
echo "REACT_APP_API_URL=http://localhost:8000" > .env

# Start the dev server
npm start

Frontend runs at http://localhost:3000


Environment Variables

Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL_NAME=llama-3.3-70b-versatile

API Endpoints

Method Endpoint Description
GET / Welcome message
GET /health Health check
POST /upload Upload and ingest a PDF file
POST /chat Ask a question about the PDF

POST /upload

Content-Type: multipart/form-data
Body: file (PDF)

POST /chat

{
  "question": "What are the main findings of this paper?"
}

Backend Requirements

backend/requirements.txt:

fastapi
uvicorn
python-multipart
langchain
langchain-community
langchain-groq
chromadb
sentence-transformers
pypdf
python-dotenv
pydantic-settings

.gitignore

venv/
chroma_db/
.env
__pycache__/
*.pyc
node_modules/
frontend/build/
uploads/

Interview Notes

"I built a RAG pipeline using LangChain agents, ChromaDB as the vector store, HuggingFace embeddings, and Groq's llama-3.3-70b for inference — all wired to a React frontend. The agent uses a custom search_chroma tool to retrieve relevant PDF chunks and maintains conversational memory across turns."


Author

Built by Roshan — GitHub

About

A full-stack PDF Q&A chatbot with conversational memory, built with FastAPI, LangChain, ChromaDB, and React.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors