A simple Retrieval-Augmented Generation (RAG) application that allows users to ask questions about a PDF document and receive grounded answers using Mistral models.
This project is built as a developer-facing AI tool with a focus on correctness, clarity, and best engineering practices rather than UI complexity.
Large language models cannot reliably answer questions about long documents without external context. This project demonstrates how Retrieval-Augmented Generation (RAG) can:
- Ground model responses in real document content
- Reduce hallucinations
- Make LLM behavior transparent and debuggable
The project is intentionally kept simple to highlight system design decisions and LLM integration best practices.
- A PDF is uploaded via the UI
- The document is split into overlapping text chunks
- Each chunk is converted into an embedding
- Embeddings are stored in a vector store (FAISS)
- When a user asks a question:
- The question is embedded
- Relevant chunks are retrieved
- Retrieved chunks are passed as context to a Mistral model
- The model generates an answer strictly based on the retrieved context
This application leverages Mistral AI's specialized models for optimal RAG performance:
# Core Mistral models
CHAT_MODEL = "mistral-small-latest" # For answer generation
EMBED_MODEL = "mistral-embed" # For text embeddings
# RAG parameters
CHUNK_SIZE = 1000 # Characters per text chunk
CHUNK_OVERLAP = 200 # Overlap between chunks for context preservation
TOP_K = 2 # Number of chunks to retrieve per questionflowchart TD
A[Upload PDF] --> B[Extract Text]
B --> C[Chunk Text]
C --> D[Create Embeddings]
D --> E[Store in Vector Store]
F[User Question] --> G[Embed Question]
G --> H[Retrieve Relevant Chunks]
H --> I[Send Context + Question to Mistral]
I --> J[Generate Answer]
J --> K[Display Answer + Retrieved Chunks]
pdf-chatbot-mistral/
โโโ app/ # Core application
โ โโโ config.py # Settings & constants
โ โโโ mistral_client.py # Mistral API wrapper
โ โโโ rag.py # Chunking & retrieval logic
โ โโโ ui.py # Streamlit UI helpers
โโโ tests/ # Comprehensive test suite
โโโ main.py # Application entry point
โโโ requirements.txt # Production dependencies
โโโ requirements-dev.txt # Development tools
โโโ README.md # You're reading it!
git clone https://github.com/imHardik1606/pdf-chatbot-mistral.git
cd pdf-chatbot-mistralpython -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the root directory:
MISTRAL_API_KEY=your_api_key_herestreamlit run main.pyOpen http://localhost:8501 and start chatting with PDFs! ๐
Once running:
- Upload a PDF
- Wait for chunking and embedding to complete
- Ask questions about the document
We've included a complete test suite so you can verify everything works:
# Install test tools (once)
pip install -r requirements-dev.txt
# Run all tests
pytest tests/
# Expected: 12 out of 14 tests pass โ
# The 2 "failing" tests are edge cases we've documented
# See what's tested
pytest tests/ -v
# Check code coverage
pytest tests/ --cov=app --cov-report=term-missing- โ Text chunking - Splitting documents intelligently
- โ FAISS operations - Vector search works correctly
- โ API client - Mocked Mistral API calls
- โ PDF processing - Text extraction from PDFs
- โ Edge cases - Empty docs, small files, etc.
tests/
โโโ conftest.py # Contains chunks of text
โโโ diagnostic.py # Diagnose the test suite
โโโ test_rag.py # Core RAG logic tests
โโโ test_mistral_client.py # API integration tests
โโโ test_ui.py # UI/PDF processing tests
Thatโs a very good instinct โ and youโre right. If the reviewer doesnโt know which PDF you used, example questions tied to a specific book are confusing and slightly unprofessional.
What you want instead is question types, not question content.
Below is a replacement section you can drop into README.md.
Itโs generic, reviewer-friendly, and reads like something an engineer at Mistral would write.
This application uses a Retrieval-Augmented Generation (RAG) pipeline that answers questions strictly based on retrieved text from the uploaded document.
As a result, performance depends heavily on the structure of the question.
The system performs best on questions where the answer is explicitly present within a limited portion of the document:
-
Factual questions
- Asking about concrete information stated in the text
-
Definition or description questions
- Asking how an entity, concept, or event is described
-
Local context questions
- Asking about content from a specific section or part of the document
-
Single-hop questions
- Questions that can be answered without reasoning across distant sections
These questions align well with the retrieval step and usually result in grounded, verifiable answers.
The following types of questions may produce incomplete or unreliable answers:
-
Global summarization
- Questions requiring understanding of the entire document
-
Multi-hop reasoning
- Questions that depend on connecting information across many sections
-
Abstract or interpretive questions
- Questions that require inference beyond what is explicitly written
-
Timeline-wide or narrative arc questions
- Questions spanning large portions of long documents
These limitations are expected in a basic RAG system without hierarchical retrieval or long-context reasoning.
To assess whether the system is working correctly:
- Ask a factual or locally scoped question
- Inspect the retrieved chunks displayed in the UI
- Confirm that the answer is derived from the retrieved text
Answers that cannot be traced back to retrieved content should be treated cautiously.
This system intentionally:
- Uses fixed-size chunking
- Retrieves a limited number of chunks per query
- Avoids document-wide reasoning for transparency
These tradeoffs keep the system simple, debuggable, and aligned with RAG best practices.
- It shows you understand RAG limitations
- It sets correct expectations
- It avoids dataset-specific assumptions
- It demonstrates engineering maturity
- The system can only answer questions explicitly present in the document
- Narrative or global questions (e.g. โWhat happens at the end?โ) may fail on very large PDFs
- No chapter-level or section metadata is used
- The model does not reason beyond retrieved chunks
- Chunk size and overlap are fixed and may not be optimal for all documents
These limitations are expected for a basic RAG pipeline and are documented intentionally.
- Streamlit was chosen for fast prototyping and easy testing of retrieval behavior
- Core logic is separated from UI for clarity and maintainability
- A Mistral SDK wrapper isolates model-specific code
- Emphasis is on transparency and correctness rather than UI complexity
- Add page or chapter-level metadata
- Display citations with answers
- Support multiple documents
- Use hierarchical chunking for large PDFs
- Add evaluation metrics for retrieval quality
- Embeddings: Mistral-compatible embedding model
- Generation: Mistral small model (chosen for fast iteration and cost efficiency)
The project focuses on system design and reliability, not model size.
- Ask factual questions grounded in the text
- Inspect retrieved chunks shown in the UI
- Verify answers are derived from retrieved content
This project is intended as a technical demonstration of building AI-powered developer tools using Mistral models. It is not production-ready but follows industry best practices for prototyping and experimentation.