Transform Videos & PDFs into Searchable AI Knowledge Bases using Retrieval-Augmented Generation (RAG), Qdrant, Gemini, Ollama and Whisper.
AI-Powered Multi-Modal Knowledge Retrieval System is an advanced Retrieval-Augmented Generation (RAG) platform that transforms long-form educational videos and PDF documents into searchable semantic knowledge repositories.
The system combines speech recognition, semantic chunking, vector embeddings, semantic retrieval, and Large Language Models to generate grounded, explainable responses linked directly to their original sources.
Unlike conventional chatbots, responses are generated from retrieved evidence and supported by timestamp-level or page-level citations, significantly reducing hallucinations and improving answer reliability.
Large Language Models often struggle with long-form content and domain-specific knowledge, frequently generating answers without verifiable evidence.
This project addresses that limitation by integrating semantic retrieval, vector databases, and hybrid LLM inference to provide source-grounded responses with explainable citations.
The result is an AI assistant capable of understanding and querying hours of lecture content and large document collections while maintaining traceability and transparency.
flowchart TD
A[Video Upload] --> B[Audio Extraction using FFmpeg]
B --> C[Groq Whisper Large-v3]
D[PDF Upload] --> E[PDF Parsing]
C --> F[Semantic Chunking]
E --> F
F --> G[BGE-M3 Embeddings]
G --> H[Qdrant Cloud Vector Database]
H --> I[Semantic Retrieval]
I --> J[Context Expansion]
J --> K[Gemini 2.5 Flash]
J --> L[Ollama Local LLM]
K --> M[Source-Grounded Response]
L --> M
M --> N[Timestamp / Page Citations]
- Upload and query both video lectures and PDF documents through a unified interface.
- Automatically transforms unstructured multimedia content into searchable knowledge.
- Supports educational content, technical documentation, and research materials.
- Semantic chunking with contextual overlap.
- Dense vector embeddings using BAAI/bge-m3.
- High-performance vector search powered by Qdrant Cloud.
- Context expansion for improved retrieval quality.
- Gemini 2.5 Flash integration.
- Ollama-based local inference support.
- Flexible cloud/local deployment options.
- Timestamp-level citations for video content.
- Page-level references for PDFs.
- Confidence scoring based on retrieval quality.
- Source-grounded responses designed to minimize hallucinations.
- GPU-accelerated embedding generation.
- Fast speech-to-text transcription using Groq Whisper Large-v3.
- Optimized indexing pipeline for large multimedia datasets.
| Metric | Result |
|---|---|
| Longest Video Processed | 59 Minutes |
| Transcript Segments Generated | 923 |
| Semantic Chunks Created | 90 |
| End-to-End Processing Time | ~83 Seconds |
| Embedding Throughput | 48.7 Chunks/sec |
| Embedding Model | BAAI/bge-m3 |
| GPU | NVIDIA RTX 4050 |
| Vector Database | Qdrant Cloud |
| Transcription Model | Groq Whisper Large-v3 |
- Python
- Gemini 2.5 Flash
- Ollama
- Mistral-7B
- Retrieval-Augmented Generation (RAG)
- Semantic Search
- Qdrant Cloud
- BAAI/bge-m3 Embeddings
- Vector Similarity Search
- Groq Whisper Large-v3
- Speech-to-Text Processing
- Semantic Chunking
- Context Expansion
- Streamlit
- CUDA
- FFmpeg
- Git
project/
│
├── app.py
├── run.py
├── process_video.py
├── process_pdf.py
├── query_engine.py
├── transformation.py
├── read_chunks.py
├── requirements.txt
├── index_summary.json
│
├── images/
└── README.md
git clone https://github.com/wahibkhannn/AI-Multimodal-Knowledge-Retrieval-System.git
cd AI-Multimodal-Knowledge-Retrieval-Systempip install -r requirements.txtCreate a .env file:
GROQ_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
QDRANT_API_KEY=your_key_here
QDRANT_URL=your_qdrant_urlstreamlit run app.py- Hybrid Search (BM25 + Vector Search)
- Cross-Encoder Re-Ranking
- Multi-Document Collections
- Chat Memory
- Knowledge Graph Integration
- Agentic Retrieval Pipelines
- Research Paper Mode
- Multi-Language Support
Mohammad Wahib Ashraf Khan
B.Tech Computer Science & Engineering (Data Science)
- GitHub: https://github.com/wahibkhannn
- LinkedIn: https://linkedin.com/in/wahibkhannn
⭐ If you found this project interesting, consider starring the repository.
Copyright © 2026 Mohammad Wahib Ashraf Khan
This project is proprietary software. No permission is granted to copy, modify, distribute, or use this code without prior written consent.


