🚀 AI-Powered Multi-Modal Knowledge Retrieval System

Transform Videos & PDFs into Searchable AI Knowledge Bases using Retrieval-Augmented Generation (RAG), Qdrant, Gemini, Ollama and Whisper.

📌 Overview

AI-Powered Multi-Modal Knowledge Retrieval System is an advanced Retrieval-Augmented Generation (RAG) platform that transforms long-form educational videos and PDF documents into searchable semantic knowledge repositories.

The system combines speech recognition, semantic chunking, vector embeddings, semantic retrieval, and Large Language Models to generate grounded, explainable responses linked directly to their original sources.

Unlike conventional chatbots, responses are generated from retrieved evidence and supported by timestamp-level or page-level citations, significantly reducing hallucinations and improving answer reliability.

🎯 Why This Project?

Large Language Models often struggle with long-form content and domain-specific knowledge, frequently generating answers without verifiable evidence.

This project addresses that limitation by integrating semantic retrieval, vector databases, and hybrid LLM inference to provide source-grounded responses with explainable citations.

The result is an AI assistant capable of understanding and querying hours of lecture content and large document collections while maintaining traceability and transparency.

🏗️ System Architecture

flowchart TD

A[Video Upload] --> B[Audio Extraction using FFmpeg]
B --> C[Groq Whisper Large-v3]

D[PDF Upload] --> E[PDF Parsing]

C --> F[Semantic Chunking]
E --> F

F --> G[BGE-M3 Embeddings]

G --> H[Qdrant Cloud Vector Database]

H --> I[Semantic Retrieval]
I --> J[Context Expansion]

J --> K[Gemini 2.5 Flash]
J --> L[Ollama Local LLM]

K --> M[Source-Grounded Response]
L --> M

M --> N[Timestamp / Page Citations]

✨ Key Features

Multi-Modal Knowledge Retrieval

Upload and query both video lectures and PDF documents through a unified interface.
Automatically transforms unstructured multimedia content into searchable knowledge.
Supports educational content, technical documentation, and research materials.

Intelligent Retrieval Pipeline

Semantic chunking with contextual overlap.
Dense vector embeddings using BAAI/bge-m3.
High-performance vector search powered by Qdrant Cloud.
Context expansion for improved retrieval quality.

Hybrid LLM Inference

Gemini 2.5 Flash integration.
Ollama-based local inference support.
Flexible cloud/local deployment options.

Explainable AI Responses

Timestamp-level citations for video content.
Page-level references for PDFs.
Confidence scoring based on retrieval quality.
Source-grounded responses designed to minimize hallucinations.

Performance Optimizations

GPU-accelerated embedding generation.
Fast speech-to-text transcription using Groq Whisper Large-v3.
Optimized indexing pipeline for large multimedia datasets.

📊 Benchmark Results

Metric	Result
Longest Video Processed	59 Minutes
Transcript Segments Generated	923
Semantic Chunks Created	90
End-to-End Processing Time	~83 Seconds
Embedding Throughput	48.7 Chunks/sec
Embedding Model	BAAI/bge-m3
GPU	NVIDIA RTX 4050
Vector Database	Qdrant Cloud
Transcription Model	Groq Whisper Large-v3

🖼️ Application Screenshots

Upload Interface

AI-Generated Responses

Source-Grounded Citations

⚙️ Technology Stack

Programming

Python

Generative AI & LLMs

Gemini 2.5 Flash
Ollama
Mistral-7B

Retrieval & Search

Retrieval-Augmented Generation (RAG)
Semantic Search
Qdrant Cloud
BAAI/bge-m3 Embeddings
Vector Similarity Search

NLP

Groq Whisper Large-v3
Speech-to-Text Processing
Semantic Chunking
Context Expansion

Infrastructure & Tools

Streamlit
CUDA
FFmpeg
Git

📂 Project Structure

project/
│
├── app.py
├── run.py
├── process_video.py
├── process_pdf.py
├── query_engine.py
├── transformation.py
├── read_chunks.py
├── requirements.txt
├── index_summary.json
│
├── images/
└── README.md

🚀 Getting Started

Clone Repository

git clone https://github.com/wahibkhannn/AI-Multimodal-Knowledge-Retrieval-System.git
cd AI-Multimodal-Knowledge-Retrieval-System

Install Dependencies

pip install -r requirements.txt

Configure Environment Variables

Create a .env file:

GROQ_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
QDRANT_API_KEY=your_key_here
QDRANT_URL=your_qdrant_url

Run Application

streamlit run app.py

🔮 Future Enhancements

👨‍💻 Author

Mohammad Wahib Ashraf Khan

B.Tech Computer Science & Engineering (Data Science)

GitHub: https://github.com/wahibkhannn
LinkedIn: https://linkedin.com/in/wahibkhannn

⭐ If you found this project interesting, consider starring the repository.

License

This project is proprietary software. No permission is granted to copy, modify, distribute, or use this code without prior written consent.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
index_summary.json		index_summary.json
process_pdf.py		process_pdf.py
process_video.py		process_video.py
query_engine.py		query_engine.py
read_chunks.py		read_chunks.py
requirements.txt		requirements.txt
run.py		run.py
transformation.py		transformation.py

Folders and files

Latest commit

History

Repository files navigation

🚀 AI-Powered Multi-Modal Knowledge Retrieval System

📌 Overview

🎯 Why This Project?

🏗️ System Architecture

✨ Key Features

Multi-Modal Knowledge Retrieval

Intelligent Retrieval Pipeline

Hybrid LLM Inference

Explainable AI Responses

Performance Optimizations

📊 Benchmark Results

🖼️ Application Screenshots

Upload Interface

AI-Generated Responses

Source-Grounded Citations

⚙️ Technology Stack

Programming

Generative AI & LLMs

Retrieval & Search

NLP

Infrastructure & Tools

📂 Project Structure

🚀 Getting Started

Clone Repository

Install Dependencies

Configure Environment Variables

Run Application

🔮 Future Enhancements

👨‍💻 Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages