Hadith - حديث

An AI-powered semantic and keyword search engine for Islamic Hadiths, built with FastAPI, Qdrant, and hybrid retrieval techniques.

📖 Description

Hadith is an end-to-end pipeline that scrapes, processes, indexes, and searches Islamic Hadith texts across multiple major books (Sahih al-Bukhari, Sahih Muslim, and more). It combines semantic vector search (via multilingual embeddings stored in Qdrant) with keyword-based BM25 search (via Whoosh) to deliver highly relevant results in both Arabic and English.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Data Pipeline                        │
│                                                             │
│  sunnah.com ──► Scraper ──► Repository (DB / Excel)         │
│                                   │                         │
│                                   ▼                         │
│                    Chunker ──► Embedder ──► Qdrant          │
│                                   │                         │
│                                   ▼                         │
│                              Whoosh Index                   │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                       Search Pipeline                       │
│                                                             │
│  User Query                                                 │
│      │                                                      │
│      ├──► Semantic Search (Qdrant)  ──► [(id, score)]       │
│      │                                        │             │
│      └──► Keyword Search  (Whoosh)  ──► [(id, score)]       │
│                                               │             │
│                                    Normalize + Fuse         │
│                               (0.7 semantic + 0.3 BM25)     │
│                                               │             │
│                                    Fetch from PostgreSQL    │
│                                               │             │
│                                        Ranked Results       │
└─────────────────────────────────────────────────────────────┘

📁 Project Structure

src/
├── api/
│   ├── routes/          # FastAPI route definitions
│   ├── schemas/         # Request / response Pydantic models
│   ├── dependencies.py  # Dependency injection
│   └── mapper.py        # Domain → response DTO mapping
├── application/
│   ├── interfaces/      # Abstract base classes
│   ├── services/        # SemanticSearch, KeywordSearch, HybridSearch
│   ├── embedder.py      # Embedding model wrapper
│   └── factories/       # VectorDB and DataLoader factories
├── domain/
│   ├── models/          # ScrapedHadith domain model
│   └── enums/           # EmbeddingType, HadithRepositoryType
├── infrastructure/
│   ├── scrapper/        # HTTP client and sunnah.com scraper
│   ├── repositories/    # DbHadithRepository, ExcelHadithRepository
│   ├── vectoreDb/       # QdrantDB wrapper
│   └── KeywordDataStore/ # WhooshIndex wrapper
│   └── db/              # db client and models (SQLAlchemy)
├── scripts/
│   ├── scrape.py        # Scraping pipeline
│   └── vectorize.py     # Chunking + embedding + indexing pipeline
│   └── dbToWhoosh.py    # pulicating keyWord store (whoosh)
├── config.py            # App configuration via pydantic-settings
└── main.py              # FastAPI app entrypoint

⚙️ Installation

1. Clone the repository

git clone https://github.com/alikashlan10/Hadith.git
cd hadith

2. Create and activate a virtual environment

conda create -n hadith python=3.11
conda activate hadith

3. Install dependencies

pip install -r requirements.txt

4. Set up environment variables

Copy the provided example file and fill in your own values:

cp .env.example .env

Then edit .env with your configuration. See .env.example for all required variables and their descriptions.

🚀 Running the Pipeline

Step 1 — Scrape Hadiths

Scrapes hadith texts from sunnah.com and saves them to your configured repository (db or Excel).

python -m src.scripts.scrape

Step 2 — Vectorize

Loads hadiths from the repository, chunks them, generates embeddings, and indexes them into Qdrant (vector search).

python -m src.scripts.vectorize

Step 3 — Vectorize

Loads hadiths from the repository, insert them into (whoosh).

python -m src.scripts.dbToWhoosh

Step 4 — Start the API Server

uvicorn src.main:app --reload

The API will be available at http://127.0.0.1:8000

🔍 API

`POST /search`

Search for hadiths using hybrid semantic + keyword retrieval.

Request:

{
    "query" : "fasting while travelling",
    "top_k" : 5
}

Response:

{
  "query": "fasting while travelling",
  "top_k": 5,
  "returned": 5,
  "results": [
    {
      "db_id": 9878,
      "text_ar": "...",
      "text_en": "...",
      "book_name_ar": "صحيح مسلم",,
      "book_name_en": "Sahih Muslim",
      "chapter_name_ar": "كتاب الصيام",
      "chapter_name_en":  "The Book of Fasting",
      "reference": "Sahih Muslim 1121b",
      "score": 0.7
    }
  ]
}

🛠️ Tech Stack

Layer	Technology
API	FastAPI
Vector Search	Qdrant
Keyword Search	Whoosh (BM25)
Embeddings	multilingual-e5-base
Relational DB	SQLAlchemy
Scraping	requests + BeautifulSoup
Config	pydantic-settings

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadith - حديث

📖 Description

🏗️ Architecture

📁 Project Structure

⚙️ Installation

🚀 Running the Pipeline

Step 1 — Scrape Hadiths

Step 2 — Vectorize

Step 3 — Vectorize

Step 4 — Start the API Server

🔍 API

`POST /search`

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hadith - حديث

📖 Description

🏗️ Architecture

📁 Project Structure

⚙️ Installation

🚀 Running the Pipeline

Step 1 — Scrape Hadiths

Step 2 — Vectorize

Step 3 — Vectorize

Step 4 — Start the API Server

🔍 API

POST /search

🛠️ Tech Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /search`

Packages