Skip to content

Raouf71/rag-factory

Repository files navigation

RAG-factory

End-to-end exploration of Retrieval-Augmented Generation pipelines across two real-world domains, comparing frameworks, patterns, and evaluation strategies.

Domains:

  • Mechanical Parts Catalogs — structured, technical, terminology-heavy documents

⚙️ Current features:

  • An annotated Jupyter notebook for exploring and testing the full pipeline step by step:  Open in Colab

  • Learn more about the different steps of the pipeline here

  • A production-ready Evaluation Roadmap for your RAG-Pipeline

  • RAG-Chatbot with UI (coming soon)

  • WIKI page hosting the entire project documentation (coming soon)

Extras:

  • Learn how to turn any Document into a RAG-Pipeline right here

Pipeline


RAG Patterns

Pattern Description
Naive RAG Baseline — embed, index, retrieve, generate
Hybrid RAG Dense + sparse (BM25) retrieval with re-ranking
KG-RAG Knowledge Graph-augmented retrieval for relational reasoning
Agentic RAG Tool-calling agents with multi-step reasoning and answer validation

Agentic RAG Stack

  • PydanticAI — typed tool-calling agents with structured outputs
  • ReAct — Reasoning + Acting loop for multi-hop queries
  • smolagents — lightweight agents with minimal boilerplate

Key agentic behaviors implemented:

  • Dynamic tool selection (search, filter, compute)
  • Multi-step reasoning before answer generation
  • Self-verification — agent checks answer against retrieved context before responding

Project Structure

rag-factory/
├── mechanical_parts_catalogs  
│   ├── config/              
│       └── settings.py           # all env vars, constants, model settings
│   ├── pipeline/ 
│   │   ├── extraction.py         # extract_layer1_fields, extract_layer2_fields
│   │   ├── mapping.py            # normalize_*, build_part_id, map_parts_and_rows
│   │   ├── nodes.py              # build_retrieval_nodes, text/metadata builders
│   │   ├── indexing.py           # build_pgvector_store, index_nodes_with_store
│   │   ├── graph.py              # Neo4j KG build/load
│   │   └── run_pipeline.py       # orchestrates all steps end-to-end
│   ├── retrieval/
│   │   ├── intent.py             # QueryIntent, extract_query_intent
│   │   ├── filters.py            # RANGE_FIELD_MAP, build_filtered_retriever
│   │   └── retriever.py          # build_custom_retriever, reranker toggle
│   │   └── agent.py
│   ├── app/
│       ├── chatbot.py           # Chatbot UI 
│   ├── data/                    # Raw documents
│   │   └── gears.pdf  
│   └── evaluation/           # RAGAS scoring + comparison notebooks
├── cache/                    # persisted extraction results
├── .env                      # API keys, passwords — never committed
├── .gitignore                # includes .env, cache/, *.ipynb outputs
├── requirements.txt
└── README.md

About

Collection of real-world RAG pipelines built with different frameworks and retrieval strategies, evolving from advanced hybrid RAG to agentic RAG

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors