End-to-end exploration of Retrieval-Augmented Generation pipelines across two real-world domains, comparing frameworks, patterns, and evaluation strategies.
- Mechanical Parts Catalogs — structured, technical, terminology-heavy documents
-
An annotated Jupyter notebook for exploring and testing the full pipeline step by step:
-
Learn more about the different steps of the pipeline here
-
A production-ready Evaluation Roadmap for your RAG-Pipeline
-
RAG-Chatbot with UI (coming soon)
-
WIKI page hosting the entire project documentation (coming soon)
Extras:
- Learn how to turn any Document into a RAG-Pipeline right here
| Pattern | Description |
|---|---|
| Naive RAG | Baseline — embed, index, retrieve, generate |
| Hybrid RAG | Dense + sparse (BM25) retrieval with re-ranking |
| KG-RAG | Knowledge Graph-augmented retrieval for relational reasoning |
| Agentic RAG | Tool-calling agents with multi-step reasoning and answer validation |
- PydanticAI — typed tool-calling agents with structured outputs
- ReAct — Reasoning + Acting loop for multi-hop queries
- smolagents — lightweight agents with minimal boilerplate
Key agentic behaviors implemented:
- Dynamic tool selection (search, filter, compute)
- Multi-step reasoning before answer generation
- Self-verification — agent checks answer against retrieved context before responding
rag-factory/
├── mechanical_parts_catalogs
│ ├── config/
│ └── settings.py # all env vars, constants, model settings
│ ├── pipeline/
│ │ ├── extraction.py # extract_layer1_fields, extract_layer2_fields
│ │ ├── mapping.py # normalize_*, build_part_id, map_parts_and_rows
│ │ ├── nodes.py # build_retrieval_nodes, text/metadata builders
│ │ ├── indexing.py # build_pgvector_store, index_nodes_with_store
│ │ ├── graph.py # Neo4j KG build/load
│ │ └── run_pipeline.py # orchestrates all steps end-to-end
│ ├── retrieval/
│ │ ├── intent.py # QueryIntent, extract_query_intent
│ │ ├── filters.py # RANGE_FIELD_MAP, build_filtered_retriever
│ │ └── retriever.py # build_custom_retriever, reranker toggle
│ │ └── agent.py
│ ├── app/
│ ├── chatbot.py # Chatbot UI
│ ├── data/ # Raw documents
│ │ └── gears.pdf
│ └── evaluation/ # RAGAS scoring + comparison notebooks
├── cache/ # persisted extraction results
├── .env # API keys, passwords — never committed
├── .gitignore # includes .env, cache/, *.ipynb outputs
├── requirements.txt
└── README.md
