RAG-factory

End-to-end exploration of Retrieval-Augmented Generation pipelines across two real-world domains, comparing frameworks, patterns, and evaluation strategies.

Domains:

Mechanical Parts Catalogs — structured, technical, terminology-heavy documents

⚙️ Current features:

An annotated Jupyter notebook for exploring and testing the full pipeline step by step:
Learn more about the different steps of the pipeline here
A production-ready Evaluation Roadmap for your RAG-Pipeline
RAG-Chatbot with UI (coming soon)
WIKI page hosting the entire project documentation (coming soon)

Extras:

Learn how to turn any Document into a RAG-Pipeline right here

Pipeline

RAG Patterns

Pattern	Description
Naive RAG	Baseline — embed, index, retrieve, generate
Hybrid RAG	Dense + sparse (BM25) retrieval with re-ranking
KG-RAG	Knowledge Graph-augmented retrieval for relational reasoning
Agentic RAG	Tool-calling agents with multi-step reasoning and answer validation

Agentic RAG Stack

PydanticAI — typed tool-calling agents with structured outputs
ReAct — Reasoning + Acting loop for multi-hop queries
smolagents — lightweight agents with minimal boilerplate

Key agentic behaviors implemented:

Dynamic tool selection (search, filter, compute)
Multi-step reasoning before answer generation
Self-verification — agent checks answer against retrieved context before responding

Project Structure

rag-factory/
├── mechanical_parts_catalogs  
│   ├── config/              
│       └── settings.py           # all env vars, constants, model settings
│   ├── pipeline/ 
│   │   ├── extraction.py         # extract_layer1_fields, extract_layer2_fields
│   │   ├── mapping.py            # normalize_*, build_part_id, map_parts_and_rows
│   │   ├── nodes.py              # build_retrieval_nodes, text/metadata builders
│   │   ├── indexing.py           # build_pgvector_store, index_nodes_with_store
│   │   ├── graph.py              # Neo4j KG build/load
│   │   └── run_pipeline.py       # orchestrates all steps end-to-end
│   ├── retrieval/
│   │   ├── intent.py             # QueryIntent, extract_query_intent
│   │   ├── filters.py            # RANGE_FIELD_MAP, build_filtered_retriever
│   │   └── retriever.py          # build_custom_retriever, reranker toggle
│   │   └── agent.py
│   ├── app/
│       ├── chatbot.py           # Chatbot UI 
│   ├── data/                    # Raw documents
│   │   └── gears.pdf  
│   └── evaluation/           # RAGAS scoring + comparison notebooks
├── cache/                    # persisted extraction results
├── .env                      # API keys, passwords — never committed
├── .gitignore                # includes .env, cache/, *.ipynb outputs
├── requirements.txt
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
mechanical_parts_catalogs		mechanical_parts_catalogs
sandbox		sandbox
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rag-pipeline.png		rag-pipeline.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-factory

Domains:

⚙️ Current features:

Pipeline

RAG Patterns

Agentic RAG Stack

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG-factory

Domains:

⚙️ Current features:

Pipeline

RAG Patterns

Agentic RAG Stack

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages