RAGeATM

RAGeATM stands for "Raging Against the Machine with Retrieval-Augmented Generation." It is a small SEIS 767 Conversational AI final-project MVP that demonstrates the mechanics of a retrieval-augmented question-answering pipeline.

The final project is intentionally scoped as an explainable prototype, not a production assistant.

What This Project Implements

Local corpus ingestion from data/raw/*.txt
Word-window chunking with overlap
Lightweight TF-IDF vectors with cosine-similarity retrieval
A saved local TF-IDF index under data/index/
Top-k retrieval with similarity scores and a minimum relevance threshold
Prompt construction using retrieved context
Offline retrieval-conditioned answer generation
Optional OpenAI generation when OPENAI_API_KEY is available
A small benchmark with in-domain, partially answerable, and out-of-domain questions
A CLI demo suitable for a 5-6 minute final presentation

Use this phrase when describing the technical scope:

a lightweight retrieval-based RAG prototype using TF-IDF vectors and cosine similarity

What This Project Does Not Implement

No Chroma or external vector database
No neural embedding model
No semantic embedding claims beyond lexical TF-IDF retrieval
No agent tools
No conversational memory
No voice interface
No Docker/evolution/self-improvement loop
No production deployment
No large benchmark or production corpus

If the demo runs without OPENAI_API_KEY, the answer generator is not a real LLM. It is an offline retrieval-conditioned generator used to demonstrate RAG mechanics without paid services.

Architecture

flowchart LR
    A["data/raw/*.txt"] --> B["src.ingest"]
    B --> C["data/processed/documents.json"]
    C --> D["src.chunk"]
    D --> E["data/processed/chunks.json"]
    E --> F["src.embed TF-IDF"]
    F --> G["data/index vectors + metadata + vectorizer"]
    H["User question"] --> I["src.retrieve cosine similarity"]
    G --> I
    I --> J["Top-k retrieved chunks + scores"]
    J --> K["src.generate prompt construction"]
    K --> L["Offline grounded answer or optional OpenAI answer"]

Repository Structure

data/raw/                  Small educational corpus used by the demo
data/processed/            Generated documents/chunks, ignored by git
data/index/                Generated TF-IDF index, ignored by git
docs/evaluation_results.md Generated benchmark report
docs/final_report_notes.md Final-report companion notes
docs/video_script_5_6_min.md Presentation script
scripts/build_index.py     Rebuilds processed data and TF-IDF index
scripts/run_demo.py        Runs one end-to-end demo question
scripts/run_evaluation.py  Runs benchmark and writes docs/evaluation_results.md
src/                       Pipeline implementation
tests/                     Focused sanity tests

Setup

From the project root:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Optional real-LLM mode:

export OPENAI_API_KEY="your_key_here"
export RAGEATM_OPENAI_MODEL="gpt-4o-mini"

Do not commit API keys. The default demo does not need an API key.

Run The Demo

Recommended command for the final video:

python scripts/run_demo.py --query "How does RAG reduce hallucinations?"

Out-of-domain refusal demo:

python scripts/run_demo.py --query "What is the capital of France?"

Show the constructed prompt if useful:

python scripts/run_demo.py --query "How does RAG reduce hallucinations?" --show-prompt

Force offline mode:

python scripts/run_demo.py --mode offline --query "What does chunking do in the RAG pipeline?"

Try OpenAI mode if OPENAI_API_KEY is set:

python scripts/run_demo.py --mode openai --query "How does RAG reduce hallucinations?"

If OpenAI mode fails, the project falls back to the offline retrieval-conditioned generator and prints the failure reason.

Run The Evaluation

python scripts/run_evaluation.py

This rebuilds the local index, runs the small benchmark, and writes:

docs/evaluation_results.md

Current benchmark summary:

7 benchmark questions
5 in-domain questions
1 partially answerable limitation question
1 out-of-domain question
Current useful retrieval decisions: 7/7 on the small educational benchmark

That 7/7 result should be described carefully. It means the small sanity-check benchmark works, not that the system is generally accurate.

Run Tests

python -m pytest

The tests cover chunking behavior and basic retrieval threshold behavior.

Corpus

The local corpus is intentionally small and educational. It lives in data/raw/ and includes notes on:

RAG concepts
RAGeATM project design
chunking and retrieval pipeline mechanics
evaluation methodology
data-engineering troubleshooting
conversational-AI limitations

This is not a production dataset or a large curated knowledge base.

Retrieval And Generation

Retrieval uses TfidfVectorizer from scikit-learn and cosine similarity. This is lexical retrieval. It is useful for an explainable MVP because the vectorization and scoring are easy to defend.

Generation has two modes:

Default: offline retrieval-conditioned answer generator. It selects supporting sentences from relevant retrieved chunks and cites local sources.
Optional: OpenAI generation through environment variables. This is not required for the project to run.

The system refuses to answer when the best retrieved context is below the similarity threshold.

Limitations

Small corpus
Lexical retrieval only
No Chroma
No neural embeddings
No persistent chat memory
No agent tools
Offline generator is not a full LLM
Evaluation is a small demo benchmark, not rigorous large-scale measurement

Future Work

Replace TF-IDF with neural embeddings
Add Chroma or another vector store
Expand the corpus with real course/project documents
Add a small Gradio or Streamlit UI
Add conversation history and follow-up question handling
Add stronger evaluation with held-out queries and human ratings

References

Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," 2020, https://arxiv.org/abs/2005.11401
scikit-learn TfidfVectorizer, https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
scikit-learn cosine_similarity, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html

Final Presentation Notes

For the video, be direct:

Say this is a rebuilt, honest MVP.
Say it demonstrates RAG mechanics with local TF-IDF retrieval.
Show one successful in-domain question.
Show one out-of-domain refusal.
Do not claim Chroma, neural embeddings, memory, tools, or production readiness.

Use docs/video_script_5_6_min.md for the recording plan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGeATM

What This Project Implements

What This Project Does Not Implement

Architecture

Repository Structure

Setup

Run The Demo

Run The Evaluation

Run Tests

Corpus

Retrieval And Generation

Limitations

Future Work

References

Final Presentation Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/raw		data/raw
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAGeATM

What This Project Implements

What This Project Does Not Implement

Architecture

Repository Structure

Setup

Run The Demo

Run The Evaluation

Run Tests

Corpus

Retrieval And Generation

Limitations

Future Work

References

Final Presentation Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages