Skip to content

DavidBraun777/RAGeATM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAGeATM

RAGeATM stands for "Raging Against the Machine with Retrieval-Augmented Generation." It is a small SEIS 767 Conversational AI final-project MVP that demonstrates the mechanics of a retrieval-augmented question-answering pipeline.

The final project is intentionally scoped as an explainable prototype, not a production assistant.

What This Project Implements

  • Local corpus ingestion from data/raw/*.txt
  • Word-window chunking with overlap
  • Lightweight TF-IDF vectors with cosine-similarity retrieval
  • A saved local TF-IDF index under data/index/
  • Top-k retrieval with similarity scores and a minimum relevance threshold
  • Prompt construction using retrieved context
  • Offline retrieval-conditioned answer generation
  • Optional OpenAI generation when OPENAI_API_KEY is available
  • A small benchmark with in-domain, partially answerable, and out-of-domain questions
  • A CLI demo suitable for a 5-6 minute final presentation

Use this phrase when describing the technical scope:

a lightweight retrieval-based RAG prototype using TF-IDF vectors and cosine similarity

What This Project Does Not Implement

  • No Chroma or external vector database
  • No neural embedding model
  • No semantic embedding claims beyond lexical TF-IDF retrieval
  • No agent tools
  • No conversational memory
  • No voice interface
  • No Docker/evolution/self-improvement loop
  • No production deployment
  • No large benchmark or production corpus

If the demo runs without OPENAI_API_KEY, the answer generator is not a real LLM. It is an offline retrieval-conditioned generator used to demonstrate RAG mechanics without paid services.

Architecture

flowchart LR
    A["data/raw/*.txt"] --> B["src.ingest"]
    B --> C["data/processed/documents.json"]
    C --> D["src.chunk"]
    D --> E["data/processed/chunks.json"]
    E --> F["src.embed TF-IDF"]
    F --> G["data/index vectors + metadata + vectorizer"]
    H["User question"] --> I["src.retrieve cosine similarity"]
    G --> I
    I --> J["Top-k retrieved chunks + scores"]
    J --> K["src.generate prompt construction"]
    K --> L["Offline grounded answer or optional OpenAI answer"]
Loading

Repository Structure

data/raw/                  Small educational corpus used by the demo
data/processed/            Generated documents/chunks, ignored by git
data/index/                Generated TF-IDF index, ignored by git
docs/evaluation_results.md Generated benchmark report
docs/final_report_notes.md Final-report companion notes
docs/video_script_5_6_min.md Presentation script
scripts/build_index.py     Rebuilds processed data and TF-IDF index
scripts/run_demo.py        Runs one end-to-end demo question
scripts/run_evaluation.py  Runs benchmark and writes docs/evaluation_results.md
src/                       Pipeline implementation
tests/                     Focused sanity tests

Setup

From the project root:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Optional real-LLM mode:

export OPENAI_API_KEY="your_key_here"
export RAGEATM_OPENAI_MODEL="gpt-4o-mini"

Do not commit API keys. The default demo does not need an API key.

Run The Demo

Recommended command for the final video:

python scripts/run_demo.py --query "How does RAG reduce hallucinations?"

Out-of-domain refusal demo:

python scripts/run_demo.py --query "What is the capital of France?"

Show the constructed prompt if useful:

python scripts/run_demo.py --query "How does RAG reduce hallucinations?" --show-prompt

Force offline mode:

python scripts/run_demo.py --mode offline --query "What does chunking do in the RAG pipeline?"

Try OpenAI mode if OPENAI_API_KEY is set:

python scripts/run_demo.py --mode openai --query "How does RAG reduce hallucinations?"

If OpenAI mode fails, the project falls back to the offline retrieval-conditioned generator and prints the failure reason.

Run The Evaluation

python scripts/run_evaluation.py

This rebuilds the local index, runs the small benchmark, and writes:

docs/evaluation_results.md

Current benchmark summary:

  • 7 benchmark questions
  • 5 in-domain questions
  • 1 partially answerable limitation question
  • 1 out-of-domain question
  • Current useful retrieval decisions: 7/7 on the small educational benchmark

That 7/7 result should be described carefully. It means the small sanity-check benchmark works, not that the system is generally accurate.

Run Tests

python -m pytest

The tests cover chunking behavior and basic retrieval threshold behavior.

Corpus

The local corpus is intentionally small and educational. It lives in data/raw/ and includes notes on:

  • RAG concepts
  • RAGeATM project design
  • chunking and retrieval pipeline mechanics
  • evaluation methodology
  • data-engineering troubleshooting
  • conversational-AI limitations

This is not a production dataset or a large curated knowledge base.

Retrieval And Generation

Retrieval uses TfidfVectorizer from scikit-learn and cosine similarity. This is lexical retrieval. It is useful for an explainable MVP because the vectorization and scoring are easy to defend.

Generation has two modes:

  • Default: offline retrieval-conditioned answer generator. It selects supporting sentences from relevant retrieved chunks and cites local sources.
  • Optional: OpenAI generation through environment variables. This is not required for the project to run.

The system refuses to answer when the best retrieved context is below the similarity threshold.

Limitations

  • Small corpus
  • Lexical retrieval only
  • No Chroma
  • No neural embeddings
  • No persistent chat memory
  • No agent tools
  • Offline generator is not a full LLM
  • Evaluation is a small demo benchmark, not rigorous large-scale measurement

Future Work

  • Replace TF-IDF with neural embeddings
  • Add Chroma or another vector store
  • Expand the corpus with real course/project documents
  • Add a small Gradio or Streamlit UI
  • Add conversation history and follow-up question handling
  • Add stronger evaluation with held-out queries and human ratings

References

Final Presentation Notes

For the video, be direct:

  • Say this is a rebuilt, honest MVP.
  • Say it demonstrates RAG mechanics with local TF-IDF retrieval.
  • Show one successful in-domain question.
  • Show one out-of-domain refusal.
  • Do not claim Chroma, neural embeddings, memory, tools, or production readiness.

Use docs/video_script_5_6_min.md for the recording plan.

About

RAGeATM: an educational local Retrieval-Augmented Generation MVP using TF-IDF, cosine similarity, and threshold-based refusal to demonstrate grounded answering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages