Skip to content

Latest commit

 

History

History
168 lines (118 loc) · 6.16 KB

File metadata and controls

168 lines (118 loc) · 6.16 KB

🧠 Cortex

An agentic assistant that thinks over your own documents — retrieve, reason, and calculate.

Cortex is a small but complete agentic RAG application. An LLM sits inside a LangGraph loop and decides, turn by turn, whether to answer directly, search a local knowledge base, or run a calculation — then keeps looping until it has a final answer. A Streamlit front-end shows every tool call live, so you can actually see the agent reason.

Built as a hands-on study of how tool-calling agents and retrieval fit together in one graph.


What it does

  • 📚 Retrieval over your own docs — drop .txt, .pdf, or .docx files into a folder, embed them once, and the agent can semantically search them.
  • 🧮 Calculator toolsadd, multiply, divide (with a divide-by-zero guard).
  • 🔁 Self-directed agent loop — the model picks tools on its own and stops when it's done; a hard cap prevents runaway loops.
  • 👀 Glass-box UI — a sidebar trace lists each tool the agent called, the arguments it passed, and what came back.
  • ⌨️ Headless CLI — prefer the terminal? Run it without the UI, interactively or one-shot.

How it works

        ┌──────────────┐
        │  user input  │
        └──────┬───────┘
               ▼
        ┌──────────────┐  wants a tool?  ┌──────────────┐
   ┌───▶│   llm_call   ├────────────────▶│   tool_node  │
   │    │   (ChatGroq) │                 │ search/math  │
   │    └──────┬───────┘◀────────────────┴──────────────┘
   │           │            tool result
   │  no tool  │
   └───────────┘
               ▼
        ┌──────────────┐
        │ final answer │
        └──────────────┘
  • Reasoning / tool choiceGroq (openai/gpt-oss-20b) for fast inference.
  • Embeddings → Google Gemini (gemini-embedding-001).
  • Vector store → FAISS, persisted to disk so you embed once and query forever.

Tools the agent can reach

Tool What it does
search_docs Semantic search over the FAISS knowledge base (top-3)
add Adds two integers
multiply Multiplies two integers
divide Divides two integers (safe on ÷ 0)

Project layout

.
├── agent.py        # The LangGraph agent: state, tools, and the graph wiring
├── app.py          # Streamlit chat UI with a live tool-trace sidebar
├── ingest.py       # Build the FAISS index from your documents (run once)
├── main.py         # Terminal CLI — interactive chat or single-shot question
├── sample_docs/    # Source documents (.txt / .pdf / .docx)
├── faiss_index/    # Generated vector index (created by ingest.py)
├── pyproject.toml  # Dependencies (managed with uv)
└── .env.example    # Template for your API keys

Getting started

Requirements

  • Python 3.12+
  • uv for dependency management (or plain pip)

1 · Install

git clone https://github.com/Ranu92/Cortex.git
cd Cortex
uv sync

2 · Add your API keys

Copy the template and fill in your own keys:

cp .env.example .env
GOOGLE_API_KEY=...   # from https://aistudio.google.com/apikey
GROQ_API_KEY=...     # from https://console.groq.com/keys

.env is git-ignored — your keys stay local and never get committed.

3 · Build the knowledge base

Put your documents in sample_docs/, then embed them:

uv run python ingest.py

This reads every .txt, .pdf, and .docx in sample_docs/, splits them into ~500-character chunks on paragraph boundaries, embeds them with Gemini, and writes the FAISS index to faiss_index/.

4 · Run it

Web UI:

uv run streamlit run app.py

Then open http://localhost:8501.

Or the terminal:

uv run python main.py                  # interactive chat
uv run python main.py "What is RAG?"   # one-shot question

Try asking

  • "What is RAG?" or "Explain how transformers work" → triggers search_docs
  • "What's 42 multiplied by 7?" → triggers multiply
  • "If an embedding has 768 dimensions, how many numbers is that across 5 documents?" → search and math in one turn

In the web UI, watch the sidebar fill in with each tool call as the agent works.


Managing documents

Supported formats: .txt, .pdf, .docx.

To add, remove, or update knowledge:

  1. Change the files in sample_docs/.
  2. Re-run uv run python ingest.py.

The index is rebuilt from scratch every time, so deleting a file only takes effect after you re-run ingest.py — the old vectors live in faiss_index/ until then.

ℹ️ Scanned/image-only PDFs won't yield text (they'd need OCR). Files with no extractable text are skipped and reported during ingest.


Tech stack

Package Role
langgraph Agent graph & control flow
langchain core LLM and tool abstractions
langchain-groq Groq chat model
langchain-google-genai Gemini embeddings
langchain-community FAISS integration
faiss-cpu Vector similarity search
pypdf / docx2txt PDF and Word text extraction
streamlit Web interface
python-dotenv Loads keys from .env