🧠 Cortex

An agentic assistant that thinks over your own documents — retrieve, reason, and calculate.

Cortex is a small but complete agentic RAG application. An LLM sits inside a LangGraph loop and decides, turn by turn, whether to answer directly, search a local knowledge base, or run a calculation — then keeps looping until it has a final answer. A Streamlit front-end shows every tool call live, so you can actually see the agent reason.

Built as a hands-on study of how tool-calling agents and retrieval fit together in one graph.

What it does

📚 Retrieval over your own docs — drop .txt, .pdf, or .docx files into a folder, embed them once, and the agent can semantically search them.
🧮 Calculator tools — add, multiply, divide (with a divide-by-zero guard).
🔁 Self-directed agent loop — the model picks tools on its own and stops when it's done; a hard cap prevents runaway loops.
👀 Glass-box UI — a sidebar trace lists each tool the agent called, the arguments it passed, and what came back.
⌨️ Headless CLI — prefer the terminal? Run it without the UI, interactively or one-shot.

How it works

        ┌──────────────┐
        │  user input  │
        └──────┬───────┘
               ▼
        ┌──────────────┐  wants a tool?  ┌──────────────┐
   ┌───▶│   llm_call   ├────────────────▶│   tool_node  │
   │    │   (ChatGroq) │                 │ search/math  │
   │    └──────┬───────┘◀────────────────┴──────────────┘
   │           │            tool result
   │  no tool  │
   └───────────┘
               ▼
        ┌──────────────┐
        │ final answer │
        └──────────────┘

Reasoning / tool choice → Groq (openai/gpt-oss-20b) for fast inference.
Embeddings → Google Gemini (gemini-embedding-001).
Vector store → FAISS, persisted to disk so you embed once and query forever.

Tools the agent can reach

Tool	What it does
`search_docs`	Semantic search over the FAISS knowledge base (top-3)
`add`	Adds two integers
`multiply`	Multiplies two integers
`divide`	Divides two integers (safe on `÷ 0`)

Project layout

.
├── agent.py        # The LangGraph agent: state, tools, and the graph wiring
├── app.py          # Streamlit chat UI with a live tool-trace sidebar
├── ingest.py       # Build the FAISS index from your documents (run once)
├── main.py         # Terminal CLI — interactive chat or single-shot question
├── sample_docs/    # Source documents (.txt / .pdf / .docx)
├── faiss_index/    # Generated vector index (created by ingest.py)
├── pyproject.toml  # Dependencies (managed with uv)
└── .env.example    # Template for your API keys

Getting started

Requirements

Python 3.12+
uv for dependency management (or plain pip)

1 · Install

git clone https://github.com/Ranu92/Cortex.git
cd Cortex
uv sync

2 · Add your API keys

Copy the template and fill in your own keys:

cp .env.example .env

GOOGLE_API_KEY=...   # from https://aistudio.google.com/apikey
GROQ_API_KEY=...     # from https://console.groq.com/keys

.env is git-ignored — your keys stay local and never get committed.

3 · Build the knowledge base

Put your documents in sample_docs/, then embed them:

uv run python ingest.py

This reads every .txt, .pdf, and .docx in sample_docs/, splits them into ~500-character chunks on paragraph boundaries, embeds them with Gemini, and writes the FAISS index to faiss_index/.

4 · Run it

Web UI:

uv run streamlit run app.py

Then open http://localhost:8501.

Or the terminal:

uv run python main.py                  # interactive chat
uv run python main.py "What is RAG?"   # one-shot question

Try asking

"What is RAG?" or "Explain how transformers work" → triggers search_docs
"What's 42 multiplied by 7?" → triggers multiply
"If an embedding has 768 dimensions, how many numbers is that across 5 documents?" → search and math in one turn

In the web UI, watch the sidebar fill in with each tool call as the agent works.

Managing documents

Supported formats: .txt, .pdf, .docx.

To add, remove, or update knowledge:

Change the files in sample_docs/.
Re-run uv run python ingest.py.

The index is rebuilt from scratch every time, so deleting a file only takes effect after you re-run ingest.py — the old vectors live in faiss_index/ until then.

ℹ️ Scanned/image-only PDFs won't yield text (they'd need OCR). Files with no extractable text are skipped and reported during ingest.

Tech stack

Package	Role
`langgraph`	Agent graph & control flow
`langchain` core	LLM and tool abstractions
`langchain-groq`	Groq chat model
`langchain-google-genai`	Gemini embeddings
`langchain-community`	FAISS integration
`faiss-cpu`	Vector similarity search
`pypdf` / `docx2txt`	PDF and Word text extraction
`streamlit`	Web interface
`python-dotenv`	Loads keys from `.env`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧠 Cortex

What it does

How it works

Tools the agent can reach

Project layout

Getting started

Requirements

1 · Install

2 · Add your API keys

3 · Build the knowledge base

4 · Run it

Try asking

Managing documents

Tech stack

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🧠 Cortex

What it does

How it works

Tools the agent can reach

Project layout

Getting started

Requirements

1 · Install

2 · Add your API keys

3 · Build the knowledge base

4 · Run it

Try asking

Managing documents

Tech stack