📂 Agentic RAG on Google Drive using n8n

An Agentic Retrieval-Augmented Generation (RAG) system built using n8n + Google Drive + Gemini + Postgres, which automatically ingests documents from Google Drive and enables intelligent querying across PDFs, CSVs, Excel files, and text documents.

This system continuously watches a Google Drive folder and keeps the knowledge base in sync whenever files are added or updated.

🚀 What This Project Does

🔍 Watches a Google Drive folder for file creation & updates
📄 Supports multiple document types
- PDFs
- CSV files
- Excel sheets
- Plain text files
🧠 Automatically:
- Extracts content
- Generates embeddings
- Stores structured & unstructured data
🤖 Uses an AI Agent to intelligently decide:
- When to use RAG
- When to run SQL queries on tabular data
💬 Allows querying documents via:
- Webhook API
- n8n Chat interface

🧠 Why Agentic RAG?

Unlike basic RAG pipelines, this project uses an AI Agent that:

Understands the type of question
Chooses between:
- Semantic search (RAG)
- SQL queries for numeric / analytical questions
Avoids hallucination by explicitly stating when data is unavailable

🏗️ Architecture Overview

📥 Ingestion Pipeline

Google Drive Trigger (File Created / Updated)
File download
File-type detection
Content extraction:
- PDF → text extraction
- CSV / Excel → row-wise ingestion
- Text → direct parsing
Metadata storage in Postgres
Vector embeddings generation (Gemini)
Storage in in-memory vector store

🗄️ Storage

doc_metadata
Stores file info, type, and schema (for tabular data)
doc_rows
Stores structured rows from CSV / Excel as JSONB
Vector Store
Used for semantic retrieval

💬 Query Flow

User sends a query (chat or webhook)
AI Agent:
- Fetches available documents
- Decides RAG vs SQL
Executes:
- Semantic retrieval for text-based questions
- SQL queries for aggregations (avg, sum, max, group by, etc.)
Returns grounded, explainable answers

🛠️ Tech Stack

n8n – Workflow orchestration
Google Drive API – Document source
Google Gemini – LLM + Embeddings
PostgreSQL – Metadata & structured data storage
LangChain nodes (n8n) – Agent, Memory, Tools
Webhook API – External access

🔥 Key Highlights

Fully automated document sync
Multi-format document support
Hybrid RAG + SQL reasoning
Agent-based decision making
Production-style data modeling
Zero manual re-indexing

📌 Use Cases

Internal knowledge assistants
Company document Q&A bots
Analytics over spreadsheets using natural language
AI copilots for operations & reports
No-code / low-code GenAI workflows

📎 Future Improvements

Persistent vector database (Pinecone / PGVector)
Access control per document
Chunk-level citation references
UI frontend for chat

🙌 Credits

Built with ❤️ using n8n and Google Gemini.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
agentic_rag.json		agentic_rag.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📂 Agentic RAG on Google Drive using n8n

🚀 What This Project Does

🧠 Why Agentic RAG?

🏗️ Architecture Overview

📥 Ingestion Pipeline

🗄️ Storage

💬 Query Flow

🛠️ Tech Stack

🔥 Key Highlights

📌 Use Cases

📎 Future Improvements

🙌 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

📂 Agentic RAG on Google Drive using n8n

🚀 What This Project Does

🧠 Why Agentic RAG?

🏗️ Architecture Overview

📥 Ingestion Pipeline

🗄️ Storage

💬 Query Flow

🛠️ Tech Stack

🔥 Key Highlights

📌 Use Cases

📎 Future Improvements

🙌 Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages