Skip to content

AMalfez/agentic_rag_n8n

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‚ Agentic RAG on Google Drive using n8n

image

An Agentic Retrieval-Augmented Generation (RAG) system built using n8n + Google Drive + Gemini + Postgres, which automatically ingests documents from Google Drive and enables intelligent querying across PDFs, CSVs, Excel files, and text documents.

This system continuously watches a Google Drive folder and keeps the knowledge base in sync whenever files are added or updated.


πŸš€ What This Project Does

  • πŸ” Watches a Google Drive folder for file creation & updates
  • πŸ“„ Supports multiple document types
    • PDFs
    • CSV files
    • Excel sheets
    • Plain text files
  • 🧠 Automatically:
    • Extracts content
    • Generates embeddings
    • Stores structured & unstructured data
  • πŸ€– Uses an AI Agent to intelligently decide:
    • When to use RAG
    • When to run SQL queries on tabular data
  • πŸ’¬ Allows querying documents via:
    • Webhook API
    • n8n Chat interface

🧠 Why Agentic RAG?

Unlike basic RAG pipelines, this project uses an AI Agent that:

  • Understands the type of question
  • Chooses between:
    • Semantic search (RAG)
    • SQL queries for numeric / analytical questions
  • Avoids hallucination by explicitly stating when data is unavailable

πŸ—οΈ Architecture Overview

πŸ“₯ Ingestion Pipeline

  1. Google Drive Trigger (File Created / Updated)
  2. File download
  3. File-type detection
  4. Content extraction:
    • PDF β†’ text extraction
    • CSV / Excel β†’ row-wise ingestion
    • Text β†’ direct parsing
  5. Metadata storage in Postgres
  6. Vector embeddings generation (Gemini)
  7. Storage in in-memory vector store

πŸ—„οΈ Storage

  • doc_metadata
    Stores file info, type, and schema (for tabular data)

  • doc_rows
    Stores structured rows from CSV / Excel as JSONB

  • Vector Store
    Used for semantic retrieval


πŸ’¬ Query Flow

  1. User sends a query (chat or webhook)
  2. AI Agent:
    • Fetches available documents
    • Decides RAG vs SQL
  3. Executes:
    • Semantic retrieval for text-based questions
    • SQL queries for aggregations (avg, sum, max, group by, etc.)
  4. Returns grounded, explainable answers

πŸ› οΈ Tech Stack

  • n8n – Workflow orchestration
  • Google Drive API – Document source
  • Google Gemini – LLM + Embeddings
  • PostgreSQL – Metadata & structured data storage
  • LangChain nodes (n8n) – Agent, Memory, Tools
  • Webhook API – External access

πŸ”₯ Key Highlights

  • Fully automated document sync
  • Multi-format document support
  • Hybrid RAG + SQL reasoning
  • Agent-based decision making
  • Production-style data modeling
  • Zero manual re-indexing

πŸ“Œ Use Cases

  • Internal knowledge assistants
  • Company document Q&A bots
  • Analytics over spreadsheets using natural language
  • AI copilots for operations & reports
  • No-code / low-code GenAI workflows

πŸ“Ž Future Improvements

  • Persistent vector database (Pinecone / PGVector)
  • Access control per document
  • Chunk-level citation references
  • UI frontend for chat

πŸ™Œ Credits

Built with ❀️ using n8n and Google Gemini.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors