An Agentic Retrieval-Augmented Generation (RAG) system built using n8n + Google Drive + Gemini + Postgres, which automatically ingests documents from Google Drive and enables intelligent querying across PDFs, CSVs, Excel files, and text documents.
This system continuously watches a Google Drive folder and keeps the knowledge base in sync whenever files are added or updated.
- π Watches a Google Drive folder for file creation & updates
- π Supports multiple document types
- PDFs
- CSV files
- Excel sheets
- Plain text files
- π§ Automatically:
- Extracts content
- Generates embeddings
- Stores structured & unstructured data
- π€ Uses an AI Agent to intelligently decide:
- When to use RAG
- When to run SQL queries on tabular data
- π¬ Allows querying documents via:
- Webhook API
- n8n Chat interface
Unlike basic RAG pipelines, this project uses an AI Agent that:
- Understands the type of question
- Chooses between:
- Semantic search (RAG)
- SQL queries for numeric / analytical questions
- Avoids hallucination by explicitly stating when data is unavailable
- Google Drive Trigger (File Created / Updated)
- File download
- File-type detection
- Content extraction:
- PDF β text extraction
- CSV / Excel β row-wise ingestion
- Text β direct parsing
- Metadata storage in Postgres
- Vector embeddings generation (Gemini)
- Storage in in-memory vector store
-
doc_metadata
Stores file info, type, and schema (for tabular data) -
doc_rows
Stores structured rows from CSV / Excel as JSONB -
Vector Store
Used for semantic retrieval
- User sends a query (chat or webhook)
- AI Agent:
- Fetches available documents
- Decides RAG vs SQL
- Executes:
- Semantic retrieval for text-based questions
- SQL queries for aggregations (avg, sum, max, group by, etc.)
- Returns grounded, explainable answers
- n8n β Workflow orchestration
- Google Drive API β Document source
- Google Gemini β LLM + Embeddings
- PostgreSQL β Metadata & structured data storage
- LangChain nodes (n8n) β Agent, Memory, Tools
- Webhook API β External access
- Fully automated document sync
- Multi-format document support
- Hybrid RAG + SQL reasoning
- Agent-based decision making
- Production-style data modeling
- Zero manual re-indexing
- Internal knowledge assistants
- Company document Q&A bots
- Analytics over spreadsheets using natural language
- AI copilots for operations & reports
- No-code / low-code GenAI workflows
- Persistent vector database (Pinecone / PGVector)
- Access control per document
- Chunk-level citation references
- UI frontend for chat
Built with β€οΈ using n8n and Google Gemini.