Notice: This application is designed with a microservice architecture consisting of three distinct services: Ingestion, Retrieval, and Email. Due to the high cost of cloud-hosting a multi-service stack, a live web demo is not currently available. However, you can easily self-host the application by following the commands below. If you encounter any issues during the setup, please feel free to open an issue.
A high-performance RAG (Retrieval-Augmented Generation) system designed for deep document analysis and persistent context awareness. This project implements a sophisticated architecture that synchronizes unstructured text and image data, utilizing a dual-memory layer for a truly personalized chat experience.
Note: This demo was recorded on a low-specification system. While the ingestion and retrieval layers are highly optimized for multi-threading and asynchronous processing, the hardware limitations may impact perceived speed. On high-performance systems, these operations are significantly faster.
how-quark.mov
Quark distinguishes itself through a multi-stage pipeline:
- Partitioning: Leveraging
Unstructured.iofor semantic text decomposition and layout analysis. - Extraction: Utilizing
pdfplumberfor precise image and table coordinate extraction. - Sync Layer: A custom orchestration layer that aligns text and visual modalities for comprehensive multimodal embeddings.
- STM (Short-Term Memory): Powered by Redis. Provides sub-millisecond access to rapid session-based context and transient state.
- LTM (Long-Term Memory): Powered by Mem0. Acts as a persistent intelligence layer that retains user history, evolving preferences, and long-form knowledge over time.
- Embedding & Reranking: Powered by Voyage AI, utilizing advanced rerankers and metadata filtering to maximize retrieval precision.
- Vector Infrastructure: Qudrant handles high-dimensional vector storage alongside robust relational metadata.
- Web Framework: ElysiaJS — The high-performance, Bun-native framework for the backend.
- Identity & DB: Supabase — Unified Auth and PostgreSQL backend.
- Frontend: React — A minimalist, streaming-responsive interface optimized for real-time AI interactions.
- Worker(BullMQ + Redis): — Persistent workers. Heavy I/O and compute offloaded. Scalability by design. Powered by BullMQ and Redis."
Follow these steps to initialize and run the Quark-RAG system on your local machine.
We provide a setup script to handle dependency installation and environment checks.
- Grant execution permissions:
chmod +x setup.sh
- Run the initializer:
./setup.sh
Before launching the web app, you must set up your credentials. inside a .env file in the root directory and populate it with your API keys
npm run worker:ingestionnpm run worker:chatOnce the setup is complete, you can start chatting with your documents immediately via the web interface.
cd frontend
npm install
npm run devThis implementation follows a Modular RAG pattern. By decoupling the ingestion of images and text and re-syncing them at the metadata level, the system maintains higher contextual integrity than standard "text-only" pipelines. The integration of Redis and Mem0 mimics human cognitive functions by separating immediate recall from historical knowledge.

