Multi-Agent Research Intelligence Platform for processing research papers with AI agents (ingestion, embeddings, clustering, and RAG).
- Frontend: Next.js (TypeScript)
- Backend: Flask API
- Database: PostgreSQL + pgvector
- Cache / Message Broker: Redis
- Background Jobs: Celery
- Containerization: Docker
-
Clone and enter the repo
cd paperMind -
Create environment file
cp .env.example .env
-
Run with Docker Compose
make up
Or use
docker compose up --builddirectly. Other useful commands:make down– stop and remove containersmake logs– follow container logsmake restart– stop, rebuild, and start
-
Access the app
- Frontend: http://localhost:3000
- Backend health: http://localhost:5000/healthz
- Backend readiness: http://localhost:5000/readyz (checks Postgres & Redis)
/
├── frontend/ # Next.js TypeScript app
├── backend/ # Flask API
├── infra/ # Docker & deployment configs
├── docs/ # Architecture notes
├── docker-compose.yml
└── .env.example
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
export DATABASE_URL=postgresql://postgres:postgres@localhost:5432/papermind
export REDIS_URL=redis://localhost:6379/0
python run.pycd frontend
npm install
npm run devEnsure PostgreSQL and Redis are running (e.g. via docker compose up postgres redis -d).
GET /healthz– Returns{"status": "ok"}(liveness)GET /readyz– Returns readiness status; checks PostgreSQL and Redis. Returns 503 if either is unreachable.
This milestone introduces the core application layer:
- Authentication – Email/password signup & login with bcrypt-hashed passwords and JWT-based auth (
/auth/signup,/auth/login,/auth/me). - Workspaces – Multi-tenant workspaces with membership and roles; endpoints to create and list workspaces and fetch a workspace (
/workspaces,/workspaces/:id). - Paper upload – Authenticated PDF uploads (max 20MB) into workspaces via
/papers/upload, saving files under the backenduploads/directory and recording metadata in PostgreSQL. - Library listing – Workspace-specific library listing via
/papers?workspace_id=..., returning all papers for a workspace.
Milestone 2 adds an asynchronous ingestion pipeline and job tracking:
- Jobs table – Tracks background jobs (
jobstable) including type, status (queued,running,completed,failed), progress, and errors. - Chunks table – Stores extracted text chunks for each paper (
chunkstable) withchunk_index,text, andtoken_count. - Celery + Redis – Uses Celery workers with Redis as the broker and result backend for asynchronous processing.
- PDF extraction – Extracts text from uploaded PDFs using
pypdf, then splits text into chunks (~800–1200 characters) with simple paragraph-based chunking. - Job-triggered ingestion – After a successful PDF upload, an
ingestionjob is created and a Celery task processes the paper in the background (updating paper status fromuploaded→processing→readyorfailed). - APIs – New
/jobsand/jobs/:idendpoints expose job metadata, and paper APIs expose processing status for frontend visibility.
Milestone 3 adds native embedding generation and vector search powered by Postgres pgvector and Sentence Transformers. Fast, private, zero external AI costs.
- Vector Storage – Adds
.embedding vector(384)columns withivfflatindexes for bothpapersandchunkstables. - Local Embedding Pipeline – Uses local, open-source models (default:
BAAI/bge-small-en-v1.5) viasentence-transformers, removing the need for paid external APIs. - Chained Jobs – The ingestion pipeline automatically queues an
embeddingjob after chunking, chaining the workflow smoothly:upload -> ingestion -> chunking -> embedding -> semantic search. - Search APIs – Exposes two new semantic vector-search endpoints:
/search(find chunks matching a text query in a workspace) and/papers/<id>/similar(find related papers using their cached centroid vector).
PaperMind keeps embeddings local (Sentence Transformers + pgvector) and uses an external LLM only for text generation (answering questions, summarisation, future RAG features).
- Provider: Gemini (via the
google-generativeaiPython SDK). - Usage: LLMs are wrapped by
LLMServiceinbackend/app/services/llm_service.py, which exposes a simplegenerate_text(prompt)API and agenerate_answer(question, context_chunks)helper for RAG-style prompts. - Model: Default model is
gemini-1.5-flash(configurable viaGEMINI_MODEL). - Configuration:
- Set
GEMINI_API_KEYin your.env(get a key from Google AI Studio). - Optional: override
LLM_PROVIDER(currently only"gemini"is supported) orGEMINI_MODEL.
- Set
This prepares the system for Milestone 4 RAG features, where retrieved chunks from pgvector search will be passed into Gemini for high-quality answer generation.
Milestone 4 adds a retrieval-augmented chat experience over workspace libraries:
- Conversation storage – New
conversationsandmessagestables track chat sessions, participants, and message history (including JSONBcitationson assistant messages). - Retrieval service – A dedicated retrieval layer uses local embeddings + pgvector (
chunks.embedding) to fetch the most relevant chunks for a question, scoped to a workspace (and optionally a single paper). - RAG answers with Gemini – The
LLMServiceuses Gemini to generate grounded answers from retrieved chunks viagenerate_answer_with_citations, returning both an answer and structured citation metadata. - Chat APIs – New
/chatendpoints:POST /chat/conversations– create a workspace-scoped conversation.GET /chat/conversations?workspace_id=...– list conversations in a workspace.GET /chat/conversations/<conversation_id>/messages– fetch message history.POST /chat/ask– ask a question within a conversation, run retrieval, generate an answer with citations, and persist both user and assistant messages.
- Workspace & paper scoping – All chat and retrieval operations enforce workspace membership, and
POST /chat/askcan optionally restrict retrieval to a single paper viapaper_id. - Frontend chat UI – A simple chat interface at
/workspace/[id]/chatshows conversations, message history, and assistant answers with inline citations (paper title, chunk index, and labels like[1],[2]).
The full pipeline is now:
upload → ingestion → chunking → embedding → retrieval → Gemini answer (with citations).
To improve answer quality without adding any external API costs, PaperMind performs local cross-encoder reranking on top of pgvector search:
- Two-stage retrieval:
- Initial retrieval: pgvector semantic search over
chunks.embeddingfetches a broader candidate set (defaultINITIAL_RETRIEVAL_LIMIT = 20). - Local reranking: a Sentence Transformers cross-encoder (
cross-encoder/ms-marco-MiniLM-L-6-v2) scores each (question, chunk) pair and selects the bestFINAL_CONTEXT_LIMITchunks (default5).
- Initial retrieval: pgvector semantic search over
- Implementation:
- Config values in
Config:RERANKER_MODEL(default"cross-encoder/ms-marco-MiniLM-L-6-v2").ENABLE_RERANKING(defaulttrue).INITIAL_RETRIEVAL_LIMIT(default20).FINAL_CONTEXT_LIMIT(default5).
- Reranking is implemented in
backend/app/services/reranking_service.pyand integrated intoretrieve_context_for_questioninbackend/app/services/retrieval_service.py.
- Config values in
- Why this helps:
- pgvector recall is high but ranking is purely embedding-based; the cross-encoder re-scores the full question + chunk text jointly, which tends to surface more semantically precise context for Gemini.
- Everything runs locally (no extra API calls), preserving privacy and keeping RAG costs low.
Milestone 5 adds higher-level intelligence over the paper library, beyond retrieval and chat:
- Paper summaries – After embedding, an
analysisjob runs a Gemini-powered summarisation step viasummarization_service.generate_paper_summary, storing a 3–5 sentence summary inpapers.summary. - Topic extraction – The same analysis job uses
topic_service.extract_paper_topicsto extract 5–8 short topics/keywords per paper, stored inpapers.topics(asTEXT[]). - Paper clustering – The
clustering_servicegroups papers per workspace using KMeans over stored paper embeddings, writing acluster_idback to each paper. - Workspace insights API –
insight_service.get_workspace_insightspowers new/insightsendpoints to return:total_papersclusters(papers grouped bycluster_id)topics(aggregated topic counts)recent_papers(latest papers with summaries, topics, clusters).
- Insights dashboard – A new UI at
/workspace/[id]/insightssurfaces these insights: total papers, top topics, cluster cards (with per-cluster papers and summaries), and a recent papers list.
The full pipeline now looks like:
upload → ingestion → chunking → embedding → analysis (summary/topics/clusters) → retrieval → reranking → Gemini answer.
For new databases, the schema is created via backend/db/schema.sql on first container startup.
For existing local databases, you can either:
-
Reset the Postgres volume and re-run initialization:
docker compose down -v docker compose up --build
-
Or apply the Milestone 2 migration SQL manually:
# From the project root, with Postgres running docker compose exec postgres psql -U postgres -d papermind -f /docker-entrypoint-initdb.d/schema.sql docker compose exec postgres psql -U postgres -d papermind -f /path/to/backend/db/migrations/002_milestone2.sql docker compose exec postgres psql -U postgres -d papermind -f /path/to/backend/db/migrations/003_milestone3.sql
The Celery worker is defined as a separate service in docker-compose.yml and is started automatically with:
docker compose up --buildThe worker shares the same code and environment as the backend API and mounts the same uploads volume so that uploaded PDFs are available during ingestion.
When you run:
docker compose up --buildthe PostgreSQL container automatically runs the schema defined in:
backend/db/schema.sql
This uses Postgres' standard entrypoint mechanism by mounting the file into /docker-entrypoint-initdb.d/schema.sql. The script is only executed on first initialization of the postgres_data volume; subsequent docker compose up runs will not re-apply the schema.
If you want to reset the database state and re-run schema initialization (for example during early development), you can remove the volume:
docker compose down -vThen start the stack again:
docker compose up --buildMilestone 6 focuses on productionising PaperMind while keeping local development simple:
- Storage abstraction (local + S3-compatible) –
storage_servicenow supports both local filesystem storage (default for development) and S3-compatible object storage (AWS S3, Cloudflare R2, etc.), controlled viaSTORAGE_PROVIDERand associatedS3_*environment variables. Workers access PDFs through the same abstraction, so ingestion continues to work in either mode. - Production configuration –
Configis fully environment-driven and grouped by concern (database, Redis, embeddings, LLMs, storage, retrieval)..env.exampledocuments the new storage-related variables and remains safe for local use. - Logging & observability basics – The Flask app logs each request with method, path, status, and duration in a structured, container-friendly format and captures uncaught exceptions without exposing details to clients. Celery continues to emit standard worker logs.
- Rate limiting & security hardening – Sensitive endpoints such as
/auth/signup,/auth/login,/papers/upload, and/chat/askare protected by lightweight rate limits usingFlask-Limiter, in addition to existing auth and validation. Upload handling still enforces PDF-only uploads and a 20MB size cap. - Automated tests – A minimal pytest-based backend test suite has been added (for example, basic
/healthzand auth validation checks), providing a foundation to grow coverage over time. - GitHub Actions CI – Workflows under
.github/workflows/now:- Run backend tests (
backend-ci.yml) against a real Postgres + Redis stack. - Build the Next.js frontend (
frontend-ci.yml) to catch compile-time issues. - Build backend and frontend Docker images (
docker-build.yml) to ensure container definitions remain valid.
- Run backend tests (
In production you will typically run:
- the backend API (Flask app behind a WSGI server such as gunicorn) connected to managed Postgres, Redis, and optional S3-compatible storage;
- one or more Celery workers using the same codebase and environment for ingestion, embedding, and analysis tasks;
- the frontend (Next.js) either as a container or as a static export behind a CDN, configured to talk to the backend API.
Local development remains unchanged: docker compose up --build starts Postgres, Redis, backend API, Celery worker, and frontend, all using local filesystem storage by default.
For low-cost or free-tier hosting (e.g. Railway, Render) where a separate Celery worker is not available, the backend supports a single-process, synchronous mode:
- Embeddings: Use the Gemini API for embeddings (
EMBEDDING_PROVIDER=gemini, default). No local ML stack (torch, sentence-transformers) is required; the image stays small. - Processing: Set
ASYNC_PROCESSING=false(default in production-safe config). After upload, the full pipeline (ingest → chunk → embed → analyze) runs inline in the request. No Redis or Celery worker is required. - Reranking: Local cross-encoder reranking is off by default (
ENABLE_RERANKING=false/ENABLE_LOCAL_RERANKING=false). Retrieval uses pgvector results directly. - Clustering: Set
ENABLE_CLUSTERING=false(default) to avoid scikit-learn; the insights page still works with summaries, topics, and recent papers.cluster_idmay be null.
Required env for this mode: GEMINI_API_KEY, your database and (if needed) storage URLs. The backend binds to PORT and listens on 0.0.0.0 for platform detection.
Private