Mat-Trix is an AI-powered assistant that answers natural language questions about the latest materials science papers published on Nature.com. It uses Retrieval-Augmented Generation (RAG) to provide accurate, citation-backed answers directly from full-text PDFs—no model training required.
- 🔄 Daily Auto-Update: Automatically scrapes and updates the latest research.
- 📄 PDF Ingestion: Full papers are downloaded and stored in a vector database.
- 🤖 LLM-based Q&A: Gemini + LangChain generates accurate, contextual answers.
- 🔗 Citations Included: Each response is backed by real document sources.
- 🌐 Simple Frontend: Lightweight HTML/CSS/JS UI served via Flask.
| Component | Tool/Library |
|---|---|
| Backend | FastAPI, LangChain |
| Middleware | Flask |
| Vector Store | ChromaDB |
| LLM | Gemini (or compatible) |
| Scraping | Playwright, BeautifulSoup |
| PDF Downloads | aria2c |
| Frontend | HTML, CSS, JavaScript |
| Automation | Bash scripts (start.sh, update.sh) |
source.pyscrapes new article links from Nature’s Materials Science section.downloader.pyfetches the PDF files for these articles.update.shingests new PDFs into the vector database.- Users interact via a web UI powered by Flask → FastAPI.
- The backend returns answers with proper citations using RAG.
graph TD
subgraph User Layer
UI["User via Web Browser (index.html, script.js, style.css)"]
end
subgraph Frontend Server Layer
FS["Flask Application (app.py)"]
end
subgraph Backend RAG Layer
BE_API["FastAPI Application (main.py)"]
RAG["RAG Pipeline (rag_pipeline.py)"]
LLM["LLM - Google Gemini"]
end
subgraph Data Layer
VDB["Vector Database (ChromaDB)"]
PDFs["PDF Document Store"]
Scraper["Web Scraper (source.py)"]
Downloader["PDF Downloader (downloader.py)"]
Ingestor["Ingestion Service (ingestion.py)"]
end
UI -- HTTP Requests (Query/Ingest) --> FS
FS -- Proxied HTTP Requests --> BE_API
BE_API -- Invokes --> RAG
RAG -- Retrieves Context --> VDB
RAG -- Sends Context + Query --> LLM
LLM -- Returns Raw Answer --> RAG
RAG -- Formats Answer --> BE_API
BE_API -- Returns JSON Response --> FS
FS -- Returns JSON Response --> UI
UI -- Renders Response (using marked.js) --> User
Scraper --> MetadataJSON["Article Metadata JSON"]
MetadataJSON --> Downloader
Downloader --> PDFs
PDFs --> Ingestor
Ingestor -- Processes & Embeds --> VDB
BE_API -- Triggers Ingestion --> Ingestor
git clone https://github.com/nandan645/mat-trix.git
cd mat-trixpip install -r requirements.txtpython source.pypython downloader.py./start.shThis runs both the FastAPI backend and Flask frontend.
Keep the server running, and in a new terminal run:
./update.sh- Add a cron job to run
source.pyandupdate.shdaily for auto-updates.
- Nature.com for article metadata
- LangChain, ChromaDB, Gemini
- Playwright, FastAPI, Flask, BeautifulSoup
- Add filtering (date/topic/access type)
- Expand to multiple journals or subjects
- User-uploaded PDF support
- Answer export/download options
Open an issue or contact the maintainer.