🧠 Mat-Trix: Assistant for Materials Science Research

Mat-Trix is an AI-powered assistant that answers natural language questions about the latest materials science papers published on Nature.com. It uses Retrieval-Augmented Generation (RAG) to provide accurate, citation-backed answers directly from full-text PDFs—no model training required.

🚀 Features

🔄 Daily Auto-Update: Automatically scrapes and updates the latest research.
📄 PDF Ingestion: Full papers are downloaded and stored in a vector database.
🤖 LLM-based Q&A: Gemini + LangChain generates accurate, contextual answers.
🔗 Citations Included: Each response is backed by real document sources.
🌐 Simple Frontend: Lightweight HTML/CSS/JS UI served via Flask.

🧰 Tech Stack

Component	Tool/Library
Backend	FastAPI, LangChain
Middleware	Flask
Vector Store	ChromaDB
LLM	Gemini (or compatible)
Scraping	Playwright, BeautifulSoup
PDF Downloads	aria2c
Frontend	HTML, CSS, JavaScript
Automation	Bash scripts (`start.sh`, `update.sh`)

⚙️ How It Works

source.py scrapes new article links from Nature’s Materials Science section.
downloader.py fetches the PDF files for these articles.
update.sh ingests new PDFs into the vector database.
Users interact via a web UI powered by Flask → FastAPI.
The backend returns answers with proper citations using RAG.

graph TD
    subgraph User Layer
        UI["User via Web Browser (index.html, script.js, style.css)"]
    end

    subgraph Frontend Server Layer
        FS["Flask Application (app.py)"]
    end

    subgraph Backend RAG Layer
        BE_API["FastAPI Application (main.py)"]
        RAG["RAG Pipeline (rag_pipeline.py)"]
        LLM["LLM - Google Gemini"]
    end

    subgraph Data Layer
        VDB["Vector Database (ChromaDB)"]
        PDFs["PDF Document Store"]
        Scraper["Web Scraper (source.py)"]
        Downloader["PDF Downloader (downloader.py)"]
        Ingestor["Ingestion Service (ingestion.py)"]
    end

    UI -- HTTP Requests (Query/Ingest) --> FS
    FS -- Proxied HTTP Requests --> BE_API
    BE_API -- Invokes --> RAG
    RAG -- Retrieves Context --> VDB
    RAG -- Sends Context + Query --> LLM
    LLM -- Returns Raw Answer --> RAG
    RAG -- Formats Answer --> BE_API
    BE_API -- Returns JSON Response --> FS
    FS -- Returns JSON Response --> UI
    UI -- Renders Response (using marked.js) --> User

    Scraper --> MetadataJSON["Article Metadata JSON"]
    MetadataJSON --> Downloader
    Downloader --> PDFs
    PDFs --> Ingestor
    Ingestor -- Processes & Embeds --> VDB
    BE_API -- Triggers Ingestion --> Ingestor

🛠️ Setup & Usage

1. Clone the repo

git clone https://github.com/nandan645/mat-trix.git
cd mat-trix

2. Install dependencies

pip install -r requirements.txt

3. Scrape article metadata (required once or for daily updates)

python source.py

4. Downloads article from `mature_articles.json` created by `source.py`

python downloader.py

5. Start the full application

./start.sh

This runs both the FastAPI backend and Flask frontend.

6. Ingest new papers into the vector DB

Keep the server running, and in a new terminal run:

./update.sh

📅 Automation Tips

Add a cron job to run source.py and update.sh daily for auto-updates.

🙌 Acknowledgments

Nature.com for article metadata
LangChain, ChromaDB, Gemini
Playwright, FastAPI, Flask, BeautifulSoup

💡 Future Enhancements

Add filtering (date/topic/access type)
Expand to multiple journals or subjects
User-uploaded PDF support
Answer export/download options

❓ Need Help?

Open an issue or contact the maintainer.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
static		static
templates		templates
webscraper_gen		webscraper_gen
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Mat-Trix: Assistant for Materials Science Research

🚀 Features

🧰 Tech Stack

⚙️ How It Works

🛠️ Setup & Usage

1. Clone the repo

2. Install dependencies

3. Scrape article metadata (required once or for daily updates)

4. Downloads article from `mature_articles.json` created by `source.py`

5. Start the full application

6. Ingest new papers into the vector DB

📅 Automation Tips

🙌 Acknowledgments

💡 Future Enhancements

❓ Need Help?

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Mat-Trix: Assistant for Materials Science Research

🚀 Features

🧰 Tech Stack

⚙️ How It Works

🛠️ Setup & Usage

1. Clone the repo

2. Install dependencies

3. Scrape article metadata (required once or for daily updates)

4. Downloads article from mature_articles.json created by source.py

5. Start the full application

6. Ingest new papers into the vector DB

📅 Automation Tips

🙌 Acknowledgments

💡 Future Enhancements

❓ Need Help?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

4. Downloads article from `mature_articles.json` created by `source.py`

Packages