Skip to content

mohammad-gh009/DrugReasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model

License: Apache 2.0 Python 3.8+ arXiv HuggingFace model Hugging Face

Logo

DrugReasoner is an AI-powered system for predicting drug approval outcomes using reasoning-augmented Large Language Models (LLMs) and molecular feature analysis. By combining advanced machine learning with interpretable reasoning, DrugReasoner provides transparent predictions that can accelerate pharmaceutical research and development.

Figure 1.pdf

✨ Key Features

  • 🤖 LLM-Powered Predictions: Utilizes fine-tuned Llama model for drug approval prediction
  • 🧬 Molecular Analysis: Advanced SMILES-based molecular structure analysis
  • 🔍 Interpretable Results: Clear reasoning behind predictions for better decision-making
  • 📊 Similarity Analysis: Identifies similar approved/non-approved compounds for context
  • ⚡ Flexible Inference: Support for both single molecule and batch predictions

🛠️ Installation

  • To use DrugReasoner, you must first request access to the base model Llama-3.1-8B-Instruct on Hugging Face by providing your contact information. Once access is granted, you can run DrugReasoner either through the command-line interface (CLI) or integrate it directly into your Python workflows.

Prerequisites

  • Python 3.8 or higher
  • CUDA-compatible GPU (recommended for training and inference)
  • Git

Setup Instructions

  1. Clone the repository

    git clone https://github.com/mohammad-gh009/DrugReasoner.git
    cd DrugReasoner
  2. Create and activate virtual environment

    Windows:

    cd src
    python -m venv myenv
    myenv\Scripts\activate

    Mac/Linux:

    cd src
    python -m venv myenv
    source myenv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Login to your Huggingface account You can use this instruction on how to make an account and this on how to get the token

    huggingface-cli login --token YOUR_TOKEN_HERE

🚀 How to use

Note: GPU is required for inference. If unavailable, use our Kaggle Notebook.

CLI Inference

python inference.py \
    --smiles "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O" "CC1=CC=C(C=C1)C(=O)O" \
    --output results.csv \
    --top-k 9 \
    --top-p 0.9 \
    --max-length 4096 \
    --temperature 1.0

Python API Usage

from inference import DrugReasoner

predictor = DrugReasoner()

results = predictor.predict_molecules(
    smiles_list=["CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"],
    save_path="results.csv",
    print_results=True,
    top_k=9,
    top_p=0.9,
    max_length=4096,
    temperature=1.0
)

📊 Dataset & Model

  • Dataset: Hugging Face Dataset
  • Model: Hugging Face Model

📈 Performance

DrugReasoner demonstrates superior performance compared to traditional baseline models across multiple evaluation metrics. Detailed performance comparisons are available in our paper.

📝 Citation

If you use DrugReasoner in your research, please cite our work:

@misc{ghaffarzadehesfahani2025drugreasonerinterpretabledrugapproval,
      title={DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model}, 
      author={Mohammadreza Ghaffarzadeh-Esfahani and Ali Motahharynia* and Nahid Yousefian and Navid Mazrouei and Jafar Ghaisari and Yousof Gheisari},
      year={2025},
      eprint={2508.18579},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.18579}, 
}

📜 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Accelerating drug discovery through AI-powered predictions