Skip to content

kari15/GENOVA-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GENOVA: Generative Modeling Framework for Highly Bioavailable and Blood Brain Barrier Permeant Drug Design

Python PyTorch License: MIT

Hasegawa K., Papadopoulos E., Xie E., Kementzidis G., Chorev M., Aktas B.H., Deng Y.
GENOVA: Generative Modeling Framework for Highly Bioavailable and Blood Brain Barrier Permeant Drug Design


Overview

GENOVA is a generative AI framework for the de novo design of highly bioavailable, blood-brain barrier (BBB) permeant drug candidates. As a proof-of-concept, GENOVA is applied to the discovery of novel BACE1 inhibitors as potential therapeutics for Alzheimer's Disease (AD).

GENOVA integrates:

  • A SELFIES-based Autoencoder for robust molecular representation
  • QSAR Neural Networks for predicting key pharmacological properties
  • A Transfer Learning (TL)-based WGAN-GP for de novo molecule generation
  • A Genetic Algorithm (GA) for multi-property fitness optimization
  • A SAS-QED-PAINS filtering for compound selection
  • AutoDock Vina molecular docking for independent validation

Starting from over 2 million generated novel compounds, GENOVA identifies 190 BBB-permeable candidate BACE1 inhibitors that outperform all BACE1 inhibitors that have advanced to human clinical trials in terms of binding affinity and/or specificity.


Framework Architecture

Flowchart_updated_102824_vers3

Repository Structure

GENOVA-pytorch/
│
├── Dataset/                        # All datasets used in this study
│   ├── 500k_small_molecule         # Subset of ChEMBL 33 bioactive small molecules for autoencoder training (500k datapoints, not included in this folder and available per request)
│   ├── 100k_small_molecule         # Subset of ChEMBL 33 bioactive small molecules for WGAN-GP pre-training (100k datapoints)
│   ├── BACE1                       # BACE1 inhibitors (7,096 datapoints after preprocessing) 
│   ├── pIC50                       # pIC50 values of BACE1 inhibitors (7,096 datapoints after preprocessing) 
│   ├── logBB                       # FDA-approved CNS drugs with logBB values (1,021 datapoints)
│   ├── Bioavailability             # Compound bioavailability values from multiple sources (2,405 datapoints)
│   ├── specificity                 # BACE1 vs. BACE2 specificity scores calucated based on AutoDock Vina (10,619 datapoints)
│   ├── SAS                         # Synthetic accessibility scores (100k datapoints)
│   └── BACE1_clinical_trial        # 11 BACE1 clinical trial candidate drugs
│
├── Runfiles/                       # Run scripts for each task in the pipeline
│   ├── run_AE_selfies.py           # Train the SELFIES autoencoder
│   ├── run_AE_smiles.py            # Train the smiles autoencoder
│   ├── run_QSAR.py                 # Train QSAR neural networks
│   ├── run_WGANGP.py               # Pre-train WGAN-GP on 100k small molecules / BACE1 dataset
│   ├── run_WGANGP_TL.py            # Fine-tune WGAN-GP on BACE1 dataset (Transfer Learning)
│   ├── run_GA.py                   # Run Genetic Algorithm optimization
│
├── models/                         # Model architecture definitions
│   ├── AE.py                       # Encoder-Decoder with bidirectional LSTM layers
│   ├── QSAR.py                     # Feed-forward QSAR neural networks (pIC50, logBB, bioavailability, SAS, specificity)
│   ├── WGANGP.py                   # Wasserstein GAN with Gradient Penalty (generator + critic)
│   ├── WGANGP_TL.py                # Wasserstein GAN with Gradient Penalty (generator + critic) + transfer learning
│   └── GA.py                       # Genetic Algorithm with elitism selection
│
├── Utility.py                      # Shared utility functions 
├── config.py                       # Configuration file (hyperparameters, file paths, thresholds, etc.)
├── requirements.txt                # Python dependencies
└── README.md

Installation

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (trained and tested on NVIDIA V100)
  • AutoDock Vina (for molecular docking)
  • RDKit (for cheminformatics utilities)
  • Open Babel (for molecular format conversion)

License

This project is licensed under the MIT License. See LICENSE for details.

About

GENOVA is an AI-driven de novo drug design framework based on WGAN-GP and genetic algorithm optimization to generate pharmacologically optimized small molecules. Demonstrated on BACE1 inhibitors for Alzheimer’s disease, the framework helps identifying candidates with improved potency, bioavailability, BBB permeability and other important properties

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages