GENOVA: Generative Modeling Framework for Highly Bioavailable and Blood Brain Barrier Permeant Drug Design

Hasegawa K., Papadopoulos E., Xie E., Kementzidis G., Chorev M., Aktas B.H., Deng Y.
GENOVA: Generative Modeling Framework for Highly Bioavailable and Blood Brain Barrier Permeant Drug Design

Overview

GENOVA is a generative AI framework for the de novo design of highly bioavailable, blood-brain barrier (BBB) permeant drug candidates. As a proof-of-concept, GENOVA is applied to the discovery of novel BACE1 inhibitors as potential therapeutics for Alzheimer's Disease (AD).

GENOVA integrates:

A SELFIES-based Autoencoder for robust molecular representation
QSAR Neural Networks for predicting key pharmacological properties
A Transfer Learning (TL)-based WGAN-GP for de novo molecule generation
A Genetic Algorithm (GA) for multi-property fitness optimization
A SAS-QED-PAINS filtering for compound selection
AutoDock Vina molecular docking for independent validation

Starting from over 2 million generated novel compounds, GENOVA identifies 190 BBB-permeable candidate BACE1 inhibitors that outperform all BACE1 inhibitors that have advanced to human clinical trials in terms of binding affinity and/or specificity.

Framework Architecture

Repository Structure

GENOVA-pytorch/
│
├── Dataset/                        # All datasets used in this study
│   ├── 500k_small_molecule         # Subset of ChEMBL 33 bioactive small molecules for autoencoder training (500k datapoints, not included in this folder and available per request)
│   ├── 100k_small_molecule         # Subset of ChEMBL 33 bioactive small molecules for WGAN-GP pre-training (100k datapoints)
│   ├── BACE1                       # BACE1 inhibitors (7,096 datapoints after preprocessing) 
│   ├── pIC50                       # pIC50 values of BACE1 inhibitors (7,096 datapoints after preprocessing) 
│   ├── logBB                       # FDA-approved CNS drugs with logBB values (1,021 datapoints)
│   ├── Bioavailability             # Compound bioavailability values from multiple sources (2,405 datapoints)
│   ├── specificity                 # BACE1 vs. BACE2 specificity scores calucated based on AutoDock Vina (10,619 datapoints)
│   ├── SAS                         # Synthetic accessibility scores (100k datapoints)
│   └── BACE1_clinical_trial        # 11 BACE1 clinical trial candidate drugs
│
├── Runfiles/                       # Run scripts for each task in the pipeline
│   ├── run_AE_selfies.py           # Train the SELFIES autoencoder
│   ├── run_AE_smiles.py            # Train the smiles autoencoder
│   ├── run_QSAR.py                 # Train QSAR neural networks
│   ├── run_WGANGP.py               # Pre-train WGAN-GP on 100k small molecules / BACE1 dataset
│   ├── run_WGANGP_TL.py            # Fine-tune WGAN-GP on BACE1 dataset (Transfer Learning)
│   ├── run_GA.py                   # Run Genetic Algorithm optimization
│
├── models/                         # Model architecture definitions
│   ├── AE.py                       # Encoder-Decoder with bidirectional LSTM layers
│   ├── QSAR.py                     # Feed-forward QSAR neural networks (pIC50, logBB, bioavailability, SAS, specificity)
│   ├── WGANGP.py                   # Wasserstein GAN with Gradient Penalty (generator + critic)
│   ├── WGANGP_TL.py                # Wasserstein GAN with Gradient Penalty (generator + critic) + transfer learning
│   └── GA.py                       # Genetic Algorithm with elitism selection
│
├── Utility.py                      # Shared utility functions 
├── config.py                       # Configuration file (hyperparameters, file paths, thresholds, etc.)
├── requirements.txt                # Python dependencies
└── README.md

Installation

Prerequisites

Python 3.8+
CUDA-compatible GPU (trained and tested on NVIDIA V100)
AutoDock Vina (for molecular docking)
RDKit (for cheminformatics utilities)
Open Babel (for molecular format conversion)

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GENOVA: Generative Modeling Framework for Highly Bioavailable and Blood Brain Barrier Permeant Drug Design

Overview

Framework Architecture

Repository Structure

Installation

Prerequisites

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Dataset		Dataset
RunFiles		RunFiles
models		models
LICENSE.txt		LICENSE.txt
README.md		README.md
Utility.py		Utility.py
config.py		config.py

Folders and files

Latest commit

History

Repository files navigation

GENOVA: Generative Modeling Framework for Highly Bioavailable and Blood Brain Barrier Permeant Drug Design

Overview

Framework Architecture

Repository Structure

Installation

Prerequisites

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages