MOL_GVAE is a machine learning project focused on developing graph-based variational autoencoders (GVAEs) for molecular data. The goal is to encode molecular structures into a latent space representation while preserving their chemical and structural properties. This enables tasks such as molecular generation, optimization, and property prediction.
The project leverages graph neural networks (GNNs) to process molecular graphs, where atoms are represented as nodes and bonds as edges. By combining GNNs with variational inference, MOL_GVAE learns probabilistic latent representations that can be used for downstream tasks like molecular design and virtual screening.
Key features of the project include:
- Graph Matching Algorithms: Incorporates Hungarian matching for aligning predicted and target graphs.
- High-Fidelity and Low-Fidelity Models: Supports multi-fidelity modeling for efficient training.
- Custom Loss Functions: Includes reconstruction loss, KL divergence, and other task-specific objectives.
- Extensibility: Modular design allows for easy integration of new datasets, models, and training pipelines.
vgae.py: Contains a variational autoencoder adapted from pytorch geometric to include graph level latents and a porperty predictoer. The encoder and decoder can abe user defined.ae.py: Contains the encoder and decoder classes to be used for the VGAEmatching.py: Graph matching implementation.train.py: Training script for the model.utils.py: Contains utility function functions alongside a data_prep.py script to load the dataset
git clone https://github.com/barbacci-marco/GVAE.git
cd GVAEMake sure you have conda or mamba installed.
conda env create -f environment.ymlThis will create an environment named mol-gvae (you can rename it inside environment.yml).
conda activate mol-gvae✅ Now you’re ready to train models with train.py or experiment with the autoencoder in ae.py.
python train.py --epochs 300 --batch-size 32