This repository contains the code for the paper "MMAF: Multimodal Attention Fusion for Molecular Toxicity Prediction" accepted at ICPR'26.
The code is organized as a standard research-code release with one code/ directory,
small entry-point scripts, configuration files, setup checks, and separated output folders.
code/
train.py # main entry point for all datasets
train_tox21.py # Tox21 entry point
train_bace.py # BACE entry point
train_bbbp.py # BBBP entry point
train_hiv.py # HIV entry point
train_clintox.py # ClinTox entry point
train_sider.py # SIDER entry point
run_all.py # run all non-Tox21 datasets
check_setup.py # verify dataset files
dataset.py # dataset metadata
graph.py # graph/fingerprint metadata
model.py # model-family metadata
metrics.py # reported metric names
utility_functions.py # command helpers
pipelines/ # experiment implementations
config/
tox21.yaml
molnet_datasets.yaml
data/
README.md
results/
output/
script/
train.sh
run_all.sh
check_data.sh
pip install -r requirements.txtFor CUDA machines, install PyTorch and PyTorch Geometric for the target CUDA version first if needed.
Create data/ and put the CSV files with lowercase names:
data/tox21.csv
data/bace.csv
data/bbbp.csv
data/hiv.csv
data/clintox.csv
data/sider.csv
Check setup:
python code/check_setup.pyTox21:
python code/train_tox21.py --config config/tox21.yamlOther datasets:
python code/train_bbbp.py --split random
python code/train_bbbp.py --split scaffold
python code/train_bace.py --split random
python code/train_bace.py --split scaffold
python code/train_hiv.py --split scaffold
python code/train_hiv.py --split random
python code/pipelines/multi_random.py --dataset clintox
python code/pipelines/multi_scaffold.py --dataset clintox
python code/pipelines/multi_random.py --dataset sider`
python code/pipelines/multi_scaffold.py --dataset sider``
Single unified command:
```bash
python code/train.py --dataset bbbp --split random
python code/train_hiv.py --split scaffoldRun all non-Tox21 experiments:
python code/run_all.pyQuick smoke test:
python code/train.py --dataset bbbp --split random --quickResults are written to:
results/
output/checkpoints/
output/splits/
output/invalid_smiles/
Use the same dataset CSV files, the same split type, and the same environment/GPU type when comparing with the paper table. Small changes in PyTorch/CUDA/RDKit versions can slightly change results.