SimSon

Pytorch Implementation of SimSon (Simple Contrastive Learning of SMILES for Molecular Property Prediction).

Data Directory

You can download datasets and model weights from the following link: https://drive.google.com/drive/folders/11vVMQrNog23Xjtb53CTAJ0loQXdmPwEv?usp=drive_link
All datasets, except for the pretrained dataset consisting of 1M PubChem entries, are available on Google Drive. You can download the 1M PubChem dataset directly from https://pubchem.ncbi.nlm.nih.gov For QM7 dataset, you should download QM7 from MoleculeNet (https://moleculenet.org/datasets-1) and convert mat file to csv format.


SimSon
├──data
│   ├──prediction
│   │   ├──bace.pth
│   │   ├──bbbp.pth
│   ├──tokenizer
│   │   ├──pubchem_part_tokenizer.json
├──models
│   ├──downstream
│   │   ├──tox21_best_model.pth
│   │   ├──lipophilicity_best_model.pth
│   ├──pretrained
│   │   ├──pretrained_best_model.pth
      .
      .
      .

Requirements

numpy==1.22.4
requests==2.32.3
scikit-learn==1.5.0
scipy==1.13.1
tokenizers==0.19.1
torch==2.3.1
torchaudio==2.3.1
torchvision==0.18.1
tqdm==4.66.4
transformers==4.42.3
x-transformers==1.31.6

Run

To finetune and inference all downstream datasets, please refer to the inference.py file or run the following code:


sh downstrem.sh

To train the models, run the following codes:


# pretrain the model
python main.py --task pretraining

# finetune the downstream models (bbbp example)
python main.py --task downstream --data bbbp --criterion bce --lr 5e-5 --num_classes 1

# inference the downstream models (bbbp example)
python main.py --task inference --data bbbp --criterion bce --lr 5e-5 --num_classes 1

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
SmilesEnumerator.py		SmilesEnumerator.py
config.py		config.py
dataloader.py		dataloader.py
downstream.sh		downstream.sh
losses.py		losses.py
main.py		main.py
model.py		model.py
tokenization.py		tokenization.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimSon

Data Directory

Requirements

Run

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SimSon

Data Directory

Requirements

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages