LigandExplorer is a workflow that combines cheminformatics tools and machine learning methods to automatically extract and classify ligands from PDB structures. It applies graph theory to identify covalent and non-covalent ligands based on molecular connectivity and uses machine learning models to filter out irrelevant molecules, ensuring that only biologically significant ligands are retained.
V2.0 introduces a new GNN-based classification backend, replacing the previous LightGBM models as the default. The GNN models learn molecular features end-to-end directly from raw atomic properties (atomic number, 3D coordinates, residue boundaries), eliminating the need for hand-crafted features.
Two GNN models are included:
- Molecule Classifier : 8-class classification — peptide, glycan, RNA, DNA, lipid, ion, organic, and cyclic peptide (new). Built on a SchNet. Test accuracy: 99.66%.
- Ligand Relevance Classifier : Binary classification — determines whether a molecule is a biologically relevant ligand. Uses pocket-aware graph construction and dual-channel (ligand + protein) pooling. Test accuracy: 97.61%, AUC: 99.41%.
V2.0 adds cyclic peptide as a new molecular category. Cyclic peptides are now correctly identified and classified rather than being grouped with linear peptides.
Users can choose between the GNN and legacy LightGBM backends via the --backend flag:
# Use GNN backend (default)
ligandexplorer -i input.zip -o output/
# Use legacy LightGBM backend (7-class, no cyclic peptide support)
ligandexplorer -i input.zip -o output/ --backend lgbmWhen using the GNN backend, GPU acceleration is available via the --device flag:
# Run on CPU (default)
ligandexplorer -i input.zip -o output/
# Run on GPU
ligandexplorer -i input.zip -o output/ --device cudaThe GNN backend requires PyTorch and PyTorch Geometric:
pip install torch torch_geometric- Automated Ligand Extraction: Automatically identifies and extracts ligands from PDB files.
- Intelligent Classification: Uses GNN models to accurately classify molecules into ions, solvents, nucleic acids, peptides, cyclic peptides, or biologically active ligands.
- Graph Theory Application: Distinguishes between covalent and non-covalent ligands based on molecular connectivity using graph theory algorithms.
- High-Throughput Processing: Supports multi-core parallel processing for rapid handling of large PDB datasets.
- Flexible Customization: Offers various command-line options for users to tailor the workflow to their specific needs.
-
Prerequisites
Before installing, please ensure you have the necessary dependencies. You can install them using pip:
pip install numpy biopython networkx scikit-learn torch torch_geometricIf you need the legacy LightGBM backend:
pip install lightGBM -
Clone the Repository
Clone the latest version of the code to your local machine using Git:
git clone https://github.com/dptech-corp/ligandexplorer.git cd ligandexplorer -
Run the install script
Use the Python setup script to install LigandExplorer and its dependencies:
python setup.py install
Once installed, you can run the program using the ligandexplorer command.
Use the -h or --help flag to view all available options:
ligandexplorer -h
Here is a basic usage example:
# Basic usage (GNN backend, CPU)
ligandexplorer -i /path/to/your/input.zip -o /path/to/your/output_directory
# With GPU acceleration
ligandexplorer -i /path/to/your/input.zip -o /path/to/your/output_directory --device cuda
# Using legacy LightGBM backend
ligandexplorer -i /path/to/your/input.zip -o /path/to/your/output_directory --backend lgbmThis command will process all PDB structures within input.zip and save the extracted and classified ligand information to the output_directory.
usage: ligandexplorer [-h] -o OUTPUT_DIR -i INPUT_ZIP [-f FIX_PDB_FILE] [-b BOX_SIZE] [-c CORE]
[-s STRICT_MODE] [-l LIG_IDENTIFY] [-e SILENCE_MODE] [-t DEBUG]
[-m {gnn,lgbm}] [-d {cpu,cuda}]
Ligand explorer, version: v2.0
options:
-h, --help show this help message and exit
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
defined the output path
-i INPUT_ZIP, --input_zip INPUT_ZIP
Path of the input zip file (rar, zip, tar, tar.gz, tar.bz2, bz2)
-f FIX_PDB_FILE, --fix_pdb_file FIX_PDB_FILE
Fix missing main chain atom by PDBFixer (if PDBFixer is install)
-b BOX_SIZE, --box_size BOX_SIZE
defined the docking grid box size (angstrom), default is 10
-c CORE, --core CORE defined the number of mult-process core, default is None <will use all core>
-s STRICT_MODE, --strict_mode STRICT_MODE
If strict mode is on, workflow will remove identical ligands. default is True
-l LIG_IDENTIFY, --lig_identify LIG_IDENTIFY
The workflow will determine the type of all ligand (ions, solvent, nucleic acid,
ligand, peptide, cyclic peptide). default is True
-e SILENCE_MODE, --silence_mode SILENCE_MODE
Disable all output information. The workflow will run in silence. default is False
-t DEBUG, --debug DEBUG
debug mode. output some debug information. default is False
-m {gnn,lgbm}, --backend {gnn,lgbm}
Model backend for ligand identification: gnn (default) or lgbm
-d {cpu,cuda}, --device {cpu,cuda}
Device for GNN inference: cpu (default) or cuda
If you use LigandExplorer in your research, please cite:
Li, Y.; Zou, R.; Yang, M.; Wang, Y.; Liu, Z.; Zheng, H. LigandExplorer: An Automated Tool for Ligand Extraction from PDB Structures. J. Chem. Inf. Model. 2026, 66 (6), 3026–3035. DOI: 10.1021/acs.jcim.5c02921
BibTeX
@article{Li2026LigandExplorer,
author = {Li, Yaqi and Zou, Rongfeng and Yang, Maohua and Wang, Ying and Liu, Zhonghua and Zheng, Hang},
title = {LigandExplorer: An Automated Tool for Ligand Extraction from PDB Structures},
journal = {Journal of Chemical Information and Modeling},
year = {2026},
volume = {66},
number = {6},
pages = {3026--3035},
doi = {10.1021/acs.jcim.5c02921},
pmid = {41762111},
}RIS (EndNote, Zotero, Mendeley)
TY - JOUR
AU - Li, Yaqi
AU - Zou, Rongfeng
AU - Yang, Maohua
AU - Wang, Ying
AU - Liu, Zhonghua
AU - Zheng, Hang
TI - LigandExplorer: An Automated Tool for Ligand Extraction from PDB Structures
JO - Journal of Chemical Information and Modeling
PY - 2026
VL - 66
IS - 6
SP - 3026
EP - 3035
DO - 10.1021/acs.jcim.5c02921
PM - 41762111
ER -