A deep learning framework for predicting Drug–Drug Interactions (DDIs) using multimodal Transformer embeddings and Graph Attention Networks (GAT).
Drug–Drug Interactions (DDIs) are often undetected due to limited clinical trials and the vast number of possible drug combinations. Traditional similarity-based methods fail to capture complex pharmacological relationships, while standard Graph Neural Networks (GNNs) do not prioritize biologically relevant neighbors. This work addresses the gap by integrating:
- Molecular representations (SMILES)
- Semantic drug descriptions
- Graph structural context (GAT)
to enable accurate and scalable DDI prediction.
The proposed framework consists of:
-
Multimodal Encoding
- SMILES and descriptions encoded using Transformer models (T5, SBERT, ChemBERTa)
-
Fusion Layer
- Concatenation + projection to unified embeddings
-
Graph Attention Network
- Learns drug interaction patterns via attention-based aggregation
-
Link Prediction
- MLP + Sigmoid for interaction probability
We use two widely accepted biomedical datasets:
- DrugBank → Drug attributes (SMILES + descriptions)
- BioSNAP → Drug–drug interaction network
| Metric | Count |
|---|---|
| Total Drugs | 12,227 |
| Total SMILES | 12,227 |
| Valid SMILES | 11,574 |
| Removed Invalid Entries | 653 |
SMILES strings are validated using RDKit to ensure chemical correctness.
A heterogeneous undirected graph is constructed where:
- Nodes → Drugs
- Edges → Known drug interactions
| Dataset | BioSNAP | DrugBank | Combined |
|---|---|---|---|
| Nodes | 1514 | 1706 | 1756 |
| Edges | 48,514 | 191,402 | 198,020 |
| Undirected | Yes | Yes | Yes |
| Isolated Nodes | No | No | No |
| Avg Degree | 64.08 | 224.38 | 225.53 |
- SMILES validation (RDKit)
- Removal of invalid entries
- Graph construction using PyTorch Geometric
- Random Link Split (train/val/test)
- Negative sampling for non-interacting pairs
- Transformer-based encoders generate embeddings:
- Structural (SMILES)
- Semantic (Descriptions)
- Concatenation + Linear Projection
- Produces unified drug representation
- Two-layer Graph Attention Network (GAT)
- Learns relational dependencies
- Pairwise feature construction:
- MLP decoder → Sigmoid output
| Model | Representation | AUC | AUPR |
|---|---|---|---|
| T5 | SMILES | 0.872 | 0.861 |
| SBERT | SMILES | 0.888 | 0.870 |
| ChemBERTa | SMILES | 0.829 | 0.787 |
| T5 | Description | 0.847 | 0.818 |
| SBERT | Description | 0.931 | 0.925 |
| ChemBERTa | Description | 0.817 | 0.764 |
| T5 | Fusion | 0.888 | 0.880 |
| SBERT | Fusion | 0.945 | 0.943 |
| ChemBERTa | Fusion | 0.866 | 0.831 |
Best Model: SBERT Fusion
Key Insight: Multimodal fusion significantly improves performance
- Python
- PyTorch & PyTorch Geometric
- HuggingFace Transformers
- RDKit
- Scikit-learn


