DIGEX-Curve: Generating Drug-Induced Gene Expression Profiles through Continuous Dose-Response Curve Modeling
Official repository for DIGEX-Curve: Generating Drug-Induced Gene Expression Profiles through Continuous Dose-Response Curve Modeling
To create the conda environment to run the code within this repository run
conda env create -f DIGEXCurve.yml
To be able to use the Jupyter Notebooks contained in this repository, you will additionally need to run the following commands
conda activate DIGEXCurve
conda install ipykernel
python -m ipykernel install --user --name DIGEXCurve
To obtain the basal gene expression profiles used for training and evaluation, place the required files in the 'Data' folder and follow the steps in the 'GEXPreprocessing' notebook.
To obtain the training dataset as described in the paper, place the required files in the 'Data' folder and follow the steps in the 'DIGEXPreprocessing' notebook.
TO pretrain the model run the following command.
python pretrain.py
To train the model run the following command.
python train.py
After training to obtain performance metrics, follow the steps in the 'PerformanceMetrics' notebook.
To obtain predictions for your own dataset, follow the steps in the 'MakePrediction' notebook.
All models were trained using PyTorch [1] 2.6.0 and PyTorch Lightning [2] 2.5.1. Numpy [3], Pandas [4], cmapPy [5], Scikit-learn [6], and SciPy [7] have been used for data processing and evaluation.
[1] Paszke, A., Gross, S., Massa, F., et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems (NeurIPS), 2019.
[2] Falcon, W., and The PyTorch Lightning team. (2019). PyTorch Lightning. GitHub repository. Available at: https://github.com/Lightning-AI/lightning
[3] Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2.
[4] The pandas development team. (2024). pandas-dev/pandas: Pandas 2.2.3. Zenodo. https://doi.org/10.5281/zenodo.3509134
[5] The GCTx format and cmap{Py, R, M} packages: resources for the optimized storage and integrated traversal of dense matrices of data and annotations Oana M. Enache, David L. Lahr, Ted E. Natoli, Lev Litichevskiy, David Wadden, Corey Flynn, Joshua Gould, Jacob K. Asiedu, Rajiv Narayan, Aravind Subramanian bioRxiv 227041; doi: https://doi.org/10.1101/227041
[6] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[7] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17(3), 261-272. DOI: 10.1038/s41592-019-0686-2.