Pre-training strategy for antiviral drug screening with low-data graph neural network: A case study in HIV-1 K103N reverse transcriptase
Kajjana Boonpalit
School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
Search for more papers by this authorHathaichanok Chuntakaruk
Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
Center of Excellence in Structural and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
Center for Artificial Intelligence in Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Bangkok, Thailand
Search for more papers by this authorJiramet Kinchagawat
School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
CARIVA (Thailand) Company Ltd, Bangkok, Thailand
Search for more papers by this authorPeter Wolschann
Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
Search for more papers by this authorSupot Hannongbua
Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
Center of Excellence in Computational Chemistry (CECC), Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
Search for more papers by this authorCorresponding Author
Thanyada Rungrotmongkol
Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
Center of Excellence in Structural and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
Correspondence
Thanyada Rungrotmongkol, Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, 10330, Bangkok, Thailand.
Email: [email protected]
Sarana Nutanong, School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), 21210, Rayong, Thailand.
Email: [email protected]
Search for more papers by this authorCorresponding Author
Sarana Nutanong
School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
Correspondence
Thanyada Rungrotmongkol, Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, 10330, Bangkok, Thailand.
Email: [email protected]
Sarana Nutanong, School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), 21210, Rayong, Thailand.
Email: [email protected]
Search for more papers by this authorKajjana Boonpalit
School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
Search for more papers by this authorHathaichanok Chuntakaruk
Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
Center of Excellence in Structural and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
Center for Artificial Intelligence in Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Bangkok, Thailand
Search for more papers by this authorJiramet Kinchagawat
School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
CARIVA (Thailand) Company Ltd, Bangkok, Thailand
Search for more papers by this authorPeter Wolschann
Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
Search for more papers by this authorSupot Hannongbua
Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
Center of Excellence in Computational Chemistry (CECC), Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
Search for more papers by this authorCorresponding Author
Thanyada Rungrotmongkol
Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
Center of Excellence in Structural and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
Correspondence
Thanyada Rungrotmongkol, Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, 10330, Bangkok, Thailand.
Email: [email protected]
Sarana Nutanong, School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), 21210, Rayong, Thailand.
Email: [email protected]
Search for more papers by this authorCorresponding Author
Sarana Nutanong
School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
Correspondence
Thanyada Rungrotmongkol, Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, 10330, Bangkok, Thailand.
Email: [email protected]
Sarana Nutanong, School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), 21210, Rayong, Thailand.
Email: [email protected]
Search for more papers by this authorKajjana Boonpalit and Hathaichanok Chuntakaruk contributed equally to this study.
Abstract
Graph neural networks (GNN) offer an alternative approach to boost the screening effectiveness in drug discovery. However, their efficacy is often hindered by limited datasets. To address this limitation, we introduced a robust GNN training framework, applied to various chemical databases to identify potent non-nucleoside reverse transcriptase inhibitors (NNRTIs) against the challenging K103N-mutated HIV-1 RT. Leveraging self-supervised learning (SSL) pre-training to tackle data scarcity, we screened 1,824,367 compounds, using multi-step approach that incorporated machine learning (ML)-based screening, analysis of absorption, distribution, metabolism, and excretion (ADME) prediction, drug-likeness properties, and molecular docking. Ultimately, 45 compounds were left as potential candidates with 17 of the compounds were previously identified as NNRTIs, exemplifying the model's efficacy. The remaining 28 compounds are anticipated to be repurposed for new uses. Molecular dynamics (MD) simulations on repurposed candidates unveiled two promising preclinical drugs: one designed against Plasmodium falciparum and the other serving as an antibacterial agent. Both have superior binding affinity compared to anti-HIV drugs. This conceptual framework could be adapted for other disease-specific therapeutics, facilitating the identification of potent compounds effective against both WT and mutants while revealing novel scaffolds for drug design and discovery.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
Open Research
DATA AVAILABILITY STATEMENT
The GIN models, pre-training dataset, and downstream dataset used in this work are available in our GitHub repository at https://github.com/kajjana/HIV-SSL. The code accompanying this work is taken from GitHub repository of Ref. 11 (https://github.com/snap-stanford/pretrain-gnns) and 29 (https://github.com/yuyangw/MolCLR).
Supporting Information
Filename | Description |
---|---|
jcc27514-sup-0001-Supinfo.docxWord 2007 document , 1.3 MB | Data S1. Supporting Information. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1H. C. Castro, N. I. Loureiro, M. Pujol-Luz, A. M. Souza, M. G. Albuquerque, D. O. Santos, L. M. Cabral, I. C. Frugulhetti, C. R. Rodrigues, Curr. Med. Chem. 2006, 13, 313.
- 2G. Mirambeau, S. Lyonnais, D. Coulaud, L. Hameau, S. Lafosse, J. Jeusset, I. Borde, M. Reboud-Ravaux, T. Restle, R. J. Gorelick, E. Le Cam, PLoS One 2007, 2, e669.
- 3H. C. Tsai, I. T. Chen, H. M. Chang, S. S. Lee, Y. S. Chen, Infect. Drug Resist. 2022, 15, 3857.
- 4K. Steegen, M. Bronze, M. A. Papathanasopoulos, G. van Zyl, D. Goedhals, E. Variava, W. MacLeod, I. Sanne, W. S. Stevens, S. Carmona, J. Antimicrob. Chemother. 2017, 72, 210.
- 5O. A. Tarasova, A. F. Urusova, D. A. Filimonov, M. C. Nicklaus, A. V. Zakharov, V. V. Poroikov, J. Chem. Inf. Model. 2015, 55, 1388.
- 6C. Cai, S. Wang, Y. Xu, W. Zhang, K. Tang, Q. Ouyang, L. Lai, J. Pei, J. Med. Chem. 2020, 63, 8683.
- 7R. Liu, S. Laxminarayan, J. Reifman, A. Wallqvist, J. Comput.-Aided Mol. Des. 2022, 36, 867.
- 8S. Wang, Q. Sun, Y. Xu, J. Pei, L. Lai, Brief. Bioinform. 2021, 22, 6.
- 9J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, Proceedings of the 34th International Conference on Machine Learning, Vol. 70, PMLR, Sydney, NSW, Australia 2017, p. 1263.
- 10Y. Liu, S. Pan, M. Jin, C. Zhou, F. Xia, P. Yu, IEEE Trans. Knowl. Data Eng. 2022, 35, 5879.
- 11W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, J. Leskovec, arXiv preprint, arXiv:190512265. 2019.
- 12T. N. Kipf, M. Welling, arXiv preprint, arXiv:161107308. 2016.
- 13Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, Y. Shen, Adv. Neural Inf. Process. Syst. 2020, 33, 5812.
- 14Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, V. Pande, Chem. Sci. 2018, 9, 513.
- 15A. Keshavarzi Arshadi, M. Salem, A. Firouzbakht, J. S. Yuan, J. Cheminfo. 2022, 14, 10.
- 16X. Jia, A. Lynch, Y. Huang, M. Danielson, I. Lang'at, A. Milder, A. E. Ruby, H. Wang, S. A. Friedler, A. J. Norquist, J. Schrier, Nature 2019, 573, 251.
- 17B. Ramsundar, P. Eastman, P. Walters, V. Pande, Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more, O'Reilly Media, Inc, Sebastopol, CA, USA 2019.
- 18S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He, Q. Li, B. A. Shoemaker, P. A. Thiessen, B. Yu, L. Zaslavsky, J. Zhang, E. E. Bolton, Nucleic Acids Res. 2023, 51, D1373.
- 19A. Gaulton, L. J. Bellis, A. P. Bento, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, D. Michalovich, B. Al-Lazikani, J. P. Overington, Nucleic Acids Res. 2012, 40, D1100.
- 20A. L. Chávez-Hernández, K. E. Juárez-Mercado, F. I. Saldívar-González, J. L. Medina-Franco, Biomolecules 2021, 11, 12.
- 21J. Lindberg, S. Sigurdsson, S. Löwgren, H. O. Andersson, C. Sahlberg, R. Noréen, K. Fridborg, H. Zhang, T. Unge, Eur. J. Biochem. 2002, 269, 1670.
- 22D. S. Wishart, Y. D. Feunang, A. C. Guo, E. J. Lo, A. Marcu, J. R. Grant, T. Sajed, D. Johnson, C. Li, Z. Sayeeda, N. Assempour, I. Iynkkaran, Y. Liu, A. Maciejewski, N. Gale, A. Wilson, L. Chin, R. Cummings, D. Le, A. Pon, C. Knox, M. Wilson, Nucleic Acids Res. 2018, 46, D1074.
- 23K. Xu, W. Hu, J. Leskovec, S. Jegelka, arXiv preprint, arXiv:181000826. 2018.
- 24W. Hamilton, Z. Ying, J. Leskovec, Adv. Neural Inf. Process. Syst. 2017, 30, 1025.
- 25T. N. Kipf, M. Welling, arXiv preprint, arXiv:160902907. 2016.
- 26P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, R. D. Hjelm, arXiv preprint, arXiv:1809.10341. 2018.
- 27T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Adv. Neural Inf. Process. Syst. 2013, 26, 3111.
- 28R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, J. Leskovec, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery, New York, NY 2018, p. 974.
10.1145/3219819.3219890 Google Scholar
- 29Y. Wang, J. Wang, Z. Cao, A. Barati Farimani, Nat. Mach. Intell. 2022, 4, 279.
- 30J. B. Baell, G. A. Holloway, J. Med. Chem. 2010, 53, 2719.
- 31D. F. Veber, S. R. Johnson, H. Y. Cheng, B. R. Smith, K. W. Ward, K. D. Kopple, J. Med. Chem. 2002, 45, 2615.
- 32A. Ünlü, Ü. Ö. Özmen, S. Alyar, A. Öztürk, H. Alyar, A. B. Gündüzalp, J. Mol. Struct. 2023, 1293, 136318.
- 33A. K. Ghose, V. N. Viswanadhan, J. J. Wendoloski, J. Comb. Chem. 1999, 1, 55.
- 34M. R. Naylor, A. T. Bockus, M. J. Blanco, R. S. Lokey, Curr. Opin. Chem. Biol. 2017, 38, 141.
- 35A. P. Bento, A. Hersey, E. Félix, G. Landrum, A. Gaulton, F. Atkinson, L. J. Bellis, M. De Veij, A. R. Leach, J. Cheminfo. 2020, 12, 51.
- 36E. B. Lansdon, K. M. Brendza, M. Hung, R. Wang, S. Mukund, D. Jin, G. Birkus, N. Kutty, X. Liu, J. Med. Chem. 2010, 53, 4295.
- 37G. Jones, P. Willett, R. C. Glen, A. R. Leach, R. Taylor, J. Mol. Biol. 1997, 267, 727.
- 38T. J. Dolinsky, P. Czodrowski, H. Li, J. E. Nielsen, J. H. Jensen, G. Klebe, N. A. Baker, Nucleic Acids Res. 2007, 35, W522.
- 39D. Case, H. M. Aktulga, K. Belfon, I. Ben-Shalom, J. Berryman, S. Brozell, D. Cerutti, T. Cheatham, G. A. Cisneros, V. Cruzeiro, T. Darden, R. Duke, G. Giambasu, M. Gilson, H. Gohlke, A. Götz, R. Harris, S. Izadi, S. Izmailov, P. Kollman, Amber 2022, University of California, CA, USA, 2022.
- 40H. Chuntakaruk, K. Hengphasatporn, Y. Shigeta, C. Aonbangkhen, V. S. Lee, T. Khotavivattana, T. Rungrotmongkol, S. Hannongbua, Sci. Rep. 2024, 14, 1.
- 41H. Chuntakaruk, K. Boonpalit, J. Kinchagawat, F. Nakarin, T. Khotavivattana, C. Aonbangkhen, Y. Shigeta, K. Hengphasatporn, S. Nutanong, T. Rungrotmongkol, S. Hannongbua, J. Comput. Chem. 2024, 45, 953.
- 42T. Sulea, E. O. Purisima, Methods Mol. Biol. 2012, 819, 295.
- 43G. Wolber, T. Langer, J. Chem. Inf. Model. 2005, 45, 160.
- 44X. Chen, W. Xie, Y. Yang, Y. Hua, G. Xing, L. Liang, C. Deng, Y. Wang, Y. Fan, H. Liu, T. Lu, Y. Chen, Y. Zhang, J. Chem. Inf. Model. 2020, 60, 4640.
- 45A. Tuerkova, B. J. Bongers, U. Norinder, O. Ungvári, V. Székely, A. Tarnovskiy, G. Szakács, C. Özvegy-Laczka, G. J. P. van Westen, B. Zdrazil, J. Chem. Inf. Model. 2022, 62, 6323.
- 46T. Xu, M. Xu, W. Zhu, C. Z. Chen, Q. Zhang, W. Zheng, R. Huang, J. Med. Chem. 2022, 65, 4590.
- 47D. Hwang, S. Yang, Y. Kwon, K. H. Lee, G. Lee, H. Jo, S. Yoon, S. Ryu, J. Chem. Inf. Model. 2020, 60, 5936.
- 48G. W. Bemis, M. A. Murcko, J. Med. Chem. 1996, 39, 2887.
- 49K. Ishiguro, S. I. Maeda, M. Koyama, arXiv preprint, arXiv:190201020. 2019.
- 50Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, J. Huang, Adv. Neural Inf. Process. Syst. 2020, 33, 12559.
- 51K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, J. Chem. Inf. Model. 2019, 59, 3370.
- 52M. T. Rosenstein, Z. Marx, L. P. Kaelbling, T. G. Dietterich, NIPS 2005 Workshop Transfer Learn. 2005, 89, 4.
- 53L. Van der Maaten, G. Hinton, J. Mach. Learn. Res. 2008, 9, 11.
- 54L. McInnes, J. Healy, J. Melville, arXiv preprint, arXiv:1802.03426. 2018.
- 55R. P. Sheridan, J. Chem. Inf. Model. 2012, 52, 814.
- 56H. Zhang, K. M. Saravanan, Y. Yang, Y. Wei, P. Yi, J. Z. H. Zhang, Brief. Bioinform. 2022, 23, 4.
- 57B. C. Doak, B. Over, F. Giordanetto, J. Kihlberg, Chem. Biol. 2014, 21, 1115.
- 58P. Srivab, S. Hannongbua, ChemMedChem 2008, 3, 803.
- 59R. A. Spence, W. M. Kati, K. S. Anderson, K. A. Johnson, Science 1995, 267, 988.
- 60D. Ramírez, J. Caballero, Molecules 2018, 23, 1038.
- 61J. R. Balser, Cardiovasc. Res. 1999, 42, 327.
- 62S. G. Sarafianos, B. Marchand, K. Das, D. M. Himmel, M. A. Parniak, S. H. Hughes, E. Arnold, J. Mol. Biol. 2009, 385, 693.
- 63V. A. Braz, M. D. Barkley, R. A. Jockusch, P. L. Wintrode, Biochemistry 2010, 49, 10565.
- 64J. Ren, J. Milton, K. L. Weaver, S. A. Short, D. I. Stuart, D. K. Stammers, Structure 2000, 8, 1089.
- 65G. Li, Y. Wang, E. De Clercq, Acta Pharm. Sin. B. 2022, 12, 1567.
- 66F. Esposito, A. Corona, E. Tramontano, Mol. Biol. Int. 2012, 2012, 586401.
- 67L. Ding, C. Pannecouque, E. De Clercq, C. Zhuang, F.-E. Chen, J. Med. Chem. 2021, 64, 5067.
- 68S. Alcaro, C. Alteri, A. Artese, F. Ceccherini-Silberstein, G. Costa, F. Ortuso, L. Parrotta, C. F. Perno, V. Svicher, Drug Resist. Updat. 2011, 14, 141.
- 69K. Das, P. J. Lewi, S. H. Hughes, E. Arnold, Prog. Biophys. Mol. Biol. 2005, 88, 209.
- 70E. Kodama, M. Orita, N. Masuda, O. Yamomoto, M. Fujii, T. Ohgami, S. Kageyama, M. Ohta, T. Hatta, H. Inoue, H. Suzuki, K. Sudo, Y. Shimizu, M. Matsuoka, Antiviral Chem. Chemother. 2008, 19, 133.