Clustering one million molecular structures on GPU within seconds
Junyong Gao
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorMincong Wu
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorJun Liao
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorFanjun Meng
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorCorresponding Author
Changjun Chen
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Correspondence
Changjun Chen, Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan 430074 Hubei, China.
Email: [email protected]
Search for more papers by this authorJunyong Gao
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorMincong Wu
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorJun Liao
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorFanjun Meng
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Search for more papers by this authorCorresponding Author
Changjun Chen
Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China
Correspondence
Changjun Chen, Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan 430074 Hubei, China.
Email: [email protected]
Search for more papers by this authorAbstract
Structure clustering is a general but time-consuming work in the study of life science. Up to now, most published tools do not support the clustering analysis on graphics processing unit (GPU) with root mean square deviation metric. In this work, we specially write codes to do the work. It supports multiple threads on multiple GPUs. To show the performance, we apply the program to a 33-residue fragment in protein Pin1 WW domain mutant. The dataset contains 1,400,000 snapshots, which are extracted from an enhanced sampling simulation and distribute widely in the conformational space. Various testing results present that our program is quite efficient. Particularly, with two NVIDIA RTX4090 GPUs and single precision data type, the clustering calculation on 1 million snapshots is completed in a few seconds (including the uploading time of data from memory to GPU and neglecting the reading time from hard disk). This is hundreds of times faster than central processing unit. Our program could be a powerful tool for fast extraction of representative states of a molecule among its thousands to millions of candidate structures.
Open Research
DATA AVAILABILITY STATEMENT
The clustering source codes are embedded in FSATOOL, which is available on GitHub https://github.com/fsatool/fsatool.github.io. The usage of the program and the trajectory data of WW33 molecule are both presented on the web page https://github.com/fsatool/fsatool.github.io/wiki/Clustering.
Supporting Information
Filename | Description |
---|---|
jcc27470-sup-0001-supinfo.docxWord 2007 document , 21.9 KB | Data S1. Supporting Information. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1J. A. McCammon, B. R. Gelin, M. Karplus, Nature 1977, 267, 585.
- 2A. K. Jain, R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc, Upper Saddle River, NJ 1988.
- 3W. Kabsch, Acta Crystallogr. Sect. A: Found Crystallogr. 1976, 32, 922.
- 4K. A. Patel, P. Thakral, Int. Conf. Commun. Signal Process 2016, 2042. https://doi.org/10.1109/ICCSP.2016.7754534
10.1109/ICCSP.2016.7754534 Google Scholar
- 5J. Wu, K. Long, F. Wang, C. Qian, C. Li, Z. Lin, H. Zha, Proc. IEEE Int. Conf. Comput. Vision 2019, 8149. https://doi.org/10.1109/ICCV.2019.00824
10.1109/ICCV.2019.00824 Google Scholar
- 6B. Probierz, J. Kozak, A. Hrabia, Proc. Comput. Sci. 2022, 207, 3449.
10.1016/j.procs.2022.09.403 Google Scholar
- 7M. Hosseinzadeh, A. Hemmati, A. M. Rahmani, Cluster Comput. 2022, 25, 4097.
- 8M. R. Karim, O. Beyan, A. Zappa, I. G. Costa, D. Rebholz-Schuhmann, M. Cochez, S. Decker, Brief Bioinform. 2021, 22, 393.
- 9S. Patel, S. Sihmar, A. Jatain, Dept of Comput Sci, ITM Univ, 537–541. 2015.
- 10X. Jin, J. Han, Encyclopedia of Machine Learning, Springer, New York 2010, p. 766.
- 11J. Sander, Encyclopedia of Machine Learning, Springer, New York 2010, p. 270.
- 12A. Banerjee, H. Shan, Encyclopedia of Machine Learning, Springer, New York 2010, p. 686.
- 13C.-F. Tsai, Y. Hu, Proc. Int. Conf. Mach. Learn Cybern. 2013, 3, 1279.
- 14J. Hartingan, M. K. Wong, J. R. Stat. Soc. 1979, 28, 100.
- 15H.-S. Park, C.-H. Jun, Expert Syst. Appl. 2009, 36, 3336.
- 16J. Gan, Y. Tao, Proc. ACM SIGMOD Int. Conf. Manage Data 2015, 519. https://doi.org/10.1145/2723372.2737792
10.1145/2723372.2737792 Google Scholar
- 17M. Ester, H.-P. Kriegel, J. Sander, X. Xu, Data Min. Knowl. Discov. 1996, 226. https://doi.org/10.5555/3001460.3001507
10.5555/3001460.3001507 Google Scholar
- 18S. Liu, S. Cao, M. Suarez, E. C. Goonetillek, X. Huang, BioRxiv. 2021.
- 19M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernández, M. Hoffmann, N. Plattner, C. Wehmeyer, J.-H. Prinz, F. Noé, J. Chem. Theory Comput. 2015, 11, 5525.
- 20M. P. Harrigan, M. M. Sultan, C. X. Hernández, B. E. Husic, P. Eastman, C. R. Schwantes, K. A. Beauchamp, R. T. McGibbon, V. S. Pande, Biophys. J. 2017, 112, 10.
- 21D. R. Roe, T. E. Cheatham, J. Chem. Theory Comput. 2013, 9, 3084.
- 22M. Li, E. Frank, B. Pfahringer, Data Min. Knowl. Disc. 2023, 37, 67.
- 23H. Zhang, Q. Gong, H. Zhang, C. Chen, J. Comput. Chem. 2019, 41, 1087.
- 24Z. Shu, M. Wu, J. Liao, C. Chen, J. Comput. Chem. 2021, 43, 215.
- 25W. Gropp, E. Lusk, R. Ross, R. Thakur, Proc. IEEE Int. Conf. Cluster Comput. 2003, 19. https://doi.org/10.1109/CLUSTER.2003.10010
10.1109/CLUSTER.2003.10010 Google Scholar
- 26D. Arthur, S. Vassilvitskii, Proc. Annu. ACM SIAM Symp. Discrete Algorithms 2007, 1, 1027.
- 27R. T. Ng, J. Han, IEEE Trans. Knowl. Data Eng. 2002, 14, 1003.
- 28M. Jäger, Y. Zhang, J. Bieschke, H. Nguyen, M. Dendle, M. E. Bowman, J. P. Noel, M. Gruebele, J. W. Kelly, PNAS 2006, 103, 10648.
- 29H. Zhang, Q. Gong, H. Zhang, C. Chen, J. Comput. Chem. 2019, 40, 1806.
- 30V. Babin, C. Roland, C. Sagui, J. Chem. Phys. 2007, 128, 134101.
10.1063/1.2844595 Google Scholar
- 31M. Moradi, V. Babin, C. Roland, T. A. Darden, C. Sagui, Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 20746.
- 32V. Babin, C. Sagui, J. Chem. Phys. 2009, 132, 104108.
10.1063/1.3355621 Google Scholar
- 33N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, J. Chem. Phys. 1953, 21, 1087.
- 34P. W. Rose, B. Beran, C. Bi, W. F. Bluhm, D. Dimitropoulos, D. S. Goodsell, A. Prlic, M. Quesada, G. B. Quinn, J. D. Westbrook, J. Young, B. Yukich, C. Zardecki, H. M. Berman, P. E. Bourne, Nucleic Acids Res. 2011, 39, 392.
- 35D. A. Case, H. M. Aktulga, K. Belfon, D. S. Cerutti, G. A. Cisneros, V. W. D. Cruzeiro, N. Forouzesh, T. J. Giese, A. W. Götz, H. Gohlke, S. Izadi, K. Kasavajhala, M. C. Kaymak, E. King, T. Kurtzman, T.-S. Lee, P. Li, J. Liu, T. Luchko, R. Luo, M. Manathunga, M. R. Machado, H. M. Nguyen, K. A. O'Hearn, A. V. Onufriev, F. Pan, S. Pantano, R. Qi, A. Rahnamoun, A. Risheh, S. Schott-Verdugo, A. Shajan, J. Swails, J. Wang, H. Wei, X. Wu, Y. Wu, S. Zhang, S. Zhao, Q. Zhu, D. R. Roe, A. Roitberg, C. Simmerling, D. M. York, M. C. Nagan, J. Chem. Inf. Model. 2023, 63, 6183.
- 36J. A. Maier, C. Martinez, K. Kasavajhala, L. Wickstrom, K. E. Hauser, C. Simmerling, J. Chem. Theory Comput. 2015, 11, 3696.
- 37S. T. Alexander, Adaptive Signal Processing: Theory and Applications, Springer New York, New York 1986, p. 46.
10.1007/978-1-4612-4978-8_4 Google Scholar
- 38Duchi, J. 1963.
- 39H. Nguyen, D. R. Roe, C. Simmerling, J. Chem. Theory Comput. 2013, 9, 2020.
- 40J.-P. Ryckaert, G. Ciccotti, H. J. C. Berendsen, J. Chem. Phys. 1977, 23, 327.
- 41W. Humphrey, A. Dalke, K. Schulten, J. Mol. Graph. 1995, 14, 33.
10.1016/0263-7855(96)00018-5 Google Scholar
- 42C. Casandjian, N. Challamel, C. Lanos, J. Hellesland, Reinforced Concrete Beams, Columns and Frames, Wiley-ISET, London and Hoboken 2013, p. 267.
10.1002/9781118639511.app1 Google Scholar
- 43G. R. Bowman, K. A. Beauchamp, G. Boxer, V. S. Pande, J. Chem. Phys. 2009, 131, 1309.
10.1063/1.3216567 Google Scholar
- 44K. A. McKiernan, B. E. Husic, V. S. Pande, J. Chem. Phys. 2017, 147, 104107.
- 45M. M. Sultan, V. S. Pande, J. Chem. Phys. 2018, 149, 094106.
- 46K. A. Beauchamp, R. McGibbon, Y.-S. Lin, V. S. Pande, Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 17807.
- 47C. R. Schwantes, V. S. Pande, J. Chem. Theory Comput. 2013, 9, 2000.
- 48D. Davies, D. Bouldin, IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224.
- 49Y. Hozumi, K. A. Tanemura, G.-W. Wei, J. Chem. Inf. Model 2023, 64, 2829.
- 50L. McInnes, J. Healy, ArXiv:180203426. 2018.