Volume 45, Issue 32 pp. 2710-2718

SOFTWARE NOTE

Clustering one million molecular structures on GPU within seconds

Junyong Gao,

Junyong Gao

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Mincong Wu,

Mincong Wu

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Jun Liao,

Jun Liao

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Fanjun Meng,

Fanjun Meng

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Changjun Chen,

Corresponding Author

Changjun Chen

[email protected]

orcid.org/0000-0002-6188-5223

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Correspondence

Changjun Chen, Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan 430074 Hubei, China.

Email: [email protected]

Search for more papers by this author

Junyong Gao,

Junyong Gao

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Mincong Wu,

Mincong Wu

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Jun Liao,

Jun Liao

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Fanjun Meng,

Fanjun Meng

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Changjun Chen,

Corresponding Author

Changjun Chen

[email protected]

orcid.org/0000-0002-6188-5223

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Correspondence

Changjun Chen, Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan 430074 Hubei, China.

Email: [email protected]

Search for more papers by this author

First published: 14 August 2024

https://doi.org/10.1002/jcc.27470

Share a link

Email
Wechat
Bluesky

Abstract

Structure clustering is a general but time-consuming work in the study of life science. Up to now, most published tools do not support the clustering analysis on graphics processing unit (GPU) with root mean square deviation metric. In this work, we specially write codes to do the work. It supports multiple threads on multiple GPUs. To show the performance, we apply the program to a 33-residue fragment in protein Pin1 WW domain mutant. The dataset contains 1,400,000 snapshots, which are extracted from an enhanced sampling simulation and distribute widely in the conformational space. Various testing results present that our program is quite efficient. Particularly, with two NVIDIA RTX4090 GPUs and single precision data type, the clustering calculation on 1 million snapshots is completed in a few seconds (including the uploading time of data from memory to GPU and neglecting the reading time from hard disk). This is hundreds of times faster than central processing unit. Our program could be a powerful tool for fast extraction of representative states of a molecule among its thousands to millions of candidate structures.

Open Research

DATA AVAILABILITY STATEMENT

The clustering source codes are embedded in FSATOOL, which is available on GitHub https://github.com/fsatool/fsatool.github.io. The usage of the program and the trajectory data of WW33 molecule are both presented on the web page https://github.com/fsatool/fsatool.github.io/wiki/Clustering.

Supporting Information

REFERENCES

1J. A. McCammon, B. R. Gelin, M. Karplus, Nature 1977, 267, 585.
10.1038/267585a0
CAS PubMed Web of Science® Google Scholar
2A. K. Jain, R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc, Upper Saddle River, NJ 1988.
Google Scholar
3W. Kabsch, Acta Crystallogr. Sect. A: Found Crystallogr. 1976, 32, 922.
10.1107/S0567739476001873
Web of Science® Google Scholar
4K. A. Patel, P. Thakral, Int. Conf. Commun. Signal Process 2016, 2042. https://doi.org/10.1109/ICCSP.2016.7754534
10.1109/ICCSP.2016.7754534
Google Scholar
5J. Wu, K. Long, F. Wang, C. Qian, C. Li, Z. Lin, H. Zha, Proc. IEEE Int. Conf. Comput. Vision 2019, 8149. https://doi.org/10.1109/ICCV.2019.00824
10.1109/ICCV.2019.00824
Google Scholar
6B. Probierz, J. Kozak, A. Hrabia, Proc. Comput. Sci. 2022, 207, 3449.
10.1016/j.procs.2022.09.403
Google Scholar
7M. Hosseinzadeh, A. Hemmati, A. M. Rahmani, Cluster Comput. 2022, 25, 4097.
10.1007/s10586-022-03646-8
Web of Science® Google Scholar
8M. R. Karim, O. Beyan, A. Zappa, I. G. Costa, D. Rebholz-Schuhmann, M. Cochez, S. Decker, Brief Bioinform. 2021, 22, 393.
10.1093/bib/bbz170
PubMed Web of Science® Google Scholar
9S. Patel, S. Sihmar, A. Jatain, Dept of Comput Sci, ITM Univ, 537–541. 2015.
Google Scholar
10X. Jin, J. Han, Encyclopedia of Machine Learning, Springer, New York 2010, p. 766.
Google Scholar
11J. Sander, Encyclopedia of Machine Learning, Springer, New York 2010, p. 270.
Google Scholar
12A. Banerjee, H. Shan, Encyclopedia of Machine Learning, Springer, New York 2010, p. 686.
Google Scholar
13C.-F. Tsai, Y. Hu, Proc. Int. Conf. Mach. Learn Cybern. 2013, 3, 1279.
Google Scholar
14J. Hartingan, M. K. Wong, J. R. Stat. Soc. 1979, 28, 100.
Google Scholar
15H.-S. Park, C.-H. Jun, Expert Syst. Appl. 2009, 36, 3336.
10.1016/j.eswa.2008.01.039
Web of Science® Google Scholar
16J. Gan, Y. Tao, Proc. ACM SIGMOD Int. Conf. Manage Data 2015, 519. https://doi.org/10.1145/2723372.2737792
10.1145/2723372.2737792
Google Scholar
17M. Ester, H.-P. Kriegel, J. Sander, X. Xu, Data Min. Knowl. Discov. 1996, 226. https://doi.org/10.5555/3001460.3001507
10.5555/3001460.3001507
Google Scholar
18S. Liu, S. Cao, M. Suarez, E. C. Goonetillek, X. Huang, BioRxiv. 2021.
Google Scholar
19M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernández, M. Hoffmann, N. Plattner, C. Wehmeyer, J.-H. Prinz, F. Noé, J. Chem. Theory Comput. 2015, 11, 5525.
10.1021/acs.jctc.5b00743
CAS PubMed Web of Science® Google Scholar
20M. P. Harrigan, M. M. Sultan, C. X. Hernández, B. E. Husic, P. Eastman, C. R. Schwantes, K. A. Beauchamp, R. T. McGibbon, V. S. Pande, Biophys. J. 2017, 112, 10.
10.1016/j.bpj.2016.10.042
CAS PubMed Web of Science® Google Scholar
21D. R. Roe, T. E. Cheatham, J. Chem. Theory Comput. 2013, 9, 3084.
10.1021/ct400341p
CAS PubMed Web of Science® Google Scholar
22M. Li, E. Frank, B. Pfahringer, Data Min. Knowl. Disc. 2023, 37, 67.
10.1007/s10618-022-00869-6
CAS Google Scholar
23H. Zhang, Q. Gong, H. Zhang, C. Chen, J. Comput. Chem. 2019, 41, 1087.
Google Scholar
24Z. Shu, M. Wu, J. Liao, C. Chen, J. Comput. Chem. 2021, 43, 215.
10.1002/jcc.26772
PubMed Google Scholar
25W. Gropp, E. Lusk, R. Ross, R. Thakur, Proc. IEEE Int. Conf. Cluster Comput. 2003, 19. https://doi.org/10.1109/CLUSTER.2003.10010
10.1109/CLUSTER.2003.10010
Google Scholar
26D. Arthur, S. Vassilvitskii, Proc. Annu. ACM SIAM Symp. Discrete Algorithms 2007, 1, 1027.
Google Scholar
27R. T. Ng, J. Han, IEEE Trans. Knowl. Data Eng. 2002, 14, 1003.
10.1109/TKDE.2002.1033770
Web of Science® Google Scholar
28M. Jäger, Y. Zhang, J. Bieschke, H. Nguyen, M. Dendle, M. E. Bowman, J. P. Noel, M. Gruebele, J. W. Kelly, PNAS 2006, 103, 10648.
10.1073/pnas.0600511103
CAS PubMed Web of Science® Google Scholar
29H. Zhang, Q. Gong, H. Zhang, C. Chen, J. Comput. Chem. 2019, 40, 1806.
10.1002/jcc.25834
CAS PubMed Web of Science® Google Scholar
30V. Babin, C. Roland, C. Sagui, J. Chem. Phys. 2007, 128, 134101.
10.1063/1.2844595
Google Scholar
31M. Moradi, V. Babin, C. Roland, T. A. Darden, C. Sagui, Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 20746.
10.1073/pnas.0906500106
CAS PubMed Web of Science® Google Scholar
32V. Babin, C. Sagui, J. Chem. Phys. 2009, 132, 104108.
10.1063/1.3355621
Google Scholar
33N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, J. Chem. Phys. 1953, 21, 1087.
10.1063/1.1699114
CAS PubMed Web of Science® Google Scholar
34P. W. Rose, B. Beran, C. Bi, W. F. Bluhm, D. Dimitropoulos, D. S. Goodsell, A. Prlic, M. Quesada, G. B. Quinn, J. D. Westbrook, J. Young, B. Yukich, C. Zardecki, H. M. Berman, P. E. Bourne, Nucleic Acids Res. 2011, 39, 392.
10.1093/nar/gkq1021
CAS PubMed Web of Science® Google Scholar
35D. A. Case, H. M. Aktulga, K. Belfon, D. S. Cerutti, G. A. Cisneros, V. W. D. Cruzeiro, N. Forouzesh, T. J. Giese, A. W. Götz, H. Gohlke, S. Izadi, K. Kasavajhala, M. C. Kaymak, E. King, T. Kurtzman, T.-S. Lee, P. Li, J. Liu, T. Luchko, R. Luo, M. Manathunga, M. R. Machado, H. M. Nguyen, K. A. O'Hearn, A. V. Onufriev, F. Pan, S. Pantano, R. Qi, A. Rahnamoun, A. Risheh, S. Schott-Verdugo, A. Shajan, J. Swails, J. Wang, H. Wei, X. Wu, Y. Wu, S. Zhang, S. Zhao, Q. Zhu, D. R. Roe, A. Roitberg, C. Simmerling, D. M. York, M. C. Nagan, J. Chem. Inf. Model. 2023, 63, 6183.
10.1021/acs.jcim.3c01153
CAS PubMed Web of Science® Google Scholar
36J. A. Maier, C. Martinez, K. Kasavajhala, L. Wickstrom, K. E. Hauser, C. Simmerling, J. Chem. Theory Comput. 2015, 11, 3696.
10.1021/acs.jctc.5b00255
CAS PubMed Web of Science® Google Scholar
37S. T. Alexander, Adaptive Signal Processing: Theory and Applications, Springer New York, New York 1986, p. 46.
10.1007/978-1-4612-4978-8_4
Google Scholar
38Duchi, J. 1963.
Google Scholar
39H. Nguyen, D. R. Roe, C. Simmerling, J. Chem. Theory Comput. 2013, 9, 2020.
10.1021/ct3010485
CAS PubMed Web of Science® Google Scholar
40J.-P. Ryckaert, G. Ciccotti, H. J. C. Berendsen, J. Chem. Phys. 1977, 23, 327.
CAS Web of Science® Google Scholar
41W. Humphrey, A. Dalke, K. Schulten, J. Mol. Graph. 1995, 14, 33.
10.1016/0263-7855(96)00018-5
Google Scholar
42C. Casandjian, N. Challamel, C. Lanos, J. Hellesland, Reinforced Concrete Beams, Columns and Frames, Wiley-ISET, London and Hoboken 2013, p. 267.
10.1002/9781118639511.app1
Google Scholar
43G. R. Bowman, K. A. Beauchamp, G. Boxer, V. S. Pande, J. Chem. Phys. 2009, 131, 1309.
10.1063/1.3216567
Google Scholar
44K. A. McKiernan, B. E. Husic, V. S. Pande, J. Chem. Phys. 2017, 147, 104107.
10.1063/1.4993207
PubMed Google Scholar
45M. M. Sultan, V. S. Pande, J. Chem. Phys. 2018, 149, 094106.
10.1063/1.5029972
PubMed Google Scholar
46K. A. Beauchamp, R. McGibbon, Y.-S. Lin, V. S. Pande, Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 17807.
10.1073/pnas.1201810109
CAS PubMed Web of Science® Google Scholar
47C. R. Schwantes, V. S. Pande, J. Chem. Theory Comput. 2013, 9, 2000.
10.1021/ct300878a
CAS PubMed Web of Science® Google Scholar
48D. Davies, D. Bouldin, IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224.
10.1109/TPAMI.1979.4766909
Web of Science® Google Scholar
49Y. Hozumi, K. A. Tanemura, G.-W. Wei, J. Chem. Inf. Model 2023, 64, 2829.
10.1021/acs.jcim.3c00674
PubMed Google Scholar
50L. McInnes, J. Healy, ArXiv:180203426. 2018.
Google Scholar

Volume45, Issue32

December 15, 2024

Pages 2710-2718

Clustering one million molecular structures on GPU within seconds

Abstract

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Clustering one million molecular structures on GPU within seconds

Abstract

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

References

Related

Information