Volume 45, Issue 32 pp. 2710-2718
SOFTWARE NOTE

Clustering one million molecular structures on GPU within seconds

Junyong Gao

Junyong Gao

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author
Mincong Wu

Mincong Wu

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author
Jun Liao

Jun Liao

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author
Fanjun Meng

Fanjun Meng

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author
Changjun Chen

Corresponding Author

Changjun Chen

Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan, China

Correspondence

Changjun Chen, Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan 430074 Hubei, China.

Email: [email protected]

Search for more papers by this author
First published: 14 August 2024

Abstract

Structure clustering is a general but time-consuming work in the study of life science. Up to now, most published tools do not support the clustering analysis on graphics processing unit (GPU) with root mean square deviation metric. In this work, we specially write codes to do the work. It supports multiple threads on multiple GPUs. To show the performance, we apply the program to a 33-residue fragment in protein Pin1 WW domain mutant. The dataset contains 1,400,000 snapshots, which are extracted from an enhanced sampling simulation and distribute widely in the conformational space. Various testing results present that our program is quite efficient. Particularly, with two NVIDIA RTX4090 GPUs and single precision data type, the clustering calculation on 1 million snapshots is completed in a few seconds (including the uploading time of data from memory to GPU and neglecting the reading time from hard disk). This is hundreds of times faster than central processing unit. Our program could be a powerful tool for fast extraction of representative states of a molecule among its thousands to millions of candidate structures.

DATA AVAILABILITY STATEMENT

The clustering source codes are embedded in FSATOOL, which is available on GitHub https://github.com/fsatool/fsatool.github.io. The usage of the program and the trajectory data of WW33 molecule are both presented on the web page https://github.com/fsatool/fsatool.github.io/wiki/Clustering.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.