Volume 45, Issue 32 pp. 2929-2940
RESEARCH ARTICLE

Enhancing protein-ligand binding affinity prediction through sequential fusion of graph and convolutional neural networks

Yimin Yang

Yimin Yang

Department of Physics, University of Science and Technology of China, Hefei, China

Search for more papers by this author
Ruiqin Zhang

Corresponding Author

Ruiqin Zhang

Department of Physics, City University of Hong Kong, Hong Kong, China

Correspondence

Ruiqin Zhang, Department of Physics, City University of Hong Kong, Hong Kong 999077, China.

Email: [email protected]

Zijing Lin, Department of Physics, University of Science and Technology of China, Hefei 230026, China.

Email: [email protected]

Search for more papers by this author
Zijing Lin

Corresponding Author

Zijing Lin

Department of Physics, University of Science and Technology of China, Hefei, China

Hefei National Laboratory, University of Science and Technology of China, Hefei, China

Correspondence

Ruiqin Zhang, Department of Physics, City University of Hong Kong, Hong Kong 999077, China.

Email: [email protected]

Zijing Lin, Department of Physics, University of Science and Technology of China, Hefei 230026, China.

Email: [email protected]

Search for more papers by this author
First published: 02 September 2024
Citations: 1

Abstract

Predicting protein-ligand binding affinity is a crucial and challenging task in structure-based drug discovery. With the accumulation of complex structures and binding affinity data, various machine-learning scoring functions, particularly those based on deep learning, have been developed for this task, exhibiting superiority over their traditional counterparts. A fusion model sequentially connecting a graph neural network (GNN) and a convolutional neural network (CNN) to predict protein-ligand binding affinity is proposed in this work. In this model, the intermediate outputs of the GNN layers, as supplementary descriptors of atomic chemical environments at different levels, are concatenated with the input features of CNN. The model demonstrates a noticeable improvement in performance on CASF-2016 benchmark compared to its constituent CNN models. The generalization ability of the model is evaluated by setting a series of thresholds for ligand extended-connectivity fingerprint similarity or protein sequence similarity between the training and test sets. Masking experiment reveals that model can capture key interaction regions. Furthermore, the fusion model is applied to a virtual screening task for a novel target, PI5P4Kα. The fusion strategy significantly improves the ability of the constituent CNN model to identify active compounds. This work offers a novel approach to enhancing the accuracy of deep learning models in predicting binding affinity through fusion strategies.

CONFLICT OF INTEREST STATEMENT

The authors declare no competing financial interest.

DATA AVAILABILITY STATEMENT

The source code developed in this work can be found at https://github.com/IanYMY/GCNN.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.