Volume 43, Issue 11 pp. 1230-1238
Concise Report

Multi-modal Homogeneous Chemical Reaction Performance Prediction with Graph and Chemical Language Information

Shen Wang

Shen Wang

State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China

Leicester International Institute, Dalian University of Technology, Panjin, Liaoning, 124221 China

Search for more papers by this author
Weiren Zhao

Weiren Zhao

State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China

School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China

Search for more papers by this author
Yining Liu

Yining Liu

State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China

School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China

Search for more papers by this author
Yang Li

Corresponding Author

Yang Li

State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China

School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China

E-mail: [email protected]Search for more papers by this author
First published: 20 February 2025

Comprehensive Summary

Accurate prediction for chemical reaction performance offers optimal direction for synthetic development. To this end, we present a novel multi-modal model called MMHRP-GCL to achieve the prediction of homogeneous chemical reaction yield, enantioselectivity, and activation energy by fusing the information from the text and graph modalities, requiring only 8 simple descriptors and Reaction SMILES obtained without high-cost DFT computation, and capable of managing reactions involving a fluctuating number of molecules. Experimental results on 4 datasets show that MMHRP-GCL outperforms at least 7 generalized SOTA methods. Ablation study confirms the critical roles of the complementation of graph and text modalities, as well as the significance of modality alignment and atomic features in prediction. Albeit there is still room for improvement in the interpretation of atomic relationships, the model has a remarkable ability to identify important atoms. A statistically interpretable study of the feature importance and a test on challenging dataset further demonstrates the utility and potential of the model. As a high-accuracy, low-cost, interpretable, and general multi-modal model, MMHRP-GCL provides valuable guidance on the design of forward predictors for homogeneous catalytic reactions.

image

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.