Multi-modal Homogeneous Chemical Reaction Performance Prediction with Graph and Chemical Language Information
Shen Wang
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
Leicester International Institute, Dalian University of Technology, Panjin, Liaoning, 124221 China
Search for more papers by this authorWeiren Zhao
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China
Search for more papers by this authorYining Liu
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China
Search for more papers by this authorCorresponding Author
Yang Li
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China
E-mail: [email protected]Search for more papers by this authorShen Wang
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
Leicester International Institute, Dalian University of Technology, Panjin, Liaoning, 124221 China
Search for more papers by this authorWeiren Zhao
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China
Search for more papers by this authorYining Liu
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China
Search for more papers by this authorCorresponding Author
Yang Li
State Key Laboratory of Fine Chemicals, Dalian University of Technology, Dalian, Liaoning, 116024 China
School of Chemical Engineering, Ocean and Life Sciences, Dalian University of Technology, Panjin, Liaoning, 124221 China
E-mail: [email protected]Search for more papers by this authorComprehensive Summary
Accurate prediction for chemical reaction performance offers optimal direction for synthetic development. To this end, we present a novel multi-modal model called MMHRP-GCL to achieve the prediction of homogeneous chemical reaction yield, enantioselectivity, and activation energy by fusing the information from the text and graph modalities, requiring only 8 simple descriptors and Reaction SMILES obtained without high-cost DFT computation, and capable of managing reactions involving a fluctuating number of molecules. Experimental results on 4 datasets show that MMHRP-GCL outperforms at least 7 generalized SOTA methods. Ablation study confirms the critical roles of the complementation of graph and text modalities, as well as the significance of modality alignment and atomic features in prediction. Albeit there is still room for improvement in the interpretation of atomic relationships, the model has a remarkable ability to identify important atoms. A statistically interpretable study of the feature importance and a test on challenging dataset further demonstrates the utility and potential of the model. As a high-accuracy, low-cost, interpretable, and general multi-modal model, MMHRP-GCL provides valuable guidance on the design of forward predictors for homogeneous catalytic reactions.
Supporting Information
Filename | Description |
---|---|
cjoc202401186-sup-0001-supinfo.pdfPDF document, 8.8 MB |
Appendix S1: Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- 1 Hinshelwood, C. N. Homogeneous Reactions. Chem. Rev. 1926, 3, 227–256.
- 2 Voinarovska, V.; Kabeshov, M.; Dudenko, D.; Genheden, S.; Tetko, I. V. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. J. Chem. Inf. Model. 2024, 64, 42–56.
- 3 Zahrt, A. F.; Athavale, S. V.; Denmark, S. E. Quantitative Structure–Selectivity Relationships in Enantioselective Catalysis: Past, Present, and Future. Chem. Rev. 2020, 120, 1620–1689.
- 4Lewis–Atwell, T.; Townsend, P. A.; Grayson, M. N. Machine Learning Activation Energies of Chemical Reactions. WIREs Comput. Mol. Sci. 2022, 12, e1593.
- 5 Stocker, S.; Csányi, G.; Reuter, K.; Margraf, J. T. Machine Learning in Chemical Reaction Space. Nat. Commun. 2020, 11, 5505.
- 6 Yang, Q.; Liu, Y.; Cheng, J.; Li, Y.; Liu, S.; Duan, Y.; Zhang, L.; Luo, S. An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties. ChemPhysChem 2022, 23, e202200255.
- 7 Hong, X.; Yang, Q.; Liao, K.; Pei, J.; Chen, M.; Mo, F.; Lu, H.; Zhang, W.-B.; Zhou, H.; Chen, J.; Su, L.; Zhang, S.-Q.; Liu, S.; Huang, X.; Sun, Y.-Z.; Wang, Y.; Zhang, Z.; Yu, Z.; Luo, S.; Fu, X.-F.; You, S.-L. AI for Organic and Polymer Synthesis. Sci. China Chem. 2024, 67, 2461–2496.
- 8 Haywood, A. L.; Redshaw, J.; Hanson-Heine, M. W. D.; Taylor, A.; Brown, A.; Mason, A. M.; Gärtner, T.; Hirst, J. D. Kernel Methods for Predicting Yields of Chemical Reactions. J. Chem. Inf. Model. 2022, 62, 2077–2092.
- 9 Asahara, R.; Miyao, T. Extended Connectivity Fingerprints as a Chemical Reaction Representation for Enantioselective Organophosphorus-Catalyzed Asymmetric Reaction Prediction. ACS Omega 2022, 7, 26952–26964.
- 10 Segler, M. H. S.; Waller, M. P. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem.-Eur. J. 2017, 23, 5966–5971.
- 11 Probst, D.; Schwaller, P.; Reymond, J.-L. Reaction Classification and Yield Prediction Using the Differential Reaction Fingerprint DRFP. Digital Discovery 2022, 1, 91–97.
- 12 Gao, B.; Cai, L.; Zhang, Y.; Huang, H.; Li, Y.; Xue, X.-S. A Machine Learning Model for Predicting Enantioselectivity in Hypervalent Iodine(III) Catalyzed Asymmetric Phenolic Dearomatizations. CCS Chem. 2024, 6, 2515–2528.
- 13 Sandfort, F.; Strieth-Kalthoff, F.; Kühnemund, M.; Beecks, C.; Glorius, F. A Structure-Based Platform for Predicting Chemical Reactivity. Chem 2020, 6, 1379–1390.
- 14 Choi, S.; Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Feasibility of Activation Energy Prediction of Gas-Phase Reactions by Machine Learning. Chem.-Eur. J. 2018, 24, 12354–12358.
- 15 Jorner, K.; Brinck, T.; Norrby, P.-O.; Buttar, D. Machine Learning Meets Mechanistic Modelling for Accurate Prediction of Experimental Activation Energies. Chem. Sci. 2021, 12, 1163–1175.
- 16 Schleinitz, J.; Langevin, M.; Smail, Y.; Wehnert, B.; Grimaud, L.; Vuilleumier, R. Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C–O Couplings. J. Am. Chem. Soc. 2022, 144, 14722–14730.
- 17 Ahneman, D. T.; Estrada, J. G.; Lin, S.; Dreher, S. D.; Doyle, A. G. Predicting Reaction Performance in C–N Cross-Coupling Using Machine Learning. Science 2018, 360, 186–190.
- 18 Zuranski, A. M.; Martinez Alvarado, J. I.; Shields, B. J.; Doyle, A. G. Predicting Reaction Yields via Supervised Learning. Acc. Chem. Res. 2021, 54, 1856–1865.
- 19 Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Prediction of Higher-Selectivity Catalysts by Computer-Driven Workflow and Machine Learning. Science 2019, 363, eaau5631.
- 20 Liu, Y.; Li, Y.; Yang, Q.; Yang, J.; Zhang, L.; Luo, S. Prediction of Bond Dissociation Energy for Organic Molecules Based on a Machine-Learning Approach. Chin. J. Chem. 2024, 42, 1967–1974.
- 21 Zhu, X.; Ran, C.; Wen, M.; Guo, G.; Liu, Y.; Liao, L.; Li, Y.; Li, M.; Yu, D. Prediction of Multicomponent Reaction Yields Using Machine Learning. Chin. J. Chem. 2021, 39, 3231–3237.
- 22 Xu, Y.; Gao, Y.; Su, L.; Wu, H.; Tian, H.; Zeng, M.; Xu, C.; Zhu, X.; Liao, K. High-Throughput Experimentation and Machine Learning-Assisted Optimization of Iridium-Catalyzed Cross-Dimerization of Sulfoxonium Ylides. Angew. Chem. Int. Ed. 2023, 62, e202313638.
- 23 Gao, B.; Chang, Y.; Tang, W. Prediction of the Enantiomeric Excess Value for Asymmetric Transfer Hydrogenation Based on Machine Learning. Org. Chem. Front. 2023, 10, 1456–1462.
- 24 Ding, Y.; Qiang, B.; Chen, Q.; Liu, Y.; Zhang, L.; Liu, Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J. Chem. Inf. Model. 2024, 64, 2955–2970.
- 25 Wen, M.; Blau, S. M.; Xie, X.; Dwaraknath, S.; Persson, K. A. Improving Machine Learning Performance on Small Chemical Reaction Data with Unsupervised Contrastive Pretraining. Chem. Sci. 2022, 13, 1446–1458.
- 26 Han, J.; Kwon, Y.; Choi, Y.-S.; Kang, S. Improving Chemical Reaction Yield Prediction Using Pre-Trained Graph Neural Networks. J. Cheminform. 2024, 16, 25.
- 27 Kwon, Y.; Lee, D.; Choi, Y.-S.; Kang, S. Uncertainty-Aware Prediction of Chemical Reaction Yields with Graph Neural Networks. J. Cheminform. 2022, 14, 2.
- 28 Bi, H.; Wang, H.; Shi, C.; Coley, C.; Tang, J.; Guo, H. Non-Autoregressive Electron Redistribution Modeling for Reaction Prediction. arXiv 2021, DOI: 10.48550/arXiv.2106.07801.
- 29 Heid, E.; Green, W. H. Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction. J. Chem. Inf. Model. 2022, 62, 2101–2110.
- 30
Wang, H.; Li, W.; Jin, X.; Cho, K.; Ji, H.; Han, J.; Burke, M. D. Chemical-Reaction-Aware Molecule Representation Learning. arXiv 2021, DOI: https://doi.org/10.48550/arXiv.2109.09888.
10.48550/arXiv.2109.09888 Google Scholar
- 31 Yarish, D.; Garkot, S.; Grygorenko, O. O.; Radchenko, D. S.; Moroz, Y. S.; Gurbych, O. Advancing Molecular Graphs with Descriptors for the Prediction of Chemical Reaction Yields. J. Comput. Chem. 2023, 44, 76–92.
- 32 Li, S.-W.; Xu, L.-C.; Zhang, C.; Zhang, S.-Q.; Hong, X. Reaction Performance Prediction with an Extrapolative and Interpretable Graph Model Based on Chemical Knowledge. Nat. Commun. 2023, 14, 3569.
- 33 Schwaller, P.; Probst, D.; Vaucher, A. C.; Nair, V. H.; Kreutter, D.; Laino, T.; Reymond, J.-L. Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks. Nat. Mach. Intell. 2021, 3, 144–152.
- 34 Sagawa, T.; Kojima, R. ReactionT5: A Large-Scale Pre-Trained Model towards Application of Limited Reaction Data. arXiv 2023, DOI: 10.48550/arXiv.2311.06708.
- 35 Jiang, S.; Zhang, Z.; Zhao, H.; Li, J.; Yang, Y.; Lu, B.-L.; Xia, N. When SMILES Smiles, Practicality Judgment and Yield Prediction of Chemical Reaction via Deep Chemical Language Processing. IEEE Access 2021, 9, 85071–85083.
- 36 Schwaller, P.; Vaucher, A. C.; Laino, T.; Reymond, J.-L. Prediction of Chemical Reaction Yields Using Deep Learning. Mach. Learn.: Sci. Technol. 2021, 2, 015016.
- 37 Baraka, S.; Kerdawy, A. M. E. Multimodal Transformer-Based Model for Buchwald-Hartwig and Suzuki-Miyaura Reaction Yield Prediction. arXiv 2022, DOI: 10.48550/arXiv.2204.14062.
- 38 Shi, R.; Yu, G.; Huo, X.; Yang, Y. Prediction of Chemical Reaction Yields with Large-Scale Multi-View Pre-Training. J. Cheminform. 2024, 16, 22.
- 39 Saebi, M.; Nan, B.; Herr, J. E.; Wahlers, J.; Guo, Z.; Zurański, A. M.; Kogej, T.; Norrby, P.-O.; Doyle, A. G.; Chawla, N. V.; Wiest, O. On the Use of Real-World Datasets for Reaction Yield Prediction. Chem. Sci. 2023, 14, 4997–5005.
- 40 Perera, D.; Tucker, J. W.; Brahmbhatt, S.; Helal, C. J.; Chong, A.; Farrell, W.; Richardson, P.; Sach, N. W. A Platform for Automated Nanomole-Scale Reaction Screening and Micromole-Scale Synthesis in Flow. Science 2018, 359, 429–434.
- 41 Baltrušaitis, T.; Ahuja, C.; Morency, L.-P. Multimodal Machine Learning: A Survey and Taxonomy. arXiv 2017, DOI: 10.48550/arXiv.1705.09406.
- 42 Geidl, S.; Bouchal, T.; Raček, T.; Svobodová Vařeková, R.; Hejret, V.; Křenek, A.; Abagyan, R.; Koča, J. High-Quality and Universal Empirical Atomic Charges for Chemoinformatics Applications. J. Cheminform. 2015, 7, 59.
- 43RDKit: Open-source chemoinformatics and machine learning. http://www.rdkit.org.
- 44 Lin, X.; Quan, Z.; Wang, Z.-J.; Huang, H.; Zeng, X. A Novel Molecular Representation with BiGRU Neural Networks for Learning Atom. Brief. Bioinform. 2020, 21, 2099–2111.
- 45
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, DOI: https://doi.org/10.48550/arXiv.1706.03762.
10.48550/arXiv.1706.03762 Google Scholar
- 46 Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, DOI: 10.48550/arXiv.1810.04805.
- 47 Du, B.-X.; Long, Y.; Li, X.; Wu, M.; Shi, J.-Y. CMMS-GCL: Cross-Modality Metabolic Stability Prediction with Graph Contrastive Learning. Bioinformatics 2023, 39, btad503.
- 48 Kipf, T. N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, DOI: 10.48550/arXiv.1609.02907.
- 49 Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, DOI: 10.48550/arXiv.1710.10903.
- 50 Liu, X.; An, H.; Cai, W.; Shao, X. Deep Learning in Spectral Analysis: Modeling and Imaging. TrAC-Trend Anal. Chem. 2024, 172, 117612.
- 51 Duan, C.; Liu, X.; Cai, W.; Shao, X. Interpretable Perturbator for Variable Selection in Near-Infrared Spectral Analysis. J. Chem. Inf. Model. 2024, 64, 2508–2514.
- 52 Ying, R.; Bourgeois, D.; You, J.; Zitnik, M.; Leskovec, J. GNNExplainer: Generating Explanations for Graph Neural Networks. arXiv 2019, DOI: 10.48550/arXiv.1903.03894.
- 53 Bourouina, A.; Meille, V.; De Bellefon, C. About Solid Phase vs. Liquid Phase in Suzuki-Miyaura Reaction. Catalysts 2019, 9, 60.
- 54 Estrada, J. G.; Ahneman, D. T.; Sheridan, R. P.; Dreher, S. D.; Doyle, A. G. Response to Comment on “Predicting Reaction Performance in C–N Cross-Coupling Using Machine Learning.” Science 2018, 362, eaat8763.
- 55 Liu, C.; Han, P.; Wu, X.; Tang, M. The Mechanism Investigation of Chiral Phosphoric Acid-Catalyzed Friedel–Crafts Reactions–How the Chiral Phosphoric Acid Regains the Proton. Comput. Theor. Chem. 2014, 1050, 39–45.
- 56 Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang P.; Pande V.; Leskovec, J. Strategies for Pre-training Graph Neural Networks. arXiv, 2019, DOI: 10.48550/arXiv.1905.12265.
- 57 Liu, Y.-D.; Qi, Y.; Li, Y.; Zhang, L.; Luo, S. Z. Application of Machine Learning in Organic Chemistry. Chin. J. Org. Chem. 2020, 40, 3812–3827.
- 58 Coley, C. W.; Thomas D. A.; Lummiss, J. A. M.; Jaworski, J. N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L.; Gao, H.; Hicklin, R. W.; Plehiers, P. P.; Byington, J.; Piotti, J. S.; Green, W. H.; Hart, A. J.; Jamison, T. F.; Jensen, K, F. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 2019, 365, eaax1566.