Auto Machine Learning Assisted Preparation of Carboxylic Acid by TEMPO-Catalyzed Primary Alcohol Oxidation
Jia Qiu
Guangdong Laboratory Animals Monitoring Institute, Guangdong Provincial Key Laboratory of Laboratory Animals, Guangzhou, Guangdong, 510663 China
Guangzhou Municipal and Guangdong Provincial Key Laboratory of Molecular Target & Clinical Pharmacology, the NMPA and State Key Laboratory of Respiratory Disease, School of Pharmaceutical Sciences and the Fifth Affiliated Hospital, Guangzhou Medical University, Guangzhou, Guangdong, 511436 China
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorYougen Xu
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorShimin Su
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorYadong Gao
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorPeiyuan Yu
Department of Chemistry, Southern University of Science and Technology, Shenzhen, Guangdong, 518055 China
Search for more papers by this authorCorresponding Author
Zhixiong Ruan
Guangzhou Municipal and Guangdong Provincial Key Laboratory of Molecular Target & Clinical Pharmacology, the NMPA and State Key Laboratory of Respiratory Disease, School of Pharmaceutical Sciences and the Fifth Affiliated Hospital, Guangzhou Medical University, Guangzhou, Guangdong, 511436 China
E-mail: [email protected]; [email protected]Search for more papers by this authorCorresponding Author
Kuangbiao Liao
Guangdong Laboratory Animals Monitoring Institute, Guangdong Provincial Key Laboratory of Laboratory Animals, Guangzhou, Guangdong, 510663 China
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Guangzhou Laboratory, Guangzhou, Guangdong, 510320 China
E-mail: [email protected]; [email protected]Search for more papers by this authorJia Qiu
Guangdong Laboratory Animals Monitoring Institute, Guangdong Provincial Key Laboratory of Laboratory Animals, Guangzhou, Guangdong, 510663 China
Guangzhou Municipal and Guangdong Provincial Key Laboratory of Molecular Target & Clinical Pharmacology, the NMPA and State Key Laboratory of Respiratory Disease, School of Pharmaceutical Sciences and the Fifth Affiliated Hospital, Guangzhou Medical University, Guangzhou, Guangdong, 511436 China
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorYougen Xu
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorShimin Su
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorYadong Gao
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Search for more papers by this authorPeiyuan Yu
Department of Chemistry, Southern University of Science and Technology, Shenzhen, Guangdong, 518055 China
Search for more papers by this authorCorresponding Author
Zhixiong Ruan
Guangzhou Municipal and Guangdong Provincial Key Laboratory of Molecular Target & Clinical Pharmacology, the NMPA and State Key Laboratory of Respiratory Disease, School of Pharmaceutical Sciences and the Fifth Affiliated Hospital, Guangzhou Medical University, Guangzhou, Guangdong, 511436 China
E-mail: [email protected]; [email protected]Search for more papers by this authorCorresponding Author
Kuangbiao Liao
Guangdong Laboratory Animals Monitoring Institute, Guangdong Provincial Key Laboratory of Laboratory Animals, Guangzhou, Guangdong, 510663 China
Bioland Laboratory, Guangzhou, Guangdong, 510005 China
Guangzhou Laboratory, Guangzhou, Guangdong, 510320 China
E-mail: [email protected]; [email protected]Search for more papers by this authorComprehensive Summary
Though alcohol oxidations were considered as well-established reactions, selecting productive conditions or predicting reaction yields for unseen alcohols remained as major challenges. Herein, an auto machine learning (ML) model for TEMPO-catalyzed oxidation of primary alcohols to the corresponding carboxylic acids is disclosed. A dataset of 3444 data, consisting of 282 primary alcohols and 45 conditions, were generated using high-throughput experimentation (HTE). With the HTE data and 105 descriptors, a multi-label prediction was performed with AutoGluon (an open-source auto machine learning framework) and KNIME (an open-source data analytics platform). For the independent test of 240 reactions (a full matrix of 20 unseen alcohols and 12 conditions), AutoGluon with multi-label prediction for yield prediction (AGMP) gave excellent performance. For external test of 1308 reactions (consisting of 84 alcohols and 45 conditions), AGMP still afforded good results with R2 as 0.767 and MAE as 4.9%. The model also revealed that the newly generated descriptor (Y/N, classification of the reaction reactivity) was the most relevant descriptor for yield prediction, offering a new perspective to integrate HTE and ML in organic synthesis.
Supporting Information
Filename | Description |
---|---|
cjoc202200555-sup-0001-Supinfo.pdfPDF document, 9 MB |
Appendix S1: Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- 1Velioglu, Y. S. Food Acids: Organic Acids, Volatile Organic Acids, and Phenolic Acids. In Advances in Food Biochemistry, CRC Press, 2009, p. 313.
- 2 Kalgutkar, A. S.; Daniels, J. S. Carboxylic acids and their bioisosteres. In Metabolism, Pharmacokinetics and Toxicity of Functional Groups: Impact of Chemical Building Blocks on ADMET, Royal Society of Chemistry, 2010, pp. 99–167.
- 3
Riemenschneider, W. Carboxylic acids, aliphatic. Ullmann's Encyclopedia of Industrial Chemistry, Wiley-VCH, 2000.
10.1002/14356007.a05_235 Google Scholar
- 4 Ogliaruso, M. A.; Wolfe, J. F. Synthesis of Carboxylic Acids, Esters and Their Derivatives, Wiley-VCH, Weinheim, 1991.
- 5 Taylor, R. J.; Katritzky, A. R. Comprehensive organic functional group transformations II, Elsevier, 2005.
- 6 Ciufolini, M. A.; Swaminathan, S. Synthesis of a model depsipeptide segment of Luzopeptins (BBM 928), potent antitumor and antiretroviral antibiotics. Tetrahedron Lett. 1989, 30, 3027–3028.
- 7 Mahmood, A.; Robinson, G. E.; Powell, L. An improved oxidation of an alcohol using aqueous permanganate and phase-transfer catalyst. Org. Process Res. Dev. 1999, 3, 363–364.
- 8 Thottathil, J. K.; Moniot, J. L.; Mueller, R. H.; Wong, M. K.; Kissick, T. P. Conversion of L-pyroglutamic acid to 4-alkyl-substituted L-prolines. The synthesis of trans-4-cyclohexyl-L-proline. J. Org. Chem. 1986, 51, 3140–3143.
- 9 Zhao, M.; Li, J.; Song, Z.; Desmond, R.; Tschaen, D. M.; Grabowski, E. J.; Reider, P. J. A novel chromium trioxide catalyzed oxidation of primary alcohols to the carboxylic acids. Tetrahedron Lett. 1998, 39, 5323–5326.
- 10 Lucio Anelli, P.; Biffi, C.; Montanari, F.; Quici, S. Fast and selective oxidation of primary alcohols to aldehydes or to carboxylic acids and of secondary alcohols to ketones mediated by oxoammonium salts under two-phase conditions. J. Org. Chem. 1987, 52, 2559–2562.
- 11 Davis, N. J.; Flitsch, S. L. Selective oxidation of monosaccharide derivatives to uronic acids. Tetrahedron Lett. 1993, 34, 1181–1184.
- 12 Russo, J. M.; Price, W. A. Mild, efficient trimethylaluminum-mediated cyclopropanations. An innovative synthesis of the new dehydrogenase inhibitor spiropentaneacetic acid. J. Org. Chem. 1993, 58, 3589–3590.
- 13 Li, K.; Helm, R. F. A practical synthesis of methyl 4-O-methyl-α-d- glucopyranosiduronic acid. Carbohydr. Res. 1995, 273, 249–253.
- 14 Wolf, E.; Spenser, I. D. [2, 3-13C2]-4-Hydroxy-L-threonine. J. Org. Chem. 1995, 60, 6937–6940.
- 15 Boger, D. L.; Borzilleri, R. M.; Nukui, S. Synthesis of (R)-(4-methoxy-3, 5-dihydroxyphenyl) glycine derivatives: the central amino acid of vancomycin and related agents. J. Org. Chem. 1996, 61, 3561–3565.
- 16 Reddy, K. L.; Sharpless, K. B. From styrenes to enantiopure α-arylglycines in two steps. J. Am. Chem. Soc. 1998, 120, 1207–1217.
- 17 Pais, G. C.; Maier, M. E. Efficient synthesis of the γ-amino-β-hydroxy acid subunit of hapalosin. J. Org. Chem. 1999, 64, 4551–4554.
- 18 Zhao, M.; Li, J.; Mano, E.; Song, Z.; Tschaen, D. M.; Grabowski, E. J. J.; Reider, P. J. Oxidation of Primary Alcohols to Carboxylic Acids with Sodium Chlorite Catalyzed by TEMPO and Bleach. J. Org. Chem. 1999, 64, 2564–2566.
- 19 Gruner, S. A.; Truffault, V.; Voll, G.; Locardi, E.; Stöckle, M.; Kessler, H. Design, synthesis, and NMR structure of linear and cyclic oligomers containing novel furanoid sugar amino acids. Chem. - Eur. J. 2002, 8, 4365–4376.
- 20 Noula, C.; Loukas, V.; Kokotos, G. An Efficient Method for the Synthesis of Enantiopure ω-Amino Acids with Proteinogenic Side Chains. Synthesis 2002, 2002, 1735–1739.
- 21 Okue, M.; Kobayashi, H.; Shin-ya, K.; Furihata, K.; Hayakawa, Y.; Seto, H.; Watanabe, H.; Kitahara, T. Synthesis of the proposed structure and revision of stereochemistry of kaitocephalin. Tetrahedron Lett. 2002, 43, 857–860.
- 22 Rye, C. S.; Withers, S. G. Elucidation of the Mechanism of Polysaccharide Cleavage by Chondroitin AC Lyase from Flavobacterium h eparinum. J. Am. Chem. Soc. 2002, 124, 9756–9767.
- 23 De Luca, L.; Giacomelli, G.; Masala, S.; Porcheddu, A. Trichloroisocyanuric/TEMPO Oxidation of Alcohols under Mild Conditions: A Close Investigation. J. Org. Chem. 2003, 68, 4999–5001.
- 24 Rozners, E.; Xu, Q. Total Synthesis of 3 ‘, 5 ‘-C-Branched Nucleosides. Org. Lett. 2003, 5, 3999–4001.
- 25 van den Bos, L. J.; Codée, J. D.; van der Toorn, J. C.; Boltje, T. J.; van Boom, J. H.; Overkleeft, H. S.; van der Marel, G. A. Thioglycuronides: synthesis and application in the assembly of acidic oligosaccharides. Org. Lett. 2004, 6, 2165–2168.
- 26 Jiang, X.; Zhang, J.; Ma, S. Iron catalysis for room-temperature aerobic oxidation of alcohols to carboxylic acids. J. Am. Chem. Soc. 2016, 138, 8344–8347.
- 27 Tojo, G.; Fernández, M. Oxidation of Primary Alcohols to Carboxylic acids. A Guide to Current Common Practice, Springer, 2007, p. 132.
- 28 Santanilla, A. B.; Regalado, E. L.; Pereira, T.; Shevlin, M.; Bateman, K.; Campeau, L.-C.; Schneeweis, J.; Berritt, S.; Shi, Z.-C.; Nantermet, P.; Liu, Y.; Helmy, R.; Welch, C. J.; Vachal, P.; Davies, I. W.; Cernak, T.; Dreher, S. D. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 2015, 347, 49–53.
- 29 Krska, S. W.; DiRocco, D. A.; Dreher, S. D.; Shevlin, M. The Evolution of Chemical High-Throughput Experimentation to Address Challenging Problems in Pharmaceutical Synthesis. Acc. Chem. Res. 2017, 50, 2976–2985.
- 30 Chen, Y.; Wang, X.; He, X.; An, Q.; Zuo, Z. Photocatalytic Dehydroxymethylative Arylation by Synergistic Cerium and Nickel Catalysis. J. Am. Chem. Soc. 2021, 143, 4896–4902.
- 31 Gaunt, M. J.; Janey, J. M.; Schultz, D. M.; Cernak, T. Myths of high-throughput experimentation and automation in chemistry. Chem 2021, 7, 2259–2260.
- 32 González-Esguevillas, M.; Fernández, D. F.; Rincón, J. A.; Barberis, M.; de Frutos, O.; Mateos, C.; García-Cerrada, S.; Agejas, J.; MacMillan, D. W. C. Rapid Optimization of Photoredox Reactions for Continuous-Flow Systems Using Microscale Batch Technology. ACS Cent. Sci. 2021, 7, 1126–1134.
- 33 Kang, K.; Loud, N. L.; DiBenedetto, T. A.; Weix, D. J. A General, Multimetallic Cross-Ullmann Biheteroaryl Synthesis from Heteroaryl Halides and Heteroaryl Triflates. J. Am. Chem. Soc. 2021, 143, 21484–21491.
- 34 Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine learning for molecular and materials science. Nature 2018, 559, 547–555.
- 35 Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 2018, 361, 360–365.
- 36 Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; Zhao, S. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019, 18, 463–477.
- 37 Sandfort, F.; Strieth-Kalthoff, F.; Kühnemund, M.; Beecks, C.; Glorius, F. A Structure-Based Platform for Predicting Chemical Reactivity. Chem 2020, 6, 1379–1390.
- 38 Strieth-Kalthoff, F.; Sandfort, F.; Segler, M. H. S.; Glorius, F. Machine learning the ropes: principles, applications and directions in synthetic chemistry. Chem. Soc. Rev. 2020, 49, 6154–6168.
- 39 Artrith, N.; Butler, K. T.; Coudert, F.-X.; Han, S.; Isayev, O.; Jain, A.; Walsh, A. Best practices in machine learning for chemistry. Nat. Chem. 2021, 13, 505–508.
- 40 George, J.; Hautier, G. Chemist versus Machine: Traditional Knowledge versus Machine Learning Techniques. Trends Chem. 2021, 3, 86–95.
- 41 Jorner, K.; Tomberg, A.; Bauer, C.; Sköld, C.; Norrby, P.-O. Organic reactivity from mechanism to machine learning. Nat. Rev. Chem. 2021, 5, 240–255.
- 42 Keith, J. A.; Vassilev-Galindo, V.; Cheng, B.; Chmiela, S.; Gastegger, M.; Müller, K.-R.; Tkatchenko, A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem. Rev. 2021, 121, 9816–9872.
- 43 Zhu, X.-Y.; Ran, C.-K.; Wen, M.; Guo, G.-L.; Liu, Y.; Liao, L.-L.; Li, Y.-Z.; Li, M.-L.; Yu, D.-G. Prediction of Multicomponent Reaction Yields Using Machine Learning. Chin. J. Chem. 2021, 39, 3231–3237.
- 44 Kariofillis, S. K.; Jiang, S.; Żurański, A. M.; Gandhi, S. S.; Martinez Alvarado, J. I.; Doyle, A. G. Using Data Science to Guide Aryl Bromide Substrate Scope Analysis in a Ni/Photoredox-Catalyzed Cross-Coupling with Acetals as Alcohol-Derived Radical Sources. J. Am. Chem. Soc. 2022, 144, 1045–1055.
- 45 Crawford, J. M.; Kingston, C.; Toste, D.; Sigman, M. S. Data Science Meets Physical Organic Chemistry. Acc. Chem. Res. 2021, 54, 3136–3148.
- 46
Oliveira, J. C. A.; Frey, J.; Zhang, S.-Q.; Xu, L.-C.; Li, X.; Li, S.-W.; Hong, X.; Ackermann, L. When machine learning meets molecular syhthesis. Trends Chem. 2022, DOI: https://doi.org/10.1016/j.trechm.2022.07.005.
10.1016/j.trechm.2022.07.005 Google Scholar
- 47 Yang, L.-C.; Zhu, L.-J.; Zhang, S.-Q.; Hong, X. Machine Learning Prediction of structure-Performance Relationship in Organic Synthesis. Chin. J. Chem. 2022, 40, 2106–2117.
- 48 Yang, Q.; Li, Y.; Yang, J.-D.; Liu, Y.; Zhang, L.; Luo, S.; Cheng, J.-P. Holistic Prediction of the pKa in Diverse Solvents Based on a Machine-Learning Approach. Angew. Chem. Int. Ed. 2020, 59, 19282–19291.
- 49 Heck, G. S.; Pintro, V. O.; Pereira, R. R.; de Avila, M. B.; Levin, N. M. B.; de Azevedo, W. F. Supervised machine learning methods applied to predict ligand-binding affinity. Curr. Med. Chem. 2017, 24, 2459–2470.
- 50 Newman-Stonebraker, S. H.; Smith, S. R.; Borowski, J. E.; Peters, E.; Gensch, T.; Johnson, H. C.; Sigman, M. S.; Doyle, A. G. Univariate classification of phosphine ligation state and reactivity in cross-coupling catalysis. Science 2021, 374, 301–308.
- 51 Li, X.; Zhang, S.-Q.; Xu, L.-C.; Hong, X. Predicting Regioselectivity in Radical C−H Functionalization of Heterocycles through Machine Learning. Angew. Chem. Int. Ed. 2020, 59, 13253.
- 52 Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 2019, 363, 247.
- 53 Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505, 2020.
- 54 Fakoor, R.; Mueller, J. W.; Erickson, N.; Chaudhari, P.; Smola, A. J. Fast, accurate, and simple models for tabular data via augmented distillation. Adv. Neural. Inf. Process. Syst. 2020, 33, 8671–8681.
- 55 Takáts, Z.; Wiseman, J. M.; Gologan, B.; Cooks, R. G. Mass Spectrometry Sampling under Ambient Conditions with Desorption Electrospray Ionization. Science 2004, 306, 471–473.
- 56 Takáts, Z.; Wiseman, J. M.; Cooks, R. G. Ambient mass spectrometry using desorption electrospray ionization (DESI): instrumentation, mechanisms and applications in forensics, chemistry, and biology. J. Mass Spectrom. 2005, 40, 1261–1275.
- 57 Loren, B. P.; Ewan, H. S.; Avramova, L.; Ferreira, C. R.; Sobreira, T. J. P.; Yammine, K.; Liao, H.; Cooks, R. G.; Thompson, D. H. High Throughput Experimentation Using DESI-MS to Guide Continuous-Flow Synthesis. Sci. Rep. 2019, 9, 14745.
- 58 Jaman, Z.; Logsdon, D. L.; Szilágyi, B.; Sobreira, T. J. P.; Aremu, D.; Avramova, L.; Cooks, R. G.; Thompson, D. H. High-Throughput Experimentation and Continuous Flow Evaluation of Nucleophilic Aromatic Substitution Reactions. ACS Comb. Sci. 2020, 22, 184–196.
- 59Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.; Marenich, A. V.; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.; Ortiz, J. V.; Izmaylov, A. F.; Sonnenberg, J. L.; Williams-Young, D.; Ding, F.; Lipparini, F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; Ranasinghe, D.; Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Throssell, K.; Montgomery, J. A. Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M. J.; Heyd, J. J.; Brothers, E. N.; Kudin, K. N.; Staroverov, V. N.; Keith, T. A.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A. P.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J. Gaussian 16, Revision B.01, Gaussian, Inc., Wallingford CT, 2016.
- 60RDKit: open-source chemoinformatics and machine learning. http://www.rdkit.org.
- 61 Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 2016, pp. 785–794.
- 62 Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Statist. 2001, 29, 1189–1232.
- 63 Ho, T. K. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition, IEEE, 1995, pp. 278–282.
- 64 Parr, T.; Wilson, J. D.; Hamrick, J. Nonparametric Feature Impact and Importance. arXiv preprint arXiv:2006.04750, 2020.