Abstract
Homogeneous catalysts enable faster conversions of molecules with higher selectivities (stereo- and regioselectivity) in chemical reactions. Traditionally, catalyst improvements are made through empirical trials, where the catalyst is functionalised by adding, removing or modifying groups within its structure and, subsequently, reevaluating the new catalytic activity. This procedure is not efficient and leads to unsuccessful trials that waste resources. Machine learning (ML) approaches have been proposed to accelerate homogeneous asymmetric catalyst optimization. However, these often lack a general descriptor generation procedure to allow encoding of molecules from a broad region of chemical space. To overcome this, we propose a homogeneous catalyst graph neural network (HCat-GNet) for the prediction of selectivity of catalysts given the SMILES of participant molecules. We demonstrate its use in rhodium-catalyzed asymmetric 1,4-addition (RhCAA), a reaction of major importance in organic synthesis. We benchmark HCat-GNet against traditional ML methods for its ability to predict RhCAA stereoselectivity from two chiral diene ligand two datasets; one for learning and one for final testing. For the learning dataset, both traditional ML and HCat-GNet methods give comparable results. However, when presented with the new unseen test dataset, traditional ML models perform poorly, while HCat-GNet retains a general ability to accurately predict product absolute stereochemistry and reaction stereoselectivity. Furthermore, HCat-GNet allows model interpretability, permitting analysis of the effect of ligand substituents in determining reaction selectivity. HCat-GNet shows greater potential for catalyst optimization than traditional ML, as it allows the use of a non-fixed number of participant molecules to train the model, only requiring the SMILES of the molecules to create graph representations. HCat-GNet allows more general models that accurately extrapolate into unseen regions of chemical space.