Abstract
Structurally and conformationally diverse databases are needed to train accurate neural networks or kernel-based potentials capable of exploring the complex free en- ergy landscape of flexible functional organic molecules. Curating such databases for species beyond “simple” drug-like compounds or molecules comprised of well-defined building blocks (e.g., peptides) is challenging, as it requires thorough chemical space mapping and evaluation of both chemical and conformational diversity. Here, we intro- duce the OFF–ON (Organic Fragments From Organocatalysts that are Non-modular) database, a repository of 7,869 equilibrium and 67,457 non–equilibrium geometries of organic compounds and dimers aimed at describing conformationally flexible func- tional organic molecules, with an emphasis on photoswitchable organocatalysts. The relevance of this database is then demonstrated by training a Local Kernel Regres- sion model on a low-cost semiempirical baseline and comparing it with a PBE0-D3 reference for several known catalysts, notably the free energy surfaces of exemplary photoswitchable organocatalysts. Our results demonstrate that the OFF–ON dataset offers reliable predictions for simulating the conformational behavior of virtually any (photoswitchable) organocatalyst or organic compound comprised of H, C, N, O, F, and S atoms, thereby opening a computationally feasible route to explore complex free energy surfaces in order to rationalize and predict catalytic behavior.
Supplementary materials
Title
Supporting Information
Description
Description of database curation, including functional groups included and SMILES strings of structures, assessments of accuracy of the machine–learning potential, anal- ysis of local atomic contributions for the LKR–OMP correction, and convergence of the free energy profiles.
Actions