From Organic Fragments to Photoswitchable Catalysts: The OFF-ON Structural Repository for Transferable Kernel-based Potentials

06 December 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Structurally and conformationally diverse databases are needed to train accurate neural networks or kernel-based potentials capable of exploring the complex free en- ergy landscape of flexible functional organic molecules. Curating such databases for species beyond “simple” drug-like compounds or molecules comprised of well-defined building blocks (e.g., peptides) is challenging, as it requires thorough chemical space mapping and evaluation of both chemical and conformational diversity. Here, we intro- duce the OFF–ON (Organic Fragments From Organocatalysts that are Non-modular) database, a repository of 7,869 equilibrium and 67,457 non–equilibrium geometries of organic compounds and dimers aimed at describing conformationally flexible func- tional organic molecules, with an emphasis on photoswitchable organocatalysts. The relevance of this database is then demonstrated by training a Local Kernel Regres- sion model on a low-cost semiempirical baseline and comparing it with a PBE0-D3 reference for several known catalysts, notably the free energy surfaces of exemplary photoswitchable organocatalysts. Our results demonstrate that the OFF–ON dataset offers reliable predictions for simulating the conformational behavior of virtually any (photoswitchable) organocatalyst or organic compound comprised of H, C, N, O, F, and S atoms, thereby opening a computationally feasible route to explore complex free energy surfaces in order to rationalize and predict catalytic behavior.

Keywords

machine learning potential
database curation
free energy surfaces
photoswitchable organocatalysis

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Description of database curation, including functional groups included and SMILES strings of structures, assessments of accuracy of the machine–learning potential, anal- ysis of local atomic contributions for the LKR–OMP correction, and convergence of the free energy profiles.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.