Virtual Variables-Enabled Generation of Datasets for Prediction in Organic Synthesis: Digitalization of Small Molecules and Its Application to Functional Molecule Syntheses

05 September 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

In machine learning, large datasets are typically required to improve estimation accuracy. However, the generation of such datasets in traditional organic synthesis experiments is extremely challenging. Hence, there is a strong demand for methods that enable the accurate prediction of outcomes using limited amounts of data. In this study, we report a molecular technology based on generative artificial intelligence that generates data from unexplored conditions and establishes the most suitable relationships between different small molecules using virtual variables. Specifically, our approach reveals relationship of structurally different three small molecules and represents them as virtual variables, which are then utilized to propose the reaction conditions for synthesizing target molecules in high yields. We demonstrated the utility of this approach in small molecule syntheses, using the iodination reaction of polyfluoronaphthalenes serving as a case study. The obtained iodinated perfluoronapthalenes have recently garnered significant attention as promising candidates for functional molecules. By computationally generating data that would be inaccessible through realistic reaction experiments, we successfully optimized reaction conditions in an additional 8 experiments. Moreover, we succeeded to depict virtual variables in electrostatic potentials using density functional theory calculations and represented them as physicochemical indices. This study introduces a novel application of machine learning as molecular technology and contributes small molecule syntheses in molecular science, utilizing fewer than 100 data points in the realm of organic synthesis experiments.

Keywords

Machine-learning
Small molecule synthesis
Small data
Prediction of reaction conditions
in-silico data generation

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.