Abstract
In machine learning, large datasets are typically required to improve estimation accuracy. However, the generation of such datasets in traditional organic synthesis experiments is extremely challenging. Hence, there is a strong demand for methods that enable the accurate prediction of outcomes using limited amounts of data. In this study, we report a molecular technology based on generative artificial intelligence that generates data from unexplored conditions and establishes the most suitable relationships between different small molecules using virtual variables. Specifically, our approach reveals relationship of structurally different three small molecules and represents them as virtual variables, which are then utilized to propose the reaction conditions for synthesizing target molecules in high yields. We demonstrated the utility of this approach in small molecule syntheses, using the iodination reaction of polyfluoronaphthalenes serving as a case study. The obtained iodinated perfluoronapthalenes have recently garnered significant attention as promising candidates for functional molecules. By computationally generating data that would be inaccessible through realistic reaction experiments, we successfully optimized reaction conditions in an additional 8 experiments. Moreover, we succeeded to depict virtual variables in electrostatic potentials using density functional theory calculations and represented them as physicochemical indices. This study introduces a novel application of machine learning as molecular technology and contributes small molecule syntheses in molecular science, utilizing fewer than 100 data points in the realm of organic synthesis experiments.