Machine Learning-Guided Synthesis of Prospective Organic Molecular Materials: An Algorithm with Latent Variables for Understanding and Predicting Experimentally Unobservable Reactions

18 March 2025, Version 8
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Chemists have traditionally relied on heuristic approaches to qualitatively assess chemical structure-property relationships and interpret experimental outcomes. However, these methods are inherently limited in handling large volumes of data and integrating them effectively into experimental planning. Understanding the interrelationships among different substitution patterns of organic molecular materials is 1 crucial for optimizing synthetic conditions and expanding their applicability. In this study, we developed a machine learning (ML) algorithm incorporating latent variables to predict unobservable reactions and synthetic conditions for organic materials: perfluoro-iodinated naphthalene derivatives. The algorithm accurately estimated substitution pattern relationships and reaction yields, which were experimentally validated with high-yield outcomes. Our findings reveal that latent variables effectively capture underlying physicochemical relationships, achieving an R2 value >0.99. This approach establishes an ML-guided framework that complements heuristic decision-making in chemistry and optimizes synthetic processes for the target molecule in an extrapolative manner. Further applications of this algorithm will focus on synthetic design and physicochemical property prediction, particularly for catalyst discovery and organic semiconductor optimization.

Keywords

Machine-learning
Latent variables
NBO charges
Reaction prediction
Extrapolation
Perfluoro-iodonaphthalenes

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
1. General information 2. Synthesis and characterization of substrate 3. Preparation of magnesium amide bases 4. Iodination reaction of polyfluoronaphthalenes 5. Computational studies 6. References Appendix 1. Cartesian coordinates Appendix 2. Details of predicted yields Appendix 3. List of algorithms Appendix 4. List of descriptors
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.