Abstract
Predictions of chemical reaction outcomes using machine learning (ML) has emerged as a powerful tool for advancing materials synthesis. However, this approach requires large and diverse datasets, which are extremely limited in the field of nanomaterials synthesis due to inconsistent and non-standardized reporting in the literature and a lack of understanding of synthetic mechanisms. In this study, we extracted parameters of InP quantum dot (QD) syntheses as our inputs, and resultant properties (absorption, emission, diameter) as our outputs from 72 publications. We “filled in” missing outputs using a data imputation method to prepare a complete dataset containing 216 entries for training and testing predictive ML models. We defined the descriptor space in two ways (condensed and extended) based on either chemical identity or the role of reagents to explore the best approach for categorizing input features. We achieved mean absolute errors (MAEs) as low as 20.29, 11.46, and 0.33 nm for absorption, emission, and diameter respectively with our best ML model across diverse synthetic methods. We used these models to deploy an accessible and interactive webapp for designing syntheses of InP (https://share.streamlit.io/cossairt-lab/indium-phosphide/Hot_injection/hot_injection_prediction.py). Using this webapp, we investigated the power of ML to uncover chemical trends in InP syntheses, such as the effects of common additives, like zinc salts and trioctylphosphine. We also designed and conducted new experiments based on extensions of literature procedures and compared our experimentally measured properties to predictions, thus evaluating the “real-life” accuracy of our models. Conversely, we used inverse-design to obtain InP QDs with specific properties. Finally, we applied the same approach to train, test, and launch predictive models for CdSe QDs by expanding a previously published dataset. Altogether, our data pre-processing method and ML implementations demonstrate the ability to design materials with targeted properties and explore underlying reaction mechanisms even when faced with limited data resources.
Supplementary materials
Title
Supplementary information
Description
Additional details for data acquisition, data imputation, Pearson correlation, datasets, code files, machine learning modeling, and experimental methods.
Actions