Abstract
Synthetic yield prediction using machine learning is intensively studied. While previous work focused on an ideal use case, High-Throughput Experiment datasets, predicting yields using literature data remains elusive. We built a large literature- based dataset of more than a thousand reactions, focusing on the activation of carbon-oxygen bonds of phenol derivatives under nickel catalysis. Detailed reaction conditions and associated yields were manually curated and stored in an open- access database. We assessed the performances of state-of-the-art machine learning models on this dataset, and explored their ability to realize predictions on novel publications, coupling partners and substrates. Our work shows that on well- designed yield prediction tasks, machine learning can have practical applications, and provides a unique public database for further improvements of these methods adapted to literature chemical data.
Supplementary materials
Title
Supplementary Informations
Description
Details on the code and the methods used to train the model and featurize the data. Additional information supporting the main manuscript.
Actions
Supplementary weblinks
Title
NiCOlit code and data
Description
The NiCOlit dataset is available.
The code used to generate the results is available.
Actions
View