Abstract
Accurate knowledge of electronic molecular properties of excited states is fundamental for understanding the behavior of functional materials for organic electronics and sensors. In this work, we focus on determining the properties of the most intense peak in the electronic absorption spectra of organic molecules. For this purpose, we employed the quantum chemistry QM-symex dataset, which has approximately 173,000 organic molecules and time-dependent DFT (TD-DFT) data of the first ten electronic absorption transitions. Each one is identified by its Cartesian coordinates. From the original QM-symex, we built a new dataset named QM-symex-modif by converting the molecular Cartesian coordinates into the Simplified Molecular Input Line Entry System (SMILES) format, selecting the main transition orbitals of the singlet most intense absorption peak, their corresponding oscillator strengths and transition energies. We employed twenty machine learning (ML) algorithms to investigate these target properties plus the highest occupied molecular orbitals (HOMOs). As inputs for the ML algorithms, we employed several chemical descriptors generated in the RDKit tool for each molecule using the corresponding SMILES format. The QM-symex-modif dataset significantly improved the accuracy of ML predictions of these key photophysical properties. Very good mean absolute errors were obtained for the test set composed of 45,056 molecules. Additionally, a Shapley additive explanations (SHAP) analysis was carried out to evaluate the importance of the input parameters for the investigated ML models. We found several interesting relationships involving the input parameters. In particular, the molecular weight has enormous importance among several different descriptors in determining HOMO values and the transition orbitals.
Supplementary materials
Title
Supplementary Material
Description
Extra information refereed in the text.
Actions
Supplementary weblinks
Title
Group Github
Description
The source code of this work, machine learning model parameters, input files, SHAP values, and output examples are available in the our Github repository
Actions
View