Machine learning prediction of electronic molecular excited state properties

18 October 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Accurate knowledge of electronic molecular properties of excited states is fundamental for understanding the behavior of functional materials for organic electronics and sensors. In this work, we focus on determining the properties of the most intense peak in the electronic absorption spectra of organic molecules. For this purpose, we employed the quantum chemistry QM-symex dataset, which has approximately 173,000 organic molecules and time-dependent DFT (TD-DFT) data of the first ten electronic absorption transitions. Each one is identified by its Cartesian coordinates. From the original QM-symex, we built a new dataset named QM-symex-modif by converting the molecular Cartesian coordinates into the Simplified Molecular Input Line Entry System (SMILES) format, selecting the main transition orbitals of the singlet most intense absorption peak, their corresponding oscillator strengths and transition energies. We employed twenty machine learning (ML) algorithms to investigate these target properties plus the highest occupied molecular orbitals (HOMOs). As inputs for the ML algorithms, we employed several chemical descriptors generated in the RDKit tool for each molecule using the corresponding SMILES format. The QM-symex-modif dataset significantly improved the accuracy of ML predictions of these key photophysical properties. Very good mean absolute errors were obtained for the test set composed of 45,056 molecules. Additionally, a Shapley additive explanations (SHAP) analysis was carried out to evaluate the importance of the input parameters for the investigated ML models. We found several interesting relationships involving the input parameters. In particular, the molecular weight has enormous importance among several different descriptors in determining HOMO values and the transition orbitals.

Keywords

Machine Learning
Organic Electronics
Sensors
Absorption Maximum Peak
Excited state properties
QM-symex dataset
Simplified Molecular Input Line Entry System (SMILES) Format

Supplementary materials

Title
Description
Actions
Title
Supplementary Material
Description
Extra information refereed in the text.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.