Abstract
The development of
functional organic fluorescent materials calls for fast and accurate
predictions of photophysical parameters for processes such as high-throughput
virtual screening, while the task is challenged by the limitations of quantum
mechanical calculations. We establish a database covering >4,300 solvated
organic fluorescent dyes and develop new machine learning (ML) approach aimed
at efficient and accurate predictions of emission wavelength and
photoluminescence quantum yield (PLQY). Our feature engineering has given rise
to Functionalized Structure Descriptor (FSD) and Comprehensive General Solvent
Descriptor (CGSD), whereby a highly black-box computational framework is
realized with consistently good accuracy across different dye families, ability
of describing substitution effects and solvent effects, efficiency for
large-scale predictions and workability with on-the-fly learning. Evaluations
with unseen molecules suggests a remarkable MAE of 0.13 for PLQY and 0.080 eV
for emission energy, the latter comparable to time-dependent density functional
theory (TD-DFT) calculations. An online prediction platform was constructed
based on the ensemble model to make prediction in various solvents (https://www.chemfluor.top/). Our
statistical learning methodology will complement quantum mechanical
calculations as an efficient alternative approach for the prediction of these
parameters.