Abstract
We present two open-source datasets that provide time-dependent density-functional tight-binding (TD-DFTB) electronic excitation spectra of organic molecules. These datasets represent predictions of UV-vis absorption spectra performed on optimized geometries of the molecules in their electronic ground state. The GDB-9-Ex dataset contains a subset of 96,766 organic molecules from the original open-source GDB-9 dataset. The ORNL_AISD-Ex dataset was created from GDB-9 molecular structures using a generative algorithm and consists of 10,502,904 organic molecules that contain between 5 and 71 non-hydrogen atoms. The data reveals the close correlation between the magnitude of the gaps between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO), and the excitation energy of the lowest singlet excited state energies quantitatively. The chemical variability of the large number of molecules was examined with a topological fingerprint estimation based on extended-connectivity fingerprints (ECFPs) followed by uniform manifold approximation and projection (UMAP) for dimension reduction. Both datasets were generated using a high-throughput workflow that used the DFTB+ software on the Andes' cluster of the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory (ORNL).
Supplementary weblinks
Title
GDB-9-Ex: Quantum chemical prediction of UV/Vis absorption spectra for GDB-9 molecules
Description
GDB-9-Ex: Quantum chemical prediction of UV/Vis absorption spectra
Actions
View Title
ORNL_AISD-Ex: Quantum chemical prediction of UV/Vis absorption spectra for over 10 million organic molecules
Description
ORNL_AISD-Ex: Quantum chemical prediction of UV/Vis absorption spectra for over 10 million organic molecules
Actions
View