Abstract
Polycyclic aromatic systems are highly important to numerous applications, especially to organic electronics and optoelectronics. High-throughput screening and generative models can help to identify new molecules that can advance these technologies but require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available data set of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our data set contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. The methodologies used to enumerate and compute the various structures and their electronic properties (including HOMO-LUMO gap, vertical and adiabatic ionization potential, and electron affinity) are detailed. Additionally, we benchmark the values against a ~50k data set calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new data sets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.
Supplementary materials
Title
Supporting Information for COMPAS-2
Description
General computational details, details of the
xTB-correction, description of benchmarking
procedure, histograms of data distribution,
color-coded plots for all studied structural fea-
tures, further analysis of the effect of sulfur on
Etot.
Actions
Supplementary weblinks
Title
COMPAS-2 Repository
Description
All data included in the COMPAS-2 data sets. Jupyter notebooks for structure generation and analysis.
Actions
View