Abstract
Chemical databases are an essential tool for
data-driven investigation of structure-property
relationships and design of novel functional
compounds. We introduce the first phase of the
COMPAS Project – a COMputational database
of Polycyclic Aromatic Systems. In this phase,
we have developed two datasets containing the
optimized ground-state structures and a selec-
tion of molecular properties of 34k and 9k cata-
condensed polybenzenoid hydrocarbons (at the
GFN2-xTB and B3LYP-D3BJ/def2-SVP lev-
els, respectively), and have placed them in the
public domain. Herein we describe the process
of the dataset generation, detail the informa-
tion available within the datasets, and show
the fundamental features of the generated data.
We analyze the correlation between the two
types of computation as well as the structure-
property relationships of the calculated species.
The data and the insights gained from them can
inform rational design of novel functional aro-
matic molecules for use in, e.g., organic elec-
tronics, and can provide a basis for additional
data-driven machine- and deep-learning studies
in chemistry.
Supplementary materials
Title
Supporting Information for COMPAS_Phase1
Description
General computational details, description of benchmarking procedure, histograms of data distribution, color-coded plots for all studied structural features, further analysis on D3 versus D4 corrections.
Actions
Supplementary weblinks
Title
Repository of COMPAS
Description
Freely accessible repository of the COMPAS database.
Actions
View