COMPAS-4: a Dataset of (BN)- Substituted Cata-Condensed Polybenzenoid Hydrocarbons – Data Analysis and Feature Engineering

20 March 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Incorporation of a BN pair into polycyclic aro- matic hydrocarbons is a common approach for modulating their electronic properties. However, a conceptual and quantitative framework rationalizing the observed effects has not been developed, and general structure-property relationships remain elusive. In this work, we perform a data-driven investigation that leads to concrete principles for rational design of (BN)1-PBHs with targeted properties. We construct a new chemical database, COMPAS-4, which contains the geometries and properties of all possible (BN)1-PBH isomers up to 6 rings, calculated at both the GFN1-xTB and DFT (CAM-B3LYP/def2-SVP) levels of theory. We investigate the influence of BN-substitution on various molecular properties, including their molecular orbital energies and aromaticity, and define specific structural features that determine these properties. Notably, all of these features are chemically intuitive and simple to extract from the structure of the molecule, without any prior computation. We find that the most influential feature is the number of rings whose cyclic delocalization is disturbed as a result of the substitution

Keywords

polycyclic aromatic hydrocarbons
Boron nitrogen doped
feature engineering
chemical dataset
high-throughput computations

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Additional analysis, including: three more properties, additional regression models, overview of the datasets, comparison of xTB and DFT.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.