Abstract
Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas,
the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atomic properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which can be used as fingerprints in machine learning (ML) models. To date, kernels have mostly focused on the atomic nodes of the graph. In this work, we developed an extended graph kernel computing atom–atom, bond–bond, and bond–atom (AABBA) autocorrelations. The resulting AABBA representations were evaluated with a transition metal complex benchmark, motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska’s complex
dataset (Friederich et al., Chem. Sci., 2020, 11, 4584). For a variety of ML models, including neural networks, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom–atom autocorrelations. Dimensionality reduction studies also showed that the bond–bond and bond–atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the discovery of chemical compounds and inspire novel molecular representations in which both atomic and bond properties play an important role.
Supplementary materials
Title
Supporting Information
Description
The Supporting Information provides further information about the maximal metal-centered depths, computational details of the NN, GBM, and GP models, and additional details about feature relevance and dimensionality reduction.
Actions
Supplementary weblinks
Title
Code and graph data
Description
Code of the AABBA graph kernels openly available, as well as u-NatQG graphs of the Vaska’s complex dataset, the associated AABBA vectors, and the code of all ML models.
Actions
View