The AABBA Graph Kernel: Atom–Atom, Bond–Bond, and Bond–Atom Autocorrelations for Machine Learning

02 September 2024, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas, the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atom and bond properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which, based on their capacity of measuring similarity, can be used as fingerprints for machine learning (ML). To date, graph kernels have mostly focused on the atomic nodes of the graph. In this work, we developed a graph kernel based on atom–atom, bond–bond, and bond–atom (AABBA) autocorrelations. The resulting vector representations were tested on regression ML tasks on a dataset of transition metal complexes; a benchmark motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska’s complex dataset (Friederich et al., Chem. Sci., 2020, 11, 4584). For a variety of ML models, including neural net- works, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom–atom autocorrelations. Dimensionality reduction studies also showed that the bond–bond and bond–atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the exploration of large chemical spaces and inspire novel molecular representations in which both atomic and bond properties play an important role.

Keywords

graph kernel
metal complexes
autocorrelation
neural networks
gradient boosting
Gaussian processes
feature engineering
feature selection
dimensionality reduction
molecular graphs
property prediction

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Detailed description of the AABBA graph kernels, including the underlying equations, dimensionality of the resulting vectors, definition of the metal-centered edge origin, systematic lists of generic and NBO properties, metal-centered depth distribution, computational details of the NN, GBM, and GP models, and further information on feature relevance and dimensionality reduction.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.