Abstract
Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas, the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atom and bond properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which, based on their capacity of measuring similarity, can be used as fingerprints for machine learning (ML). To date, graph kernels have mostly focused on the atomic nodes of the graph. In this work, we developed a graph kernel based on atom–atom, bond–bond, and bond–atom (AABBA) autocorrelations. The resulting vector representations were tested on regression ML tasks on a dataset of transition metal complexes; a benchmark motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska’s complex dataset (Friederich et al., Chem. Sci., 2020, 11, 4584). For a variety of ML models, including neural net- works, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom–atom autocorrelations. Dimensionality reduction studies also showed that the bond–bond and bond–atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the exploration of large chemical spaces and inspire novel molecular representations in which both atomic and bond properties play an important role.
Supplementary materials
Title
Supporting Information
Description
Detailed description of the AABBA graph kernels, including the underlying equations, dimensionality of the resulting vectors, definition of the metal-centered edge origin, systematic lists of generic and NBO properties, metal-centered depth distribution, computational details of the NN, GBM, and GP models, and further information on feature relevance and dimensionality reduction.
Actions
Supplementary weblinks
Title
Code and graph data
Description
Open code of the AABBA graph kernels, as well as u-NatQG graphs of the Vaska’s complex dataset, the associated AABBA vectors, and the code of all ML models.
Actions
View