Abstract
This work proposes a state-of-the-art hybrid kernel to calculate molecular similarity. Combining with Gaussian process models, the performance of the hybrid kernel in predicting molecular properties is comparable to that of the Directed Message Passing Neural Network (D-MPNN). The hybrid kernel consists of a marginalized graph kernel (MGK) and a radial basis function (RBF) kernel that operates on molecular graphs and global molecular features, respectively. Bayesian optimization was used to get the optimal hyperparameters for both models. The comparisons are performed on 11 publicly available data sets. Our results show that the predictions of both models are correlated with similar performance, and the ensemble prediction of both models performs better than either of them. Through principal component analysis, we found that the features extracted by the hybrid kernel are similar to those extracted by D-MPNN. The advantage of D-MPNN lies in computational efficiency, while the advantage of the graph kernel models lies in the inherent uncertainty quantification and accurate uncertainty quantification. All codes for graph kernel machines used in this work can be found at https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine.