Abstract
The creation of effective models is of utmost importance in various scientific and engineering domains. However, analyzing such models, especially nonlinear ones, poses significant challenges. In this context, centered kernel alignment (CKA) has emerged as a promising model analysis tool that assesses the independence between two embeddings. CKA's efficacy depends on the selection of a kernel that adequately captures the underlying properties of the compared models. We examine the properties of the linear and random forest (RF) kernel with respect to multilayer perceptrons (MLPs) and RFs to adapt the model analysis tool CKA to cheminformatics. Furthermore, we demonstrate the utility of CKA in cheminformatics in three case studies in which we (1) investigate why optimizing the radius of circular fingerprints beyond two bonds results in only minor changes in the performance of models, (2) analyze the dependence between physicochemical properties and the molecular representations induced by graph neural networks (GNNs) that use addition as readout operation, and (3) compare different graph readout operations in GNNs.
Supplementary materials
Title
Supporting Information
Description
The Supporting Information contains further details on model performance (PDF).
Actions