Abstract
This paper presents a novel approach to predicting critical micelle concentrations (CMCs) using graph neural networks (GNNs) augmented with Gaussian processes (GPs). The proposed model uses learned latent space representations of molecules to predict CMCs and estimate uncertainties. The performance of the model on a dataset containing nonionic, cationic, anionic and zwitterionic molecules is compared against a linear model that works with extended-connectivity fingerprints (ECFPs). The GNN-based model performs slightly better than the linear ECFP model, when there is enough well-balanced training data, and achieves predictive accuracy that is comparable to published models that were evaluated on a smaller range of surfactant chemistries. We illustrate the applicability domain of our model using a molecular cartogram to visualize the latent space, which helps identify molecules for which predictions are likely to be erroneous. In addition to accurately predicting CMCs for some surfactant classes, the proposed approach can provide valuable insights into the molecular properties that influence CMCs.
Supplementary weblinks
Title
GitHub repository
Description
Source code to reproduce the models, as well as the trained models and metrics.
Actions
View