Abstract
We explore the prediction of surfactant phase behavior using state-of-the-art machine learning methods, using a data set for twenty-three non-ionic surfactants in line with Bell, Phil. Trans. R. Soc. A 2016, 374, 20150137. Most machine learning classifiers we tested are capable of filling in missing data in a partially complete data set. However, strong data bias and a lack of chemical space information generally lead to poorer results entire de novo phase diagram prediction. Although some machine learn- ing classifiers perform better than others, these observations are largely robust to the particular choice of algorithm. Finally we explore how de novo phase diagram prediction can be improved by the inclusion of observations from state points sampled by analogy to commonly used experimental protocols. Our results indicate what factors should be considered when preparing for machine learning prediction of surfactant phase behavior in future studies.
Supplementary materials
Title
Data to support the main article
Description
• Machine learning performance using unaltered phase state labels (corresponding to Section 3.2).
• A detailed account of our approach to developing hyperparameters for the ML Classifiers (corresponding to Section 3.4)
• Gap filling predictions for the full set of surfactants (corresponding to Section 4.1.1).
• Full phase diagram prediction for full set of surfactants (corresponding to Section 4.1.2).
• Confusion plots for all studied surfactants (corresponding to Section 4.2.1).
• Correlation between regression metrics and maximum similarity (corresponding to Section 4.2.2).
• Further laboratory sampling predictions for additional surfactants(corresponding to Section 4.3).
Actions
Title
Data file containing information used to train the models in the main article
Description
We make available a machine readable format data set (machine-readable- data.txt) comprising the surfactants studied along with their phase behaviour at specific temperature and composition (weight fraction) points. For each surfactant the calculated tail length (Å), tail volume (Å3) and head group area (Å2 at 25°C) are presented.
Actions