Leveraging High-throughput Molecular Simulations and Machine Learning for Formulation Design

Alex K. Chew; Mohammad Atif Faiz Afzal; Zach Kaplan; Eric M. Collins; Suraj Gattani; Mayank Misra; Anand Chandrasekaran; Karl Leswing; Mathew D. Halls

doi:10.26434/chemrxiv-2024-4lff6-v3

Materials Science

Search within Materials Science

Leveraging High-throughput Molecular Simulations and Machine Learning for Formulation Design

11 November 2024, Version 3

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Formulations, or mixtures of chemical ingredients, are ubiquitous in materials science, but optimizing their properties remains challenging due to the vast design space. Computational approaches offer a promising solution to traverse this space while minimizing trial-and-error experimentation. Using high-throughput classical molecular dynamics simulations, we generated a comprehensive dataset of over 30,000 solvent mixtures to evaluate three machine learning approaches that connect molecular structure and composition to property: formulation descriptor aggregation (FDA), formulation graph (FG), and Set2Set-based method (FDS2S). Our results demonstrate that our new FDS2S approach outperforms other approaches in predicting simulation-derived properties. Formulation-property relationships can reveal important substructures and identify promising formulations at least two to three times faster than random guessing. The models show robust transferability to experimental datasets, accurately predicting properties across energy, pharmaceutical, and petroleum applications. Our research demonstrates the utility of high-throughput simulations and machine learning tools to design formulations with promising properties.

Keywords

Formulations

Chemical Mixtures

Classical Molecular Dynamics Simulations

Formulation-Property Relationships

Quantitative Structure-Property Relationships

Machine Learning

Supplementary materials

Title

Description

Actions

Title

Supplementary information document

Description

The supporting information contains the comparison of formulation labels between molecular dynamics simulations and experiments, analysis of miscibility for binary mixtures using molecular dynamics simulations, best hyperparameters of formulation-property models when trained with 90% of the data, and description of the formulation dataset generated in this work and the curated literature datasets.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jan 08, 2025 Version 4

Nov 11, 2024 Version 3

Oct 28, 2024 Version 2

Jun 18, 2024 Version 1

Version Notes

Mainly formatting changes were performed in this revision: (1) the abstract was shorten to be less than 150 words; (2) the "Conclusion" section is renamed to "Discussion"; (3) the Methods was moved after Discussion section; (4) the text was adjusted to ensure that the manuscript flowed correctly; (5) Fig. 2 of the previous submission was relabeled as Fig. 4 in this submission; (6) all figures and tables have been moved to the end of the manuscript.

Metrics

3,993

1,755

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2024-4lff6-v3

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Leveraging High-throughput Molecular Simulations and Machine Learning for Formulation Design

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share