Machine Learning Force Field Aided Cluster Expansion Approach to Configurationally Disordered Materials: Critical Assessment of Training Set Selection and Size Convergence

Jun-Zhong  Xie; Xu-Yuan Zhou; Dong  Luan; Hong  Jiang

doi:10.26434/chemrxiv-2022-0kdtn

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Machine Learning Force Field Aided Cluster Expansion Approach to Configurationally Disordered Materials: Critical Assessment of Training Set Selection and Size Convergence

28 February 2022, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Cluster expansion (CE) is a powerful theoretical tool to study the configuration-dependent properties of substitutionally disordered systems. Typically, a CE model is built by fitting a few tens or hundreds of target quantities calculated by first-principles approaches. To validate the reliability of the model, a convergence test of cross-validation (CV) score to the training set size is commonly conducted to verify the sufficiency of training data. However, such test only confirms the convergence of the predictive capability of the CE model within the training set and it is unknown whether the convergence of the CV score would lead to robust thermodynamic simulation results such as order-disorder phase transition temperature $T_{\rm c}$. In this work, using carbon defective MoC$_{1-x}$ as a model system and aided by the machine-learning force field technique, a training data pool with about 13000 configurations has been efficiently obtained and used to generate different training sets of the same size randomly. By conducting parallel Monte Carlo simulations with the CE models trained with different randomly selected training set, the uncertainty in calculated $T_{\rm c}$ can be evaluated at different training set size. It is found that the training set size that is sufficient for the CV score to converge still leads to a significant uncertainty in the predicted $T_{\rm c}$, and that the latter can be considerably reduced by enlarging the training set to that of a few thousand configurations. This work highlights the importance of considering large training set for building the optimal CE model that can achieve robust statistical modeling results, and the facility provided by the machine-learning force field approach to efficiently produce adequate training data.

Keywords

machine learning force field

cluster expansion

configurationally disordered materials

transition metal carbides

Supplementary materials

Title

Description

Actions

Title

Supporting Information for ``Machine Learning Force Field Aided Cluster Expansion Approach to Configurationally Disordered Materials: Critical Assessment of Training Set Selection and Size Convergence''

Description

The model deviation test for all the locally relaxed structures of the candidate configurations, Figure S1-S2. All the CCFs data in the Monte Carlo simulation in this work are also included, Figure S3-S12.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Machine Learning Force Field Aided Cluster Expansion Approach to Configurationally Disordered Materials: Critical Assessment of Training Set Selection and Size Convergence

Jun-Zhong Xie, Xu-Yuan Zhou, Dong Luan, Hong Jiang journal article

Journal of Chemical Theory and Computation , Volume 18, Issue 6

Online publication date: Jun 03, 2022

Version History

Feb 28, 2022 Version 1

Metrics

621

519

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2022-0kdtn

Funding

National Natural Science Foundation of China

21873005, 21911530231

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Machine Learning Force Field Aided Cluster Expansion Approach to Configurationally Disordered Materials: Critical Assessment of Training Set Selection and Size Convergence

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share