Spatially resolved uncertainties for machine learning potentials

Esther Heid; Johannes Schörghuber; Ralf Wanzenböck; Georg. K. H. Madsen

doi:10.26434/chemrxiv-2024-k27ps-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Spatially resolved uncertainties for machine learning potentials

02 August 2024, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning potentials have become an essential tool for atomistic simulations, yielding results close to ab-initio simulations at a fraction of computational cost. With recent improvements on the achievable accuracies, the focus has now shifted on the dataset composition itself. The reliable identification of erroneously predicted configurations to extend a given dataset is therefore of high priority. Yet, uncertainty estimation techniques have achieved mixed results for machine learning potentials. Consequently, a general and versatile method to correlate energy or atomic force uncertainties with the model error has remained elusive to date. In the current work, we show that epistemic uncertainty cannot correlate with model error by definition, but can be aggregated over groups of atoms to yield a strong correlation. We demonstrate that our method correctly estimates prediction errors both globally per structure, and locally resolved per atom. The direct correlation of local uncertainty and local error is used to design an active learning framework based on identifying local sub-regions of a large simulation cell, and performing ab-initio calculations only for the sub-region subsequently. We successfully utilize this method to perform active learning in the low-data regime for liquid water.

Keywords

Uncertainty

Machine learning

Prediction error

Machine learning potentials

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Additional benchmarks including high bias models, as well as several further uncertainty metrics for all systems studied.

Actions

Supplementary weblinks

Title

Description

Actions

Title

Data and scripts

Description

Transition1x, SrTiO3, and water datasets including all data generated during the active learning loops for water, as well as scripts to calculate locally aggregated uncertainties and cut and relax water boxes for the active learning study, and a Jupyter notebook for the Monte Carlo experiment.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Aug 02, 2024 Version 2

May 02, 2024 Version 1

Version Notes

Clarifications on the scope and applicability of the work, additional benchmarks, additional metrics and analyses.

Metrics

978

464

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2024-k27ps-v2

Funding

Austrian Science Fund

10.55776/COE5

Austrian Science Fund

10.55776/F81

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Spatially resolved uncertainties for machine learning potentials

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share