Abstract
Machine learning potentials have become an essential tool for atomistic simulations, yielding results close to ab-initio simulations at a fraction of computational cost. With recent improvements on the achievable accuracies, the focus has now shifted on the dataset composition itself. The reliable identification of erroneously predicted configurations to extend a given dataset is therefore of high priority. Yet, uncertainty estimation techniques have achieved mixed results for machine learning potentials. Consequently, a general and versatile method to correlate energy or atomic force uncertainties with the model error has remained elusive to date. In the current work, we show that epistemic uncertainty cannot correlate with model error by definition, but can be aggregated over groups of atoms to yield a strong correlation. We demonstrate that our method correctly estimates prediction errors both globally per structure, and locally resolved per atom. The direct correlation of local uncertainty and local error is used to design an active learning framework based on identifying local sub-regions of a large simulation cell, and performing ab-initio calculations only for the sub-region subsequently. We successfully utilize this method to perform active learning in the low-data regime for liquid water.
Supplementary materials
Title
Supporting Information
Description
Additional benchmarks including high bias models, as well as several further uncertainty metrics for all systems studied.
Actions
Supplementary weblinks
Title
Data and scripts
Description
Transition1x, SrTiO3, and water datasets including all data generated during the active learning loops for water, as well as scripts to calculate locally aggregated uncertainties and cut and relax water boxes for the active learning study, and a Jupyter notebook for the Monte Carlo experiment.
Actions
View