Alarming structural error rates in MOF databases used in data driven workflows identified via a novel metal oxidation state-based method

10 October 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Metal-organic frameworks (MOFs) are a diverse class of porous materials composed of inorganic nodes joined by organic linkers, currently under investigation for a wide range of applications including gas storage and separation where they have been commercialized. Given the labor-intensive nature of synthesizing and testing individual MOFs, high-throughput computational screening and machine learning (ML) methods are increasingly viewed as essential for facilitating MOF development. However, the structural fidelity of the “computation-ready” MOF databases used in such studies remains largely unquantified. We introduce MOSAEC, an algorithm that detects chemically invalid structures on the basis of metal oxidation states. MOSAEC was manually validated against ~16k MOF structures from the popular CoRE database, and was found to flag erroneous structures with 95% accuracy. Systematic examination of 14 leading experimental and hypothetical MOF databases containing >1.9 million MOFs reveals concerning structural error rates, exceeding 40% in most cases.

Keywords

Metal Organic Frameworks
Materials Databases
Data Driven Methods
High-Throughput Screening

Supplementary materials

Title
Description
Actions
Title
Details of Validation
Description
• Details of the MOSAEC algorithm and validation are provided. • The manual validation sets used to check oxidation state accuracy, error sensitivity, and error flag accuracy. • A complete list of structures flagged by MOSAEC as being problematic in each databased screened in this work.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.