Abstract
Metal-organic frameworks (MOF) have garnered much attention as promising catalysts due to their tunable porosity, high surface area, and diversity of catalytic metal clusters and organic linkers as building blocks. The presence of open metal sites (OMS) significantly influences the catalytic, adsorption, and separation capabilities of MOFs. However, common laboratory methods are indirect and can suffer from structural heterogeneity. Computational methods, including machine learning, play a central role in the rational design of MOFs, yet in silico detection of OMS still relies heavily on computationally expensive simulations. In this work, we use extreme gradient boosting (XGboost) and random forest (RF) methods to predict the existence of OMS in various MOF compounds based on structural and chemical features. RF provided a higher prediction accuracy of 0.891 compared to 0.865 of XGBoost. Average ionization energy, average electron affinity, and fraction of electrons in d orbitals exhibited the highest importance scores across the two models. These prediction models not only provide novel insights into the structural-property relationship between MOFs and OMS, but also would enable accurate and efficient exploration of MOFs that would give rise to OMS, facilitating the engineering of sorption, separation, and catalytic properties.