Abstract
Perovskite materials, renowned for their versatility and remarkable properties, pose challenges in discovering optimal candidates due to the vast compositional space. Data-driven machine learning (ML) offers promise in expediting material discovery; however, the trade-off between accuracy and efficiency across different ML models for predicting perovskite properties is not well understood. In this study, we conducted a comprehensive assessment of various ML models for predicting the formation energy (Ef) and bandgap (Eg) of perovskites. We designed a protocol to extract perovskite structures from three databases based on the stoichiometry, octahedral lattice motif, and alignment with established perovskite prototype structures. Benchmarking conventional ML algorithms (CML) against graph neural network (GNN) models across three datasets, we identified the GATGNN model as the top performer, balancing exceptional prediction accuracy and computational efficiency. We further investigated the impact of data size on model performance, emphasizing the need for over 1000 data points for optimal prediction accuracy. Additionally, through SHAP analysis, we provided valuable insights into the interpretation of CML models in predicting Ef and Eg. Our study establishes a standardized benchmark for evaluating various ML models across diverse datasets of perovskite materials, facilitating future applications in materials science, particularly in model selection and the advancement of perovskite materials.
Supplementary materials
Title
SI
Description
Supplementary Information: Comparative Analysis of Classical Machine Learning and Graph Neural Network Models for Perovskite Property Prediction
Actions