Abstract
Density functional theory (DFT) is a ubiquitous first-principles method, but the approximate nature of the exchange-correlation functional poses an inherent limitation for the accuracy of various computed properties. In this context, surrogate models based on machine learning have the potential to provide a more efficient and physically meaningful understanding of electronic properties, such as the exciton binding energy. Here, we construct a regression model based on gradient boosting on decision trees (CatBoost) for prediction of the exciton binding energy of 2D materials from simple physical descriptors, using the Computational 2D Materials Database (C2DB). Out of 22 atomic features, the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy levels exhibit the highest importance scores, with HOMO having a much more important role compared to other features. We obtain a R2 value of 0.8023 and mean absolute error of 0.2138 eV using the top 9 features. Our work presents a rapid and interpretable prediction model for exciton biding energy with high fidelity to DFT and can be extended beyond the C2DB dataset considered in this study.