Abstract
Traditional and non-classical machine learning models for solid-state structure prediction have predominantly relied on compositional features (derived from properties of constituent elements) to predict the existence of structure and its properties. However, the lack of structural information can be a source of suboptimal property mapping and increased predictive uncertainty. To address the challenge, we introduce a strategy that generates and combines both compositional and structural features with minimal programming expertise required. Our approach utilizes open-source, interactive Python programs named Composition Analyzer Featurizer (CAF) and Structure Analyzer Featurizer (SAF). CAF generates numerical compositional features from a list of formulas provided in an Excel file, while SAF extracts numerical structural features from a .cif file by generating a supercell. 133 features from CAF and 94 features from SAF were used either individually or in combination to cluster nine structure types in equiatomic AB intermetallics. The performance was comparable to those with features state-of-the art featurizers in advanced machine learning models. Our SAF+CAF features provided a cost-efficient and reliable solution, even with the PLS-DA method, where a significant fraction of the most contributing features were the same as those identified in the more computationally intensive XGBoost models.
Supplementary materials
Title
Composition and structure analyzer/featurizer for explainable machine-learning models to predict solid state structures
Description
SI contains user experience evaluation for open-source available featurizers in solid state chemistry. It also contains lists of features generated using our CAF/SAF software tools presented in the manuscript and other additional features that may be included.
Actions
Supplementary weblinks
Title
Composition Analyzer/Featurizer (CAF)
Description
An interactive Python script that generates chemical compositional features and provides tools for filtering, sorting, and merging data.
Actions
View Title
Structure Analysis/Featurizer (SAF)
Description
A Python script designed to process CIF files and extract geometric features. These features include interatomic distances, information on atomic environments, and coordination numbers.
Actions
View