Composition and structure analyzer/featurizer for explainable machine-learning models to predict solid state structures

15 October 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Traditional and non-classical machine learning models for solid-state structure prediction have predominantly relied on compositional features (derived from properties of constituent elements) to predict the existence of structure and its properties. However, the lack of structural information can be a source of suboptimal property mapping and increased predictive uncertainty. To address the challenge, we introduce a strategy that generates and combines both compositional and structural features with minimal programming expertise required. Our approach utilizes open-source, interactive Python programs named Composition Analyzer Featurizer (CAF) and Structure Analyzer Featurizer (SAF). CAF generates numerical compositional features from a list of formulas provided in an Excel file, while SAF extracts numerical structural features from a .cif file by generating a supercell. 133 features from CAF and 94 features from SAF were used either individually or in combination to cluster nine structure types in equiatomic AB intermetallics. The performance was comparable to those with features state-of-the art featurizers in advanced machine learning models. Our SAF+CAF features provided a cost-efficient and reliable solution, even with the PLS-DA method, where a significant fraction of the most contributing features were the same as those identified in the more computationally intensive XGBoost models.

Keywords

machine learning
feature engineering
software
crystal structure
materials infomatics

Supplementary materials

Title
Description
Actions
Title
Composition and structure analyzer/featurizer for explainable machine-learning models to predict solid state structures
Description
SI contains user experience evaluation for open-source available featurizers in solid state chemistry. It also contains lists of features generated using our CAF/SAF software tools presented in the manuscript and other additional features that may be included.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.