CatEmbed: A Machine-Learned Representation Obtained via Categorical Entity Embedding for Predicting Adsorption and Reaction Energies on Bimetallic Alloy Surfaces

Clara Kirkvold; Brianna Collins; Jason Goodpaster

doi:10.26434/chemrxiv-2024-1r7v9

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

CatEmbed: A Machine-Learned Representation Obtained via Categorical Entity Embedding for Predicting Adsorption and Reaction Energies on Bimetallic Alloy Surfaces

22 May 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine-learning models for predicting adsorption energies on metallic surfaces often relies on basic elemental properties, electronic, and geometric descriptors. Here, we apply categorical entity embedding, a featurization method inspired by natural language processing techniques, to predict adsorption energies on bimetallic alloy surfaces using categorical descriptors. Using this method, we develop a machine-learned representation from categorical descriptors (e.g., surface composition, adsorbate type, and site type) of the slab/adsorbate complex. By combining this representation with numerical features (e.g., slab metal stoichiometric ratios), we create the CatEmbed representation. Remarkably, decision tree models trained using CatEmbed, which includes no explicit geometric information, achieve a Mean Absolute Error (MAE) of 0.12 eV. Additionally, we extended this technique to predict reaction energies on bimetallic surfaces, creating the CatEmbed-React representation, which achieves an MAE of 0.08 eV. These findings highlight the effectiveness of categorical entity embedding for predicting adsorption and reaction energies on bimetallic alloy surfaces.

Keywords

computational catalysis

catalysis screening

natural language processing

machine learning

feature engineering

adsorption energies

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Summary of each feature representation presented in main text, details on model and feature selection, performance of each model used to calculate average MAEs presented in text, and comparison of entity embedding network vs. CatBoost model performance.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

May 22, 2024 Version 1

Metrics

321

198

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2024-1r7v9

Funding

National Science Foundation

CHE-1945525

National Science Foundation Graduate Research Fellowship Program

2237827

Camille and Henry Dreyfus Foundation

Award ML-20-146

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

CatEmbed: A Machine-Learned Representation Obtained via Categorical Entity Embedding for Predicting Adsorption and Reaction Energies on Bimetallic Alloy Surfaces

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share