Large Language Models for Inorganic Synthesis Predictions

Seongmin Kim; Yousung Jung; Joshua Schrier

doi:10.26434/chemrxiv-2024-9bmfj-v3

Materials Chemistry

Search within Materials Chemistry

Large Language Models for Inorganic Synthesis Predictions

17 June 2024, Version 3

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We evaluate the effectiveness of pre-trained and fine-tuned large language models (LLMs) for predicting the synthesizability of inorganic compounds and the selection of precursors needed to perform inorganic synthesis. The predictions of fine-tuned LLMs are comparable to—and sometimes better than—recent bespoke machine learning models for these tasks, but require only minimal user expertise, cost, and time to develop. Therefore, this strategy can serve both as an effective and strong baseline for future machine learning studies of various chemical applications and as a practical tool for experimental chemists.

Keywords

large language models

llm

synthesizability

precursor selection

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Description of data preparation. Plots of the distribution of number of unique reactions and number of precursors. Description of model construction and training. LLM prompts. Description for evaluation metrics. Tables of the model performance for the synthesizability task. Description of methods and results for re-evaluating top-5 predictions using GPT-4 and code for associated statistical tests. Description of PU learning prompt modification experiments and table of results. Histogram of top-10 precursors occurrences. (PDF)

Actions

Supplementary weblinks

Title

Description

Actions

Title

Data and Code (Github)

Description

Data and Code for the study

Actions

View

Title

Source code for the stoi-CGNF model

Description

Source code for the stoi-CGNF synthesizability prediction model

Actions

View

Title

Source code for the Elemwise model

Description

Source code for the Elementwise precursor prediction model

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jun 17, 2024 Version 3

Apr 29, 2024 Version 2

Apr 18, 2024 Version 1

Version Notes

Primarily changes to supporting information: clarifying methodology and procedure, new section on the effect of prompt specificity. Figures 1 & 2 have been updated with most recent results and to improve clarity.

Metrics

2,508

1,453

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-9bmfj-v3

Funding

National Research Foundation of Korea

RS-2023-00283902, 2021R1A5A1030054

Institute for Information and Communications Technology Promotion

2021-0-01343

Basic Energy Sciences

KC0302031

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Large Language Models for Inorganic Synthesis Predictions

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share