Text2Struc: Programmatic Crystal Structure Generation with Fine-Tuned Large Language Models

Viktoriia Baibakova

doi:10.26434/chemrxiv-2025-pvgp0-v2

Materials Science

Search within Materials Science

Text2Struc: Programmatic Crystal Structure Generation with Fine-Tuned Large Language Models

13 February 2025, Version 2

Working Paper

Viktoriia Baibakova

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Accelerating computational materials science relies not only on hardware advances but also on software that increases the ease of working with the relevant abstractions. Creation and manipulation of crystal structures is a part of many routine materials science workflows. In this work, we demonstrate how fine tuning large language models can be used to generate crystal structures from textual descriptions. By fine-tuning a CodeGen model with low-rank adaptation, we developed an interface that reduces errors and enables more flexible and powerful input, particularly for larger or more complex structures. Our model, which we call Text2Struc, is used to compare structure generation from the Materials Project database against LLM-generated and API-executed outputs. We show that API calls have higher accuracy, especially for supercells or defected crystals, as evidenced by an increase in the number of matches with original structures. Furthermore, removing Crystallographic Information File (CIF) outputs during training enhances generation fidelity, as the model trained without CIFs has a higher success rate than the model that prints CIFs in addition to the generating code. We hypothesize this may be owed to the base model being oriented towards generating code. Our findings highlight the effectiveness of fine-tuning and API integration for automating crystal structure generation in materials science.

Keywords

Large Language Models

Generative Models

Crystal Structures

Supplementary materials

Title

Description

Actions

Title

Supplementary Information for Text2Struc: Programmatic Crystal Structure Generation with Fine-Tuned Large Language Models

Description

Supplementary Information contains examples of dataset and additional details of the method.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Feb 13, 2025 Version 2

Feb 07, 2025 Version 1

Version Notes

Updated results, data access, authors, acknowledgements

Metrics

516

366

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-pvgp0-v2

Funding

Toyota Research Institute

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Text2Struc: Programmatic Crystal Structure Generation with Fine-Tuned Large Language Models

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share