Natural Language Processing for Automated Workflow and Knowledge Graph Generation in Self-Driving Labs

Bastian Ruehle

doi:10.26434/chemrxiv-2025-0p7xx

Abstract

Natural language processing with the help of large language models such as ChatGPT has become ubiquitous in many software applications and allows users to interact even with complex hardware or software in an intuitive way. The recent concepts of Self-Driving Labs and Material Acceleration Platforms stand to benefit greatly from making them more accessible to a broader scientific community through enhanced user-friendliness or even completely automated ways of generating experimental workflows that can be run on the complex hardware of the platform from user input or previously published procedures. Here, two new datasets with over 1.5 million experimental procedures and their (semi)automatic annotations as action graphs, i.e., structured output, were created and used for training two different transformer-based large language models. These models strike a balance between performance, generality, and fitness for purpose and can be hosted and run on standard consumer-grade hardware. Furthermore, the generation of node graphs from these action graphs as a user-friendly and intuitive way of visualizing and modifying synthesis workflows that can be run on the hardware of a Self-Driving Lab or Material Acceleration Platform is explored. Lastly, it is discussed how knowledge graphs - following an ontology imposed by the underlying node setup and software architecture - can be generated from the node graphs. All resources, including the datasets, the fully trained large language models, the node editor, and scripts for querying and visualizing the knowledge graphs are made publicly available.

Keywords

Large Language Model

Knowledge Graph

Workflow

Materials Acceleration Platforms

Self Driving Labs

Nanomaterials

Advanced Materials

Natural Language Processing

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

It contains more details on the dataset generation, the in context learning prompts, some exemplary outputs of the models discussed in the text, and additional figures and examples from the node editor.

Actions

Supplementary weblinks

Title

Description

Actions

Title

Github Repository

Description

Repository with the Python code and modules used in the project, as well as more extensive feature descriptions, documentations, a video showcasing the node editor, and links to further resources.

Actions

View

Natural Language Processing for Automated Workflow and Knowledge Graph Generation in Self-Driving Labs

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share

Natural Language Processing for Automated Workflow and Knowledge Graph Generation in Self-Driving Labs

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share