Abstract
Extracting knowledge from complex and diverse chemical texts is a pivotal task for both experimental and computational chemists. The task is still considered to be extremely challenging due to the complexity of the chemical language and scientific literature. This study explored the power of fine-tuned large language models (LLMs) on five intricate chemical text mining tasks: compound entity recognition, reaction role labelling, metal-organic framework (MOF) synthesis information extraction, nuclear magnetic resonance spectroscopy (NMR) data extraction, and the conversion of reaction paragraph to action sequence. The fine-tuned LLMs models demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. For comparison, we guided GPT-3.5 and GPT-4 with prompt engineering and fine-tuned GPT-3.5 as well as other open-source LLMs such as Llama2, T5, and BART. The results showed that the fine-tuned GPT models excelled in all tasks. It achieved exact accuracy levels ranging from 69% to 95% on these tasks with minimal annotated data. It even outperformed those task-adaptive pre-training and fine-tuning models that were based on a significantly larger amount of in-domain data. Given its versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as flexible and effective toolkits for automated data acquisition could revolutionize chemical knowledge extraction.