Comparing Optical Chemical Structure Recognition Tools

Lutz Weber; Aleksei Krasnov; Shadrack Barnabas; Timo Böhme; Stephen Boyer

doi:10.26434/chemrxiv-2023-d6kmg

Chemical Education

Search within Chemical Education

Comparing Optical Chemical Structure Recognition Tools

21 November 2023, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The extraction of chemical information from images, also known as optical chemical structure recognition (OCSR) has recently gained new attention. This new interest is ignited by the various machine learning methods introduced over the last years and the possibility to train image models for specific tasks such as OCSR. Thus, in this paper we have compared 6 open access OCSR methods (DECIMER, ReactionDataExtractor, MolScribe, RxnScribe, MolVec and OSRA) using an independent test set of images from patents and patent applications as this is an application area of general interest - precision and recall are highly desired by those who are analysing the intellectual property of chemistry patents. As a result, the used methods have shown different strengths when predicting structures from different images containing different modalities and chemistry categories. These existing methodologies for image extraction overall remain unsatisfactory, indicating a need for further advancements in the field. Thus, we have created a machine learning image classifier, classifying images into one out of four image categories and applying the best performing OCSR method for this category. The classifier, image comparator tools and datasets have been made available to the public as open access tools.

Keywords

optical chemical structure recgnition

image to structure

image to reaction

Supplementary weblinks

Title

Description

Actions

Title

Supplementary Material for Publication: Comparing Image-to-Chemistry Tools

Description

Datasets and Software used in the publication

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Feb 01, 2024 Version 2

Nov 21, 2023 Version 1

Metrics

1,712

873

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-d6kmg

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Comparing Optical Chemical Structure Recognition Tools

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share