Abstract
The extraction of chemical information from images, also known as optical chemical structure recognition (OCSR) has recently gained new attention. This new interest is ignited by the various machine learning methods introduced over the last years and the possibility to train image models for specific tasks such as OCSR. Thus, in this paper we have compared 6 open access OCSR methods (DECIMER, ReactionDataExtractor, MolScribe, RxnScribe, MolVec and OSRA) using an independent test set of images from patents and patent applications as this is an application area of general interest - precision and recall are highly desired by those who are analysing the intellectual property of chemistry patents. As a result, the used methods have shown different strengths when predicting structures from different images containing different modalities and chemistry categories. These existing methodologies for image extraction overall remain unsatisfactory, indicating a need for further advancements in the field. Thus, we have created a machine learning image classifier, classifying images into one out of four image categories and applying the best performing OCSR method for this category. The classifier, image comparator tools and datasets have been made available to the public as open access tools.
Supplementary weblinks
Title
Supplementary Material for Publication: Comparing Image-to-Chemistry Tools
Description
Datasets and Software used in the publication
Actions
View