Reaction impurity prediction using a data mining approach

Adarsh Arun; Zhen Guo; Simon Sung; Alexei Lapkin

doi:10.26434/chemrxiv-2022-0btmt

Organic Chemistry

Search within Organic Chemistry

Reaction impurity prediction using a data mining approach

06 July 2022, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Automated prediction of reaction impurities can be useful in facilitating rapid early-stage reaction development, synthesis planning and optimization. Existing reaction predictors are catered towards main product prediction, and are often black-box, making it difficult to troubleshoot erroneous outcomes. This work presents an automated, interpretable impurity prediction workflow based on data mining large chemical reaction databases. A 14-step workflow was implemented in Python and RDKit using Reaxys® data. Evaluation of potential chemical reactions between functional groups present in the same reaction environment in the user-supplied query species can be accurately performed by directly mining the Reaxys® database for similar or ‘analogue’ reactions involving these functional groups. Reaction templates can then be extracted from analogue reactions and applied to the relevant species in the original query to return impurities and transformations of interest. Three proof-of-concept case studies based on active pharmaceutical ingredients (paracetamol, agomelatine and lersivirine) were conducted, with the workflow able to suggest the correct impurities within the top two outcomes. At all stages, suggested impurities can be traced back to the originating template and analogue reaction in the literature, allowing for closer inspection and user validation. Ultimately, this work could be useful as a benchmark for more sophisticated algorithms or models since it is interpretable, as opposed to purely black-box solutions, and illustrates the potential of chemical data in impurity prediction.

Keywords

impurities prediction

chemoinformatics

retrosynthesis

computer-aided synthesis planning

graphs

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jul 06, 2022 Version 1

Metrics

1,048

582

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-0btmt

Funding

NRF Singapore

C4T CREATE

Pharma Innovation Partnership in Singapore (PIPS)

Author’s competing interest statement

ZG and AAL are co-founders of Chemical Data Intelligence (CDI) Pte Ltd (cdi-sg.com), which was set-up to commercially exploit the chemical data networks.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Reaction impurity prediction using a data mining approach

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share