Identification of the Core Chemical Structure in SureChEMBL Patents

Maria J. Falaguera; Jordi Mestres

doi:10.26434/chemrxiv.13660994.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Identification of the Core Chemical Structure in SureChEMBL Patents

01 February 2021, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The SureChEMBL database provides open access to 17 million chemical entities mentioned in 14 million patents published since 1970. However, alongside with molecules covered by patent claims, the database is full of starting materials and intermediate products of little pharmacological relevance. Herein, we introduce a new filtering protocol to automatically select the core chemical structures best representing a congeneric series of pharmacologically relevant molecules in patents. The protocol is first validated against a selection of 890 SureChEMBL patents for which a total of 51,738 manually curated molecules are deposited in ChEMBL. Our protocol was able to select 92.5% of the molecules in ChEMBL from all 270,968 molecules in SureChEMBL for those patents. Subsequently, the protocol was applied to all 240,988 US pharmacological patents for which 9,111,706 molecules are available in SureChEMBL. The unsupervised filtering process selected 5,949,214 molecules (65.3% of the total number of molecules) that form highly congeneric chemical series in 188,795 of those patents (78.3% of the total number of patents).

Keywords

patent databases

SureChEMBL

Markush structures

Maximum common structure

Bioactive compounds

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Identification of the Core Chemical Structure in SureChEMBL Patents

Maria J. Falaguera, Jordi Mestres journal article

Journal of Chemical Information and Modeling , Volume 61, Issue 5

Online publication date: Apr 30, 2021

Version History

Feb 01, 2021 Version 1

Metrics

1,960

753

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.13660994.v1

Author’s competing interest statement

No conflict of interest

Identification of the Core Chemical Structure in SureChEMBL Patents

Authors

Abstract

Keywords

Comments

Now Published

Version History

Metrics

License

DOI

Author’s competing interest statement

Share