COVID-19 Knowledge Extractor (COKE): A Tool and a Web Portal to Extract Drug - Target Protein Associations from the CORD-19 Corpus of Scientific Publications on COVID-19

Daniel Korn; Vera Pervitsky; Tesia Bobrowski; Vinicius Alves; Charles Schmitt; Cristopher Bizon; Nancy Baker; Rada Chirkova; Artem Cherkasov; Eugene Muratov; Alexander Tropsha

doi:10.26434/chemrxiv.13289222.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

COVID-19 Knowledge Extractor (COKE): A Tool and a Web Portal to Extract Drug - Target Protein Associations from the CORD-19 Corpus of Scientific Publications on COVID-19

26 November 2020, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Objective: The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationships from the research literature on COVID-19 to assist drug repurposing efforts.

Materials and Methods: SciBiteAI ontological tagging of the COVID Open Research Dataset (CORD-19), a repository of COVID-19 scientific publications, was employed to identify drug-target relationships. Entity identifiers were resolved through lookup routines using UniProt and DrugBank. A custom algorithm was used to identify co-occurrences of protein and drug terms, and confidence scores were calculated for each entity pair.

Results: COKE processing of the current CORD-19 database identified about 3,000 drug-protein pairs, including 29 unique proteins and 500 investigational, experimental, and approved drugs. Some of these drugs are presently undergoing clinical trials for COVID-19.

Discussion: The rapidly evolving situation concerning the COVID-19 pandemic has resulted in a dramatic growth of publications on this subject in a short period. These circumstances call for methods that can condense the literature into the key concepts and relationships necessary for insights into SARS-CoV-2 drug repurposing.

Conclusion: The COKE repository and web application deliver key drug - target protein relationships to researchers studying SARS-CoV-2. COKE portal may provide comprehensive and critical information on studies concerning drug repurposing against COVID-19. COKE is freely available at https://coke.mml.unc.edu/ and the code is available at https://github.com/DnlRKorn/CoKE.

Keywords

COVID-19

CORD-19

COVID-19 data

Drug-target-disease associations

knowledge acquisition

text mining

Supplementary materials

Title

Description

Actions

Title

CoKE ChemRxiv

Description

Actions

Supplementary weblinks

Title

Description

Actions

Title

Description

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Nov 26, 2020 Version 1

Metrics

2,783

714

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.13289222.v1

Funding

NCATS [OT2TR003441]

NIH [1U01CA207160]

Author’s competing interest statement

The authors declare no conflicts of interest.

COVID-19 Knowledge Extractor (COKE): A Tool and a Web Portal to Extract Drug - Target Protein Associations from the CORD-19 Corpus of Scientific Publications on COVID-19

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share