COVID-19 Knowledge Extractor (COKE): A Tool and a Web Portal to Extract Drug - Target Protein Associations from the CORD-19 Corpus of Scientific Publications on COVID-19

26 November 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Objective: The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationships from the research literature on COVID-19 to assist drug repurposing efforts.

Materials and Methods: SciBiteAI ontological tagging of the COVID Open Research Dataset (CORD-19), a repository of COVID-19 scientific publications, was employed to identify drug-target relationships. Entity identifiers were resolved through lookup routines using UniProt and DrugBank. A custom algorithm was used to identify co-occurrences of protein and drug terms, and confidence scores were calculated for each entity pair.

Results: COKE processing of the current CORD-19 database identified about 3,000 drug-protein pairs, including 29 unique proteins and 500 investigational, experimental, and approved drugs. Some of these drugs are presently undergoing clinical trials for COVID-19.

Discussion: The rapidly evolving situation concerning the COVID-19 pandemic has resulted in a dramatic growth of publications on this subject in a short period. These circumstances call for methods that can condense the literature into the key concepts and relationships necessary for insights into SARS-CoV-2 drug repurposing.

Conclusion: The COKE repository and web application deliver key drug - target protein relationships to researchers studying SARS-CoV-2. COKE portal may provide comprehensive and critical information on studies concerning drug repurposing against COVID-19. COKE is freely available at https://coke.mml.unc.edu/ and the code is available at https://github.com/DnlRKorn/CoKE.

Keywords

COVID-19
CORD-19
COVID-19 data
Drug-target-disease associations
knowledge acquisition
text mining

Supplementary materials

Title
Description
Actions
Title
CoKE ChemRxiv
Description
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.