PINNED: Identifying Characteristics of Druggable Human Proteins Using an Interpretable Neural Network

30 March 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between “druggable” and “undruggable” proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.

Keywords

Protein
Druggability
Machine learning
Neural network
Interpretable
AlphaFold
Protein pocket
Dark genome
Proteome
Drug target

Supplementary materials

Title
Description
Actions
Title
Comparative model performance
Description
Comparison of our model to other whole-proteome ML-based druggability models across sensitivity, specificity, accuracy, and AUC
Actions
Title
Highest scoring undrugged proteins
Description
The status of our highest scoring undrugged proteins in the Therapeutic Targets Database (TTD) and OpenTargets
Actions
Title
All protein scores
Description
The overall druggability scores and subscores for every protein in our dataset
Actions
Title
Feature importances
Description
The importances as measured by increase in test loss after random permutation for all features in our dataset
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.