Abstract
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between “druggable” and “undruggable” proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein-protein interaction network are all important discriminant factors. However, many prior efforts to automate the assessment of protein druggability suffer from low performance or poor interpretability. We developed a neural network-based machine learning model capable of generating druggability sub-scores based on each of four distinct categories, combining them to form an overall druggability score. The model achieves an excellent performance in separating drugged and undrugged proteins in the human proteome, with an area under the receiver operating characteristic (AUC) of 0.95. Our use of multiple sub-scores allows the assessment of potential protein targets of interest based on distinct contributors to druggability, leading to a more interpretable and holistic model to identify novel targets.
Supplementary materials
Title
Comparative model performance
Description
Comparison of our model to other whole-proteome ML-based druggability models across sensitivity, specificity, accuracy, and AUC
Actions
Title
Highest scoring undrugged proteins
Description
The status of our highest scoring undrugged proteins in the Therapeutic Targets Database (TTD) and OpenTargets
Actions
Title
All protein scores
Description
The overall druggability scores and subscores for every protein in our dataset
Actions
Title
Feature importances
Description
The importances as measured by increase in test loss after random permutation for all features in our dataset
Actions