MinKLIFSAI: a simple machine learning approach toward selective kinase inhibitor

01 October 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The aim of achieving selectivity in kinase inhibition is a big challenge within the realm of drug discovery, particularly due to the structural similarities between various kinases. Can machine learning be leveraged to overcome this hurdle? Utilizing different fingerprints may indeed lead to improved results. However, is there a single machine-learning approach that can effectively address selectivity across all kinases. In this study, the author collect kinase activity data from PubChem database (January 2023) using Uniprot IDs for each kinase. Each Uniprot ID is associated with its unique dataset, and duplicate points were removed to ensure accuracy. The data was then appended together, and any datasets containing fewer than 120 points were discarded. Each data point was categorized as either Active (1) or Inactive (0) based on the activity data. Two fingerprinting approaches were employed for predictions: MACCS fingerprints and Morgan2 (ECFP2) with a 2048-bit representation. The combined dataset was then divided into two subsets, one featuring imbalance data and another with balanced data. Random Forest and Artificial Neural Network models were applied to both datasets. To evaluate the performance of these models, various metrics were employed, including accuracy, sensitivity, specificity, and area under the curve (AUC). The results showed that Morgan fingerprinting performed slightly better than MACCS fingerprinting. A total of 480 target IDs was produced, with 452 unique IDs identified. On each dataset(balance and imbalance), two models were developed for both fingerprints, resulting in a combined total of 1920 predictions. Interestingly, the imbalance data yielded higher specificity compared to the balanced data. Each model has been deployed and made publicly available at (github.com/phalem/minKLIFSAI). However, the current data on all kinases is not yet sufficient to enable machine learning to reliably discover selective inhibitors

Keywords

Kinase
Fingerprint
Selective inhibitor
Machine learning
Random Forest
Neural Network

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.