Abstract
The aim of achieving selectivity in kinase inhibition is a big challenge within the realm of drug discovery, particularly due to the structural similarities between various kinases. Can machine learning be leveraged to overcome this hurdle? Utilizing different fingerprints may indeed lead to improved results. However, is there a single machine-learning approach that can effectively address selectivity across all kinases. In this study, the author collect kinase activity data from PubChem database (January 2023) using Uniprot IDs for each kinase. Each Uniprot ID is associated with its unique dataset, and duplicate points were removed to ensure accuracy. The data was then appended together, and any datasets containing fewer than 120 points were discarded. Each data point was categorized as either Active (1) or Inactive (0) based on the activity data. Two fingerprinting approaches were employed for predictions: MACCS fingerprints and Morgan2 (ECFP2) with a 2048-bit representation. The combined dataset was then divided into two subsets, one featuring imbalance data and another with balanced data. Random Forest and Artificial Neural Network models were applied to both datasets. To evaluate the performance of these models, various metrics were employed, including accuracy, sensitivity, specificity, and area under the curve (AUC). The results showed that Morgan fingerprinting performed slightly better than MACCS fingerprinting. A total of 480 target IDs was produced, with 452 unique IDs identified. On each dataset(balance and imbalance), two models were developed for both fingerprints, resulting in a combined total of 1920 predictions. Interestingly, the imbalance data yielded higher specificity compared to the balanced data. Each model has been deployed and made publicly available at (github.com/phalem/minKLIFSAI). However, the current data on all kinases is not yet sufficient to enable machine learning to reliably discover selective inhibitors
Supplementary weblinks
Title
Streamlit app for virtual screening
Description
This link contain the Streamlit app mention in the paper with the models
Actions
View Title
Main project link that include data used
Description
This will be the main repository for future contribution
Actions
View Title
Zenodo link for figures, app, models, data used
Description
This contain all the data required for the publication
Actions
View