First report of q-RASAR modeling towards an approach of easy interpretability and efficient transferability

18 April 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Quantitative structure-activity relationship (QSAR) and read-across techniques have recently been merged into a new emerging field of Read-across Structure-Activity Relationship (RASAR) that uses the chemical similarity concepts of read-across (an unsupervised step) and finally develops a supervised learning model (like QSAR). The RASAR method has so far been used only in the case of graded predictions or classification modeling. In this work, we attempt, for the first time, to apply RASAR for quantitative predictions (q-RASAR) using a case study of androgen receptor binding affinity data. We have computed a number of error-based and similarity-based measures such as weighted standard deviation of the predicted values, coefficient of variation of the computed predictions, average similarity level of close training compounds for each query molecule, standard deviation and coefficient of variation of similarity levels, maximum similarity levels to positive and negative close training compounds, a concordance measure indicating similarity to positive, negative or both classes of close training compounds, etc. We have clubbed these additional measures along with the selected chemical descriptors from the previously developed QSAR model and redeveloped new partial least squares (PLS) models from the training set, and predicted the endpoint using the query data set. Interestingly, these new models outperform the internal and external validation quality of the original QSAR model. In this study, we have also introduced a new similarity-based concordance measure that can significantly contribute to the model quality. A q-RASAR model also has the advantage over read-across predictions in providing easy interpretation and indicating quantitative contributions of important chemical features. The strategy described here should be applicable to other biological/toxicological/property data modeling for enhanced quality of predictions, easy interpretability, and efficient transferability.

Keywords

q-RASAR
Read-across
QSAR
Similarity
Prediction

Supplementary materials

Title
Description
Actions
Title
Data files
Description
The .zip folder contains the original data set used for modeling with the SMILES of the compounds along with observed receptor binding affinity, the data files for best subset regression and intelligent consensus predictions, and all the reported models in the Excel format.
Actions
Title
Supplementary Plots file
Description
The file contains score plots, applicability domain plots, and randomization plots of individual PLS models
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.