Abstract
Quantitative structure-activity relationship (QSAR) and read-across techniques have recently been merged into a new emerging field of Read-across Structure-Activity Relationship (RASAR) that uses the chemical similarity concepts of read-across (an unsupervised step) and finally develops a supervised learning model (like QSAR). The RASAR method has so far been used only in the case of graded predictions or classification modeling. In this work, we attempt, for the first time, to apply RASAR for quantitative predictions (q-RASAR) using a case study of androgen receptor binding affinity data. We have computed a number of error-based and similarity-based measures such as weighted standard deviation of the predicted values, coefficient of variation of the computed predictions, average similarity level of close training compounds for each query molecule, standard deviation and coefficient of variation of similarity levels, maximum similarity levels to positive and negative close training compounds, a concordance measure indicating similarity to positive, negative or both classes of close training compounds, etc. We have clubbed these additional measures along with the selected chemical descriptors from the previously developed QSAR model and redeveloped new partial least squares (PLS) models from the training set, and predicted the endpoint using the query data set. Interestingly, these new models outperform the internal and external validation quality of the original QSAR model. In this study, we have also introduced a new similarity-based concordance measure that can significantly contribute to the model quality. A q-RASAR model also has the advantage over read-across predictions in providing easy interpretation and indicating quantitative contributions of important chemical features. The strategy described here should be applicable to other biological/toxicological/property data modeling for enhanced quality of predictions, easy interpretability, and efficient transferability.
Supplementary materials
Title
Data files
Description
The .zip folder contains the original data set used for modeling with the SMILES of the compounds along with observed receptor binding affinity, the data files for best subset regression and intelligent consensus predictions, and all the reported models in the Excel format.
Actions
Title
Supplementary Plots file
Description
The file contains score plots, applicability domain plots, and randomization plots of individual PLS models
Actions
Supplementary weblinks
Title
Read-Across v4.0
Description
The read-across tool for computing read-across predictions along with several error and similarity measures is available for download from this link
Actions
View Title
DTC Lab tools site
Description
DTC Lab tools like best subset selection, PLS regression, and intelligent consensus predictions are available from download from this link
Actions
View