Alternative weighting schemes for fine-tuned extended similarity index calculations

Kenneth Lopez Perez; Anita Racz; David Bajusz; Camila Gonzalez; Karoly Heberger; Ramon Miranda-Quintana

doi:10.26434/chemrxiv-2024-0b8sf

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Alternative weighting schemes for fine-tuned extended similarity index calculations

05 February 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Extended similarity indices (i.e. generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation and superiority in tasks like diversity picking. However, they operate with several meta parameters that should be optimized. Earlier, we extended the binary similarity indices to ‘discrete non-binary’ and ‘continuous’ data; now we continue with introducing and comparing multiple weighting functions. As a case study, the similarity of CYP enzyme inhibitors (4016 molecules after curation) was characterized by their extended similarities, based on 2D descriptors, MACCS and Morgan fingerprints. A statistical workflow based on sum of ranking differences (SRD) and analysis of variance (ANOVA) was used for finding the optimal weight function(s). Overall, the best weighting function is the fraction (“frac”), while optimal extended similarity indices were also found, and their differences are revealed across different data sets. We intend this work to be a guideline for users of extended similarity indices regarding the various weighting options available. Source code for the calculations is available at https://github.com/mqcomplab/MultipleComparisons.

Keywords

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Feb 05, 2024 Version 1

Metrics

459

198

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-0b8sf

Funding

National Research Development and Innovation Office of Hungary

K134260

National Research Development and Innovation Office of Hungary

FK146063

Hungarian Academy of Sciences: János Bolyai Research Scholarship

New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund

ÚNKP-23-5

National Institute of General Medical Sciences of the National Institutes of Health

R35GM150620

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Alternative weighting schemes for fine-tuned extended similarity index calculations

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share