Evaluating molecular similarity measures: Do similarity measures reflect electronic structure properties?
Comments
While I highly appreciate the work done (especially the high number of molecules included), I miss some points heavily: i) To my knowledge the first similarity measure based on electron density has been created by Carbó et al. DOI: 10.1002/qua.560170612 . Maybe, it is worth mentioning DOI: 10.1016/s0065-3276(08)60021-0 , too. ii) Our multicriteria comparison of fingerprints & cosine similarity (Fig. 13 in [https://www.researchgate.net/publication/315513438 ] shows the subordinate role of cosine similarity. A discussion would be warranted. iii) Instead of the pairwise comparison options (cited frequently), it is expedient to make multiple (n-ary comparisons), which is computationally faster and superior in diversity picking, see e.g. [ DOI: 10.1186/s13321-021-00505-3 and DOI: 10.1186/s13321-021-00504-4 ] iv) It would be interesting to see whether better alternatives exist over “top area ratios” Best regards Karoly [email protected]
Response,
Chad Risko
: Feb 03, 2025, 20:49
Dr. Heberger - Thank you for your suggestions! Let us dig through these suggested references and come back with a response. There is a lot of excellent work in this space, and we want to be certain to appropriately account for it. Sincerely, Chad
Response,
Rebekah Duke
: Feb 07, 2025, 16:17
Thank you for your thoughtful comments and for directing our attention to these other works. Regarding the first similarity measure based on electron density, we acknowledge its foundational role in this class of measures. However, such methods require wavefunction calculations, making them computationally demanding and impractical for large-scale applications, particularly in machine learning (ML). In contrast, the fingerprint-based similarity measures studied in our work offer computational efficiency, enabling their use in ML-driven analyses. We appreciate the reference to your discussion on cosine similarity and will consider incorporating it into future revisions of our work. Similarly, we appreciate your comments on n-array comparisons. This approach appears promising for enhancing the computational efficiency of our KDE area ratio analysis. While our current work focuses on developing and validating the KDE area ratio method, integrating n-array comparisons presents a promising direction for future research. Finally, while we compare our KDE area ratio analysis to some existing similarity measure validation methods (Figure S11), further validation by future works is welcome. Regards, Rebekah