Data Driven Estimation of Molecular Log-Likelihood using Fingerprint Key Counting

10 April 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Chemical similarity between two molecules finds widespread use in drug discovery and material science, being utilized for similarity search, toxicological assessment, and as a foundation for QSAR models. This study describes models for the estimation of the log-likelihood for a given molecule to belong to a specific dataset, representing a form of similarity between a single molecule and a given dataset. Two different models are derived based on simple counting of fingerprint keys in the molecule and collected statistics for the total number of observations in the dataset. The AtomLL model is shown to be useful for detecting outliers with unusual keys and demonstrates the greatest baseline performance for class membership assignment. The MolLL model can detect outliers with an unusual number of repeats and is also beneficial for keeping de novo molecular generation and optimization in scope. Their performance is compared to a kernel density estimator model based on molecular descriptors. The model code and some precomputed models are available as open source on GitHub.

Supplementary materials

Title
Description
Actions
Title
Supplementary Plots
Description
Supplementary Info for Data Driven Estimation of Molecular Log-Likelihood using Fingerprint Key Counting
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.