Building Block-Based Binding Predictions for DNA-Encoded Libraries

17 April 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

DNA-encoded libraries (DELs) provide the means to make and screen millions of diverse compounds against a target of interest in a single experiment. However, despite producing large volumes of binding data at a relatively low cost, the DEL selection process is susceptible to noise, necessitating computational follow-up to increase signal-to-noise ratios. In this work, we present a set of informatics tools to analyze DEL selection data so that subsequent DEL screens probe productive regions of chemical space. Our approach segments DEL data at the individual building block level to identify productive building blocks in a library. We show how similar building blocks have a similar probability of binding, which we then employ to predict the behavior of untested building blocks. Lastly, we build a model from the inference that the combined behavior of individual building blocks is predictive of the activity of an overall compound. We report a performance of more than an order of magnitude greater than random guessing on a holdout set, demonstrating that our model can serve as a baseline for comparison against other machine learning models on DEL data.

Keywords

DNA encoded libraries
drug discovery
clustering
open-source
methods
combinatorial chemistry
dimensionality reduction
chemical similarity

Supplementary materials

Title
Description
Actions
Title
Supporting Information: Building Block-Based Binding Predictions for DNA-Encoded Libraries
Description
The Supporting Information includes additional methods on how we constructed the HDBSCAN loss function and how we generated the DEL selection data. We also include additional data on hyperparameter optimization, evaluation of the method using 2D Tanimoto similarity and data tables for the figures presented in the main text.
Actions
Title
Compiled dataset of sEH binders and non-binders
Description
Data used to perform analysis in the paper.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.