Abstract
DNA-encoded libraries (DELs) provide the means to make and screen millions of diverse compounds against a target of interest in a single experiment. However, despite producing large volumes of binding data at a relatively low cost, the DEL selection process is susceptible to noise, necessitating computational follow-up to increase signal-to-noise ratios. In this work, we present a set of informatics tools to analyze DEL selection data so that subsequent DEL screens probe productive regions of chemical space. Our approach segments DEL data at the individual building block level to identify productive building blocks in a library. We show how similar building blocks have a similar probability of binding, which we then employ to predict the behavior of untested building blocks. Lastly, we build a model from the inference that the combined behavior of individual building blocks is predictive of the activity of an overall compound. We report a performance of more than an order of magnitude greater than random guessing on a holdout set, demonstrating that our model can serve as a baseline for comparison against other machine learning models on DEL data.
Supplementary materials
Title
Supporting Information: Building Block-Based Binding Predictions for DNA-Encoded Libraries
Description
The Supporting Information includes additional methods on how we constructed the HDBSCAN loss function and how we generated the DEL selection data. We also include additional data on hyperparameter optimization, evaluation of the method using 2D Tanimoto similarity and data tables for the figures presented in the main text.
Actions
Title
Compiled dataset of sEH binders and non-binders
Description
Data used to perform analysis in the paper.
Actions
Supplementary weblinks
Title
DEL analysis
Description
GitHub repository containing all associated code and scripts.
Actions
View