Abstract
DNA-Encoded chemical Libraries (DELs) enable highly efficient screening of billions of small molecules for binding to a target of interest and can provide valuable training data for use in machine learning models for virtual screening. However, DEL screening data is notoriously noisy due to variance in the synthetic yield of library members, the DNA amplification and sequencing process, the influence of the DNA tags on binding, and other factors. Here we show an analysis from a split-sample DEL screening strategy against Bruton’s tyrosine kinase (BTK) which includes a panel of affinity selections against the target protein at varying concentrations and a probabilistic model to estimate both the binding affinity and relative input concentrations of library members. We evaluated the model by comparing model predictions to SPR measurements of resynthesized compounds conjugated to DNA and found that this methodology yielded improved ranking of library members by binding affinity compared to enrichment metrics alone. Additionally, the method successfully recovered a library member with potent binding affinity that would not have been detected in a standard DEL selection.
Supplementary materials
Title
Supplementary Information
Description
Supplementary figures, SPR and LCMS data
Actions