Abstract
The development of DNA-Encoded Library (DEL) technology introduced new challenges for the analysis of chemical libraries. Unlike classical HTS libraries, the DEL composition cannot be modified once synthesized and, therefore, it must be considered as a stand-alone chemoinformatic object represented both as a collection of independent molecules, and yet an individual entity. For the analysis of such collections, the concept of the Chemical Library Space (CLS) where entire libraries become objects is indispensable. In this article, we introduce, analyze and compare four vectorial library representations obtained using Generative Topographic Mapping (GTM), formally defining CLS. Such representations allow effective comparison of libraries, with the ability to tune and chemically interpret the similarity relationships. In particular, property-tuned CLS encodings enable to simultaneously compare libraries with respect to both property and chemotype distribution. We apply the various CLS encodings for the selection problem of DELs that optimally match a reference collection (here ChEMBL28), showing how the choice of the CLS descriptors may help to fine-tune the matching (overlap) criteria. The proposed CLS encodings are a new efficient way for polyvalent analysis of the space of thousands of chemical libraries.