A Benchmark Set of Bioactive Molecules for Diversity Analysis of Compound Libraries and Combinatorial Chemical Spaces

03 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Sources for commercially available compounds have been experiencing continuous growth for several years, reaching their peak in billion- to trillion-sized combinatorial Chemical Spaces. In order to assess the quality of a compound collection to provide relevant chemistry, a benchmark set of pharmaceutically relevant structures is required that enables an unbiased comparison. For this purpose, the CHEMBL database was mined for molecules displaying biological activity, and three benchmark sets of successive orders of magnitude were created by systematic filtering and processing: Set L (‘large-sized’, 379k), Set M (‘medium-sized’, 25k), and Set S (‘small-sized’, 3k). Tailored for broad coverage of the physicochemical and topological landscape, the benchmark Set S was then employed to analyze the chemical diversity capacities of commercial combinatorial Chemical Spaces and enumerated compound libraries. Among the three utilized search methods—FTrees (pharmacophore features), SpaceLight (molecular fingerprints), and SpaceMACS (maximum common substructure)—the eXplore and REAL Space consistently performed best. In general, each Chemical Space was able to provide a larger number of compounds more similar to the respective query molecule than the enumerated libraries, while also individually offering unique scaffolds for each method.

Keywords

benchmark set
chemical diversity
Chemical Spaces
combinatorial chemistry
commercial compounds
LBDD
ultra-large chemical libraries
virtual screening

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Supporting figures and tables.
Actions
Title
Benchmark Set L [379k]
Description
Benchmark set L 'large-sized' featuring 379k molecules.
Actions
Title
Benchmark Set M [25k]
Description
Benchmark set M 'medium-sized' featuring 25k molecules.
Actions
Title
Benchmark Set S [3k]
Description
Benchmark Set S 'small-sized' featuring 2.9k molecules.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.