A Benchmark Set of Bioactive Molecules for Diversity Analysis of Compound Libraries and Combinatorial Chemical Spaces

Alexander Neumann; Raphael Klein

doi:10.26434/chemrxiv-2025-vzjw3

Abstract

Sources for commercially available compounds have been experiencing continuous growth for several years, reaching their peak in billion- to trillion-sized combinatorial Chemical Spaces. In order to assess the quality of a compound collection to provide relevant chemistry, a benchmark set of pharmaceutically relevant structures is required that enables an unbiased comparison. For this purpose, the CHEMBL database was mined for molecules displaying biological activity, and three benchmark sets of successive orders of magnitude were created by systematic filtering and processing: Set L (‘large-sized’, 379k), Set M (‘medium-sized’, 25k), and Set S (‘small-sized’, 3k). Tailored for broad coverage of the physicochemical and topological landscape, the benchmark Set S was then employed to analyze the chemical diversity capacities of commercial combinatorial Chemical Spaces and enumerated compound libraries. Among the three utilized search methods—FTrees (pharmacophore features), SpaceLight (molecular fingerprints), and SpaceMACS (maximum common substructure)—the eXplore and REAL Space consistently performed best. In general, each Chemical Space was able to provide a larger number of compounds more similar to the respective query molecule than the enumerated libraries, while also individually offering unique scaffolds for each method.

Keywords

benchmark set

chemical diversity

Chemical Spaces

combinatorial chemistry

commercial compounds

LBDD

ultra-large chemical libraries

virtual screening

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Supporting figures and tables.

Actions

Title

Benchmark Set L [379k]

Description

Benchmark set L 'large-sized' featuring 379k molecules.

Actions

Title

Benchmark Set M [25k]

Description

Benchmark set M 'medium-sized' featuring 25k molecules.

Actions

Title

Benchmark Set S [3k]

Description

Benchmark Set S 'small-sized' featuring 2.9k molecules.

Actions

A Benchmark Set of Bioactive Molecules for Diversity Analysis of Compound Libraries and Combinatorial Chemical Spaces

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share

A Benchmark Set of Bioactive Molecules for Diversity Analysis of Compound Libraries and Combinatorial Chemical Spaces

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share