Abstract
Purchasable chemical space has grown rapidly into the tens of billions of molecules providing unprecedented opportunities for ligand discovery, but also straining the tools that might exploit these molecules at scale. We have therefore developed ZINC-22, a database of commercially accessible small molecules derived from multi-billion-scale make-on-demand libraries. The new database and tools enable analog searching in this vast new space via a facile GUI, CartBlanche, drawing on similarity methods that scale sub-linearly in the number of molecules. The new library also uses data organization methods enabling rapid lookup of molecules and their physical properties, including conformations, partial atomic charges, cLogP values, and solvation energies, all crucial for molecule docking, which had become slow with older database organizations in previous versions of ZINC. As the libraries have continued to grow, we have been interested if molecular diversity has suffered, for instance, because certain scaffolds have come to dominate via easy analoging. This has not occurred thus far, and chemical diversity continues to grow with database size, with a log increase in Bemis-Murcko scaffolds for every two logs increase in database size. Most new scaffolds come from compounds with the highest heavy atom count. Finally, we consider the implications for databases like ZINC as the libraries grow towards and beyond the trillion-molecule range. ZINC is freely available to everyone and may be accessed at cartblanche22.docking.org, via Globus, and in the Amazon AWS and Oracle OCI clouds.
Supplementary materials
Title
Supporting information for ZINC-22 paper
Description
S0. Access to databases to prevent molecules becoming unpatentable
S1. Source catalog contributions to ZINC-22
S2. Sharding script
S3. ZINC-22 numbering
S4. Software and Hardware overview
S5. Sn system overview
S6. Sb system overview
S7. Common Database Schema Overview
S8. Important management scripts for ZINC-22
Actions