Abstract
Transition metal complexes (TMCs) play a key role in several areas of high interest, including medicinal chemistry, renewable energies, and nanoporous materials. The development of new TMCs enabling these technologies remains challenged by the need of optimizing multiple properties within large chemical spaces, in which the thirty transition metals of the periodic table can be combined with a virtually infinite number of ligands. In this work, we provide an open dataset, tmQMg-L, including a collection of 30K TMC ligands from the Cambridge Structural Database. tmQMg-L combines size, diversity, and synthesizability at an unprecedented scale. Each ligand is characterized by geometric information and a rich fingerprint including electronic and steric features. The ligand charge and metal-coordination mode were also assigned with a robust algorithm based on graphs and natural bond orbital theory. The tmQMg-L dataset was leveraged in the automated generation of 1.37M TMCs resulting from all possible combinations between a square planar palladium(II) scaffold and a pool of 50 different ligands. This TMC space was explored with a multiobjective genetic algorithm (MOGA) that optimized two properties over a Pareto front; namely the polarizability (alpha) and the HOMO-LUMO gap (epsilon). After exploring only 1% of the whole space (i.e. 13k TMCs), the MOGA yielded 130 diverse hits with maximal alpha and epsilon values. Despite the size of the space explored, the evolution of the hits was easily rationalized by analyzing how the MOGA picked ligands of different natures. Instead of the traditional mutation and crossover of fragments within a single ligand, the MOGA of this work implemented full-ligand genetic operations acting on all coordination sites, enforcing chemical diversity across all populations, including the last one containing the hits. We believe that the combined use of the tmQMg-L dataset with this MOGA algorithm will enable the discovery of TMCs with optimal properties within diverse and vast chemical spaces.
Supplementary materials
Title
Supporting Information
Description
The Supporting Information provides further details about the curation of the tmQMg-L dataset, the generation of the 1.37M chemical space, the chemoinformatics descriptors, and the MOGA algorithm.
Actions