Abstract
Machine learning has experienced a drastic rise in interest and applications in all fields of chemistry, enabling researchers to leverage large chemical datasets to gain novel insights. The success of machine learning-driven projects in chemistry hinges on three key factors: access to robust and comprehensive datasets, a well-defined objective, and effective molecular representations that convert chemical structures into machine-readable formats. Transition metal complexes have lagged behind their organic counterparts on all three of these avenues. The large diversity of structures, coordination numbers and modes have made its translation to a machine-readable format an ongoing challenge. Here we introduce ELECTRUM, an electron configuration-based universal metal fingerprint for transition metal compounds. Its lightweight implementation enables the straightforward conversion of any transition metal complex into a simple fingerprint. Utilising a novel dataset generated from the Cambridge Structural Database (CSD), we demonstrate that ELECTRUM effectively captures the structural diversity of transition metal complexes. By plotting nearest-neighbor relationships in ELECTRUM space, we reveal meaningful clustering in two-dimensional representations. Furthermore, we use the ELECTRUM encoding to train machine learning models on the prediction of metal complex coordination numbers from ligand structures and metal identity alone. We show that on a subset of this data, we can train models to predict the oxidation state of metal complexes. These case studies showcase the potential of ELECTRUM as an easy-to-implement fingerprint for metal complexes. We rely on the community to further test, validate, and improve it.
Supplementary weblinks
Title
ELECTRUM Github Repository
Description
Contains all the code and data utilised in this work
Actions
View