Abstract
We present MultiModalTransformer (MMT), a novel deep learning architecture that directly predicts molecular structures from diverse spectroscopic data (1H-NMR, 13C-NMR, HSQC, COSY, IR, and mass spectrometry (MS). Utilizing a modified Transformer model with attention mechanisms, the MMT simultaneously processes multiple data modalities to focus on the most relevant spectral features. Our approach demonstrates significant advancements in automated structure determination, achieving up to 94% correct identifications for real experimental samples despite being trained solely on simulated spectra. To address the challenges of vast chemical space and limited experimental data we introduce an innovative improvement cycle that allows MMT to adapt to new chemical spaces. The model's robustness is evidenced by its ability to maintain substantial predictive power even when starting with slightly incorrect molecular structures, identifying 56% of experimental molecules correctly from modified initial guesses. MMT provides explainable predictions through token-based analysis, offering insights into its decision-making process. We also present a user-friendly GUI that integrates the full improvement cycle workflow, facilitating practical application in chemistry laboratories. By leveraging diverse spectral inputs and adaptive learning techniques, MMT represents a significant step towards fully automated structure elucidation, potentially accelerating drug discovery and natural product research while demonstrating that comprehensive chemical space coverage in training data is more critical than precise spectral accuracy.
Supplementary materials
Title
Enhancing Molecular Structure Elucidation: MultiModalTransformer for both simulated and experimental spectra
Description
Supporting Information: Enhancing Molecular Structure Elucidation: MultiModalTransformer for both simulated and experimental spectra
Actions
Supplementary weblinks
Title
MultiModalTransformer
Description
MultiModalTransformer is a transformer-based architecture that integrates various spectroscopic modalities (NMR, HSQC, COSY, IR) for automated molecular structure prediction, complete with a data generation pipeline and user-friendly HTML interface.
Actions
View Title
Datasets for MulitModalTransformer project
Description
This folder contains all the necessary data related to the publication:
"Enhancing Molecular Structure Elucidation with the MultiModalTransformer for both simulated and experimental spectra"
Actions
View