Abstract
In the pursuit of improved compound identification and database search tasks, this study explores Heteronuclear Single Quantum Coherence (HSQC) spectra simulation and matching methodologies. HSQC spectra serve as unique molecular fingerprints, enabling a valuable balance of data collection time and information richness. We conducted a comprehensive evaluation of four HSQC simulation techniques: ACD-Labs (ACD), MestReNova (MNova), Gaussian NMR calculations (DFT), and a graph-based neural network (ML). For with the latter two techniques, we developed a reconstruction logic to combine proton and carbon 1D spectra into HSQC spectra.
The methodology involved the implementation of three peak-matching strategies (Minimum-Sum, Euclidean-Distance, and Hungarian-Distance) combined with three padding strategies (zero-padding, peak-truncated, and nearest-neighbor double assignment). We found that coupling these strategies with a robust simulation technique facilitates the accurate identification of correct molecules from similar analogues (regio- and stereoisomers) and allows for fast and accurate large database searches. Furthermore, we demonstrated the efficacy of the best-performing methodology by rectifying the structures of a set of previously misidentified molecules.
This research indicates that effective HSQC spectra simulation and matching methodologies significantly facilitate molecular structure elucidation. Furthermore, we offer a Google Colab notebook for researchers to use our methods on their own data.
Supplementary weblinks
Title
HSQC Structure Elucidation
Description
Github: Google Colab implementation of the methodology presented in the publication
Actions
View