Abstract
Structural dereplication is an essential step of the studies of natural products (NPs). The number of found NPs is so large that efficient dereplication is highly desirable. NMR spectroscopy is still the gold standard of structural identification. 13C NMR spectra is an effective molecular fingerprint but its acquisition is time-consuming, especially for mass-limited NPs. Several alternative meth-ods or tools have been proposed but never reached general use for some reasons. Here, a new artificial intelligence tool using con-trastive learning between 1H-13C HSQC spectra and structures, HSQCid, is proposed for effective structural identification. Two structure encoders are compared and Graph neural network is preferred over Transformer. In this way, 80% and 20% of about 400K predicted data could be used for training and testing, respectively. Besides, with 18K experimental data as external test data, top-1 and top-5 accuracy reaches 74.9% and 92.2%, respectively. Top-1 accuracy increases by at least 12% when combined with other easily obtainable structure features, such as total number of hydrogens connected to carbons from 1H NMR spectra. Further data analysis shows that the filters by structure features nearly eliminate the influence (>10%) of the difference between predicted and experimental data. Surprisingly the influence of the number or the ratio of quaternary carbons on the identification accuracy is only significant in specific and rare cases (less than 3%). Furthermore, benchmark method by matching 13C peaks is compared and markedly inferior to the proposed method. HSQCid will be available online in the near future for free public use. It is believed that HSQCid contributes to paving the way to high throughput or highly effective structural dereplication for NPs
Supplementary materials
Title
supporting information 1
Description
Data sources and quality evaluation; models by contrastive; peak matching methods; Data partitioning of traditional Chinese medicine related natural products
Actions
Title
supporting information 2
Description
Chemical structure classes of Traditional Chinese medicine related natural products
Actions