Abstract
In non-targeted analysis, spectral matching is one of the most commonly used approaches for annotation or identification of the chemicals of emerging concern in complex samples via high-resolution mass spectrometry (HRMS). Conventional confidence assessment systems struggle to resolve multiple hits during library searches, resulting in ambiguities in identification. In this study, a combination of information extracted from the MS/MS spectra and calibrate-free predicted retention indices (RIs) yielded the probability of true positive (TP) for each hit by machine learning (ML), including a pre-trained molecular fingerprint (MF)-to-retention index (RI) model, a cumulative neutral loss (CNL)-to-RI model trained by 693,681 spectra, and a binary classification model that incorporated 5 significant features from the universal library search algorithm and the difference between the predicted RIs from both RI models. Our results demonstrated a high correlation (R2 = 0.96 and 0.88) between MF-derived and CNL-derived RI values for the model training and testing datasets, respectively, suggesting reduced RI error for TP annotations. When applied to pesticides-spiked samples, the k-nearest neighbors algorithm achieved a weighted F1 score of 0.65 and a Matthews correlation coefficient of 0.30. Integrating ML with reference spectral match enhanced the identification probability of exposome by half in a proof-of-concept study in pesticide contaminants in black tea. This study highlights the potential of integrating ML with reference library matching to enhance the identification probability of contaminants.
Supplementary materials
Title
Supplementary Information for Integration of Transferable Prediction of Retention Index and Universal Library Search Enhances Exposome Identification Probability in RPLC/HRMS-Based Non-Targeted Analysis
Description
This file includes:
Supporting Figures S01–15
Supporting Table S01–04
Actions
Supplementary weblinks
Title
Available Code for Integration of Transferable Prediction of Retention Index and Universal Library Search Enhances Exposome Identification Probability in RPLC/HRMS-Based Non-Targeted Analysis
Description
Storing all the scripts used for this work
Actions
View Title
Available Models for Integration of Transferable Prediction of Retention Index and Universal Library Search Enhances Exposome Identification Probability in RPLC/HRMS-Based Non-Targeted Analysis
Description
Storing all the models attained in this work
Actions
View