Abstract
Untargeted analysis based on high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery, and exposomics, with compound identification remaining the major bottleneck. Currently, MS2 fragmentation data and spectral library matching are the standard workflow for confident compound annotation. Multi-stage fragmentation (MSn) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MSn data for reference compounds. Here, we describe a high-throughput method for acquiring MSn trees and an automated workflow for extracting and building open MSn libraries. By applying this pipeline to ~ 20,600 small molecules, we obtained MSn spectra for 16,391 unique compound structures in twelve days. This resource includes 1,126,997 MSn spectra and can be leveraged for compound annotation based on library matching, including substructures and training of machine learning models on substructure-fragmentation patterns. The workflow, implemented in mzmine and Python scripts, is open-source and freely available to anyone interested.
Supplementary materials
Title
Supplementary Information
Description
Supplementary information
Actions