Abstract
Untargeted analysis based on high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery, and exposomics, with compound identification remaining the major bottleneck. Currently, MS2 fragmentation data and spectral library matching are the standard workflow for confident compound annotation. Multi-stage fragmentation (MSn) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MSn data for reference compounds. Here, we describe a high-throughput method for acquiring MSn trees and an automated workflow for extracting and building open MSn libraries. By applying this pipeline to 37,829 small molecules, we obtained MSn spectra for 30,008 unique compound structures within 23 days. This resource includes 2,350,646 MSn spectra (merged and individual) and can be leveraged for compound annotation based on library matching, including substructures and training of machine learning models on substructure-fragmentation patterns. The workflow, implemented in mzmine and Python scripts, is open-source and freely available.
Supplementary materials
Title
Supplementary Information
Description
Supplementary information
Actions