Efficient generation of open multi-stage fragmentation mass spectral libraries

10 October 2024, Version 2

Abstract

Untargeted analysis based on high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery, and exposomics, with compound identification remaining the major bottleneck. Currently, MS2 fragmentation data and spectral library matching are the standard workflow for confident compound annotation. Multi-stage fragmentation (MSn) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MSn data for reference compounds. Here, we describe a high-throughput method for acquiring MSn trees and an automated workflow for extracting and building open MSn libraries. By applying this pipeline to 37,829 small molecules, we obtained MSn spectra for 30,008 unique compound structures within 23 days. This resource includes 2,350,646 MSn spectra (merged and individual) and can be leveraged for compound annotation based on library matching, including substructures and training of machine learning models on substructure-fragmentation patterns. The workflow, implemented in mzmine and Python scripts, is open-source and freely available.

Keywords

MSnLib
Orbitrap ID-X
mzmine
MSn
Spectral library
metabolomics
small molecules
high-throughput
java
python

Supplementary materials

Title
Description
Actions
Title
Supplementary Information
Description
Supplementary information
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.