Efficient generation of open multi-stage fragmentation mass spectral libraries

Corinna Brungs; Robin Schmid; Steffen Heuckeroth; Aninda Mazumdar; Matúš  Drexler; Pavel Šácha; Pieter C. Dorrestein; Daniel Petras; Louis-Felix Nothias; Václav Veverka; Radim Nencka; Zdeněk Kameník; Tomáš Pluskal

doi:10.26434/chemrxiv-2024-l1tqh-v2

Analytical Chemistry

Search within Analytical Chemistry

Efficient generation of open multi-stage fragmentation mass spectral libraries

10 October 2024, Version 2

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Untargeted analysis based on high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery, and exposomics, with compound identification remaining the major bottleneck. Currently, MS2 fragmentation data and spectral library matching are the standard workflow for confident compound annotation. Multi-stage fragmentation (MSn) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MSn data for reference compounds. Here, we describe a high-throughput method for acquiring MSn trees and an automated workflow for extracting and building open MSn libraries. By applying this pipeline to 37,829 small molecules, we obtained MSn spectra for 30,008 unique compound structures within 23 days. This resource includes 2,350,646 MSn spectra (merged and individual) and can be leveraged for compound annotation based on library matching, including substructures and training of machine learning models on substructure-fragmentation patterns. The workflow, implemented in mzmine and Python scripts, is open-source and freely available.

Keywords

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Supplementary information

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 25, 2025 Version 3

Oct 10, 2024 Version 2

May 10, 2024 Version 1

Version Notes

Adding 3 new compound libraries, chemical space coverage

Metrics

3,932

2,073

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2024-l1tqh-v2

Author’s competing interest statement

P.C.D. is an advisor and equity holder in the companies Cybele and Sirenas. He also works as a science advisor for the company bileOmix, in which he holds equity, too. In addition, P.C.D. is a scientific co-founder, advisor, and equity holder of the companies Ometa, Enveda, and Arome, with prior approval by the University of California San Diego. He has also consulted for DSM Animal Health in 2023. T.P., S.H., and R.S. are co-founders of the company mzio GmbH, which develops technologies related to mass spectrometry data processing.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Efficient generation of open multi-stage fragmentation mass spectral libraries

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share