UmetaFlow: An untargeted metabolomics workflow for high-throughput data processing and analysis

Eftychia Eva Kontou; Axel Walter; Oliver Alka; Julianus Pfeuffer; Timo Sachsenberg; Omkar S. Mohite; Matin Nuhamunada; Oliver Kohlbacher; Tilmann Weber

doi:10.26434/chemrxiv-2022-z0t4g-v3

Analytical Chemistry

Search within Analytical Chemistry

UmetaFlow: An untargeted metabolomics workflow for high-throughput data processing and analysis

06 March 2023, Version 3

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC-MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets.

Keywords

Supplementary materials

Title

Description

Actions

Title

Additional File 1.

Description

Figure S1. A detailed overview of UmetaFlow. Table S1. Important instrument, method, and sample-specific parameters for UmetaFlow parameter optimization. Table S2. The optimal parameters for OpenMS (UmetaFlow) for feature detection, formula, and structural predictions of the in-house datasets. Table S3. Feature detection, structural and formula predictions for pyracrimycin A in Streptomyces sp. NBC 00162, Streptomyces sp. CA-210063 and Streptomyces eridani. Table S4. The optimal parameters for OpenMS (UmetaFlow) for feature detection, quantification, and marker selection of the MTBLS733 QE HF dataset. Table S5. Feature identification, quantification, and marker selection performance of different untargeted metabolomic data processing software using the benchmark dataset MTBLS733. Table S6. The optimal parameters for OpenMS (UmetaFlow) for feature detection, quantification, and marker selection of the MTBLS736 tripleTOF dataset. Table S7. Feature identification, quantification, and marker selection performance of different untargeted metabolomic data processing software using the benchmark dataset MTBLS736.

Actions

Title

Additional File 2.

Description

SI_Table_S8: All the raw in-house data were both manually analyzed and through UmetaFlow for method validation.

Actions

Title

Additional File 3.

Description

SI_Table_S9: Feature detection, structural and formula predictions for commercial standards germicidins A and B, kanamycin, tetracycline hydrochloride, thiostreptone, globomycin, ampicillin and apramycin.

Actions

Title

Additional File 4.

Description

SI_Table_S10: Feature detection, structural and formula predictions for kirromycin and desferrioxamine B from extracts of Streptomyces collinus Tü 365 and epemicins A and B from extracts of Kutzneria sp. CA-103260.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis

Eftychia E. Kontou, Axel Walter, Oliver Alka, Julianus Pfeuffer, Timo Sachsenberg, Omkar S. Mohite, Matin Nuhamunada, Oliver Kohlbacher, Tilmann Weber journal article

Journal of Cheminformatics , Volume 15, Issue 1

Online publication date: May 12, 2023

Version History

Mar 06, 2023 Version 3

Oct 20, 2022 Version 2

Oct 19, 2022 Version 1

Version Notes

In the newest version, we benchmarked the workflow with SCIEX tripleTOF data (MTBLS736) on top of the already existing QE ones. We improved the re-quantification algorithm run-time (a 70% time decrease), built a web-based Graphical-User Interface for entry-level users, and replaced the main tables with bar plots for easier interpretation. We clearly define the OS compatible with the two versions of UmetaFlow, as well as the most important parameters needed to be optimized by the users.

Metrics

2,674

1,243

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-z0t4g-v3

Funding

Novo Nordisk Fonden

NNF20CC0035580

Novo Nordisk Fonden

NNF16OC0021746

German Ministry for Research and Education (BMBF)

FKZ: 31A535A

Forschungscampus MODAL

3FO18501

Deutsche Forschungsgemeinschaft

TRR 261/1, Z03

Author’s competing interest statement

EEK, TW, OM, MN, OA and AW declare that they have no competing interests. OK and TS are principals of OpenMS LLC.

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

UmetaFlow: An untargeted metabolomics workflow for high-throughput data processing and analysis

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share