ProteomicsML: An Online Platform for Community-Curated Datasets and Tutorials for Machine Learning in Proteomics

Tobias Rehfeldt; Ralf Gabriels; Robbin Bouwmeester; Siegfried Gessulat; Benjamin Neely; Magnus Palmblad; Yasset Perez-Riverol; Tobias Schmidt; Juan Antonio Vizcaíno; Eric Deutsch

doi:10.26434/chemrxiv-2022-2s6kx

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

ProteomicsML: An Online Platform for Community-Curated Datasets and Tutorials for Machine Learning in Proteomics

05 October 2022, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Dataset acquisition and curation are often the hardest and most time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based LC-IM-MS datasets, due to the high-throughput data structure with high levels of noise and complexity between raw and machine learning-ready formats. While predictive proteomics is a field on the rise, when predicting peptide behavior in LC-IM-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based datasets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides datasets that are useful for comparing state-of-the-art (SOTA) machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available on https://www.proteomicsml.org/ and we welcome the entire proteomics community to contribute to the project at https://github.com/proteomicsml/.

Keywords

Supplementary materials

Title

Description

Actions

Title

Supplementary Table 1

Description

Proteomics ML publications along with links to the ProteomeXchange datasets used for training or testing.

Actions

Title

Supplementary Table 2

Description

Public ProteomeXchange datasets that have been used for ML training or benchmarking.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Oct 05, 2022 Version 1

Metrics

1,355

451

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-2s6kx

Author’s competing interest statement

Tobias Schmidt and Siegfried Gessulat are employees of MSAID. MSAID makes machine learning-based software modules that are sold as part of Proteome Discoverer and also offers contract research.

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

ProteomicsML: An Online Platform for Community-Curated Datasets and Tutorials for Machine Learning in Proteomics

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share