LSM1-MS2: A Foundation Model for MS/MS, Encompassing Chemical Property Predictions, Search and de novo Generation

Gabriel Asher; Mimoun Cadosh Delmar; Jennifer M. Campbell; Jack Geremia; Timothy Kassis

doi:10.26434/chemrxiv-2024-k06gb-v3

Analytical Chemistry

Search within Analytical Chemistry

LSM1-MS2: A Foundation Model for MS/MS, Encompassing Chemical Property Predictions, Search and de novo Generation

06 June 2024, Version 3

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We present LSM1-MS2, a pre-trained self-supervised foundation model designed for tandem mass spectrometry (MS/MS) utilizing a transformer architecture with custom tokenization for masked MS2 peak reconstruction. Our model is fine-tuned on smaller, labeled datasets for tasks such as compound property prediction, spectral matching, and de novo molecular generation. LSM1-MS2 demonstrates superior performance compared to traditional supervised models, achieving high accuracy with minimal labeled data. It outperforms conventional methods in database lookups and molecular query retrievals and shows promising results in the opening field of de novo molecular generation. The model's efficiency in spectral lookup tasks, with significantly reduced evaluation times, underscores its potential for large-scale applications. Our findings highlight the transformative capability of self-supervised pre-training in enhancing the predictive power of models for mass spectrometry, particularly in data-limited scenarios. The success of LSM1-MS2 in property prediction, database spectral lookup, and molecular generation paves the way for its application in metabolomics and drug discovery, facilitating robust and scalable analysis with reduced data requirements.

Keywords

MS2

Tandem Mass Spectrometry

Deep Learning

Machine Learning

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Predicted properties.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jun 06, 2024 Version 3

Mar 11, 2024 Version 2

Feb 14, 2024 Version 1

Version Notes

Added a section on de novo molecular generation, in addition to updated figures and results throughout the paper. Significant textual changes to improve clarity.

Metrics

3,418

1,677

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-k06gb-v3

Author’s competing interest statement

All authors are employees of Matterworks Inc. and hold shares in the company.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

LSM1-MS2: A Foundation Model for MS/MS, Encompassing Chemical Property Predictions, Search and de novo Generation

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share