GASP: A pan-specific predictor of family 1 glycosyltransferase specificity enabled by a pipeline for substrate feature generation and large-scale experimental screening

David Harding-Larsen; Christian Degnbol Madsen; David Teze; Tiia Kittilä; Mads Rosander Langhorn; Hani Gharabli; Mandy Hobusch; Felipe Mejia Otalvaro; Onur Kırtel; Gonzalo Nahuel Bidart; Stanislav Mazurenko; Evelyn Travnik; Ditte Hededam Welner

doi:10.26434/chemrxiv-2023-pr9ck

Catalysis

Search within Catalysis

GASP: A pan-specific predictor of family 1 glycosyltransferase specificity enabled by a pipeline for substrate feature generation and large-scale experimental screening

03 November 2023, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the Glycosyltransferase Acceptor Specificity Predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. GASP is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). GASP had an 83% hit rate in a comparative case study for the glycosylation of the anti-helminth drug niclosamide, significantly outperforming a hit rate of 53% from a random selection assay. However, it was unable to compete with a hit rate of 83% for the glycosylation of the plant defensive compound DIBOA using expert-selected enzymes, with GASP achieving a hit rate of 50%. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a ready-to-use GT1:acceptor predictor which in addition can be trained on other datasets enabled by the automated feature generation pipelines.

Keywords

Glycosyltransferases

Machine Learning

High-throughput Assay

Enzyme Specificity

Supplementary materials

Title

Description

Actions

Title

Appendix 1

Description

Name, CID, and structure of acceptors used in NADH coupled enzyme assay to generate the independent in-house dataset.

Actions

Title

SI for "GASP: A pan-specific predictor of family 1 glycosyltransferase specificity enabled by a pipeline for substrate feature generation and large-scale experimental screening"

Description

Supporting figures and tables for experimental and computational studies.

Actions

Title

dataset1

Description

In-house dataset used to test performance of GASP. Contains enzyme information, substrate information, rate, and reaction classification.

Actions

Supplementary weblinks

Title

Description

Actions

Title

GitHub repository for GASP code

Description

GitHub repository for all code related to the installation, training, optimisation, application, and validation of the GASP model.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

GASP: A Pan-Specific Predictor of Family 1 Glycosyltransferase Acceptor Specificity Enabled by a Pipeline for Substrate Feature Generation and Large-Scale Experimental Screening

David Harding-Larsen, Christian Degnbol Madsen, David Teze, Tiia Kittilä, Mads Rosander Langhorn, Hani Gharabli, Mandy Hobusch, Felipe Mejia Otalvaro, Onur Kırtel, Gonzalo Nahuel Bidart, Stanislav Mazurenko, Evelyn Travnik, Ditte Hededam Welner journal article

ACS Omega , Volume 9, Issue 25

Online publication date: Jun 11, 2024

Version History

Nov 03, 2023 Version 1

Metrics

703

312

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2023-pr9ck

Funding

Novo Nordisk Fonden

NNF18OC0034744, NNF10CC1016517, and NNF20CC0035580

Czech Ministry of Education, Youth and Sports

CETOCOEN Excellence CZ.02.1.01/0.0/0.0/17_043/0009632, ESFRI RECETOX RI LM2023069, ESFRI ELIXIR LM2023055

Horizon 2020

CETOCOEN EXCELLENCE No 857560

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

GASP: A pan-specific predictor of family 1 glycosyltransferase specificity enabled by a pipeline for substrate feature generation and large-scale experimental screening

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share