TransPTM: a Transformer-Based Model for Non-Histone Acetylation Site Prediction

Lingkuan Meng; Xingjian Chen; Ke Cheng; Nanjun Chen; Zetian Zheng; Fuzhou Wang; Hongyan  Sun; Ka-Chun Wong

doi:10.26434/chemrxiv-2023-txhw5

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

TransPTM: a Transformer-Based Model for Non-Histone Acetylation Site Prediction

04 October 2023, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) for its sig- nificant roles across a myriad of biological processes. Although many computationl tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address those problems, we have contributed to both dataset creation and predictor benchmark in this study. Firstly, we construct a non-histone acetylation site bench- mark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose a transformer-based neural network model, TransPTM, for non-histone acetylation site predication. Our model introduces a pre-trained protein language model ProtT5 to con- struct the site’s feature space. The GNN framewrk consists of three TransformerConv layers for feature extraction and a multilayer perceptron (MLP) module for classification. In experiments, TransPTM has the competitive performance for non-histone acetylation site prediction over 3 SOTA tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM.

Keywords

Non-histone acetylation

Deep learning

Transformer

Protein language model

Supplementary weblinks

Title

Description

Actions

Title

TransPTM

Description

source code and dataset

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Oct 04, 2023 Version 1

Metrics

743

367

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2023-txhw5

Funding

National Natural Science Foundation of China

32170654

National Natural Science Foundation of China

32000464

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

TransPTM: a Transformer-Based Model for Non-Histone Acetylation Site Prediction

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share