Abstract
Antimicrobial peptides (AMPs) are small bioactive chemicals that have appeared as promising compounds to treat a wide range of diseases. The effectiveness of AMPs resides in the wide range of mechanisms they can use for both killing microbes and modulating immune responses. However, the AMPs’ chemical space (AMPCS) is huge, it is estimated that there exist more than 1065 unique sequences of peptides with 50 residues or fewer, which represent a big challenge for the discovery of new promising sequences and the identification of common features, motifs, or relevant biological functions shared by these peptides. Therefore, we present a new approach based on network science and similarity searching to discover new potential AMPs, specifically antiparasitic peptides (APPs). We have taken advantage of network-based representation of APPs’ chemical space (APPCS) to retrieve valuable information, using three types of networks: chemical space (CSN), half-space proximal (HSPN), and metadata (METN). Some centrality measures were applied to identify the most important and non-redundant nodes, and these peptides were taken as queries (Qs) against the graph database starPepDB to discover new potential APPs with similarity searching by group fusion (MAX-SIM rule) models. We evaluated the multi-query similarity searching models (mQSSMs) performance with five benchmarking data sets of APP/non-APPs. It can be stated that the predictions performed by the best mQSSMs present a strong-to-very strong predictive agreement since their external Matthews correlation coefficient (MCC) values ranged from 0.834 to 0.965. Outstanding outcomes were attained by the mQSSM with 219 Qs from both networks CSN and HSPN (219Q_0.5_HB-HC-Singletons_CSN-HSPN) and by using 0.5 as similarity threshold, with MCC values greater than 0.85 in external datasets. Then, we compared the performance metrics of our mQSSMs with APPs prediction servers AMPDiscover and AMPFun. The model proposed in this report outperformed the machine learning approaches with statistically significant differences, showing the enormous potential of this method. After applying our method and additional filters, we proposed 95 repurposed leads as potential APPs, which have not been associated with this activity until now. In addition, we explored sequence similarities and motifs shared by these peptides, which can serve as templates for searching and designing new promising APPs. The analyses show that the similarity models proposed in this study could contribute to identifying APPs with high effectivity and reliability. Our models and pipeline are freely available through the starPep toolbox software at http://mobiosd-hub.com/starpep.
Supplementary materials
Title
The Supporting Information is available free of charge at Zenodo: https://doi.org/10.5281/zenodo.5650160.
Description
Supporting information 1 (SI1) contains Fasta files. Supporting information 2 (SI2) contains an MS word file with Tables.
Supporting information 3 (SI3) has graphml files Supporting information 4 (SI4) is an excel file
Supporting information 5 (SI5) has FASTA files
Supporting information 6 (SI6) contains excel files
Supporting information 7 (SI7) has 3 kinds of files, namely SI7-A contains 3 folders with original results for each mQSSMs generated, as well excel file with statistical parameters. SI7-B is an excel file with the performance parameter of the best 21 mQSSMs proposed as well as the ranking of these models. Finally, SI7-C and SI7-D are pdf files with results of multiple comparisons of our mQSSMs and with literature algorithms, respectively. Supporting information 8 (SI8) contains FASTA files Supporting information 9 (SI9) contains a PowerPoint file.
Actions