Inverse design of viral infectivity-enhancing peptide fibrils from continuous protein-vector embeddings

07 March 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Amyloid-like nanofibers from self-assembling peptides can promote viral gene transfer for therapeutic applications. Traditionally, new sequences are discovered either from screening large libraries or by creating derivatives of known active peptides. However, the discovery of de novo peptides, which are sequence-wise not related to any known active peptides, is limited by the difficulty to rationally predict structureactivity relationships because their activities typically have multi-scale and multi-parameter dependencies. Here, we used a small library of 163 peptides to predict de novo sequences for viral infectivity enhancement using a machine learning (ML) approach based on natural language processing. Specifically, we trained an ML model using continuous vector representations of the peptides, which were previously shown to retain relevant information embedded in the sequences. We used the trained ML model to sample the sequence space of peptides with 6 amino acids to identify promising candidates. These 6-mers were then further screened for charge and aggregation propensity. The resulting 16 new 6-mers were tested and found to be active with a 25% hit rate. Strikingly, these de novo sequences are the shortest active peptides for infectivity enhancement reported so far and show no sequence relation to the training set. Moreover, by screening the chemical space, we discovered the first hydrophobic peptide fibrils with a moderately negative surface charge that can enhance infectivity. Hence, this ML strategy is a time- and cost-efficient way for expanding the chemical space of short functional self-assembling peptides exemplified for therapeutic viral gene delivery.

Keywords

Self-assembling peptides
amyloid fibril
retroviral transduction enhancer
chemical space
sequence prediction
machine learning

Supplementary materials

Title
Description
Actions
Title
Inverse design of viral infectivity-enhancing peptide fibrils from continuous protein-vector embeddings
Description
The Supporting Information contains 17 pages, Section 1 – 11 with Figures S1–S9, Table S1 and S2. Section 1 provides further information on the trained regression model, Section 2 and 3 describes the selection process based on aggregation prediction and N-gram similarity, Section 4 evaluates the property-activity correlation of the de novo peptides and the training set. Section 5-8 summarizes detailed experimental data, TEM, infection rates, cell-viability, and FT-IR spectra. The supporting Table S1 summarizes the infection data and predicted and experimental aggregation behavior of the training set, Table S2 summarizes the complete physicochemical properties of the de novo peptides
Actions
Title
Table S3
Description
Top 12320 sequences from Monte Carlo ProtVec LASSO model screening with information on predicted infectivity, hydrophobicity, and net charge.
Actions
Title
Table S4
Description
op 3669 peptides with a net positive charge with information on aggregation prediction results from Aggrescan, APPNN, and PATH.
Actions
Title
Table S5
Description
N-gram similarity matrix composed of top 3669 peptides and 163 peptides from the training set
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.