Combining Bayesian optimization with sequence- or structure-based strategies for optimization of protein-peptide binding

17 April 2024, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

This study introduces a novel Bayesian Optimization (BO) method to support the design and optimization of bioactive peptide sequences in the context of a fully automated closed-loop Design-Make-Test (DMT) pipeline. Using the major histocompatibility complex class I receptor system as test case, we showed that BO is capable to efficiently navigate vast sequence spaces. Starting from a single peptide-lead sequence in the $\mu$M IC50 range, the method is able to optimize a peptide sequence to its optimal binding affinity in less than 5 DMT cycles, with 96 peptide sequences per batch. We extensively evaluated its performance, in various conditions and with different parameters, providing valuable insights for peptide optimization tasks in future closed-loop DMT environments. Different sequence- and structure-based initialization strategies were also tested, to generate the initial batch of peptide sequences, as well as different molecular fingerprints and protein language models. Additionally, the method developed here can natively handle various peptide sequence lengths and scaffolds (e.g. macrocycles) and support any arbitrary non-standard amino acids or residue modifications. The source code of our method, Mobius, is publicly available under the Apache license at https://git.scicore.unibas.ch/schwede/mobius.

Keywords

Bayesian optimization
Closed-loop DMT platform
Marjor Histocompatibility complex
Peptide
Active learning

Supplementary materials

Title
Description
Actions
Title
Supplementary materials: Combining Bayesian optimization with sequence- or structure-based strategies for optimization of peptide-protein binding
Description
Supplementary figures showing the different sequence-based strategies, correlations between experimental pIC50 and Gaussian Process Regression model and also with Rosetta and FoldX, comparison between different sequence descriptors and fingerprint methods, plots showing the evolution of the pIC50 during the DMT optimization process using different initialization strategies.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.