Abstract
Peptide aggregation is a long-standing challenge in chemical peptide synthesis, limiting its efficiency and reliability. Although data-driven methods have enhanced our understanding of many sequence-based phenomena, no comprehensive approach addresses so-called “non-random difficult couplings” (generally linked to aggregation) during solid-phase peptide synthesis. Here, we leverage existing peptide synthesis datasets, supplemented with newly acquired experimental data, to build a predictive model that deciphers the role of individual amino acids in triggering aggregation. First, we identified and experimentally validated composition-dependent aggregation as a stronger predictor than sequence-based patterns. This insight enabled the development of a composition vector representation, allowing insights into the aggregation propensities of individual amino acids. Applying an ensemble of trained models, we predict the aggregation properties of peptides and recommend optimized synthesis conditions. By elucidating each individual amino acid’s influence, this method holds the potential to accelerate synthesis optimization through existing data, offering a robust framework for understanding and controlling peptide aggregation.
Supplementary materials
Title
Supplementary Material
Description
Supplementary material including dataset statistics, computational analysis, experimental procedures, and analytical data.
Actions