Abstract
Characterizing and unveiling chemical bonds and interactions in proteins are essential due to their impact on deepening our understanding of these systems and their accompanying applications, e.g., in drug design and protein engineering. Despite the significant impact of such investigations, a systematic framework for such studies has been lacking. Here, we present a machine-learning-based approach for discovering chemical connections in proteins. This method enables the identification of effective descriptors and the prediction of atomic constructs that host specific chemical bonds. To demonstrate the applicability of our approach, we integrate our predictive modeling method with experimental observations for covalent nitrogen-oxygen-sulfur (NOS) linkages between lysine and cysteine. Analyzing over 86,000 protein structures and their X-ray validation reports, we have unveiled sixty-nine NOS linkages beyond the previously known lysine-cysteine cases for lysine-cysteine, glycine-cysteine, and arginine-cysteine pairs. Our proposed method is easily adaptable to characterize any chemical bond or interaction, opening the way to the discovery of various chemical connections within protein structures.