Design of Enzymes for Biocatalysis, Bioremediation, and Biosensing using Variational Autoencoder-Generated Latent Spaces

11 October 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Enzymes enable sustainable and environmentally friendly solutions in industrial biocatalysis, bioremediation, and biosensing. Evolutionary data have proven pivotal for narrowing down the vast sequence space during enzyme optimization. However, capturing all important dependencies among residues is challenging due to the nonlinear influence of coevolution at individual positions. To overcome this challenge, deep learning methods are being actively trained on protein sequence data. While they have demonstrated incredible capacity to grasp protein evolution, the strategies to leverage this information for the design of promising biocatalysts remain largely unexplored. Here, we introduce evolutionary trajectories generated by a generative deep-learning framework of variational autoencoders. We optimized and utilized this framework and the latent space geometry to produce a set of deep-learning-based ancestral sequences of model enzymes haloalkane dehalogenases. The generated novel proteins were expressed and experimentally characterized, showing stability and activity at the level of the wild type for soluble variants. We also identified a major limitation: the sequences distant from the template tend to accumulate many insertions and deletions, known to compromise protein solubility. Taking this limitation into account, we demonstrate that the geometry of the latent space, together with the generative potential of variational autoencoders, can be used for diversification of natural protein sequences.

Keywords

machine learning
ancestral sequence reconstruction
haloalkane dehalogenase
enzyme activity
protein stability
substrate specificity

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.