Abstract
Biochemical transformations may allow significant improvements in synthetic efficiency of complex functional molecules through reduction in the number of synthetic steps or avoidance of harsh conditions and/or toxic solvents/reactants. Yet, there is a limited access to biochemical reaction data, which reduces the opportunities of finding alternatives and discovering synergies with organic synthesis. We propose a workflow to explore the sparse synthetic biological domain. Using a molecular graph method we predict feasible biosynthetic reactions. The products of biosyntheses are learned from the functional transformations of the literature-excerpted reactions recorded in KEGG database. Through this approach we expanded the KEGG reaction dataset of biochemical transformations by approximately four times. To catalyse the novel reactions, we proposed a transformer model that learns from reaction SMILES and amino acid sequences of native enzymes and predicts promiscuous enzymes for potential substrates. The proposed transformer model calibrates the feasibility of the predicted reactions and reduces the search scope for promiscuous enzymes in the pool. A populated biological reaction space is eventually visualised in a two-dimensional t-SNE diagram.
Supplementary materials
Title
Aminoacid sequences
Description
Data required for the model
Actions