Abstract
Data-driven modeling has emerged as a new paradigm for biocatalyst design and discovery. Biocatalytic databases that integrate enzyme structure and function data are in urgent need. Here, we described IntEnzyDB as an integrated structure-kinetics database for facile statistical modeling and machine learning. IntEnzyDB employs a relational architecture with flattened data structure, which allows rapid data operation. This architecture also makes it easy for IntEnzyDB to incorporate more types of enzyme function data. IntEnzyDB contains enzyme kinetics and structure data from six enzyme commission classes. Using 1019 enzyme structure-kinetics pairs, we investigated the efficiency-perturbing propensity for mutations that are close or distal to the active site. The statistical results show that efficiency-enhancing mutations are globally encoded; deleterious mutations are much more likely to occur in close mutations than in distal mutations. Finally, we described a web interface that allows public users to access enzymology data stored in IntEnzyDB. IntEnzyDB will provide a computational facility for data-driven modeling in biocatalysis and molecular evolution.
Supplementary materials
Title
SI pdf
Description
supporting figures and tables.
Actions
Title
SI zip
Description
code
Actions