Abstract
Protein pKa prediction is essential for the investigation of pH-associated relationship between protein structure and function. In this work, we introduce a deep learning
based protein pKa predictor DeepKa, which is trained with 12809 pKa's derived from continuous constant pH molecular dynamics (CpHMD) simulations of 279 soluble proteins. Here the CpHMD implemented in the Amber molecular dynamics package has been employed (Huang, Harris, and Shen J. Chem. Inf. Model. 2018, 58, 1372-1383).
Notably, to deal with the finite-size effect, grid charges are proposed to represent electrostatics, but rather the previously used atomic charges. We show that the prediction accuracy by DeepKa is close to that by CpHMD benchmarking simulations, validating DeepKa as an efficient protein pKa predictor. In addition, the training dataset created in this study can be applied to the development of machine learning based protein pKa predictors in future. Finally, the new grid charge representation is general and applicable to other topics, like the protein-ligand binding affinity prediction.