Abstract
This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 dataset using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on SARS-CoV-2 protease complex with PF-00835231, resulting in predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.
Supplementary materials
Title
Supporting Information
Description
Sections:
S1 Molecules excluded from the dataset
S2 Selection of reference atomic environments
S3 Modified sparse GPR
S4 Molecular dipolar polarizability from Thole model
S5 Results with MBIS volumes
S6 Calculation of electrostatic potential
S7 χ from MBIS partitioning
S8 Absolute embedding energy prediction errors
S9 Learning workflow
S10 Prediction workflow
Actions