Abstract
Markov State Models (MSMs) have been widely applied to understand folding mechanisms and predict long timescale dynamics from ensembles of short molecular simulations. Most MSM estimators enforce detailed balance, assuming that trajectory data is sampled at equilibrium. This is rarely the case for ab initio folding studies, however, and as a result, MSMs can severely underestimate protein folding stabilities from such data. To remedy this problem, we have developed an enhanced-sampling protocol in which (1) unbiased folding simulations are performed and sparse tICA is used to obtain features that best capture the slowest events in folding, (2) umbrella sampling along this reaction coordinate is performed to observe folding and unfolding transitions, and (3) the thermodynamics and kinetics of folding are estimated using multiensemble Markov models (MEMMs). Using this protocol, folding pathways, rates, and stabilities of a designed alpha-helical hairpin, Z34C, can be predicted in good agreement with experimental measurements. These results indicate that accurate simulation-based estimates of absolute folding stabilities are within reach, with implications for the computational design of folded mini-proteins and peptidomimetics.