Abstract
Molecular simulation is a mature and versatile tool set widely utilized in many subjects with more than 30,000 publications each year. However, its methodology development has been struggling with a tradeoff between accuracy/resolution and speed, significant improvement of both beyond present state of the art is necessary to reliably substitute many expensive and laborious experiments in molecular biology, materials science and nanotechnology. Previously, the ubiquitous issue regarding severe wasting of computational resources in all forms of molecular simulations due to repetitive local sampling was raised, and the local free energy landscape approach was proposed to address it. This approach is derived from a simple idea of first learning local distributions, and followed by dynamic assembly of which to infer global joint distribution of a target molecular system. When compared with conventional explicit solvent molecular dynamics simulations, a simple and approximate implementation of this theory in protein structural refinement harvested acceleration of about six orders of magnitude without loss of accuracy. While this initial test revealed tremendous benefits for addressing repetitive local sampling, there are some implicit assumptions need to be articulated. Here, I present a more thorough discussion of repetitive local sampling; potential options for learning local distributions; a more general formulation with potential extension to simulation of near equilibrium molecular systems; the prospect of developing computation driven molecular science; the connection to mainstream residue pair distance distribution based protein structure prediction/refinement; and the fundamental difference of utilizing averaging from conventional molecular simulation framework based on potential of mean force. This more general development is termed the local distribution theory to release the limitation of strict thermodynamic equilibrium in its potential wide application in general soft condensed molecular systems.