Abstract
The timescales of long-time atomistic molecular dynamics simulations are typically reported in microseconds, while the timescales for experiments studying the kinetics of amyloid fibril formation are typically reported in minutes or hours. This timescale deficit of roughly 9 orders of magnitude presents a major challenge in the design of computer simulation methods for studying protein aggregation events. Thus, coarse-grained molecular simulations of amyloid fibril formation are crucial for understanding the molecular mechanism behind the formation of these structures, which are implicated in diseases such as Alzheimer’s, Parkinson’s, and Type II diabetes. Network Hamiltonian simulations of aggregation are centered around a Hamiltonian function that returns the total energy of a system of aggregating proteins, given the graph structure of the system as input. In the graph, or network, representation of the system, each protein molecule is represented as a node, and non-covalent bonds between proteins are represented as edges. The parameter, i.e. a set of coefficients that determine the degree to which each topological degree of freedom is favored or disfavored, must be determined for each network Hamiltonian model, and is a well-known technical challenge.
Here, a type of artificial intelligence (AI) called a genetic algorithm is introduced for autonomously parameterizing network Hamiltonian models, whereby an initial set of randomly parameterized models, typically of low fibril yield (e.g. < 5 %), is used to initiate the evolution of subsequent model generations, ultimately leading to high fibril yield models (e.g. > 70 %). The methodology is also demonstrated by applying it to optimizing previously published network Hamiltonian models for the 5 key amyloid fibril topologies that have been reported in the Protein Data Bank (PDB), and showing that the models generated by the AI produce fibril yields that surpass or match previously published fibril yields in all cases. The authors also aim to encourage more widespread use of the network Hamiltonian methodology by introducing a free open-source implementation of the genetic algorithm for fitting network Hamiltonian models to other self-assembling systems.