Abstract
Quantum mechanics/molecular mechanics (QM/MM) simulations are crucial for understanding enzymatic reactions, but their accuracy depends heavily on the quantum-mechanical method used. Semiempirical methods offer computational efficiency but often struggle with accuracy in complex systems. This work presents a novel multi-objective evolutionary strategy for optimizing semiempirical Hamiltonians, specifically designed to enhance their performance in enzymatic QM/MM simulations while remaining broadly applicable to condensed-phase systems. Our methodology combines automated parameter optimization, targeting \textit{ab initio} or density functional theory (DFT)-reference potential energy surfaces, atomic charges, and gradients, with comprehensive validation through minimum free energy path (MFEP) calculations. To demonstrate its effectiveness, we applied our approach to improve the GFN2-xTB Hamiltonian using two enzymatic systems that involve hydride transfer reactions where the activation energy barrier is severely underestimated: Crotonyl-CoA carboxylase/reductase (CCR) and dihydrofolate reductase (DHFR). The optimized parameters showed significant improvements in reproducing potential and free energy surfaces, closely matching higher-level DFT calculations. Through an efficient two-stage optimization process, we first developed parameters for CCR using reaction path data, then refined these parameters for DHFR by incorporating a targeted set of additional training geometries. This strategic approach minimized the computational cost while achieving accurate descriptions of both systems, as validated through QM/MM simulations using the Adaptive String Method (ASM). Our method represents an efficient approach for optimizing semiempirical methods to study larger systems and longer timescales, with potential applications in enzymatic reaction mechanisms studies, drug design, and enzyme engineering.
Supplementary materials
Title
Supporting Information
Description
The Supporting Information provides comprehensive details about our methodology, parameters, and results. The document contains a thorough explanation of our methodology, including the theoretical foundations of GFN2-xTB's energy terms and their dependency on element-specific parameters, details of the ASM setup for CCR and DHFR enzymatic systems, the dual-level correction scheme for PMF profiles, our GFN2-xTB re-parametrization strategy, and PCA methodology for analyzing QM/MM trajectories. It also includes training sets generation procedures and comprehensive tables of optimized parameters from our parametrization workflow.
Two GitHub repositories containing code and data for reproducing our work are provided, both released under the MIT License and include detailed documentation for reproducing our work. Users must have Amber24 compiled with GFN2-xTB API support, Gaussian16 (only needed to reproduce our work, you could use any QM package readable by cclib library) for reference calculations, and a Python environment with the required dependencies.
Actions
Supplementary weblinks
Title
Data and Software Availability, materials
Description
The data and results necessary for reproduction are available in this GitHub repository, it is organized into three main sections: (1) Training Sets, containing quantum mechanical data at the M06-2X-D3/def2-TZVP level used for parameter optimization, including IRC and scan calculations for CCR and DHFR systems; (2) Optimized Parameters, providing the final GFN2-xTB parameter sets; and (3) ASM Calculations, containing the results of ASM's simulations. The repository also includes input files and configuration data needed for reproducing the QM/MM simulations.
Actions
View Title
Data and Software Availability, the code implementation
Description
The code implementation is available in this GitHub repository, which contains a Python package that implements our multi-objective evolutionary strategy for optimizing GFN2-xTB parameters. The package includes the core DDGA algorithm, objective function implementations, and working examples for replicating the parameter optimization for CCR and DHFR systems.
Actions
View