Multi-objective evolutionary strategy for improving semiempirical  Hamiltonians in the study of enzymatic reactions at the QM/MM level of theory

José Luís Velázquez-Libera; Rodrigo  Recabarren; Esteban  Vöhringer-Martinez; Yamisleydi  Salgueiro; J. Javier Ruiz-Pernía; Julio Caballero; Iñaki Tuñón

doi:10.26434/chemrxiv-2025-pvztk

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Multi-objective evolutionary strategy for improving semiempirical Hamiltonians in the study of enzymatic reactions at the QM/MM level of theory

13 February 2025, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Quantum mechanics/molecular mechanics (QM/MM) simulations are crucial for understanding enzymatic reactions, but their accuracy depends heavily on the quantum-mechanical method used. Semiempirical methods offer computational efficiency but often struggle with accuracy in complex systems. This work presents a novel multi-objective evolutionary strategy for optimizing semiempirical Hamiltonians, specifically designed to enhance their performance in enzymatic QM/MM simulations while remaining broadly applicable to condensed-phase systems. Our methodology combines automated parameter optimization, targeting \textit{ab initio} or density functional theory (DFT)-reference potential energy surfaces, atomic charges, and gradients, with comprehensive validation through minimum free energy path (MFEP) calculations. To demonstrate its effectiveness, we applied our approach to improve the GFN2-xTB Hamiltonian using two enzymatic systems that involve hydride transfer reactions where the activation energy barrier is severely underestimated: Crotonyl-CoA carboxylase/reductase (CCR) and dihydrofolate reductase (DHFR). The optimized parameters showed significant improvements in reproducing potential and free energy surfaces, closely matching higher-level DFT calculations. Through an efficient two-stage optimization process, we first developed parameters for CCR using reaction path data, then refined these parameters for DHFR by incorporating a targeted set of additional training geometries. This strategic approach minimized the computational cost while achieving accurate descriptions of both systems, as validated through QM/MM simulations using the Adaptive String Method (ASM). Our method represents an efficient approach for optimizing semiempirical methods to study larger systems and longer timescales, with potential applications in enzymatic reaction mechanisms studies, drug design, and enzyme engineering.

Keywords

Semiempirical methods

QM/MM simulations

Multi-objective optimization

Parameter optimization

Hydride transfer

Enzyme catalysis

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

The Supporting Information provides comprehensive details about our methodology, parameters, and results. The document contains a thorough explanation of our methodology, including the theoretical foundations of GFN2-xTB's energy terms and their dependency on element-specific parameters, details of the ASM setup for CCR and DHFR enzymatic systems, the dual-level correction scheme for PMF profiles, our GFN2-xTB re-parametrization strategy, and PCA methodology for analyzing QM/MM trajectories. It also includes training sets generation procedures and comprehensive tables of optimized parameters from our parametrization workflow. Two GitHub repositories containing code and data for reproducing our work are provided, both released under the MIT License and include detailed documentation for reproducing our work. Users must have Amber24 compiled with GFN2-xTB API support, Gaussian16 (only needed to reproduce our work, you could use any QM package readable by cclib library) for reference calculations, and a Python environment with the required dependencies.

Actions

Supplementary weblinks

Title

Description

Actions

Title

Data and Software Availability, materials

Description

The data and results necessary for reproduction are available in this GitHub repository, it is organized into three main sections: (1) Training Sets, containing quantum mechanical data at the M06-2X-D3/def2-TZVP level used for parameter optimization, including IRC and scan calculations for CCR and DHFR systems; (2) Optimized Parameters, providing the final GFN2-xTB parameter sets; and (3) ASM Calculations, containing the results of ASM's simulations. The repository also includes input files and configuration data needed for reproducing the QM/MM simulations.

Actions

View

Title

Data and Software Availability, the code implementation

Description

The code implementation is available in this GitHub repository, which contains a Python package that implements our multi-objective evolutionary strategy for optimizing GFN2-xTB parameters. The package includes the core DDGA algorithm, objective function implementations, and working examples for replicating the parameter optimization for CCR and DHFR systems.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Apr 23, 2025 Version 2

Feb 13, 2025 Version 1

Metrics

366

107

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-pvztk

Funding

ANID/CONICYT FONDECYT Postdoctorado

3240216

ANID FONDECYT REGULAR

1210138

MCIN/AEI/10.13039/501100011033/ and “ERDF A way of making Europe”

PID2021-123332OB-C22

ANID/CONICYT FONDECYT Postdoctorado

3210695

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Multi-objective evolutionary strategy for improving semiempirical Hamiltonians in the study of enzymatic reactions at the QM/MM level of theory

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share