Abstract
This article presents a novel algorithm for the calculation
of analytic energy gradients from second order Møller Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2) which is designed to achieve high performance on multi-GPU clusters. The algorithm uses GPUs for all major steps of the calculation, including integral generation, formation of all required intermediate tensors, solution of the Z-vector equation and gradient accumulation. The implementation in the EXtreme Scale Electronic Structure System (EXESS) software package includes a tailored, highly efficient, multi-stream scheduling system to hide CPU-GPU data transfer latencies and allows nodes with 8 A100 GPUs to operate at over 80% of theoretical peak floating-point performance. Comparative performance analysis shows a significant reduction in computational time relative to traditional multi-core CPU-based methods, with our approach achieving up to a 95-fold speedup over the single-node performance of established software such as Q-Chem and ORCA. Additionally, we demonstrate that pairing our implementation with the molecular fragmentation framework in EXESS can drastically lower the computational scaling of RI-MP2 gradient calculations from quintic to sub-quadratic, enabling further substantial savings in runtime while retaining high numerical accuracy in the resulting gradients.