Abstract
Nuclear densities are frequently represented by an ensemble of nuclear configurations or points in the phase space in various contexts of molecular simulations. The size of the ensemble directly affects the accuracy and computational cost of subsequent calculations of observable quantities. In the present work, we address the question of how many configurations do we need and how to select them most efficiently. We focus on the nuclear ensemble method in the context of electronic spectroscopy, where thousands of sampled configurations are usually needed for sufficiently converged spectra. The proposed representative sampling technique allows for a dramatic reduction of the sample size. By using an exploratory method, we model the density from a large sample in the space of transition properties. The representative subset of nuclear configurations is optimized by minimizing its Kullback-Leibler divergence to the full density with simulated annealing. High-level calculations are then performed only for the selected subset of configurations. We tested the algorithm on electronic absorption spectra of three molecules: (E)-azobenzene, the simplest Criegee intermediate, and hydrated nitrate anion. Typically, dozens of nuclear configurations provided sufficiently accurate spectra. A strongly forbidden transition of the nitrate anion presented the most challenging case due to rare geometries with disproportionately high transition intensities. This problematic case was easily diagnosed within the present approach. We also discuss various exploratory methods and a possible extension to dynamical simulations.