Abstract
Crystal structure prediction methods are prone to overestimate the number of potential polymorphs of organic molecules. In this work, we aim to reduce the overprediction by systematically applying molecular dynamics simulations and biased sampling methods to cluster subsets of structures that can easily interconvert at finite temperature and pressure. Following this approach, we rationally reduce the number of predicted putative polymorphs in CSP-generated crystal energy landscapes. This uses an unsupervised clustering approach to analyze independent finite-temperature molecular dynamics trajectories and hence identify a representative structure of each cluster of distinct lattice energy minima that are effectively equivalent at finite temperature and pressure. Biased simulations are used to reduce the impact of limited sampling time and to estimate the work associated with polymorphic transformations. We demonstrate the proposed systematic approach by studying the polymorphs of urea and succinic acid, reducing an initial set of over 100 energetically plausible CSP structures to 12 and 27 respectively, including the experimentally known polymorphs. The simulations also indicate the types of disorder and stacking errors that may occur in real structures.