Progressive Alignment of Crystals: Reproducible and Efficient Assessment of Crystal Structure Similarity

30 May 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

During in silico crystal structure prediction of organic molecules, millions of candidate structures are often generated. These candidates must be compared to remove duplicates prior to further analysis (e.g., optimization with electronic structure methods) and ultimately compared with structures determined experimentally. The agreement of predicted and experimental structures forms the basis of evaluating the results from the Cambridge Crystallographic Data Centre (CCDC) blind assessment of crystal structure prediction, which further motivates the importance of rigorous alignments. Evaluating crystal structure packings in a reproducible manner requires not only calculating a coordinate root-mean-square deviation (RSMD) for N molecules (or N asymmetric units), but we argue should also include metrics to describe the shape of the compared molecular clusters to account for alternative approaches used to prioritize selection of molecules. Here we describe a flexible algorithm called Progressive Alignment of Crystals (PAC) to evaluate crystal packing similarity using coordinate RMSD and introduce radius of gyration (Rg) as a metric to quantify the shape of the superimposed clusters. We show that the absence of metrics to describe cluster shape adds ambiguity to the results of the CCDC blind assessments because it is not possible to determine whether the superposition algorithm prioritized tightly packed molecular clusters (i.e., to minimize Rg) or prioritized reduced RMSD (i.e., via possibly elongated clusters with relatively larger Rg). For example, we show that when the PAC algorithm described here uses “single linkage” to prioritize molecules for inclusion in the superimposed clusters, our results are nearly identical to those calculated by the widely used program COMPACK. However, we favor the lower Rg values obtained by use of “average linkage” for molecule prioritization because the resulting RMSDs more equally reflect the importance of packing along each dimension. We conclude by showing that the PAC algorithm is faster than COMPACK when using a single process, demonstrate its utility for biomolecular crystals, and finally present parallel scaling up to 64 processes in the open-source code Force Field X.

Keywords

structure comparison
crystal packing
crystal structure prediction
radius of gyration
Force Field X

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.