Abstract
Reversed-phase (RP) liquid chromatography is an important tool for the characterization of materials and products in the pharmaceutical industry. Method development is still challenging in this application space, particularly when dealing with closely-related compounds. Models of chromatographic selectivity are useful for predicting which columns out of the hundreds that are available are likely to have very similar, or different, selectivity for the application at hand. The hydrophobic subtraction model (HSM1) has been widely employed for this purpose; the column database for this model currently stands at 750 columns. In previous work we explored a refinement of the original HSM1 (HSM2) and found that increasing the size of the dataset used to train the model dramatically reduced the number of gross errors in predictions of selectivity made using the model. In this paper we describe further work in this direction (HSM3), this time based on a much larger dataset (43,329 total measurements) containing selectivities for compounds covering a broader range of physicochemical properties compared to HSM1. This includes multiple compounds that are actual active pharmaceutical ingredients and related synthetic intermediates and impurities, as well as multiple pairs of closely related structures (e.g., geometric and cis-/trans- isomers). The HSM3 model is based on retention measurements for 75 compounds using 13 RP stationary phases and a mobile phase of 40/60 acetonitrile/25 mM ammonium formate buffer at pH 3.2. This data-driven model produced predictions of ln(alpha) (chromatographic selectivity using ethylbenzene as the reference compound) with average absolute errors of approximately 0.033, which corresponds to errors in alpha of about 3 %. In some cases, the prediction of the trans-/cis- selectivities for positional and geometric isomers was relatively accurate, and the driving forces for the observed selectivity could be inferred by examination of the relative magnitudes of the terms in the HSM3 model. For some geometric isomer pairs the interactions mainly responsible for the observed selectivities could not be rationalized due to large uncertainties for particular terms in the model. This suggests that more work is needed in the future to explore other HSM-type models and continue expanding the training dataset in order to continue improving the predictive accuracy of these models.
Supplementary materials
Title
Supplemental Figures and Tables
Description
Figure, tables, and explanation that support the main manuscript.
Actions
Title
Final Parameters for HSM3
Description
Column and solute parameters for the HSM3 model.
Actions
Title
WC_second_kernel_full_database
Description
Database of retention measurements used to develop the HSM3 model.
Actions
Title
Quality Control Data
Description
Quality control measurements made in the course of acquiring the full dataset.
Actions