Abstract
Thousands of publications on the prediction of small molecule retention times were published during the last decades. The ultimate goal is, without doubt, the transferable prediction of retention times: We want to train a model on a certain set of compounds from one dataset and then use the model to predict retention times for a different set of compounds from another dataset. Unfortunately, retention times may change massively, even for nominally identical chromatographic conditions. Retention order is much better retained, yet even the retention order of compounds may change if chromatographic conditions vary. Here, we systematically study what chromatographic conditions result in notable changes in retention order. We then present a machine learning model that can predict retention order or, more precisely, a retention order index, taking into account chromatographic conditions. Finally, we show how to map the retention order index to retention times. Disentangling these two task finally enables retention time prediction across chromatographic conditions and compound classes.
Supplementary materials
Title
Supplementary Table 2. List of RepoRT datasets used for retention order statistics and model evaluation
Description
All datasets from RepoRT are listed, detailing in which evaluation scenario each dataset is used. Information on which datasets are missing important metadata (HSM and Tanaka parameters, pH, void volume estimate, column temperature, flow rate) are also provided. Datasets removed from evaluation following manual curation are specified.
Actions
Supplementary weblinks
Title
Code for model training, evaluation and application
Description
GitHub repository containing the code to train, evaluate and apply the two-step retention time prediction models.
Actions
View