Abstract
We present a model for estimating the price of a reagent from its chemical structure. It is intended to be useful when doing reagent selection for library design. The model is a Random Forest regressor which is trained on the MolPort catalog of 302K reagents and the log of their price. For descriptors we use topological fingerprints from RDKit: chiral Morgan fingerprints, its medicinal chemistry descriptors, and counts of undetermined chiral centers. The model has an out-of-bag performance of 34% variance explained in log Price. When predicting on known reagents, the model explains 91% of the variance in log Price. We analyzed the model to understand the errors that the model makes. We show that the compounds with the highest errors have only a subtly different structure from similar molecules, but very different in price.