Abstract
Lipophilicity is a physicochemical property with wide relevance in drug design and also applied in areas such as food chemistry, environmental chemistry, and computational biology. This descriptor strongly influences the absorption, distribution, permeability, bioaccumulation, protein-binding, and biological activity of bioorganic compounds. Lipophilicity is commonly expressed as the n-octanol/water partition coefficient (PN) for neutral molecules, whereas for molecules with ionizable groups, the distribution coefficient (D) at a given pH is used. The logDpH is usually predicted using a pH correction over the logPN using the pKa of ionizable molecules, while often ignoring the ionic partition (PI) because of the challenge of predicting the partitioning of the charged species. In this work, we studied the impact of PI on the prediction of lipophilicity of small drug-like molecules by modeling 225 logDpH of a set of experimental values using both the formalism that takes into account a pH correction (see Eq. 1) and the one considering the partition of ionic species (see Eq. 2). Our findings show that a better calculation is obtained by considering the ionic partition while ignoring its contribution can lead to inadequate computational predictions. In this context, we developed machine learning algorithms to determine in which cases the PI should be considered. The results indicate that small, compact, and hydrophobic molecules with a higher likelihood of being in their ionic state at specific pH values, were better modeled using Eq. 2. Finally, we validated our findings using a test and external set where the logistic regressions, random forest classifications, and support vector machine models predicted the better formalism to determine the logDpH for each molecule with high accuracies, sensitivities, and specificities.