Prediction of Toluene/Water Partition Coefficient in the SAMPL9 Blind Challenge: Assessment of Machine Learning and IEF-PCM/MST Continuum Solvation Models

William J. Zamora Ramírez; Antonio  Viayna; Silvana  Pinheiro; Carles  Curutchet; Laia  Bisbal; Rebeca  Ruiz; Clara  Ràfols; F. Javier  Luque

doi:10.26434/chemrxiv-2023-fg64s

Physical Chemistry

Search within Physical Chemistry

Prediction of Toluene/Water Partition Coefficient in the SAMPL9 Blind Challenge: Assessment of Machine Learning and IEF-PCM/MST Continuum Solvation Models

30 March 2023, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

In recent years the use of partition systems other than the widely used biphasic n-octanol/water has received increased attention to gain insight into the molecular features that dictate the lipophilicity of compounds. Thus, the difference between n-octanol/water and toluene/water partition coefficients has proven to be a valuable descriptor to study the propensity of molecules to form intramolecular hydrogen bonds and exhibit chameleon-like properties that modulate solubility and permeability. In this context, this study reports the experimental toluene/water partition coefficients (logPtol/w) for a series of 16 drugs that were selected as an external test set in the framework of the Statistical Assessment of the Modeling of Proteins and Ligands (SAMPL) blind challenge. This external set has been used by the computational community to calibrate their methods in the current edition (SAMPL9) of this contest. Furthermore, the study also investigates the performance of two computational strategies for the prediction of logPtol/w. The first relies on the development of two machine learning (ML) models, which are built up by combining the selection of 11 molecular descriptors in conjunction with either multiple linear regression (MLR) and random forest regression (RFR) models to target a dataset of 252 experimental logPtol/w values. The second consists of the parametrization of the IEF-PCM/MST continuum solvation model from B3LYP/6-31G(d) calculations to predict the solvation free energies of 163 compounds in toluene and benzene. The performance of the ML and IEF-PCM/MST models has been calibrated against external test sets, including the compounds that define the SAMPL9 logPtol/w challenge. The results are used to discuss the merits and weaknesses of the two computational approaches.

Keywords

SAMPL9

Toluene/Water Partition Coefficient

Hydrophobicity

Machine Learning

Continuum Solvation Models

multiple linear regression

random forest regression

IEF-PCM/MST

logP

Supplementary materials

Title

Description

Actions

Title

Description

SI: Figures and Tables

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Prediction of toluene/water partition coefficients in the SAMPL9 blind challenge: assessment of machine learning and IEF-PCM/MST continuum solvation models

William J. Zamora, Antonio Viayna, Silvana Pinheiro, Carles Curutchet, Laia Bisbal, Rebeca Ruiz, Clara Ràfols, F. Javier Luque journal article

Physical Chemistry Chemical Physics , Volume 25, Issue 27

Online publication date: 2023

Version History

Mar 30, 2023 Version 1

Metrics

1,112

372

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2023-fg64s

Funding

Vice Chancellor for Research of the University of Costa Rica

Research projects 115-C2-126 and 908-C3-610

State Research Agency/Spanish Ministry of Science and Innovation

AEI/10.13039/501100011033; grants MDM-2017-0767, PID2020-115812GB-I00, PID2020-115374GB-100, PID2020-117646RB-I00 and CEX2021-001202-M

Generalitat de Catalunya

2021SGR00671

National Institutes of Health

R01GM124270

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Prediction of Toluene/Water Partition Coefficient in the SAMPL9 Blind Challenge: Assessment of Machine Learning and IEF-PCM/MST Continuum Solvation Models

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share