Prediction of intrinsic solubility for drug-like organic compounds using Automated Network Optimizer (ANO) for physicochemical feature and hyperparameter optimization

07 November 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Accurate prediction of aqueous solubility remains a critical challenge in the chemical and pharmaceutical industries, significantly influencing drug development and delivery. This study revisits this well-explored area by leveraging the advanced capabilities of modern computational resources. We apply an automated network optimizer model that integrates dual optimization processes for molecular features and hyperparameters, streamlining the traditionally complex hyperparameter search while providing an efficient interpretation of molecular properties. By employing feature optimization techniques, our deep neural network model demonstrates improvements in both the speed and accuracy of molecular property predictions, achieving an average performance of R2 = 0.991. This result outperforms conventional hyperparameter optimization methods such as grid search and random search in predicting the intrinsic solubility of 3,745 compounds across four external experimental datasets. Based on feature importance analysis, we identified key molecular features and structures that significantly influence solubility. Additionally, combining three molecular fingerprints (Morgan, MACCS key, and Avalon) with molecular descriptors enhances model performance, providing a deeper understanding of the relationship between molecular structure and solubility within the physicochemical feature optimization process. These findings underscore the potential of machine learning models to improve predictive modeling of physical properties, apply automated modeling and feature selection to new chemical datasets, and offer explainable insights into the principles driving solubility predictions.

Keywords

automated network optimizer (ANO)
hyperparameter optimization
Bayesian optimization
intrinsic solubility
feature importance analysis

Supplementary materials

Title
Description
Actions
Title
Prediction of intrinsic solubility for drug-like organic compounds using Automated Network Optimizer (ANO) for physicochemical feature and hyperparameter optimization
Description
The supplementary materials include essential additional information to support the main text, comprising one table, six figures, and an appendix that details the machine learning methodology employed in this study.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.