RandomNets Improve Neural Network Regression Performance via Implicit Ensembling

24 December 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Artificial feed-forward neural networks have long been recognized as powerful machine learning models and are widely used in QSAR and QSPR modeling of molecular properties. Inspired by Random Forest models and the robust techniques of sample and feature bagging, the RandomNets model was developed as an efficient, vectorized solution for ensemble creation, training, and inference. The model adds an extra dimension to the tensors passing through the neural network, combined with input feature masking and optional subsampling of the dataset during training. This vectorized approach improves efficiency and simplifies training and inference of the implicit ensemble. Training a 25-member implicit ensemble requires only twice the time of a comparable baseline network but significantly improves prediction performance, as measured by R² and MSE on test sets from 133 bioactivity datasets, with an average performance increase of around 25%. Compared to the conceptually similar input masking technique using dropout, the implicit ensemble demonstrates reduced sensitivity to hyperparameter choices, similar or improved performance, and a fourfold reduction in training time. Additionally, the implicit ensemble provides the standard deviation of individual predictions, which can help identify uncertain predictions.

Keywords

QSAR
Ensemble
Neural Networks
PyTorch

Supplementary materials

Title
Description
Actions
Title
Supplementary plots and figures
Description
Supplementary Information for: RandomNets Improve Neural Network Regression Performance via Implicit Ensembling
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.