RandomNets Improve Neural Network Regression Performance via Implicit Ensembling

Esben Jannik Bjerrum

doi:10.26434/chemrxiv-2024-dsh9t

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

RandomNets Improve Neural Network Regression Performance via Implicit Ensembling

24 December 2024, Version 1

Working Paper

Esben Jannik Bjerrum

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Artificial feed-forward neural networks have long been recognized as powerful machine learning models and are widely used in QSAR and QSPR modeling of molecular properties. Inspired by Random Forest models and the robust techniques of sample and feature bagging, the RandomNets model was developed as an efficient, vectorized solution for ensemble creation, training, and inference. The model adds an extra dimension to the tensors passing through the neural network, combined with input feature masking and optional subsampling of the dataset during training. This vectorized approach improves efficiency and simplifies training and inference of the implicit ensemble. Training a 25-member implicit ensemble requires only twice the time of a comparable baseline network but significantly improves prediction performance, as measured by R² and MSE on test sets from 133 bioactivity datasets, with an average performance increase of around 25%. Compared to the conceptually similar input masking technique using dropout, the implicit ensemble demonstrates reduced sensitivity to hyperparameter choices, similar or improved performance, and a fourfold reduction in training time. Additionally, the implicit ensemble provides the standard deviation of individual predictions, which can help identify uncertain predictions.

Keywords

Supplementary materials

Title

Description

Actions

Title

Supplementary plots and figures

Description

Supplementary Information for: RandomNets Improve Neural Network Regression Performance via Implicit Ensembling

Actions

Supplementary weblinks

Title

Description

Actions

Title

GitHub Repository

Description

Source code for the RandomNets model

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Dec 24, 2024 Version 1

Metrics

553

201

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-dsh9t

Author’s competing interest statement

The author is a scientist and consultant with both personal and professional interests in promoting their work. While the source code for this model has been released under an open-source LGPL license, making it freely available for use and testing, there is an inherent tension between contributing unpaid work and sustaining a livelihood. Nevertheless, commercial support for the model is available through Cheminformania Consulting. The author acknowledges this balance between academic contribution and economic interests and remains committed to furthering scientific progress through open access and collaboration.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

RandomNets Improve Neural Network Regression Performance via Implicit Ensembling

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share