Materials Science Optimization Benchmark Dataset for High-dimensional, Multi-objective, Multi-fidelity Optimization of CrabNet Hyperparameters

Sterling G. Baird; Jeet N. Parikh; Taylor D. Sparks

doi:10.26434/chemrxiv-2023-9s6r7

Materials Science

Search within Materials Science

Materials Science Optimization Benchmark Dataset for High-dimensional, Multi-objective, Multi-fidelity Optimization of CrabNet Hyperparameters

07 March 2023, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Benchmarks are crucial for driving progress in scientific disciplines. To be effective, benchmarks should closely mimic real-world tasks while being computationally efficient, allowing for accessibility and repeatability. Developing surrogate models that can be indistinguishable from the ground truth observation within the explored dataset bounds dramatically reduces the computational burden of running benchmarks without sacrificing quality, but this requires a large amount of initial data. In the fields of materials science and chemistry, relevant optimization tasks can be challenging due to their complexity, which includes hierarchical, noisy, multi-fidelity, multi-objective, high-dimensional, and non-linearly correlated variables. Additionally, they may include mixed numerical and categorical variables that are subject to linear and non-linear constraints. Simulating or experimentally verifying such tasks can be difficult, which is why benchmarks are essential. This study aimed to overcome these challenges by generating 173219 quasi-random hyperparameter combinations across 23 hyperparameters and using them to train CrabNet on the Matbench experimental band gap dataset (Computational runtime: 387 RTX-2080-Ti GPU days). The results were stored in a free-tier shared MongoDB Atlas dataset, creating a regression dataset that maps hyperparameter combinations to metrics such as MAE, RMSE, computational runtime, and model size for the CrabNet model trained on the Matbench experimental band gap benchmark task. To simulate the actual simulations, heteroskedastic noise was incorporated into the regression dataset, and bad hyperparameter combinations were excluded. Percentile ranks were computed within each group of identical parameter sets to capture heteroskedastic noise, rather than assuming Gaussian noise as is done in traditional approaches. This approach can be applied to other benchmark datasets, bridging the gap between optimization benchmarks with low computational overhead and realistically complex, real-world optimization scenarios.

Keywords

adaptive design

Bayesian optimization

formulation optimization

PseudoCrab

Supplementary weblinks

Title

Description

Actions

Title

Materials Science Optimization Benchmarks GitHub Repository

Description

A collection of benchmarking problems and datasets for testing the performance of advanced optimization algorithms in the field of materials science and chemistry.

Actions

View

Title

Materials Science Optimization Benchmark Dataset for High-dimensional, Multi-objective, Multi-fidelity Optimization of CrabNet Hyperparameters

Description

Zenodo snapshot of the CrabNet hyperparameter dataset.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 07, 2023 Version 1

Metrics

749

282

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-9s6r7

Funding

Division of Materials Research

DMR-1651668

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Materials Science Optimization Benchmark Dataset for High-dimensional, Multi-objective, Multi-fidelity Optimization of CrabNet Hyperparameters

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share