PROSAC as a selection tool for SO-PLS regression: a strategy for multi-block data fusion

Jose Antonio Diaz-Olivares; Ryad Bendoula; Wouter Saeys; Maxime Ryckewaert; Ines Adriaens; Xinyue Fu; Matti Pastell; Jean-Michel Roger; Ben Aernouts

doi:10.26434/chemrxiv-2024-r57wp

Analytical Chemistry

Search within Analytical Chemistry

PROSAC as a selection tool for SO-PLS regression: a strategy for multi-block data fusion

28 February 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Spectral data from multiple sources can be integrated into multi-block fusion chemometric models, such as sequentially orthogonalized partial-least squares (SO-PLS), to improve the prediction of sample quality features. Pre-processing techniques are often applied to mitigate extraneous variability, unrelated to the response variables. However, the selection of suitable pre-processing methods and identification of informative data blocks becomes increasingly complex and time-consuming when dealing with a large number of blocks. The problem addressed in this work is the efficient pre-processing, selection and ordering of data blocks for targeted applications in SO-PLS. We introduce the PROSAC-SO-PLS methodology, which employs pre-processing ensembles with response-oriented sequential alternation calibration (PROSAC). This approach identifies the best pre-processed data blocks and their sequential order for specific SO-PLS applications. The method uses a stepwise forward selection strategy, facilitated by the rapid Gram-Schmidt process, to prioritize blocks based on their effectiveness in minimizing prediction error, as indicated by the lowest prediction residuals. To validate the efficacy of our approach, we showcase the outcomes of three empirical near-infrared (NIR) datasets. Comparative analyses were performed against partial-least-squares (PLS) regressions on single-block pre-processed datasets and a methodology relying solely on PROSAC. The PROSAC-SO-PLS approach consistently outperformed these methods, yielding significantly lower prediction errors. This has been evidenced by a reduction in the root-mean-squared error of prediction (RMSEP) ranging from 5 to 25% across seven out of the eight response variables analyzed. The PROSAC-SO-PLS methodology offers a versatile and efficient technique for ensemble pre-processing in NIR data modeling. It enables the use of SO-PLS minimizing concerns about pre-processing sequence or block order and effectively manages a large number of data blocks. This innovation significantly streamlines the data pre-processing and model-building processes, enhancing the accuracy and efficiency of chemometric models.

Keywords

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Feb 28, 2024 Version 1

Metrics

452

258

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-r57wp

Funding

Fonds Wetenschappelijk Onderzoek

1S76322N

Fonds Wetenschappelijk Onderzoek

V418621N

KU Leuven

C3/19/037

Ghent University

BOF - Ines Adriaens

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

PROSAC as a selection tool for SO-PLS regression: a strategy for multi-block data fusion

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share