Uncovering the Impact of Spectroscopic Data Reduction Techniques on the Process Control Mode Pattern Recognition: The Case of Industrial Penicillin Production

09 February 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Process Analytical Technologies (PAT) often rely on real-time spectroscopy, allowing for fast-paced process control and monitoring. However, the data generated from real-time spectroscopy for long-running process systems can result in excessively large databases, which can be challenging to manage and may not necessarily lead to better process control. Therefore, it is crucial to reduce the amount of data generated by real-time spectroscopy while still retaining the essential information needed for process control. This work explores various data reduction techniques to address this issue. IndPenSim, a simulated spectroscopic probing dataset, was used as an oracle model to study the impact of data reduction techniques on the resulting process control identification. For analysis, the data pipeline consists of using principal component analysis (PCA) for visualization, followed by truncation and pre-processing (e.g. baseline corrections). Moreover, we have discussed the impact of data size reduction techniques (e.g. spectral data column selection, data binning, and region of interest (ROI), etc.) on the different chemometric models (e.g. PCA, PLS-DA, SIMCA, and KNN, etc.). Finally, the study examined the impact of data reduction on the control strategy for a realistic industrial fed-batch penicillin simulator. The multi-class classification performance was analyzed, and the results were interpreted to determine the best approach for controlling the process. Overall, the study provides valuable insights into data reduction techniques for real-time spectroscopy in PAT, which can improve the efficiency and accuracy of process control and monitoring.

Keywords

PLSDA
PCA
SIMCA
machine learning
Raman spectrum analysis
chemometrics
penicillin
Industrial data set

Supplementary materials

Title
Description
Actions
Title
Supporting Information - Uncovering the Impact of Spectroscopic Data Reduction Techniques on the Process Control Mode Pattern Recognition: The Case of Industrial Penicillin Production
Description
The supporting information contains plots and models generated using specific data reduction techniques for matrices M4, M5, and M6. The methods resulted in PLSDA and SIMCA plots, KNN prediction plots, and spectral information. Model performance was evaluated using specificity, selectivity, and precision values, and KNN model statistics were computed for data binning.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.