Abstract
Process Analytical Technologies (PAT) often rely on real-time spectroscopy, allowing for fast-paced process control and monitoring. However, the data generated from real-time spectroscopy for long-running process systems can result in excessively large databases, which can be challenging to manage and may not necessarily lead to better process control. Therefore, it is crucial to reduce the amount of data generated by real-time spectroscopy while still retaining the essential information needed for process control. This work explores various data reduction techniques to address this issue. IndPenSim, a simulated spectroscopic probing dataset, was used as an oracle model to study the impact of data reduction techniques on the resulting process control identification. For analysis, the data pipeline consists of using principal component analysis (PCA) for visualization, followed by truncation and pre-processing (e.g. baseline corrections). Moreover, we have discussed the impact of data size reduction techniques (e.g. spectral data column selection, data binning, and region of interest (ROI), etc.) on the different chemometric models (e.g. PCA, PLS-DA, SIMCA, and KNN, etc.). Finally, the study examined the impact of data reduction on the control strategy for a realistic industrial fed-batch penicillin simulator. The multi-class classification performance was analyzed, and the results were interpreted to determine the best approach for controlling the process. Overall, the study provides valuable insights into data reduction techniques for real-time spectroscopy in PAT, which can improve the efficiency and accuracy of process control and monitoring.
Supplementary materials
Title
Supporting Information - Uncovering the Impact of Spectroscopic Data Reduction Techniques on the Process Control Mode Pattern Recognition: The Case of Industrial Penicillin Production
Description
The supporting information contains plots and models generated using specific data reduction techniques for matrices M4, M5, and M6. The methods resulted in PLSDA and SIMCA plots, KNN prediction plots, and spectral information. Model performance was evaluated using specificity, selectivity, and precision values, and KNN model statistics were computed for data binning.
Actions