Abstract
This paper addresses the challenges in cell line development (CLD), the lengthy and ambiguous clone screening in upstream biopharmaceutical production. Typically, only a small subset of the later stages of CLD data is used for manually selecting lead clones. Addressing this issue, we introduce a multivariate data analysis (MVDA) as an automated, data-driven approach that integrates CLD data of all scales and stages, to identify criteria for earlier, more accurate, and efficient selection of high-performing cell lines, as well as providing sophisticated knowledge on the metabolic patterns prevalent in productive, stable cell lines. CLD is a multi-scale screening process from micro-scale single-cell cloning (SSC) and well plates, to minibioreactor (MBR) production runs for screening cell line stability. The MVDA identified which early micro-scale CLD stages and what criteria provide predictive potential for disregarding more cell lines earlier on in CLD, using four historical CLD data of hundreds of CHO clonal cell lines producing three unique target mAbs. Using decision trees, we derived that the SSC and the 6-well scale-up are the most predictive early-CLD stages for cell line performance in the production runs, with generalised thresholds of VCC_(6-well)>38% and q_(p,Beacon)<40% for safely deselecting poor-performing cell lines across various mAb targets. The MVDA also revealed the metabolic patterns typical in highly productive, stable cell lines, which were elevated oxidative metabolism, modified glutamine and ammonium metabolism, and less lactate-induced cell death. Including these metabolic parameters as selection criteria would be novel, demonstrating the enhanced information retrieval the MVDA method offers.