Abstract
We have developed a multivariate approach for rapid exploration of differential protein profiles acquired from distinct tissue regions. Spatially targeted proteomics is a technology for analyzing the proteome of specific cell types and functional regions within tissue. While spatial context is often essential to understanding biological processes, interpreting complex protein profiles (e.g., of key tissue subregions) can pose a challenge due to the high-dimensional nature of the data. To address this challenge, we developed a multivariate approach to explore such data and applied it to analyze a published spatially targeted proteomics dataset collected from Staphylococcus aureus-infected murine kidney, 4-days and 10-days post-infection. The multivariate data analysis process we developed rapidly filters complex biological data to determine the most relevant species from hundreds to thousands of measured molecules avoiding the more traditional univariate and targeted viewpoint of tracking individual proteins. We employ principal component analysis (PCA) for dimensionality reduction and grouping of correlated and anticorrelated proteins among regions and timepoints previously measured by mass spectrometry through micro-liquid extraction surface analysis (microLESA). Subsequently, k-means clustering of the PCA-processed data was used to group samples in an unsupervised manner. Interpretation of the resultant cluster centers revealed a subset of proteins among those detected that differentiate among spatial regions of infection over two timepoints. These proteins are involved in the glycolysis and TCA metabolomic pathways, calcium-dependent processes, and cytoskeletal organization. Gene ontology analysis of the protein subsets in each cluster uncovered patterns in the dataset used related to tissue damage and repair as well as calcium-related defense mechanisms during staphylococcal infection. By applying this analysis in an infectious disease case study, we observed differential proteomic changes across abscess regions over time, reflecting the dynamic nature of host-pathogen interactions.
Supplementary materials
Title
Spatial Proteomics Supplementary Information
Description
Supplemental figures describing missing values per dataset, silhouette score determination, and results with dataset using the full zero-filled dataset.
Actions
Title
Spatial Proteomics Code for Zero-Filled Dataset
Description
Jupyter Notebook of Code Analyzing Zero-Filled Dataset
Actions
Title
Spatial Proteomics Code for Dataset with No Imputation
Description
Jupyter Notebook of Code Analyzing Dataset with No Imputation
Actions
Supplementary weblinks
Title
Spatial Proteomics Code
Description
GitHub repository containing code for analysis of both non-imputed and zero-filled datasets, comma-separated values (CSV) files with protein IDs per protein class for both non-imputed and zero-filled datasets, and text file containing protein group data.
Actions
View