Proteomics: Clinical and research applications in respiratory diseases
ABSTRACT
The proteome is the study of the protein content of a definable component of an organism in biology. However, the tissue-specific expression of proteins and the varied post-translational modifications, splice variants and protein–protein complexes that may form, make the study of protein a challenging yet vital tool in answering many of the unanswered questions in medicine and biology to date. Indeed, the spatial, temporal and functional composition of proteins in the human body has proven difficult to elucidate for many years. Given the effect of microRNA and epigenetic regulation on silencing and enhancing gene transcription, the study of protein arguably provides more accurate information on homeostasis and perturbation in health and disease. There have been significant advances in the field of proteomics in recent years, with new technologies and platforms available to the research community. In this review, we briefly discuss some of these new technologies and developments in the context of respiratory disease. We also discuss the types of data science approaches to analyses and interpretation of the large volumes of data generated in proteomic studies. We discuss the application of these technologies with regard to respiratory disease and highlight the potential for proteomics in generating major advances in the understanding of respiratory pathophysiology into the future.
Abbreviations
-
- AC
-
- adenocarcinoma
-
- BALF
-
- bronchoalveolar lavage fluid
-
- CCL5
-
- C-C Motif Chemokine Ligand 5
-
- DAVID
-
- Database for Annotation, Visualization and Integrated Discovery
-
- ELF
-
- epithelial lining fluid
-
- GO
-
- gene ontology
-
- ILD
-
- interstitial lung disease
-
- IPF
-
- idiopathic pulmonary fibrosis
-
- KEGG
-
- Kyoto Encyclopedia of Genes and Genomes
-
- LASSO
-
- least absolute shrinkage and selection operator
-
- LC
-
- liquid chromatography
-
- MALDI-TOF-MS
-
- matrix-assisted laser desorption/ionization-time of flight-MS
-
- MMP
-
- matrix metalloprotease
-
- MS
-
- mass spectrometry
-
- NSCLC
-
- non-small cell lung carcinoma
-
- PC
-
- principal component
-
- PCA
-
- PC analysis
-
- PLSDA
-
- partial least squares discriminant analysis
-
- PTM
-
- post-translational modification
-
- SCC
-
- squamous carcinoma
-
- SERPINA3
-
- Serpin Family A Member 3
-
- SOMAmer
-
- slow off-rate modified aptamer
-
- SSc
-
- scleroderma
-
- TGM2
-
- transglutaminase 2
-
- UPR
-
- unfolded protein response
-
- VIP
-
- variable importance in projection
INTRODUCTION
Proteomics is the study of ‘proteomes’ or the study and characterization of the protein composition of a cell, organ or other definable compartment of living organisms. Proteins are compounds of one or more long chains of amino acids and are vital parts of all living organisms. Amino acids are compounds composed of both a carboxyl (-COOH) and an amino (-NH2) group and form the building blocks of proteins. These proteins are generated from translation of mRNA and provide the principal information on how cells or organs function.1 The lung is a fascinating and complex arena for proteomic studies, with innate and adaptive immune systems, extracellular matrix/interstitium, resident and recruited leucocytes and an epithelial lining that is constantly exposed to the external environment. Pulmonary diseases remain a major contributor to global morbidity and mortality and there are many difficult questions that remain unanswered in pulmonary pathophysiology.2 Proteomics has the potential to address many of these shortcomings.
Protein is generated from translation of mRNA, yet flow of information from DNA to mRNA and then protein is confounded by epigenetic changes and microRNA which can work to alter, amplify or dampen these genetic signals.3, 4 The human genome consists of approximately 31 000 protein-coding genes,5 and remains largely unchanged throughout life. Therefore, study of DNA and mRNA sequences does not account for changing environmental influences. Nucleic acid studies provide data on the potential for organ and cellular pathobiology and risk of disease and perturbation.5, 6 However, the human proteome adds incredible complexity to the human genome. The tissue-specific expression of genes, translation of protein and subsequent splice variants, post-translational modifications (PTM) and protein–protein complexes/interactions7 and regulation of protein abundance largely at a translational level8 mean that interrogating the human proteome is arguably more challenging than interrogating the human genome (Fig. 1).

Historically, technologies available to quantify protein in biological matrices were limited, costly and cumbersome. There has been considerable recent progress. The human proteome has been tentatively mapped using an integrative omics’ approach (transcriptomics and antibody-based techniques) and represents a major step forward for proteomic research.9 Central to this has been the development of the human protein atlas, an invaluable research tool for protein localization and tissue expression, which includes a proteome map of normal human lung.10, 11 One of the major focuses of proteomic research to date has been identification of accurate disease biomarkers and targets for intervention.12 Proteomics arguably has the most potential of the ‘omics’ fields to provide new knowledge on disease pathogenesis, generate reliable biomarkers and facilitate discovery of new therapeutic strategies for human disease.
In this review, we discuss some of the recent advances in proteomic technology and describe current proteomic applications including mass spectrometry (MS) and aptamer approaches. We also detail several bioinformatic techniques and workflows to approach, analyse and interpret proteomic data. Finally, we highlight the application of proteomic technology to respiratory diseases and discuss some of the potential future uses of these technologies.
PROTEOMIC APPLICATIONS AND CHALLENGES
Herein, we provide a basic guide to some proteomic applications, namely MS and aptamer approaches. An important consideration is that proteomic platforms are constantly evolving, have mixed versatility, difficulty and technical challenges. For instance, the field includes diverse projects from cell organelle protein expression profiling to human blood biomarker identification. Certain platforms may be better suited to addressing different scientific questions over others. All proteomic approaches will not be covered here and the interested reader is directed to a comprehensive review of proteomic applications and MS elsewhere.13
Challenges in proteomic applications have been significant and have dampened enthusiasm for these platforms over the years. The spectrum of proteins that exist span a dynamic concentration range of at least 12 logs and this has hampered progress.14, 15 For instance, albumin, a large abundant protein in plasma is separated from the rarest measurable plasma proteins by 10 orders of magnitude.15 The complexity of proteins involving splice variant and PTM has also generated difficulties. The exact frequency of PTM is unclear, although the top 15 experimentally validated modifications represent the bulk of reported PTM.16 Common PTM are listed in Table 1. Moreover, splice variants add further complexity. Indeed, fibronectin, an important component of the pulmonary interstitium, has more than 20 known isoforms.17 Despite these hurdles, new advances have improved our technical abilities and these challenges have become less daunting.
Phosphoserine | 4-hydroxyproline |
Phosphothreonine | Pyrrolidone carboxylic acid |
N-linked glycosylation | N-acetylalanine |
N-6 acetyllysine | O-linked glycosylation |
Glycyl lysine isopeptide | Phosphotyrosine |
Citrullination |
Mass spectrometry
Significant advances in MS technology have accumulated in the last decade. These advances have improved the ability of these platforms to accurately measure thousands of proteins in a biological matrix.
Protein extraction from biological samples requires preformed knowledge of study design as different types of biological matrices and methods of extraction may induce bias and affect protein quantity and activity. For instance, blood within a tissue sample may give non-representative falsely elevated results for certain proteins. Lysis and digestion of a biological matrix can generate peptide mixtures which need some degree of fractionation or enrichment to be compatible with proteomic applications. Fractionation can be achieved based on charge, isoelectric point or hydrophobicity properties of peptides and is typically achieved using gel electrophoresis, affinity chromatography or isoelectric focusing.13 Specific subsets of peptides can be enriched by targeting PTM (e.g. phosphorylation and acetylation) using affinity resins or antibody immunoprecipitation. Liquid chromatography (LC) is then applied to the reduced samples for further separation and sample reduction. MS is the next crucial analytical step as information garnered is then used to identify varied proteins. In brief, as MS measures the mass to charge ratio of ions (m/z) in gas phase, peptides must be transferred into the gas phase and then ionized. Once ionized, peptide precursor ions are submitted to the mass spectrometer where the m/z ratio is measured. Single precursor ions are then selected and subjected to tandem MS to generate characteristic fragment ions. This combination of precursor m/z ratio and its fragment ions is then matched to known peptide sequences from curated protein databases for protein identification. There are multiple technologies and methods for peptide fractionation, enrichment, ionization and types of mass spectrometers commercially available.13 MS has been used widely in biomarker studies of respiratory disease to date including chronic obstructive pulmonary disease (COPD),18-20 acute respiratory distress syndrome (ARDS)21-23 and interstitial lung disease (ILD).24-26
Aptamer-based techniques
Aptamers are short single-stranded RNA or DNA oligonucleotides that bind specific parts of a target molecule with high affinity and specificity.27 Aptamer generation is less expensive and less arduous than antibody generation, and aptamers are not known to be toxic or immunogenic.28 In recent years, a new class of aptamer has been developed, termed slow off-rate modified aptamers (SOMAmer), which consist of single-stranded DNA-based molecular recognition elements.29, 30 They are fully synthetic and developed in vitro using libraries of randomized sequences through modifications of the systematic evolution of ligands by exponential enrichment (SELEX) process. The selected SOMAmers have distinct recognizable nucleotide sequences and act as protein-binding elements with defined shapes. The nucleotide sequences can be recognized by complementary hybridization probes. The assay takes advantage of the slow dissociation rate between SOMAmer and their cognate proteins. Non-cognate interactions between SOMAmer and protein will dissociate rapidly. The cognate SOMAmers are hybridized to complementary probes on a standard DNA microarray. The SOMAmer data quantitatively represent the protein concentration in the sampled matrix. This is achieved by converting the assay signal in relative fluorescent units to protein concentration.31 SOMAmers have been used to develop biomarker tools in several forms of respiratory disease including lung cancer,32-34 pulmonary tuberculosis35, 36 and idiopathic pulmonary fibrosis (IPF).37, 38
BIOINFORMATIC ANALYSIS OF PROTEOMIC DATA
New proteomic experimental technologies generate large volumes of data, but a major challenge lies in analysing these data to provide new biological insight. The fields of bioinformatics, computational biology and systems biology have developed techniques to facilitate curating, analysis and interpretation of ‘omics’ data with many of these approaches described as either data-driven or knowledge-based.39 Data-driven approaches rely only on protein data to identify proteins of interest in differentiating clinical or biological groups, and knowledge-based approaches rely on previously reported functions and pathways.
Data-driven analysis
The goal of data-driven analysis is to use proteomic data to discover new proteins that are associated with certain experimental or clinical groups, without employing prior knowledge of these proteins’ functions. One way to begin analysis of a new proteomic data set involves employing data-driven tools to enable visualization of the overall differences in protein expression data between clinical or biological groups. A common method used to visualize protein expression is a volcano plot, which displays information about each protein's fold change in expression across groups on the x-axis versus the significance of this change (determined by t-test or other statistical analysis) on the y-axis40 (Fig. 2A). As determining statistical significance in large proteomic data sets may involve performing many statistical tests, it is important to correct for multiple comparisons to control for the Type I error rate in order to reduce the number of false-positive findings. The Bonferroni41 and Benjamini–Hochberg42 correction are common tests used in order to control for this, and can also be displayed on the volcano plot (Fig. 2A).



Hierarchical clustering is another visualization technique that additionally highlights the presence of protein clusters that differentiate multiple groups of interest. The hierarchical clustering algorithm employs a distance metric (such as Pearson's correlation coefficient, Euclidean distance or others described by Jaskowiak et al.43) to cluster samples and proteins in terms of similarity. Identified clusters can then be displayed as dendrograms, with an associated heat map of colour intensity to display changes in expression of each protein across groups of interest (Fig. 2B).
Two other data-driven analytical approaches used to visualize differences between clinical or biological groups employ linear algebra: principal component (PC) analysis (PCA) and partial least squares discriminant analysis (PLSDA).44, 45 PCA and PLSDA algorithms identify weighted linear combinations (or ‘patterns’) of measured proteins that capture variance across the samples. Each sample can then be plotted on these key combinations (called latent variables (LV) in PLSDA and PC in PCA), generating an interpretable scores plot in which differences between groups of interest may be visualized. Although PCA and PLSDA create figures that can look similar, an important difference between them is that the PLSDA algorithm also receives information about patient groups and searches for variance that differentiates these groups, making it a ‘supervised’ approach (Fig. 2C,D). In contrast, PCA only evaluates overall variance (without information about groups), making it ‘unsupervised’.
Another approach for evaluating large proteomic data sets is correlation network analysis, which enables graphical visualization of significant correlations between protein pairs. Correlation networks are constructed by calculating Pearson's or Spearman's correlation coefficients between measured proteins. A map is then created indicating significant connections and the strength of each correlation (Fig. 2E). These graphs allow quick identification of highly connected proteins that may be network regulators, and how these interactions change across groups. Again, multiple comparison tests should be used to reduce Type I error.
In addition to visualization, data-driven approaches are also useful for eliminating proteins that are not relevant to a biological or clinical question of interest. In proteomic data sets, considerably large numbers of proteins may be unchanged between groups of interest, masking the important and differentially regulated proteins. In this case, quantitative feature selection techniques inherent to some data-driven approaches can be used to identify subsets, or ‘minimum signatures’, of proteins that best separate the groups of interest. Two such examples are the least absolute shrinkage and selection operator method (LASSO)46 and selection using variable importance in projection (VIP) scores in PLSDA.47
Knowledge-based analysis
Knowledge-based bioinformatic tools take advantage of prior knowledge to analyse proteomic data sets in the context of known protein function and ontology. These tools enable identification of biological pathways that are both enriched in the data set and known to be involved in specific functions and processes.
Knowledge-based analysis employs previously generated databases where proteins have been tagged with unique identifier labels. UniProt IDs48 are the most commonly used protein identifiers, although gene IDs (with gene identifiers given by Ensembl49) or the Enzyme Commission numbering system50 are also often used in proteomics. The choice in which identifier to use depends on the type of proteins that are being measured, and on which identifiers a given knowledge-based database will accept. Once proteins are linked with unique identifiers, prior knowledge databases with annotated information about biological functions and pathways can be employed to identify associated processes. One such tool is gene ontology (GO).51 GO terms, which standardize the naming of genes and gene products, are used to report the specific ‘biological processes’, ‘molecular functions’ and ‘cellular compartments’ annotations associated with measured genes and proteins. Similar to GO terms, the Kyoto Encyclopedia of Genes and Genomes (KEGG) links protein and gene names with their functions and chemical information.52 KEGG differs from GO in that it is more focused on known protein interactions. KEGG's mapped pathways include those describing metabolism, human disease, signal transduction and many others. Other pathway databases include Reactome,53 PANTHER pathways54 and WikiPathways.55
In addition to biological annotation, knowledge-based analysis can be used to identify functions or pathways that are significantly enriched in data sets of interest. This involves comparing how many times a certain pathway is included in the protein set of interest with how many times it appears in a reference (control) set of proteins or genes (such as that organism's genome). A P-value can be calculated and used to determine if the pathway is significantly enriched in the proteomic data set of interest. One such tool that both annotates proteins and performs functional enrichment analyses is the Database for Annotation, Visualization and Integrated Discovery (DAVID).56 DAVID's strength is that it performs enrichment analyses on multiple annotation types (such as GO terms and KEGG pathways) and displays the results in both charts and clustered heat maps. Other knowledge-based enrichment analyses and visualization tools include Cytoscape57 and its ClueGO plug in,58 EnrichNet,59 and the commercial Ingenuity Pathway Analysis (IPA),60 as well as others as described by Laukens et al.61
Combining data-driven and knowledge-based analysis techniques
Data-driven and knowledge-based proteomic analyses complement each other well when combined. In this process, data-driven tools can identify key minimum signatures of proteins that differentiate the groups of interest, with knowledge-based tools providing a deeper biological context for this smaller list of proteins. For example, a feature selection technique (LASSO or VIP scores) or a volcano plot can be used to narrow down the proteomic data set into a list of the proteins that vary between the groups of interest. These identified significant proteins can subsequently be labelled with UniProt IDs and input into DAVID to discover enriched pathways and biological processes, generating new hypotheses regarding mechanisms of action associated with disease.
PROTEOMIC APPLICATIONS IN RESPIRATORY DISEASE
Diseases of the respiratory system remain a major source of global morbidity and mortality.2 Proteomic discovery in lung is a rapidly evolving field, and currently much of the focus has been centred on the role of proteomics in lung cancer (Fig. 3).






Idiopathic pulmonary fibrosis
IPF is the most common form of ILD and is invariably fatal with a median survival of 2–3 years.62 IPF aetiology and pathogenesis are poorly understood.63 The disease results in aberrant accumulation of extracellular matrix within the interstitium of the lung, promoting impaired gas exchange and respiratory failure.37
However, recent studies have started to explore differences in protein expression and profiling in IPF patients. Comparative proteomic analysis of lung tissue samples derived from IPF patients and human donor transplant lungs using 2D gel electrophoresis and matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF)-MS demonstrated significant differences in protein expression.25 Fifty-one proteins were upregulated and 38 downregulated in IPF lung compared with normal. Proteins involved in unfolded protein response (UPR) were upregulated and immunohistochemistry confirmed induction of markers of UPR within type 2 pneumocytes. Furthermore, there was downregulation of antioxidants and structural epithelial proteins supporting epithelial cell injury as a key feature of IPF pathogenesis. The ability to differentiate between different types of ILD pathology using proteomic profiles would mark a major advancement in ILD management. Landi et al. employed bronchoalveolar lavage fluid (BALF) derived from IPF, sarcoidosis, Langerhans cell histiocytosis and scleroderma (SSc)-associated ILD patients to examine differentially expressed protein profiles.26 They reported novel findings supporting the regulation of ILD pathogenesis by factors in alternative complement activation, blood coagulation, protein folding and Slit-Robo signalling. The acquisition of BALF however may be challenging in chronic lung disease. Recent work by our group applied novel aptamer approaches to investigate the blood plasma proteome in IPF patients from the Correlating Outcomes With Biochemical Markers to Estimate Time-progression (COMET) study.38 SOMAmer were measured in IPF patients and then analysed to generate a panel of six plasma biomarkers to predict disease progression based on a composite disease progression index. IPF patients with high levels of inducible T-cell costimulatory (ICOS) and trypsin 3 (TRY3) and low levels of ficolin-2 (FCN2), cathepsin-S (Cath-S), legumain (LGMN) and soluble vascular endothelial growth factor receptor 2 (VEGFsR2) predicted poorer progression-free survival. We next examined the differential expression of plasma proteins in healthy volunteers and IPF patients. In this recent proof of concept study, we employed hierarchical clustering of statistically significant differentially expressed proteins in IPF patients and healthy volunteers, demonstrating visually distinct plasma proteomes between healthy volunteers and IPF patients37 (Fig. 4). This study highlights the potential use of proteomic profiles derived from easily accessible blood in the diagnostic workup of ILD patients. Foster et al. recently employed two different proteomic platforms to BALF from IPF patients and demonstrated the increased expression of osteopontin.64 This work importantly validated previous studies and results across quantitative proteomic platforms.65 Schiller et al. recently applied quantitative label-free MS to address common protein regulations across apparently heterogeneous lung fibrosis tissue from human patients (including IPF).66 They report a possible common regulator, MZB1 (Marginal Zone B and B1 Cell-Specific Protein) + plasma B cells, present at high prevalence in both fibrotic lung and skin tissue including IPF, hypersensitivity pneumonitis (HP), cryptogenic organizing pneumonia (COP), SSc-associated ILD and unclassifiable ILD.



Asthma
Asthma is a chronic inflammatory airway disorder characterized by variable airflow obstruction.67 The disease is associated with exposure to aeroallergens which leads to immunological changes within the airway epithelium. To date, there are several studies that have examined the role of proteomic technologies in both development of biomarkers and improved understanding of asthma pathogenesis.
Initial studies using high-performance liquid chromatography (HPLC) resulted in discovery of the chemokine CCL5 (RANTES (Regulated Upon Activation, Normally T-Expressed And Presumably Secreted)) as a BALF biomarker of allergic inflammation and eosinophilic activation in asthma patients.68 A further study of endobronchial biopsies in a small number of asthma patients and healthy volunteers using MS also identified CCL5 as a biomarker. These authors used pathway analysis to identify biologically important functional pathways including acute phase response, cell–cell signalling and tissue development in asthmatic airways compared with controls.69 Hamsten et al. also demonstrated alterations in CCL5 plasma protein levels with significantly lower levels reported in children with persistent asthma compared with controls.70 Wu et al. used LC–MS/MS of BALF samples after allergen challenge in asthma patients to describe the complex biological pathways activated in the lung.71 They found approximately 150 proteins that were upregulated in response to allergen exposure in BALF, and the upregulated proteins were associated with wide ranging functional pathways including proteolysis, inflammation, cell proliferation and signal transduction. Potentially interesting upregulated proteins included matrix metalloproteinase (MMP) 9 and Serpin Family A Member 3 (SERPINA3). MMP9 is a matrix metalloproteinase involved in lung remodelling that is generated in part by airway neutrophils.72 Proteomic studies of sputum samples from asthma patients have also been employed to study asthma pathobiology. Gharib et al. examined airway sputum samples from 10 patients and reported 17 target proteins including alpha 1-antichymotrypsin.73 Sputum samples are acquired by non-invasive means and therefore provide an advantage over other types of pulmonary sampling. Aptamer approaches have also been reported in studies of asthma. Loza et al. reported increases in serum C-reactive protein (CRP) and IgE and reductions in serum carbonic anhydrase 6 and osteomodulin in severe asthma.74
Chronic obstructive pulmonary disease
COPD is a common disease with global impact and high related morbidity and mortality. COPD is characterized by airflow obstruction that is poorly reversible.75 There is obstruction of small airways and destruction of distal alveolar structures resulting in air trapping, impaired gas exchange, cough, dyspnoea and sputum production.76 Proteomic approaches have been utilized in studies of COPD from BALF, tissue and blood for biomarker discovery. Nano-LC–MS techniques identified 76 differentially expressed proteins in BALF from COPD patients, and pathway analysis identified biological processes including inflammatory processes, glycolysis and oxidation reduction.77 Given the issues with dilution of epithelial lining fluid (ELF) on BALF acquisition, one investigative group obtained ELF directly from the airway using microprobes during bronchoscopy and then applied microfluidics-based nano-LC–MS/MS to identify and quantify proteins in the ELF of COPD patients. They identified elevated levels of lactotransferrin, high-mobility group protein B1 (HGMB1) and alpha-1 antichymotrypsin in ELF from COPD patients compared with healthy controls.78 Interestingly, alpha-1 antichymotrypsin encoded for by the SERPINA3 gene is reportedly elevated in the sputum of asthma patients, possibly reflecting a shared mechanism in chronic inflammatory airway disorders.71 Studies have also examined proteins in sputum to better understand COPD pathogenesis. Baraniuk et al. identified a higher abundance of mucin 5AC in sputum from COPD and healthy smokers. Patients with emphysema features had higher levels of defensins and protein components of neutrophil extracellular traps (NETS).79 Lee et al. employed MALDI-TOF-MS in tissue samples from COPD patients and healthy smokers.19 They reported significant upregulation of MMP13 mainly in alveolar macrophages and thioredoxin-like 2 (TXL2) in bronchial epithelium compared with healthy smokers. A further comprehensive study of tissue, plasma and sputum in COPD, IPF and alpha-1-antitrypsin deficiency patients identified the protein transglutaminase 2 (TGM2) as a COPD-specific protein.80 Tissue levels of TGM2 associated with disease severity, and sputum and plasma levels of TGM2 correlated with forced expiratory volume in 1 s (FEV1) % predicted values.
Lung cancer
The detection of lung cancer during the early phases of disease is crucial to providing optimal management strategies and potential cure, as it is often diagnosed at an advanced stage.81 Therefore, the discovery of accurate and reliable biomarkers is an important goal. Extensive use of proteomic research applications has occurred in the lung cancer field but the proposed biomarkers have yet to be adopted for clinical applications.82 Most lung cancer proteomic studies have been undertaken in diseased tissue samples; however, some studies have been carried out on serum, blood, BALF, pleural fluid and saliva.82
Wu et al. studied plasma samples from lung adenocarcinoma (AC) (non-small cell lung carcinoma, NSCLC) patients and age- and gender-matched healthy controls and reported nine candidate proteins that discriminated between cancer and health.83 These proteins included gelsolin (GSN), galectin-1 (LGALS1) and actin cytoplasmic 1 (ACTB). It may be possible to use blood proteomics to stratify risk of developing lung cancer. One study applied proteomics to plasma from never, current or former smokers and reported a significant association with plasma apolipoprotein E (APOE) levels and the development of squamous metaplasia in the lungs, supporting the potential to develop proteomic plasma biomarkers capable of predicting pre-malignant and early forms of lung cancer.84 However, lung tissue samples from cancer patients have received more extensive analysis. Numerous studies have analysed proteomic changes within lung tissue samples. Pernemalm et al. used isobaric tags for relative and absolute quantitation (ITRAQ)-based quantitative proteomics to compare lung cancer tissue samples associated with 2-year relapse and those without relapse. Using pathway analysis, they reported that tumours associated with relapse had a higher dependence on glycolysis and higher hypoxia-inducible factor (HIF) activity.85 Kikuchi et al. pooled samples of lung AC, squamous carcinoma (SCC) and control tissue and used shotgun proteomics to profile the lung tumour proteome. They found higher levels of Mapsin (SERPINAB5) in SCC tissue samples and identified, for the first time, dysregulation of the p21-activated kinases in NSCLC.86
TRANSLATING PROTEOMIC STUDIES
To date, the results of many proteomic studies in medicine have been centred on the development of reliable biomarkers for disease and outcomes. For instance, the largest aptamer study of plasma proteins to date was employed to risk stratify patients with cardiovascular disease.87 However, it is important to note that the study of proteins may have vast implications in medicine and science. Proteomic platforms may be used to generate hypothesis on disease pathophysiology, develop new therapies and novel strategies and assess for clinical efficacy and safety of new drugs.88-91 Given the array of proteomic tools available including ELISA, aptamer and MS platforms, an important goal for the field is an improved understanding of accuracy and reproducibility across proteome-specific platforms. These questions remain difficult to address without large-scale cross-platform studies in humans. We have previously shown significant correlations between protein measured by both ELISA and aptamer techniques within the same human cohort.38 However, this is an area where further study is required. The future of proteomics is exciting and likely to yield major advances in medicine. Recent work has shown how proteomics may be integrated with genomic data to demonstrate overlap between quantitative gene, protein and disease-associated loci, with evidence of causal links between specific proteins and disease.92 These advances may lead to accurate mapping in real-time of disease states, biological pathways and therapeutic targets.
CONCLUSION
The last two decades have ushered in a timely revolution in proteomics. New technologies and modifications of old ones are facilitating studies of thousands of proteins in biological samples, allowing for an ever improved understanding of protein expression, function and dynamics. Leveraging the power of proteomics to provide an accurate estimate of immediate health or disease remains an achievable and vital goal. The continued evolution and expansion of proteomic technologies such as aptamer approaches and the parallel development of bioinformatic tools and applications will facilitate this goal. While challenges remain, evolving proteomic applications and the era of integrating genomic and proteomic human data in disease and health will alter the current architecture of how we understand, diagnose and manage human disease in the lung and elsewhere in the body.
Acknowledgements
K.C.N. was supported by a Department of Education Graduate Assistance in Areas of National Need (GAANN) Fellowship awarded to the Biomedical Engineering Department at the University of Michigan (PR Award Number: P200A150170). D.N.O.D. was supported by NIH grant K99HL139996 and B.B.M. was supported by NIH grants AI117229 and HL127805.
The Authors
K.C.N. is a graduate student in the Department of Biomedical Engineering at the University of Michigan, Ann Arbor. She is interested in using computational approaches to gain systems-level insight into immunological diseases. Her current work uses data-driven analysis techniques to identify key cytokine and cellular relationships involved in IPF and COPD. K.B.A. is an Assistant Professor in the Department of Biomedical Engineering at the University of Michigan, Ann Arbor. Dr K.B.A. uses experimental and computational techniques to understand and uncover the immune cell communication networks present in the inflammatory environment of disease states. Specifically, she investigates diseases affecting mucosal immunology, such as HIV, IPF and COPD. B.B.M. is a Professor in the Department of Internal Medicine, Division of Pulmonary and Critical Care Medicine and the Department of Microbiology and Immunology at the University of Michigan. Dr B.B.M. has a long history of research in the field of pulmonary fibrosis and has published extensively in IPF. Her recent work involves the use of omics including microbiome and proteomic analysis to identify novel host and microbiota-related mechanisms involved in IPF disease progression. D.N.O.D. is an Assistant Professor of Internal Medicine, Division of Pulmonary and Critical Care Medicine at the University of Michigan, Ann Arbor. He has specific interest in IPF and other forms of ILD. His research interests include host innate immune and pulmonary interactions in chronic lung injury and the use of proteomic biomarkers in the study of IPF pathogenesis and progression.