An Overview of Current Protein-Profiling Technologies
Total Page:16
File Type:pdf, Size:1020Kb
Proteomics and the Analysis of Proteomic UNIT 13.1 Data: An Overview of Current Protein-Profiling Technologies In recent years, several proteomic method- correlation between protein and mRNA abun- ologies have been developed that now make it dance for yeast is only 0.4. They found yeast possible to identify, characterize, and compar- genes with similar mRNA levels that had pro- atively quantify the relative level of expression tein levels that differed by 20-fold. Conversely, of hundreds of proteins that are coexpressed they found invariant, steady-state levels of pro- in a given cell type or tissue, or that are found teins which had mRNA levels that varied by in biological fluids such as serum. These 30-fold, similar to the >10-fold range observed advances have resulted from the integration of by Futcher et al. (1999). A more recent study diverse scientific disciplines including molec- found that protein concentrations in yeast can ular and cellular biology, protein/peptide vary by >100-fold for a given mRNA concen- chemistry, bioinformatics, analytical and tration (Greenbaum et al., 2003). Protein ex- bioanalytical chemistry, and the use of instru- pression analysis thus offers a potentially large mental and software tools such as multidimen- advantage in that it measures the level of the bi- sional electrophoretic and chromatographic ological effector protein molecule. Moreover, separations and mass spectrometry. In this unit, microarray analysis cannot detect, identify, or some of the common protein profiling tech- quantify post-translational protein modifica- nologies are reviewed, along with the accom- tions, which often play a key role in modulat- panying data analysis tools that are available to ing protein function. Additionally, microarray help interpret the resulting data. A summary of analysis is not suitable for monitoring the most abbreviations used is provided in Table 13.1.1. complex human proteome, the serum/plasma One of the most fundamental approaches to proteome. Because cells release proteins into understanding the functions of individual pro- the blood stream, the serum/plasma proteome teins in complex cellular processes is to cor- provides a unique and readily available re- relate protein expression levels with biologi- source to monitor changes occurring through- cal changes, e.g., differentiation, growth con- out the human body. However, the 1010 range ditions, cell-cycle stage, disease state, or an in plasma protein concentrations (i.e., from 0 external stimulus (Fig. 13.1.1). Although DNA to 5 pg/ml for interleukin 6 to 35 to 50 mg/ml microarray analysis offers a massively parallel for albumin) and the potential occurrence of approach to genome-wide mRNA expression an estimated 10 million or more immunoglob- analysis, there is often no direct relationship ulin sequences, make elucidation of the plasma between the in vivo concentration of an mRNA proteome a daunting challenge (Anderson and and its encoded protein. Differential rates of Anderson, 2002). translation of mRNAs into protein and differ- Recent advances in technology, instrumen- ential rates of protein degradation in vivo are tation, molecular biology, and bioinformatics two examples of factors that may confound the have made it possible to begin to ana- extrapolation of mRNA to protein expression lyze entire units of cellular components, profiles. Gygi et al. (1999a) estimated that the such as the genome, transcriptome, and more Figure 13.1.1 Increasing complexity from the genome to the proteome (adapted from National Using Heart, Lung, and Blood Institute; courtesy, Susan Old and Tom Kodadek). Proteomics Techniques Contributed by Erol E. Gulcicek, Christopher M. Colangelo, Walter McMurray, Kathryn Stone, 13.1.1 Terence Wu, Hongyu Zhao, Heidi Spratt, Alexander Kurosky, Baolin Wu, and Kenneth Williams Current Protocols in Bioinformatics (2005) 13.1.1-13.1.31 Supplement 10 Copyright C 2005 by John Wiley & Sons, Inc. Table 13.1.1 List of Commonly Used Abbreviations Abbreviation Definition dB Database 2DGE Two-dimensional gel electrophoresis CF Chromatofocusing DIGE Differential (fluorescence) gel electrophoresis ESI Electrospray ionization FFE Free-flow electrophoresis FTMS Fourier transform mass spectrometer HPLC High-performance liquid chromatography HT High throughput ICAT Isotope-coded affinity tag IEF Isoelectric focusing IMAC Immobilized metal-affinity chromatography IT Ion trap iTRAQ Applied Biosystems trademark name for multiplexed isobaric tagging technology for relative and absolute quantitation LC Liquid chromatography LIMS Laboratory Information Management Systems MALDI Matrix-assisted laser desorption/ionization MS Mass spectrometry MS/MS Tandem mass spectrometry MudPIT Multidimensional protein identification technology NPS Nonporous silica PhIAT Phosphoprotein isotope-coded affinity tag PMF Peptide mass fingerprint PPV Positive predictive value PTM Post-translational modifications Q and S Preparative anion (Q) and (S) cation exchange protein chromatography QTOF Quadrupole time-of-flight RP Reversed-phase SBEAMS Systems Biology Experiment Analysis System SCX Strong cation-exchange chromatography SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis SEC Size-exclusion chromatography SELDI Surface-enhanced laser desorption ionization SILAC Stable isotope labeling by amino acids in cell culture SPDBC Simultaneous Peak Detection and Baseline Correction SPF Simple peak finding TOFTime-of-flight Protein Profiling YPED Yale Protein Expression Database Technologies: An Overview 13.1.2 Supplement 10 Current Protocols in Bioinformatics Figure 13.1.2 Declining rate of introduction of new plasma protein analytes in FDA-approved clinical tests (adapted from Anderson and Anderson, 2002, with permission from the American Society for Biochemistry and Molecular Biology). recently, the proteome. These advances pro- of a single protein will be able to serve as vide the opportunity to begin to monitor a marker for the unambiguous diagnosis of a changes in human tissue proteomes that are disease—does not adequately account for bi- associated with differentiation, apoptosis, dis- ological diversity and the pleiotropic causes ease, and other important biological modifiers. and effects of many diseases. In this regard The ultimate goal of proteomics is to compre- it is worthwhile to recall the successful DNA hensively identify all proteins, their associated microarray research carried out on predicting biological activities, post-translational modi- the outcome of breast cancer. When compar- fications, and protein-protein interactions oc- ing mRNA expression profiles from biopsies curring in a given cell, and determine how this of 98 primary breast cancers that either had or “proteome” is altered in response to a modifier. had not metastasized within 5 years of diagno- Twoofthe factors that contribute to the enor- sis, Van’t Veer et al. (2002) found that 5,000 of mity of the challenge of proteomics and the 25,000 genes interrogated were differentially very modest progress to date are the 100-fold regulated (i.e., with at least a two-fold differ- increased level of complexity of the proteome ence and a p value of less than 0.01 in more than as compared to the genome (Fig. 13.1.1) and five tumors) between these two groups. Using the estimated 1010 dynamic range of protein a supervised classification methodology, they concentrations. identified a gene expression signature strongly Despite major technological improve- predictive of a short interval to distant metas- ments, advances in understanding of the hu- tases. They found that a classification system man proteome so far have been modest. As one based on 70 genes outperformed all clinical quantitative example, the annual rate of FDA- variables in predicting the likelihood of dis- approved plasma protein–based clinical diag- tant metastases within five years. The odds ra- nostic assays has actually declined over the tio for metastases among tumors with a gene last 10 years (Fig. 13.1.2). This clearly at odds signature associated with a poor prognosis, as with popular expectations that advances in “ge- compared with those having a signature as- nomics and proteomics are transforming the sociated with a good prognosis, was ∼15 us- clinical landscape through diagnostic applica- ing a cross-validation procedure. These results tion of knowledge on large numbers of new suggest that, while a fundamental biological proteins” (Anderson and Anderson, 2002). change like cancer is likely to alter the relative The authors of this unit believe that one of the level of expression of thousands of proteins major reasons for the slow rate of progress in in a given tissue type, only a small subset of the development of protein diagnostics is that these changes will be sufficiently robust to be Using the simple test paradigm often used in current predictive in large numbers of patients. Fur- Proteomics Techniques practice—i.e., that the change in concentration thermore, these data suggest that proteomic 13.1.3 Current Protocols in Bioinformatics Supplement 10 Figure 13.1.3 Block diagram of commonly used protein profiling workflows in mass spectrometry. The workflow is divided in to three basic categories and generally flows from proteins to peptides, and to LC and MS strategies. Examples of individual techniques within the categories are described in blocks, and possible workflow combinations are connected with arrows. Refer to Table 13.1.1 for a list of abbreviations. technologies capable of analyzing large can be separated into three