Large-Scale Proteomics and Its Future Impact on Medicine
Total Page:16
File Type:pdf, Size:1020Kb
The Pharmacogenomics Journal (2001) 1, 15–22 2001 Nature Publishing Group All rights reserved 1470-269X/01 $15.00 www.nature.com/tpj PERSPECTIVES ome Sequencing Consortium esti- Large-scale proteomics and its future mates that there are 31 000 protein- encoding genes in the human gen- impact on medicine ome, of which they can now provide a list of 22 000; Celera finds about GL Corthals1 and PS Nelson2 26 000—thus only about 1.1–1.4% of the Genome actually encodes protein.1 1Mass Spectrometry & Protein Analysis Laboratory, Garvan Institute of Medical This number of coding genes in the Research, Sydney, NSW, Australia; 2Division of Human Biology, Fred Hutchinson Cancer human sequence compares with 6000 Research Center, Seattle, WA, USA for a yeast cell, 13 000 for a fly, 18 000 for a worm and 26 000 for a plant. Fur- thermore, only 94 of 1278 protein In the last 25 years we have gone from omic technologies. The post-genomics families in our genome appear to be studying single genes to entire gen- era is spawning and proteomics is set specific to vertebrates and these cover omes. It’s amazing that we already to play a major role in defining bio- elementary cellular functions such as have the genome sequences of 599 logical systems at a molecular level. metabolism or protein expression (ie viruses and viroids, 205 naturally Here we look at large-scale proteomics transcription of DNA into RNA, and occurring plasmids, 185 organelles, 31 in the post-genome era and reflect on translation of RNA into protein). It is eubacteria, seven archaea, one fungus, its future impact to study biological remarkable that the biggest difference one plant, and two mammals. Due to systems in medical research. between humans, worms or flies is the the public availability of the human The field of proteomics transcends complexity of our proteins: more genome, at least 30 disease genes have genomics as it aims to convert raw domains (modules) per protein and 1 been positionaly cloned. There have gene sequence data, and measurement attendant novel combinations indeed been many impressive of gene expression, to information resulting from alternate assemblies of 1–5 advances in the fields of genomics, describing the actions of those pro- these modules.6 Two groups have but along with these fetes the HGP has teins controlling biological systems. attempted to order this complexity by also catalysed the need for post-gen- The public International Human Gen- developing computational methods Figure 1 Functional annotation by computation that ideally interfaces with experimental proteomics. (a) The ‘Rosetta Stone’ approach described by Marcotte et al,6 identifies putative interactions between gene products. The existence of fused genes in some (for example, genes A and B in organism n) but not all organisms implies that proteins encoded by separated genes interact, either physically or functionally. (b) Two unrelated pathways consisting of several individual proteins and protein complexes are shown. The sequence and structural similarities of the proteins are indicated by a common shape; the functional links are indicated by a common colour. Some proteins have several links because they are part of the same pathway as well as the same assembly. Common function is discovered based on the phylogenetic profile,6 domain-fusion analysis6,7 and expression correlation.6 From Sali13 and Doolittle.14 Impact of proteomics GL Corthals and PS Nelson 16 Figure 2 Comprehensive large-scale analysis of biological systems: combined analysis of transcriptomic, proteomic (including 2-DE,16 and ICAT11) and yeast two-hybrid data (y2h data from Schwikowski et al15) with software mining tools to display functions and linkages. that associate proteins that are func- ing and representation of biological in protein cascades and networks. The tionally linked through a metabolic systems, critical information needed to need to assign function, characterize pathway or structural complex.6,7 construct useful models of pathways this molecular interplay and ulti- Importantly, these approaches do not and assemblies is only gained through mately define cellular and organismal rely on direct sequence similarity, experimental proteomics data. The properties in the context of a biologi- rather they group proteins that are size and complexity of the task can be cal system will require a combination part of the same pathway or assembly appreciated by assuming between one of computational methods and high- (Figure 1) and define them as being and ten functional links per protein throughput proteomics. ‘functionally linked’. While these, and that results in 6000–60 000 links for a The concept of genome-wide pro- other software tools will continue to single yeast cell and a rapid temporal tein profiling is 20 years old, but has play a pivotal role in our understand- and spatial propagation of interactions become a reality only recently through The Pharmacogenomics Journal Impact of proteomics GL Corthals and PS Nelson 17 Table 1 Protein analysis technologies and their applications in large-scale proteomics. While enzymatic activity, receptor ligand, cytokine assays, biosensor, etc are integral to proteomics, they have not been included in the table due to their relative low-speed screening current sequence databases and tech- fications (PTMs), static and dynamic, accurate diagnosis of patients and nically impressive advances with the are rich in variety (more than 200 further classification of diseases such integration of powerful analytical documented8) and typically not appar- as tumours so that timely treatment tools, ranging from two-dimensional ent from primary sequence data. Fur- definition or selection can be realised. electrophoresis (2-DE) combined with thermore, even with the knowledge of In asking how proteomics will mass spectrometry (MS), yeast two- a consensus sequence there are cur- impact on medicine, we should start hybrid system, to protein chips con- rently no general rules describing by asking what can it enable now. A taining protein and peptide libraries, when and where phosphorylation, or description of technologies (Table 1) coupled with high-throughput func- other PTMs, takes place and how this and their applications reveals that we tional screening assays. Currently, functionally affects a biological sys- already have a huge variety of techno- however, there is much investment in tem. Thirdly, and perhaps most logies allowing the global analysis of measurement of mRNA expression. It importantly, inducible protein–ligand systems. Central to most screening is highly unlikely that there exists interactions cannot be measured by techniques lies the mass spectrometer a simple unidirectional or linear transcriptomics. A comprehensive as it enables outstanding speed and relationship between the trans- approach to measuring expression pro- accuracy in protein identification.9 criptome and proteome: the two data files and interactions of biological sys- Large-scale proteomics is and will be sets are distinctly different and both tems is therefore warranted (Figure 2). used for three distinct applications: have idiosyncratic control and regu- Both transcriptomics and proteomics Expression proteomics, Cell map proteo- lation over biological effects. We see at must be used for our measurements, mics10 and Modular proteomics. least three reasons to perform both and we must now equally invest in Expression proteomics is for the measurements. Firstly, temporal and proteome directed research. In the qualitative and quantitative display of spatial expression of proteins is not immediate future these large-scale protein expression profiles of tissues apparent from gene expression analy- molecular screening methods will con- extracts or cells. This approach gener- sis. Secondly, post-translational modi- tribute in areas such as accelerated and ates 2-D arrays of proteins (2-D gels) www.nature.com/tpj Impact of proteomics GL Corthals and PS Nelson 18 and via differential pattern display one amongst the functional modules. Gen- genomic screening techniques, such as can pin-point differences between nor- eral (affinity-based) approaches that differential display-PCR, cDNA micro mal vs perturbed states. Promising new can simplify protein mixtures and link arrays, and serial analysis of gene technologies such as isotope coded related protein–protein and protein– expression (SAGE), it should make affinity tags11 that may complement, DNA interactions are set to make drugs better, safer and more effi- or by-pass, 2-D gel technology will also massive in roads for high-throughput cacious. In the long term, we antici- provide a means for global expression modular protein analysis. pate the ability to dissect biological analysis. The expression proteomics What are the potential impacts of systems through robust extraction pro- approach is highly adaptable for proteomics on medicine? The poten- cedures where one can study func- analysis and categorisation of drug tials are huge, but these must be tem- tional modules. This requires only to influence and disease state as well as pered by the current practical techni- look at a small subset of proteins from the influence of biological stimuli, cal issues. Ultimately, protein profiles a complete mixture, and herein lies identification of body fluids and tissue should be able to provide qualitative one challenge: to screen a particular biopsies, examining PTMs and dis- and quantitative read-outs of patient constellation of proteins that are criti- covery of new disease markers. It serum