<<

The Journal (2001) 1, 15–22  2001 Nature Publishing Group All rights reserved 1470-269X/01 $15.00 www.nature.com/tpj PERSPECTIVES

ome Sequencing Consortium esti- Large-scale and its future mates that there are 31 000 - encoding genes in the human gen- impact on ome, of which they can now provide a list of 22 000; Celera finds about GL Corthals1 and PS Nelson2 26 000—thus only about 1.1–1.4% of the actually encodes protein.1 1Mass Spectrometry & Protein Analysis Laboratory, Garvan Institute of Medical This number of coding genes in the Research, Sydney, NSW, Australia; 2Division of Human , Fred Hutchinson human sequence compares with 6000 Research Center, Seattle, WA, USA for a yeast , 13 000 for a fly, 18 000 for a worm and 26 000 for a plant. Fur- thermore, only 94 of 1278 protein In the last 25 years we have gone from omic technologies. The post- families in our genome appear to be studying single genes to entire gen- era is spawning and proteomics is set specific to vertebrates and these cover omes. It’s amazing that we already to play a major role in defining bio- elementary cellular functions such as have the genome sequences of 599 logical systems at a molecular level. or protein expression (ie viruses and viroids, 205 naturally Here we look at large-scale proteomics of DNA into RNA, and occurring plasmids, 185 organelles, 31 in the post-genome era and reflect on of RNA into protein). It is eubacteria, seven archaea, one fungus, its future impact to study biological remarkable that the biggest difference one plant, and two mammals. Due to systems in medical research. between humans, worms or flies is the the public availability of the human The field of proteomics transcends complexity of our : more genome, at least 30 disease genes have genomics as it aims to convert raw domains (modules) per protein and 1 been positionaly cloned. There have gene sequence data, and measurement attendant novel combinations indeed been many impressive of , to information resulting from alternate assemblies of 1–5 advances in the fields of genomics, describing the actions of those pro- these modules.6 Two groups have but along with these fetes the HGP has teins controlling biological systems. attempted to order this complexity by also catalysed the need for post-gen- The public International Human Gen- developing computational methods

Figure 1 Functional annotation by computation that ideally interfaces with experimental proteomics. (a) The ‘Rosetta Stone’ approach described by Marcotte et al,6 identifies putative interactions between gene products. The existence of fused genes in some (for example, genes A and B in organism n) but not all organisms implies that proteins encoded by separated genes interact, either physically or functionally. (b) Two unrelated pathways consisting of several individual proteins and protein complexes are shown. The sequence and structural similarities of the proteins are indicated by a common shape; the functional links are indicated by a common colour. Some proteins have several links because they are part of the same pathway as well as the same assembly. Common function is discovered based on the phylogenetic profile,6 domain-fusion analysis6,7 and expression correlation.6 From Sali13 and Doolittle.14 Impact of proteomics GL Corthals and PS Nelson 16

Figure 2 Comprehensive large-scale analysis of biological systems: combined analysis of transcriptomic, proteomic (including 2-DE,16 and ICAT11) and yeast two-hybrid data (y2h data from Schwikowski et al15) with software mining tools to display functions and linkages.

that associate proteins that are func- ing and representation of biological in protein cascades and networks. The tionally linked through a metabolic systems, critical information needed to need to assign function, characterize pathway or structural complex.6,7 construct useful models of pathways this molecular interplay and ulti- Importantly, these approaches do not and assemblies is only gained through mately define cellular and organismal rely on direct sequence similarity, experimental proteomics data. The properties in the context of a biologi- rather they group proteins that are size and complexity of the task can be cal system will require a combination part of the same pathway or assembly appreciated by assuming between one of computational methods and high- (Figure 1) and define them as being and ten functional links per protein throughput proteomics. ‘functionally linked’. While these, and that results in 6000–60 000 links for a The concept of genome-wide pro- other software tools will continue to single yeast cell and a rapid temporal tein profiling is 20 years old, but has play a pivotal role in our understand- and spatial propagation of interactions become a reality only recently through

The Pharmacogenomics Journal Impact of proteomics GL Corthals and PS Nelson 17

Table 1 Protein analysis technologies and their applications in large-scale proteomics. While enzymatic activity, receptor ligand, cytokine assays, biosensor, etc are integral to proteomics, they have not been included in the table due to their relative low-speed screening

current sequence databases and tech- fications (PTMs), static and dynamic, accurate diagnosis of patients and nically impressive advances with the are rich in variety (more than 200 further classification of diseases such integration of powerful analytical documented8) and typically not appar- as tumours so that timely treatment tools, ranging from two-dimensional ent from primary sequence data. Fur- definition or selection can be realised. electrophoresis (2-DE) combined with thermore, even with the knowledge of In asking how proteomics will (MS), yeast two- a consensus sequence there are cur- impact on medicine, we should start hybrid system, to protein chips con- rently no general rules describing by asking what can it enable now. A taining protein and libraries, when and where , or description of technologies (Table 1) coupled with high-throughput func- other PTMs, takes place and how this and their applications reveals that we tional screening assays. Currently, functionally affects a biological sys- already have a huge variety of techno- however, there is much investment in tem. Thirdly, and perhaps most logies allowing the global analysis of measurement of mRNA expression. It importantly, inducible protein–ligand systems. Central to most screening is highly unlikely that there exists interactions cannot be measured by techniques lies the mass spectrometer a simple unidirectional or linear transcriptomics. A comprehensive as it enables outstanding speed and relationship between the trans- approach to measuring expression pro- accuracy in protein identification.9 criptome and : the two data files and interactions of biological sys- Large-scale proteomics is and will be sets are distinctly different and both tems is therefore warranted (Figure 2). used for three distinct applications: have idiosyncratic control and regu- Both transcriptomics and proteomics Expression proteomics, Cell map proteo- lation over biological effects. We see at must be used for our measurements, mics10 and Modular proteomics. least three reasons to perform both and we must now equally invest in Expression proteomics is for the measurements. Firstly, temporal and proteome directed research. In the qualitative and quantitative display of spatial expression of proteins is not immediate future these large-scale protein expression profiles of tissues apparent from gene expression analy- molecular screening methods will con- extracts or cells. This approach gener- sis. Secondly, post-translational modi- tribute in areas such as accelerated and ates 2-D arrays of proteins (2-D gels)

www.nature.com/tpj Impact of proteomics GL Corthals and PS Nelson 18

and via differential pattern display one amongst the functional modules. Gen- genomic screening techniques, such as can pin-point differences between nor- eral (affinity-based) approaches that differential display-PCR, cDNA micro mal vs perturbed states. Promising new can simplify protein mixtures and link arrays, and serial analysis of gene technologies such as isotope coded related protein–protein and protein– expression (SAGE), it should make affinity tags11 that may complement, DNA interactions are set to make drugs better, safer and more effi- or by-pass, 2-D gel technology will also massive in roads for high-throughput cacious. In the long term, we antici- provide a means for global expression modular protein analysis. pate the ability to dissect biological analysis. The expression proteomics What are the potential impacts of systems through robust extraction pro- approach is highly adaptable for proteomics on medicine? The poten- cedures where one can study func- analysis and categorisation of drug tials are huge, but these must be tem- tional modules. This requires only to influence and disease state as well as pered by the current practical techni- look at a small subset of proteins from the influence of biological stimuli, cal issues. Ultimately, protein profiles a complete mixture, and herein lies identification of body fluids and tissue should be able to provide qualitative one challenge: to screen a particular biopsies, examining PTMs and dis- and quantitative read-outs of patient constellation of proteins that are criti- covery of new disease markers. It serum proteins that reflect a given dis- cal to a biological process. As David holds particular promise for the ease state or a response to disease treat- Baltimore recently wrote: ‘Celera’s pharma/biotech industry in the fields ment. Comprehensive qualitatative achievement of producing a draft of disease-marker discovery, toxi- and quantitative protein analyses may sequence in only a year of data-gather- cology and drug-target validation. replace current diagnostic laboratory ing is a testament to what can be real- Cell map proteomics aims at meas- technologies. Proteomics may provide ized today with the new capillary uring the subcellular protein content prognostic data that reflect tumour sequencers, sufficient computing by the purification of organelles and behaviour in a fashion that incorpor- power and the faith of investors.’ With their protein complexes followed by ates both intrinsic tumour character- a little ‘faith’ the timing looks right for mass-spectrometric identification of istics as well as host-response factors. proteomics to make its impact. the components. Systematic identifi- On a more immediate front, proteo- cation of subcellular protein content mics will identify common pathways DUALITY OF INTEREST allows a refined display and categoris- of drug action that indicate side-effect None declared. ation of the cellular processes based on profiles, synergistic and antagonistic location. Applications are similar to interactions, drug-resistance, and drug Correspondence should be sent to those pointed out above. efficacy before entering into expensive Garry L Corthals, Mass Spectrometry & Protein Modular proteomics aims to ration- and time-dependent clinical trials. Analysis Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Sydney, alise how biological systems are built Ahead lies a road of limitations and NSW 2010, Australia. up. To fully understand biological sys- opportunities. Despite the power of Tel: +61 2 9295 8193 tems we believe that protein function proteomic techniques several limi- Fax: +61 2 9295 8194 cannot easily be predicted from study- tations remain both in theory and in E-mail: g.corthalsȰgarvan.org.au ing singular components. Instead practice. Some key issues are for cellular functions are carried out by instance, in humans, proteins have a REFERENCES ‘modules’ made up of many species of dynamic range of expression of more 1 International Human Genome Sequencing interacting molecules. These func- than six orders of magnitude render- Consortium. Initial sequencing and analysis of the human genome. Nature 2001: 409: tional modules exist as a critical level ing simultaneous analysis of all system 860–921. 12 of biological organisation. Proteo- components extremely difficult. There 2 Liang P, Pardee AB. Differential display of mics must aim to generate data in the is yet no known amplification of pro- eukaryotic messenger RNA by means of the form of discrete functional modules teins that is generally applicable. The polymerase chain reaction. Science 1992; 257: 967–971. based on interacting proteins. Accord- analysis of most PTMs is essentially 3 Lashkari DA, DeRisi JL, McCusker JH, Nam- ingly proteomics should attempt to not solved but needs to be addressed ath AF, Gentile C, Hwang SY et al. Yeast validate functional modules by classi- so that we can more expansively ana- microarrays for genome wide parallel gen- fying and cataloguing those proteins lyse biological systems. The differen- etic and gene expression analysis. Proc Natl Acad Sci USA 1997; 94: 13057–13062. that contribute to the functional pro- tial solubility of proteins poses a prob- 4 DeRisi J, Penland L, Brown PO, Bittner ML, cess in question. The data measure- lem for all areas of proteomics. These Meltzer PS, Ray M et al. Use of a cDNA ments will be used in conjunction limitations of the technology are microarray to analyse gene expression pat- with sophisticated algorithms actively being addressed. terns in human cancer. Nat Genet 1996; 14: described above6,7 which in turn will New methods enabling the simpli- 457–460. 5 Velculescu VE, Zhang L, Vogelstein B, lead to the elucidation of common fication and quantitation of large pro- Kinzler KW. Serial analysis of gene biological ‘network architectures’. tein mixtures are currently being expression. Science 1995; 270: 484–487. Higher-level properties of cells, such as developed,11 as well as exciting devel- 6 Marcotte EM, Pellegrini M, Thompson MJ, the ability to integrate information opments in .6,7 Initially Yeates TO, Eisenberg D. A combined algor- ithm for genome-wide prediction of protein from multiple sources, will be proteomics will be used for drug devel- function. Nature 1999; 402: 83–86. described by a pattern of connections opment; where combined with current 7 Enright AJ, Iliopoulos I, Kyrpides NC,

The Pharmacogenomics Journal Pharmacogenomics, ethnicity, and susceptibility genes DW Nebert and AG Menon 19

Ouzounis CA. Protein interaction maps for 10 Blackstock WP, Weir MP. Proteomics: quanti- 13 Sali A. Functional links between proteins. complete based on gene fusion tative and physical mapping of cellular pro- Nature 1999; 402: 23, 25–26. events. Nature 1999; 402: 86–90. teins. Trends Biotechnol 1999; 17: 121–127. 14 Doolittle RF. Do you dig my groove? Nat 8 Krishna RG, Wold F. Post-translational 11 Gygi SP, Rist B, Gerber SA, Turecek F, Gelb Genet 1999; 23: 6–8. modification of proteins. Adv Enzymol Relat MH, Aebersold R. Quantitative analysis of 15 Schwikowski B, Uetz P, Fields S. A network Areas Mol Biol 1993; 67: 265–298. complex protein mixtures using isotope- of protein-protein interactions in yeast. Nat 9 Corthals GL, Gygi SP, Aebersold R, Pat- coded affinity tags. Nat Biotechnol 1999; Biotechnol 2000; 18: 1257–1261. terson SD. Identification of proteins by mass 17: 994–999. 16 Corthals GL, Wasinger VC, Hochstrasser DF, spectrometry. In: Rabilloud T (ed). Proteome 12 Hartwell LH, Hopfield JJ, Leibler S, Murray Sanchez J-C. The dynamic range of protein Research: 2D and Detection AW. From molecular to modular . expression: a challenge for proteomic Methods. Springer: New York, 1999, pp 197–231. Nature 1999; 402 (Suppl): C47–C52. research. Electrophoresis 2000; 21: 1104–1115.

differences,2 the majority of the diver- Pharmacogenomics, ethnicity, and sity occurs in drug metabolism genes, with some present in drug receptor susceptibility genes and transporter genes, and many others not yet explained on a molecu- DW Nebert1,2 and AG Menon2,3 lar basis. In principle, a pharmaco- genetic disorder might reflect allelic 1Department of Environmental Health; 2Center for Environmental Genetics, University of differences in any ‘susceptibility gene.’ Cincinnati Medical Center, Cincinnati, OH; 3Department of , If one considers that a drug (or other & Microbiology, University of Cincinnati Medical Center, Cincinnati, OH, USA environmental chemical, metabolite, or heavy metal) may theoretically act as an agonist or antagonist (or activator or INTRODUCTION unwarranted starting point in such inhibitor) of virtually any gene product Pharmacogenetic differences can be population-based studies. The most (target) in the cell, then it is likely that 10- to more than 40-fold between indi- successful approach to search for most—if not every—gene in the human viduals within an ‘ethnic’ group, while allelic effects is therefore likely to genome might be considered either the mean variation between ethnic require judicious choice of study directly or indirectly to be a suscepti- groups is rarely more than 2- to 3-fold. populations, based on the knowledge bility gene.3 Pharmacogenetic differ- This observation is the hallmark of of the trait being studied—in some ences are often 10-fold to more than 40- complex (or, multifactorial) traits, ie cases a relatively unadmixed popu- fold between the highest and lowest contribution of multiple genes and lation, in other cases a diverse, more individual in any given population,2 yet environmental factors on these genes. admixed population. In designing the mean difference between any two The concept now emerging is that the pharmacogenomic studies, it is there- ethnic populations is rarely greater than total number of alleles encoding all fore imperative for the clinical 2- or 3-fold.4 The fact—that interindi- the features related to ‘categorization researcher to be aware and appreciate vidual variance is much greater than of race or ethnic group’ (skin color, the richness and diversity of alleles group variation—is a hallmark of com- hair color and texture, facial traits) will that exist (but to very differing plex traits and is the subject of this probably be quite small (eg in the degrees) in each racial and ethnic invited ‘Perspectives’ article. dozens), compared with the number of group, rather than to study such alleles that encode interindividual dif- groups with the idea that they are gen- RACE AND ETHNICITY ferences even within the same ethnic etically ‘pure.’ The term ‘race’ is derived from Latin— group; the same is predicted to be true Adverse drug reactions, due in large generatio (n) and generare (v)—meaning for alleles that govern variation in part to interindividual variability in ‘generation’ and ‘to engender’, blood pressure, ability to taste, drug response, ranks between the respectively. Webster’s Dictionary capacity to metabolize drugs, etc. fourth and sixth leading cause of defines race as ‘any of the different 1 Because of considerable genetic admix- death in the US. Pharmacogenetics varieties of mankind, mainly the Cau- ture in most human populations, fed- and pharmacogenomics represent ‘the casoid, Mongoloid, and Negroid eral governments must rethink their studies of variability in drug response groups, distinguished by color of skin mandates that minorities and ethnic due to heredity.’ Pharmacogenetic and type of hair.’ Also, ‘any geographi- groups be included for political advances, which have exploded during cal, national, or tribal ethnic group- reasons as ‘distinct groups’—in each the past decade due to our rapidly ing.’ Some anthropologists regard and every clinical study. The idea of increasing knowledge of the human Arabs, Jews, Latinos and Spaniards as grouping by ethnicity certainly has its genome, should help in the substan- ‘distinct races’, whereas most would advantages in studies of traits that are tial reduction of drug-caused mor- prefer to call these ‘ethnic groups.’ predominantly expressed in specific bidity and mortality. In a recent listing Most regard the origin of human races populations, and may not be an of many dozens of pharmacogenetic similar to that of animal or plant spec-

www.nature.com/tpj