From Sequence to Function: Case Studies in Structural and Functional Genomics

4 From Sequence to Function: Case Studies in Structural and Functional Genomics One of the main challenges facing biology is to assign biochemical and cellular functions to the thousands of hitherto uncharacterized gene products discovered by genome sequencing. This chapter discusses the strengths and limitations of the many experimental and computational methods, including those that use the vast amount of sequence information now available, to help determine protein structure and function. The chapter ends with two individual case studies that illustrate these methods in action, and show both their capabilities and the approaches that still must be developed to allow us to proceed from sequence to consequence. 4-0 Overview: From Sequence to Function in the Age of Genomics 4-1 Sequence Alignment and Comparison 4-2 Protein Profiling 4-3 Deriving Function from Sequence 4-4 Experimental Tools for Probing Protein Function 4-5 Divergent and Convergent Evolution 4-6 Structure from Sequence: Homology Modeling 4-7 Structure From Sequence: Profile-Based Threading and “Rosetta” 4-8 Deducing Function From Structure: Protein Superfamilies 4-9 Strategies for Identifying Binding Sites 4-10 Strategies for Identifying Catalytic Residues 4-11 TIM Barrels: One Structure with Diverse Functions 4-12 PLP Enzymes: Diverse Structures with One Function 4-13 Moonlighting: Proteins with More than One Function 4-14 Chameleon Sequences: One Sequence with More than One Fold 4-15 Prions, Amyloids and Serpins: Metastable Protein Folds 4-16 Functions for Uncharacterized Genes: Galactonate Dehydratase 4-17 Starting From Scratch: A Gene Product of Unknown Function 4-0 Overview: From Sequence to Function in the Age of Genomics Genomics is making an increasing contribution to the study of protein structure and function The relatively new discipline of genomics has great implications for the study of protein structure and function. The genome-sequencing programs are providing more amino-acid sequences of proteins of unknown function to analyze than ever before, and many computational and experimental tools are now available for comparing these sequences with those of proteins of known structure and function to search for clues to their roles in the cell or organism. Also underway are systematic efforts aimed at providing the three-dimensional structures, subcellular locations, interacting partners, and deletion phenotypes for all the gene products in several model organisms. These databases can also be searched for insights into the functions of these proteins and their corresponding proteins in other organisms. Sequence and structural comparison can usually give only limited information, however, and comprehensively characterizing the function of an uncharacterized protein in a cell or organism will always require additional experimental investigations on the purified protein in vitro as well as cell biological and mutational studies in vivo. Different experimental methods are required to define a protein’s function precisely at biochemical, cellular, and organismal levels in order to characterize it completely, as shown in Figure 4-1. In this chapter we first look at methods of comparing amino-acid sequences to determine their similarity and to search for related sequences in the sequence databases. Sequence comparison alone gives only limited information at present, and in most cases, other experimental and structural information is also important for indicating possible biochemical function and mechanism of action. We next provide a summary of some of the genome-driven experimental tools for probing function. We then describe computational methods that are being developed to deduce the protein fold of an uncharacterized protein from its sequence. The existence of large families of structurally related proteins with similar functions, at least at the biochemical level, is enabling sequence and structural motifs characteristic of various functions to be identified. Protein structures can also be screened for possible ligand-binding sites and catalytic active sites by both computational and experimental methods. As we see next, predicting a protein’s function from its structure alone is complicated by the fact that evolution has produced proteins with almost identical structures but different functions, proteins with quite different structures but the same function, and even multifunctional proteins which have more than one biochemical function and numerous cellular and physiological functions. We shall also see that some proteins can adopt more than one stable protein fold, a change which can sometimes lead to disease. The chapter ends with two case histories illustrating how a range of different approaches were combined to determine aspects of the functions of two uncharacterized proteins from the genome sequences of E. coli and yeast, respectively. Figure 4-1 Time and distance scales in functional genomics The various levels of function of proteins encompass an enormous range of time (scale on the left) and distance (scale on the right). Depending on the time and distance regime involved, different experimental approaches are required to probe function. Since many genes code for proteins that act in processes that cross multiple levels on this diagram (for example, a protein kinase may catalyze tyrosine phosphorylation at typical enzyme rates, but may also be required for cell division in embryonic development), no single experimental technique is adequate to dissect all their roles. In the age of genomics, interdisciplinary approaches are essential to determine the functions of gene products. Definitions genomics: the study of the DNA sequence and gene content of whole genomes. 130 Chapter 4 From Sequence to Function ©2004 New Science Press Ltd Overview: From Sequence to Function in the Age of Genomics 4-0 Time Process Example System Example Detection Methods Distance –15 10 electron photosynthetic optical sec 1 Å transfer reaction center spectroscopy proton triosephosphate fast 10–9 transfer isomerase kinetics sec 2–10 Å 10–6 fastest catalase, fumarase, kinetics sec enzyme reactions carbonic anhydrase typical trypsin, protein kinase A, kinetics, –3 10 enzyme reactions ketosteroid isomerase time-resolved X-ray, sec nuclear magnetic resonance slow cytochrome P450, kinetics, sec enzyme reactions/cycles phosphofructokinase low T X-ray, Å – nm nuclear magnetic resonance, mass spectroscopy min/ protein synthesis/ budding yeast cell light microscopy, nm – µm hour cell division genetics, optical probes µ day/ embryonic mouse embryo genetics, m – m year development microscopy, microarray analysis References Houry, W.A. et al.: Identification of in vivo substrates initiative on yeast proteins. J. Synchrotron. Radiat. of the chaperonin GroEL. Nature 1999, 402:147–154. 2003, 10:4–8. Brazhnik, P. et al.: Gene networks: how to put the function in genomics. Trends Biotechnol. 2002, 20:467–472. Koonin E.V. et al.: The structure of the protein universe Tefferi, A. et al.: Primer on medical genomics parts and genome evolution. Nature 2002, 420:218–223. I–IV. Mayo. Clin. Proc. 2002, 77:927–940. Chan, T.-F. et al.: A chemical genomics approach toward understanding the global functions of the O’Donovan, C. et al.: The human proteomics initiative Tong, A.H. et al.: Systematic genetic analysis with target of rapamycin protein (TOR). Proc.Natl Acad.Sci. (HPI). Trends Biotechnol. 2001, 19:178–181. ordered arrays of yeast deletion mutants. Science USA 2000, 97:13227–13232. 2001, 294:2364–2368. Oliver S.G.: Functional genomics: lessons from yeast. Guttmacher A.E. and Collins, F.S.: Genomic medicine— Philos.Trans. R. Soc. Lond. B. Biol. Sci. 2002, 357:17–23. von Mering, C. et al.: Comparative assessment of large- a primer. N. Engl. J. Med. 2002, 347:1512–1520. scale data sets of protein–protein interactions. Quevillon-Cheruel, S. et al.: A structural genomics Nature 2002, 417:399–403. ©2004 New Science Press Ltd From Sequence to Function Chapter 4 131 4-1 Sequence Alignment and Comparison S.c. Kss1 INNQNSGFSTLSDDHVQYFTYQILRALKSIHSAQVI H.s. Erk2 LKTQH-----LSNDHICYFLYQILRGLKYIHSANVL Sequence comparison provides a measure of the relationship between genes HRDIKPSNLLLNSNCDLKVCDFGLARCLASSSDSRET HRDLKPSNLLLNTTCDLKICDFGLARVA----DPDHD The comparison of one nucleotide or amino-acid sequence with another to find the degree Figure 4-2 Pairwise alignment Part of an of similarity between them is a key technique in present-day biology. A marked similarity alignment of the amino-acid sequences of the between two gene or protein sequences may reflect the fact that they are derived by evolution kinase domains from two ERK-like kinases of from the same ancestral sequence. Sequences related in this way are called homologous and the MAP kinase superfamily, Erk2 from humans the evolutionary similarity between them is known as homology. Unknown genes from and Kss1 from yeast. The region shown covers the kinase catalytic loop and part of the activation newly sequenced genomes can often be identified by searching for similar sequences in data- loop (see Figure 3-24). Identical residues bases of known gene and protein sequences using computer programs such as BLAST and highlighted in purple show the extensive FASTA. Sequences of the same protein from different species can also be compared in order similarity between these two homologous to deduce evolutionary relationships. Two genes that have evolved fairly recently from a kinases (their evolutionary relationship can be seen in Figure 4-5). To

Load more