Comparative Genomics 19 Ross C
Total Page:16
File Type:pdf, Size:1020Kb
Comparative Genomics 19 Ross C. Hardison Abstract Comparative genomics harnesses the power of sequence comparisons within and between species to deduce not only evolutionary history but also insights into the function, if any, of particular DNA sequences. Changes in DNA and protein sequences are subject to three evolutionary processes: drift, which allows some neutral changes to accumulate, negative selection, which removes deleterious changes, or positive selection, which acts on adaptive changes to increase their frequency in a population. Quantitative data from comparative genomics can be used to infer the type of evolu- tionary force that likely has been operating on a particular sequence, thereby predict- ing whether it is functional. These predictions are good but imperfect; their primary role is to provide useful hypotheses for further experimental tests of function. Rates of evolutionary change vary both between functional categories of sequences and region- ally within genomes. Even within a functional category (e.g. protein or gene regula- tory region) the rates vary. A more complete understanding of variation in the patterns and rates of evolution should improve the predictive accuracy of comparative genom- ics. Proteins that show signatures of adaptive evolution tend to fall into the major functional categories of reproduction, chemosensation, immune response and xenobi- otic metabolism. DNA sequences that appear to be under the strongest evolutionary constraint are not fully understood, although many of them are active as transcriptional enhancers. Human sequences that regulate gene expression tend to be conserved among placental mammals, but the phylogenetic depth of conservation of individual regulatory regions ranges from primate-specifi c to pan-vertebrate. Contents 19.1.3 Models of Neutral DNA ........................ 560 19.1.4 Adaptive Evolution ................................ 562 19.2 Alignments of Biological Sequences 19.1 Goals, Impact, and Basic Approaches and Their Interpretation ......................................... 563 of Comparative Genomics ...................................... 557 19.2.1 Global and Local Alignments ................ 563 19.1.1 How Biological Sequences 19.2.2 Aligning Protein Sequences ................... 563 Change Over Time ................................. 558 19.2.3 Aligning Large Genome 19.1.2 Purifying Selection ................................ 559 Sequences .............................................. 564 19.3 Assessment of Conserved Function from Alignments .................................................... 565 R.C. Hardison () 19.3.1 Phylogenetic Depth Center for Comparative Genomics and Bioinformatics, of Alignments ........................................ 566 Huck Institutes of Life Sciences, 19.3.2 Portion of the Human Genome Department of Biochemistry and Molecular Biology , Under Constraint .................................... 568 The Pennsylvania State University, 19.3.3 Identifying Specifi c Sequences PA 16802 , USA Under Constraint .................................... 569 e-mail: [email protected] M.R. Speicher et al. (eds.), Vogel and Motulsky’s Human Genetics: Problems and Approaches, 557 DOI 10.1007/978-3-540-37654-5_19, © Springer-Verlag Berlin Heidelberg 2010 558 R.C. Hardison 19.4 Evolution Within Protein-Coding Genes ............... 560 19.5.1 Ultraconserved Elements ....................... 579 19 19.4.1 Comparative Genomics in Gene 19.5.2 Evolution Within Noncoding Finding ................................................... 570 Genes ..................................................... 580 19.4.2 Sets of Related Genes ............................ 572 19.5.3 Evolution and Function 19.4.3 Rates of Sequence Change in Gene Regulatory Sequences .............. 581 in Different Parts of Genes .................... 574 19.5.4 Prediction and Tests of Gene Regulatory 19.4.4 Evolution and Function in Sequences .............................................. 582 Protein-Coding Exons ............................ 574 19.6 Resources for Comparative Genomics ................... 583 19.4.5 Fast-Changing Genes That Code 19.6.1 Genome Browsers for Proteins............................................. 575 and Data Marts ....................................... 583 19.4.6 Recent Adaptive Selection 19.6.2 Genome Analysis Workspaces............... 583 in Humans .............................................. 576 19.4.7 Human Disease-Related Genes .............. 578 19.7 Concluding Remarks .............................................. 584 19.5 Evolution in Regions That Do Not Code References ............................................................................ 585 for Proteins or mRNA ............................................ 579 19.1 Goals, Impact, and Basic in this chapter. The mapping and genotyping of millions Approaches of Comparative of polymorphisms in humans [32] coupled with the Genomics availability of genome sequences of species closely related to humans [14, 70] has stimulated great interest in discovering genes and control sequences that are adap- Comparative genomics uses evolutionary theory to glean tive in humans, which may provide clues to the genetic insights into the function of genomic DNA sequences. elements that make us uniquely human (see Chaps. 8 and By comparing DNA and protein sequences between spe- 16). As more and more loci are implicated in disease and cies or among populations within a species, we can esti- susceptibility to diseases, identifying strong candidates mate the rates at which various sequences have evolved for the causative mutations becomes more challenging. and infer chromosomal rearrangements, duplications Research in comparative genomics is helping to meet and deletions. This evolutionary reconstruction can then this challenge by generating estimates across the human be used to predict functional properties of the DNA. genome of sequences likely to be conserved for functions Sequences that are needed for functions common to the common to many species as well as sequences showing species being compared are expected to change little signs of adaptive change. Finding disease-associated over evolutionary time, whereas sequences that confer markers in either type of sequence could rapidly narrow an adaptive advantage when altered are expected to have the search for mutations that cause a phenotype. greater divergence between species. Furthermore, sequence comparisons can help in predicting what role is played by a particular functional region, e.g., coding for 19.1.1 How Biological Sequences Change a protein or regulating the level of expression of a gene. These insights from comparative genomics are hav- Over Time ing a strong impact on medical genetics, and their role is expected to become more pervasive in the future. When All DNA sequences are subject to change, and these profound mutant phenotypes lead to the discovery of changes provide the fuel for evolution. Replication is genes in model organisms (bacteria, yeast, fl ies, etc.), the highly accurate but not perfect, and despite the correction human genome is immediately searched for homologs, of many replication errors by repair processes during which frequently are discovered to be involved in similar S-phase, a small fraction is retained as altered processes. Control of the cell cycle [76] and defects in sequences. Mutagens in the environment can damage DNA repair associated with cancers [24, 47] are particu- DNA, and some of these induced mutations escape larly famous examples. In studies of the noncoding repair. In addition, DNA bases can change spontane- regions of the human genome, conservation has become ously, for example, oxidative deamination of cytosine almost a proxy for function [20, 26, 64] , and we will to produce uracil. The mutation rate is the number of explore the power and limitations of this approach more sequence changes escaping correction and repair that 19 Comparative Genomics 559 accumulate per unit of time. The average mutation rate population). However, many of the new mutations will in humans has been estimated to be about 2 changes in have no effect on the individual; we call these muta- 10 8 sites per generation [43, 57] . Thus for a diploid tions with no functional consequence polymorphisms genome of 6 × 10 9 bp, about 120 new mutations arise in or neutral changes . The frequency of these polymor- each generation. As will be discussed later in more phisms will increase or decrease depending on the detail, the mutation rate varies among loci and depends results of matings and survival of progeny. The vast on the context, with transitions at CpG dinucleotides majority will be transitory in the population, with most occurring about ten times more frequently than other headed for loss. However, the stochastic fl uctuations in mutations. allele frequencies will allow some to eventually Mutations can be substitutions of one nucleotide for increase to a high frequency. Thus, some of the neutral another, deletions of strings of nucleotides, insertions changes lead to fi xed differences. In fact, Kimura [40] of nucleotides, or rearrangements of chromosomes, and others have argued that such neutral changes are including duplications of DNA segments. Substitutions the major contributors to the overall evolution of the are about ten times as frequent as the length-changing genome. alterations, with transitions greatly favored over In order for a sequence change to have an effect on transversions. an organism, the change has to occur in a region that