Positive Selection in Transcription Factor Genes

POSITIVE SELECTION IN TRANSCRIPTION FACTOR GENES ALONG THE HUMAN LINEAGE by GABRIELLE CELESTE NICKEL Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Thesis Adviser: Dr. Mark D. Adams Department of Genetics CASE WESTERN RESERVE UNIVERSITY January 2009 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Gabrielle Nickel______________________________________________________ candidate for the _Ph.D._______________________________degree *. Helen Salz_______________________________________________ (chair of the committee) Mark Adams______________________________________________ Radhika Atit_______________________________________________ Peter Harte________________________________________________ Joe Nadeau________________________________________________ ________________________________________________ (date) August 28, 2008_______________________ *We also certify that written approval has been obtained for any proprietary material contained therein. 1 TABLE OF CONTENTS Table of Contents……………………………………………………………………………………………………….2 List of Tables……………………………………………………………………………………………………………...6 List of Figures……………………………………………………………………………………………………………..8 List of Abbreviations…………………………………………………………………………………………………10 Glossary………………………………………………………………………………………………..………………….12 Abstract..………………………………………………………………………………………………………………….16 Chapter 1: Introduction and Background………………………………………….……………………...17 Origin of Modern Humans………………………………………………………………………………..19 Human‐chimpanzee morphological divergence…………………………………….23 Human‐chimpanzee comparative genomics………………………………………….24 Human Molecular Evolution……………………………………………………………………………..27 Adaptive protein evolution……………………………………………………………………31 Changes in gene regulatory sequence……………………………………………………32 “Less‐is‐More” hypothesis of gene loss……………………………………………..….36 Gene duplication in human evolution……………………………………………………37 Epigenetic regulation of gene expression………..…………………………………….38 Gene Expression Differences…………………………………………………………………………….40 Human‐chimpanzee expression differences………………………………………….41 Selective pressures on gene expression………………………………………………..43 Positive Selection……………………………………………………………………………………………..44 2 Detecting positive selection through population genetic analysis………………………………………………………………………………………………….46 Detecting positive selection by phylogenetic analysis……………………………52 Transcription Factors………………………………………………………………………………………..57 Conclusion………………………………………………………………………………………………………..60 Chapter 2: An empirical test for branch‐specific positive selection…………. ……………….62 Abstract…………………………………………………………………………………………………………….63 Introduction……………………………………………………………………………………………………..64 Materials and Methods…………………………………………………………………………………….68 Selection of genes and DNA sequencing………………………………………………..68 Phylogenetic analysis…………………………………………………………………………….73 Simulations testing the performance of codeml…………………………………….76 Empirical tests of positive selection using sequences simulated under a model of neutral evolution……………………………………………………….77 Results………………………………………………………………………………………………………………79 Sequencing of transcription factor genes and tests of selection……………79 Effect of phylogenetic breadth on predictions of positive selection……..83 Simulations to assess sensitivity of the strict branch+site test……………..86 Alternative null model using an empirical test……………………………………..91 Comparison with previous predictions of positive selection…………………98 Discussion……………………………………………………………………………………………………….103 Chapter 3: Human PAML Browser: A database of positive selection on human genes using phylogenetic analysis……………………………………………………………….110 3 Abstract………………………………………………………………………………………………………….111 Introduction……………………………………………………………………………………………………112 Data Sources and Processing…………………………………………………………………………..115 Input data…………………………………………………………………………………………..115 Statistical analysis……………………………………………………………………………….118 Results…………………………………………………………………………………………………………….121 User interface……………………………………………………………………………………..125 Database organization…………………………………………………………………………126 Discussion……………………………………………………………………………………………………….132 Conclusion………………………………………………………………………………………………………134 Chapter 4: Demonstrating functional divergence using genome wide expression arrays in the positively selected gene CDX4……………………………………………………………………...135 Abstract………………………………………………………………………………………………….………136 Introduction……………………………………………………………………………………………………137 Materials and Methods…………………………………………………………………….…………….140 DNA sequencing………………………………………………………………….………………140 Phylogenetic analysis..………………………………………………………………………..141 cDNA clone construction…………………………………………………………………….142 Microarray………………………………………………………………………………………….143 Results………………………………………………………………………….………………………………..144 Discussion………………………………………………………………………………………………………161 Chapter 5: Conclusion and Future Directions…………………….……........……………………..165 4 The Evolution of Modern Traits……………………………………………………………..………167 Future Directions…………………………………………………………………………………………..170 Picking candidate genes………………………………………….………………………….171 In vivo studies……………………………………………………….…………………………….172 Transgenic mice……………………………………….……………………………..174 Gene targeted mice……………………………….………………………………..176 Target gene discovery……………………………………….…….………………………….178 ChIP‐chip……………………………………………………….………………………..179 Yeast‐2‐hybrid…………………………………………………………………………181 Luciferase gene expression studies…………………………………………………….183 MicroRNA screens………………………………………………………………………………191 Investigating selection around a phenotype……………………………………….194 Positive Selection Along the Human Lineage…………………..…………………………….195 Appendix………………………………………………………………………………………………………………..198 Literature Cited………………………………………………………………………………………………………212 5 LIST OF TABLES Table 1.1 The dominant hypothesis of human molecular evolution and important references…………………………………………………………………………………………….30 Table 2.1 The percent of primer pairs that produced high quality sequence from human templates and macaque and chimpanzee masks……………………….69 Table 2.2 Sequence coverage of transcription factor genes and data sources………72 Table 2.3 Evolutionary models and parameter sets for codeml analysis……………….74 Table 2.4 Branch+site test site classes………………………………………………………………….75 Table 2.5 codeml results for transcription factor genes with significant results in the strict branch+site test of positive selection…………………………………………..83 Table 2.6 Comparison of the results from the strict branch+site and empirical tests...........................................................................................................94 Table 2.7 Comparison of predictions of positive selection by different methods..100 Table 2.8 Tests for positive selection on other primate branches that exhibited dN/dS > 1…………………………………………………………………………………………….104 6 Table 2.9 Effect of synonymous and nonsynonymous substitution on predictions of positive selection…………………………………………………………………………………105 Table 3.1 Evolutionary models used by codeml ………………………………………………….122 Table 3.2 Summary of results of tests of selection on human genes…………………..123 Table 3.3 Gene ontology categories over‐represented among genes with p<0.05 in the strict branch+site test……………………………………………………………………124 Table 4.1 Statistical analysis of CDX4 using different methods to predict positive selection……………………………………………………………………………………………..147 Table 4.2 The results from the strict branch+site model for CDX4………………………147 Table 4.3 123 genes differentially regulated by human and chimpanzee CDX4…..156 Table 4.4 Cell lines chosen for CDX4 gene expression assays………………………………163 Table 5.1 Luciferase promoter‐reporter tests with controls………………………………189 Table 5.2 List of experiments and controls associated with luciferase promoter‐ reporter analysis…………………………………………………………………………………190 Table A.1 codeml results from 175 Transcription Factors……………………………………198 Table A.2 Sensitivity analysis: Factors that affect predictions of positive selection.........................................................................................................................208 7 LIST OF FIGURES Figure 1.1 Phylogenetic tree of the order Primates……………………………………………….20 Figure 1.2 Mechanisms of human molecular evolution………………………………………….29 Figure 2.1 Probability of accurate ancestral sequence reconstruction……………………78 Figure 2.2 Phylogenetic tree of the primate species used in phylogenetic analyses………………………………………………………………………………………………..81 Figure 2.3 Comparison of the results for predictions of positive selection using codeml for the full primate+rodent alignment set to the minimal alignment set…………………………………………………………………………….………….85 Figure 2.4 Distribution of likelihood ratio statistic (LRS) values from the strict branch+site test of simulated gene sets………………………………………………..88 Figure 2.5 Prediction of positive selection across a range of branch lengths representative of human genes……………………………………………………………89 Figure 2.6 Contribution of dN/dS values to predictions of positive selection…………90 Figure 2.7 Comparison of the strict branch+site test, 50:50 mixture test, and empirical test on simulated sequences…………………………………………………93 8 Figure 2.8 Comparison of the strict branch+site test and empirical test on human genes…………………………………………………………………………………………………….97 Figure 3.1 Unrooted mammalian species tree of the organisms used in the construction of the database and the phylogenetic tree used in PAML analysis……………………………………………………………………………………………….117 Figure 3.2 PAML database summary results for Iroquois homeobox 3 (IRX3)………128 Figure 3.3 Multispecies protein alignment

Positive Selection in Transcription Factor Genes

ATRX Induction by Mutant Huntingtin Via Cdx2 Modulates Heterochromatin Condensation and Pathology in Huntington’S Disease

Isyte: Integrated Systems Tool for Eye Gene Discovery

Faith and the Human Genome

Protein Interaction Network of Alternatively Spliced Isoforms from Brain Links Genetic Risk Factors for Autism

The Expression of the Human Apolipoprotein Genes and Their Regulation by Ppars

Seq2pathway Vignette

Dual Proteome-Scale Networks Reveal Cell-Specific Remodeling of the Human Interactome

Parallel Molecular Evolution in Pathways, Genes, and Sites in High-Elevation Hummingbirds Revealed by Comparative Transcriptomics

Evidence for Differential Alternative Splicing in Blood of Young Boys With

Primepcr™Assay Validation Report

The Porcine Major Histocompatibility Complex and Related Paralogous Regions: a Review Patrick Chardon, Christine Renard, Claire Gaillard, Marcel Vaiman

Learning Protein Constitutive Motifs from Sequence Data Je´ Roˆ Me Tubiana, Simona Cocco, Re´ Mi Monasson*