The Pennsylvania State University the Graduate School Department
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School Department of Biology STUDIES OF GENE EXPRESSION EVOLUTION: GENES ON THE INACTIVE X CHROMOSOME AND DUPLICATE GENES A Dissertation in Biology by Chungoo Park © 2010 Chungoo Park Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2010 The dissertation of Chungoo Park was reviewed and approved* by the following: Kateryna D. Makova Associate Professor of Biology Dissertation Co-Advisor Chair of Committee Laura Carrel Associate Professor of Biochemistry and Molecular Biology Dissertation Co-advisor Francesca Chiaromonte Professor of Statistics Webb Miller Professor of Biology and Computer Science and Engineering Claude dePamphilis Professor of Biology Douglas R. Cavener Professor of Biology Head of the Department of Biology *Signatures are on file in the Graduate School iii ABSTRACT Understanding the determinants of the rate of protein evolution is one of the major goals in molecular evolution. Among the potential variables, expression abundance is one of the most important factors for determining protein evolutionary rates; the variation in gene expression appears to contribute to the evolutionary divergence and phenotypic diversity among species and individuals. Here we perform studies to characterize variation in gene expression patterns on the inactive X chromosome, and across duplicate genes in mammals. Specifically, several questions are addressed in greater detail in this dissertation. First, what genomic signals determine the expression status of genes on the inactive X chromosome? Second, does selection operate differently on genes that escape inactivation vs. genes that are inactivated? Third, do genomic features and motifs predict candidate X-linked mental retardation (XLMR) genes? Fourth, what drives the rapid expression divergence observed between human paralogs? To investigate these issues, we use genome-scale gene expression data and bioinformatic analyses. We find that (1) the majority of the sequences enriched in the vicinity of inactivated genes are found within L1 repeats (indicating an involvement of L1 repeats in X chromosome inactivation), and these sequences capture most of the genomic signal determining inactivation; some unique or overrepresented motifs in boundary regions (indicating that they are candidates for the boundary elements separating genes with different X inactivation profiles) are also found; (2) escape genes experience stronger purifying selection than inactivated genes at both the protein-coding and gene expression levels, and this effect largely results from the importance of function and dosage of escape genes; (3) sequence motifs that are mutually exclusively overrepresented in either XLMR or non-XLMR genes effectively capture genomic signals to distinguish between them; and (4) turnover of transcription start sites, structural heterogeneity of coding sequences, and divergence of cis-regulatory regions between duplicate gene copies play a pivotal role in determining the iv expression divergence of duplicate genes. Results from these studies provide valuable insights into the regulation of inactive X expression and understanding the X chromosome inactivation mechanism, and will further aid in our understanding of long-range control of gene expression on the X chromosome. Moreover, they provide important information for understanding human transcriptome heterogeneity, complexity, and evolution. v TABLE OF CONTENTS LIST OF FIGURES.......................................................................................................................viii LIST OF TABLES ........................................................................................................................x ACKNOWLEDGEMENTS..........................................................................................................xi Chapter 1 Introduction .................................................................................................................1 X chromosome inactivation..................................................................................................2 Gene duplication ...................................................................................................................9 References .............................................................................................................................12 Chapter 2 Genomic Environment Predicts Expression Patterns on the Human Inactive X Chromosome..........................................................................................................................18 Abstract..................................................................................................................................18 Synopsis.................................................................................................................................19 Introduction ...........................................................................................................................20 Results....................................................................................................................................22 Description of the Escape and Inactivated Subgenomes Analyzed in Xp22 ............22 Analysis of Oligomers Enriched in Either E or I Subgenomes..................................24 Classification of Genes as Either Inactivated or Escaping Inactivation Based on Surrounding Oligomers ........................................................................................27 Genes Classified Correctly and Misclassified Genes .................................................31 Discussion..............................................................................................................................34 Methods .................................................................................................................................38 Transcripts .....................................................................................................................38 Oligomer enrichment analysis......................................................................................39 LDA ...............................................................................................................................40 References .............................................................................................................................42 Chapter 3 Strong Purifying Selection at Genes Escaping X Chromosome Inactivation .........45 Abstract..................................................................................................................................45 Introduction ...........................................................................................................................46 Results and Discussion .........................................................................................................47 Methods .................................................................................................................................53 References .............................................................................................................................57 Chapter 4 Studies of boundary elements model at the boundary region between escape and inactivated genes ............................................................................................................60 Introduction ...........................................................................................................................60 Results....................................................................................................................................62 A comprehensive computational analysis of boundary regions using chromosome-wide human XCI data ....................................................................62 vi A comprehensive computational analysis for boundary regions in USP9X gene cluster.....................................................................................................................66 Genomic factors examined are not sufficient to explain how genes escape XCI according to the boundary elements model.........................................................74 Caveats to our approaches ............................................................................................76 Conclusions ...........................................................................................................................77 Methods .................................................................................................................................78 Analysis using genome-wide human XCI data...........................................................78 Scanning the USP9X gene cluster for candidate elements controlling XCI .............83 Analysis using comparative XCI data in the USP9X gene cluster ............................85 References .............................................................................................................................87 Chapter 5 A Computational Approach to Candidate Gene Prioritization for X-Linked Mental Retardation using Clinically-Informed Binary Filtering and Motif-Based Linear Discriminatory Analysis ...........................................................................................91 Abstract..................................................................................................................................91 Background............................................................................................................................92