Downloaded from Ensembl V41 (6), All

Downloaded from Ensembl V41 (6), All

IN SILICO APPROACHES TO INVESTIGATING MECHANISMS OF GENE REGULATION by SHANNAN JANELLE HO SUI B.Sc., The University of British Columbia, 2000 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Genetics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) March 2008 © Shannan Janelle Ho Sui, 2008 Abstract Identification and characterization of regions influencing the precise spatial and temporal expression of genes is critical to our understanding of gene regulatory networks. Connecting transcription factors to the cis-regulatory elements that they bind and regulate remains a challenging problem in computational biology. The rapid accumulation of whole genome sequences and genome-wide expression data, and advances in alignment algorithms and motif-finding methods, provide opportunities to tackle the important task of dissecting how genes are regulated. Genes exhibiting similar expression profiles are often regulated by common transcription factors. We developed a method for identifying statistically over- represented regulatory motifs in the promoters of co-expressed genes using weight matrix models representing the specificity of known factors. Application of our methods to yeast fermenting in grape must revealed elements that play important roles in utilizing carbon sources. Extension of the method to metazoan genomes via incorporation of comparative sequence analysis facilitated identification of functionally relevant binding sites for sets of tissue-specific genes, and for genes showing similar expression in large-scale expression profiling studies. Further extensions address alternative promoters for human genes and coordinated binding of multiple transcription factors to cis-regulatory modules. Sequence conservation reveals segments of genes of potential interest, but the degree of sequence divergence among human genes and their orthologous sequences varies widely. Genes with a small number of well-distinguished, highly conserved non- coding elements proximal to the transcription start site may be well-suited for targeted laboratory promoter characterization studies. We developed a “regulatory resolution” ii score to prioritize lists of genes for laboratory gene regulation studies based on the conservation profile of their promoters. Additionally, genome-wide comparisons of vertebrate genomes have revealed surprisingly large numbers of highly conserved non- coding elements (HCNEs) that cluster nearby to genes associated with transcription and development. To further our understanding of the genomic organization of regulatory regions, we developed methods to identify HCNEs in insects. We find that HCNEs in insects have similar function and organization as their vertebrate counterparts. Our data suggests that microsynteny in insects has been retained to keep large arrays of HCNEs intact, forming genomic regulatory blocks that surround the key developmental genes they regulate. iii Table of Contents Abstract............................................................................................................................... ii Table of Contents............................................................................................................... iv List of Tables ...................................................................................................................viii List of Figures.................................................................................................................... ix List of Abbreviations and Acronyms.................................................................................. x Acknowledgements........................................................................................................... xii Co-authorship Statement.................................................................................................. xiv Chapter 1: Introduction....................................................................................................... 1 1.1 Background and significance.................................................................................... 1 1.2 Structure of eukaryotic regulatory regions ............................................................... 3 1.2.1 The core promoter.............................................................................................. 3 1.2.2 Complexities in promoter architecture............................................................... 6 1.2.3 Enhancers........................................................................................................... 7 1.2.4 Chromatin structure and its effects on gene regulation ..................................... 8 1.2.5 Current strategies for improving our understanding of gene regulation.......... 10 1.3. Laboratory-based methods for regulatory region identification ............................ 10 1.3.1 Reporter constructs .......................................................................................... 11 1.3.2 DNA binding assays ........................................................................................ 12 1.3.3 In vitro selection .............................................................................................. 13 1.3.4 Transcript profiling to locate proximal promoters........................................... 13 1.4 Computational methods for regulatory element prediction .................................... 14 1.4.1 Repositories of gene regulatory information ................................................... 14 1.4.2 Computational identification of TFBSs........................................................... 16 1.5 Conservation analysis for the identification of regulatory regions......................... 20 1.5.1 Algorithms for multi-species sequence alignments......................................... 21 1.5.2 Identifying and measuring evolutionary constraint ......................................... 21 1.5.3 Highly conserved noncoding elements in metazoan genomes ........................ 24 1.6 Thesis overview and chapter objectives ................................................................. 26 1.7 References............................................................................................................... 30 Chapter 2: oPOSSUM: Identification of Over-represented Transcription Factor Binding Sites in Sets of Co-expressed Genes ................................................................................ 42 2.1 Introduction............................................................................................................. 42 2.2 Methods................................................................................................................... 45 2.2.1 Automated retrieval of human-mouse orthologs ............................................. 45 2.2.2 Phylogenetic footprinting ................................................................................ 46 2.2.3 Detection of TF binding sites........................................................................... 47 2.2.4 Discovery of over-represented binding sites ................................................... 47 2.2.5 NF-κB microarray experiment......................................................................... 49 2.2.6 Simulations using random sampling................................................................ 50 2.2.7 Parameter selection for validation studies ....................................................... 51 2.3 Results..................................................................................................................... 51 2.3.1 Validation using reference gene sets ............................................................... 52 2.3.2 Application to transcript profiling data............................................................ 56 iv 2.3.3 Specificity assessment ..................................................................................... 59 2.3.4 Noise tolerance ................................................................................................ 60 2.3.5 Web implementation........................................................................................ 62 2.3.6 The oPOSSUM application programming interface (API).............................. 63 2.4 Discussion............................................................................................................... 65 2.4.1 Performance ..................................................................................................... 65 2.4.2 Challenges........................................................................................................ 67 2.4.3 Utility............................................................................................................... 69 2.5 Conclusions............................................................................................................. 70 2.6 References............................................................................................................... 71 Chapter 3: Integrated Tools for Analysis of Regulatory Motif Over-representation ....... 75 3.1 Introduction............................................................................................................. 75 3.2 Methods................................................................................................................... 76 3.2.1 Over-representation analysis...........................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    241 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us