Bioinformatics Analyses of Genomic Imprinting

Bioinformatics Analyses of Genomic Imprinting Dissertation zur Erlangung des Grades des Doktors der Naturwissenschaften der Naturwissenschaftlich-Technischen Fakultät III Chemie, Pharmazie, Bio- und Werkstoffwissenschaften der Universität des Saarlandes von Barbara Hutter Saarbrücken 2009 Tag des Kolloquiums: 08.12.2009 Dekan: Prof. Dr.-Ing. Stefan Diebels Berichterstatter: Prof. Dr. Volkhard Helms Priv.-Doz. Dr. Martina Paulsen Vorsitz: Prof. Dr. Jörn Walter Akad. Mitarbeiter: Dr. Tihamér Geyer Table of contents Summary________________________________________________________________ I Zusammenfassung ________________________________________________________ I Acknowledgements _______________________________________________________II Abbreviations ___________________________________________________________ III Chapter 1 – Introduction __________________________________________________ 1 1.1 Important terms and concepts related to genomic imprinting __________________________ 2 1.2 CpG islands as regulatory elements ______________________________________________ 3 1.3 Differentially methylated regions and imprinting clusters_____________________________ 6 1.4 Reading the imprint __________________________________________________________ 8 1.5 Chromatin marks at imprinted regions___________________________________________ 10 1.6 Roles of repetitive elements ___________________________________________________ 12 1.7 Functional implications of imprinted genes _______________________________________ 14 1.8 Evolution and parental conflict ________________________________________________ 16 1.8.1 Occurence of imprinting _________________________________________________ 16 1.8.2 Embryonic development and parental conflict_________________________________ 16 1.8.3 Evolution of imprinting regulatory elements __________________________________ 18 1.8.4 Natural selection on imprinted genes________________________________________ 20 1.9 Previous bioinformatics research related to imprinting ______________________________ 22 Chapter 2 – Materials and Methods ________________________________________ 25 2.1 Molecular databases and annotation resources ____________________________________ 25 2.1.1 NCBI ________________________________________________________________ 25 2.1.2 UCSC Genome Browser _________________________________________________ 26 2.1.3 Ensembl Genome Browser________________________________________________ 27 2.2 CpG islands _______________________________________________________________ 28 2.2.1 CpGobs/CpGexp, the margin effect and artifact CpG islands _____________________ 28 2.2.2 The sliding window method_______________________________________________ 30 2.2.3 Segmentation methods ___________________________________________________ 30 2.2.4 CpG clustering _________________________________________________________ 32 2.2.5 The UCSC elongation method _____________________________________________ 32 2.3 Repetitive elements _________________________________________________________ 33 2.3.1 RepeatMasker__________________________________________________________ 33 2.3.2 Tandem Repeats Finder __________________________________________________ 34 2.4 Alignments and conserved elements ____________________________________________ 36 2.4.1 Blastz ________________________________________________________________ 36 2.4.2 Pairwise evolutionary conserved elements ___________________________________ 37 2.4.3 PhastCons most conserved elements ________________________________________ 37 2.5 Annotations of regulatory regions and polymorphisms ______________________________ 39 2.6 Motif search _______________________________________________________________ 42 2.7 Homology and evolution _____________________________________________________ 45 2.7.1 Orthologs and paralogs __________________________________________________ 45 2.7.2 Estimation of selection___________________________________________________ 46 i Table of contents 2.8 Custom Perl scripts __________________________________________________________ 50 2.8.1 Merging transcript variants into genes _______________________________________ 50 2.8.2 Calculating overlaps with binary search ______________________________________ 51 2.9 Statistical Tests _____________________________________________________________ 53 2.9.1 Chi square test__________________________________________________________ 56 2.9.2 t test __________________________________________________________________ 57 2.9.3 Wilcoxon test __________________________________________________________ 58 2.9.4 Correlation_____________________________________________________________ 59 Chapter 3 – Results _____________________________________________________ 61 3.1 Characteristics of human and mouse CpG islands __________________________________ 61 3.1.1 Effects of different algorithms and repetitive sequences on CpG island identification __ 61 3.1.2 Promoter CpG islands possess pronounced characteristics and are reliably detected____ 64 3.1.3 General differences between human and mouse CpG islands______________________ 66 3.1.4 The (TpG+CpA)/(2*CpG) ratio is correlated to epigenetic properties of CpG islands __ 67 3.1.5 Summary and conclusions of chapter 3.1 _____________________________________ 71 3.2 CpG islands in imprinted and non-imprinted regions ________________________________ 71 3.2.1 Different sequence properties of human and mouse imprinted and non-imprinted genes 71 3.2.2 General CpG island properties do not distinguish imprinted genes _________________ 73 3.2.3 Estimating CpG deamination effects on CpG islands in imprinted regions ___________ 75 3.2.4 Supporting evidence from genome-wide conservation studies_____________________ 75 3.2.5 Enrichment of tandem repeats______________________________________________ 76 3.2.6 Summary and conclusions of chapter 3.2 _____________________________________ 79 3.3 Sequence conservation at imprinted loci__________________________________________ 80 3.3.1 Low recovery rates of orthologs of imprinted genes_____________________________ 80 3.3.3 Properties of pairwise and genome-wide conserved elements _____________________ 81 3.3.4 Features of the promoter regions of imprinted genes ____________________________ 86 3.3.5 CpG-rich motifs in intragenic and intronic conserved regions _____________________ 88 3.3.6 Weak conservation of exonic sequences______________________________________ 90 3.3.7 Summary and conclusions of chapter 3.3 _____________________________________ 92 3.4 Divergence and conservation of protein-coding imprinted genes_______________________ 93 3.4.1 Contrasting evolution of rodent imprinted genes _______________________________ 93 3.4.2. Divergence at the base of rodent imprinting __________________________________ 95 3.4.3 Reconstruction of ancestral evolutionary patterns ______________________________ 97 3.4.4 Assessing ongoing evolution with single nucleotide polymorphisms________________ 99 3.4.5 Other factors influencing the low conservation of imprinted genes _________________ 99 3.4.6 Paralogous genes may facilitate divergence __________________________________ 100 3.4.7 Summary and conclusions of chapter 3.4 ____________________________________ 103 Chapter 4 – Discussion _________________________________________________ 105 4.1 Imprinted genes versus control genes and the genome ______________________________ 105 4.1.1 Choosing appropriate control groups _______________________________________ 105 4.1.2 Imprinting candidates ___________________________________________________ 106 4.2 CpG islands associated with human and mouse imprinted and biallelically expressed genes 108 4.2.1 Performance of alternative methods for CpG island identification_________________ 108 ii Table of contents 4.2.2 Recommendable strategies for detection of functional CpG islands _______________ 109 4.2.3 Special features of CpG islands in imprinted regions __________________________ 109 4.3 Influence of CpG deamination in imprinted regions _______________________________ 110 4.4 Possible epigenetic functions of tandem repeats __________________________________ 111 4.5 Connections between imprinted genes and the X chromosome_______________________ 112 4.6 Is there an "imprinting transcription factor"?_____________________________________ 114 4.7 Distinguishing patterns of conservation and divergence ____________________________ 115 4.7.1 Possible contributions to murine speciation__________________________________ 115 4.7.2 Reconstruction of ancient evolutionary patterns ______________________________ 116 4.7.3 Maternally expressed genes and female-specific benefits _______________________ 118 4.7.4 A critical look on sequence-based methods to keep track of protein evolution_______ 118 4.8 Paralogous genes and the evolution of imprinting _________________________________ 119 4.9 Conclusions and outlook ____________________________________________________ 121 Appendices ___________________________________________________________ 123 Appendix A _________________________________________________________________ 123 Table A1: Locations and data of genomic sequences _______________________________ 123 Table A2: Overlap of CpG islands with selected repetitive elements___________________ 132 Table A3: Ananlogous promoter CpG islands ____________________________________ 133 Table A4: Overlap of TJ CGIs with cpg CGIs ____________________________________ 134 Table A5: Overlap of cpg CGIs with TJ CGIs ____________________________________ 134 Table A6: Median values for CpG islands groups ______________________________

Bioinformatics Analyses of Genomic Imprinting

The Rise and Fall of the Bovine Corpus Luteum

Identification of the Binding Partners for Hspb2 and Cryab Reveals

Searching the Genomes of Inbred Mouse Strains for Incompatibilities That Reproductively Isolate Their Wild Relatives

A Multistep Bioinformatic Approach Detects Putative Regulatory

Functional Parsing of Driver Mutations in the Colorectal Cancer Genome Reveals Numerous Suppressors of Anchorage-Independent

Congenital Disorders of Glycosylation from a Neurological Perspective

Nutrient Health and EROEN Compounds Resegsterixsteextics: Production * Gets Cartrai, Agairaxxxix

Methods in and Applications of the Sequencing of Short Non-Coding Rnas" (2013)

Noelia Díaz Blanco

Molecular Effects of Isoflavone Supplementation Human Intervention Studies and Quantitative Models for Risk Assessment

Current Trends in Gene Recovery Mediated by the CRISPR-Cas System Hyeon-Ki Jang 1, Beomjong Song2,Gue-Hohwang1 and Sangsu Bae 1

Gene Expression During Normal and FSHD Myogenesis Tsumagari Et Al