Bioinformatics Analyses of Genomic Imprinting
Total Page:16
File Type:pdf, Size:1020Kb
Bioinformatics Analyses of Genomic Imprinting Dissertation zur Erlangung des Grades des Doktors der Naturwissenschaften der Naturwissenschaftlich-Technischen Fakultät III Chemie, Pharmazie, Bio- und Werkstoffwissenschaften der Universität des Saarlandes von Barbara Hutter Saarbrücken 2009 Tag des Kolloquiums: 08.12.2009 Dekan: Prof. Dr.-Ing. Stefan Diebels Berichterstatter: Prof. Dr. Volkhard Helms Priv.-Doz. Dr. Martina Paulsen Vorsitz: Prof. Dr. Jörn Walter Akad. Mitarbeiter: Dr. Tihamér Geyer Table of contents Summary________________________________________________________________ I Zusammenfassung ________________________________________________________ I Acknowledgements _______________________________________________________II Abbreviations ___________________________________________________________ III Chapter 1 – Introduction __________________________________________________ 1 1.1 Important terms and concepts related to genomic imprinting __________________________ 2 1.2 CpG islands as regulatory elements ______________________________________________ 3 1.3 Differentially methylated regions and imprinting clusters_____________________________ 6 1.4 Reading the imprint __________________________________________________________ 8 1.5 Chromatin marks at imprinted regions___________________________________________ 10 1.6 Roles of repetitive elements ___________________________________________________ 12 1.7 Functional implications of imprinted genes _______________________________________ 14 1.8 Evolution and parental conflict ________________________________________________ 16 1.8.1 Occurence of imprinting _________________________________________________ 16 1.8.2 Embryonic development and parental conflict_________________________________ 16 1.8.3 Evolution of imprinting regulatory elements __________________________________ 18 1.8.4 Natural selection on imprinted genes________________________________________ 20 1.9 Previous bioinformatics research related to imprinting ______________________________ 22 Chapter 2 – Materials and Methods ________________________________________ 25 2.1 Molecular databases and annotation resources ____________________________________ 25 2.1.1 NCBI ________________________________________________________________ 25 2.1.2 UCSC Genome Browser _________________________________________________ 26 2.1.3 Ensembl Genome Browser________________________________________________ 27 2.2 CpG islands _______________________________________________________________ 28 2.2.1 CpGobs/CpGexp, the margin effect and artifact CpG islands _____________________ 28 2.2.2 The sliding window method_______________________________________________ 30 2.2.3 Segmentation methods ___________________________________________________ 30 2.2.4 CpG clustering _________________________________________________________ 32 2.2.5 The UCSC elongation method _____________________________________________ 32 2.3 Repetitive elements _________________________________________________________ 33 2.3.1 RepeatMasker__________________________________________________________ 33 2.3.2 Tandem Repeats Finder __________________________________________________ 34 2.4 Alignments and conserved elements ____________________________________________ 36 2.4.1 Blastz ________________________________________________________________ 36 2.4.2 Pairwise evolutionary conserved elements ___________________________________ 37 2.4.3 PhastCons most conserved elements ________________________________________ 37 2.5 Annotations of regulatory regions and polymorphisms ______________________________ 39 2.6 Motif search _______________________________________________________________ 42 2.7 Homology and evolution _____________________________________________________ 45 2.7.1 Orthologs and paralogs __________________________________________________ 45 2.7.2 Estimation of selection___________________________________________________ 46 i Table of contents 2.8 Custom Perl scripts __________________________________________________________ 50 2.8.1 Merging transcript variants into genes _______________________________________ 50 2.8.2 Calculating overlaps with binary search ______________________________________ 51 2.9 Statistical Tests _____________________________________________________________ 53 2.9.1 Chi square test__________________________________________________________ 56 2.9.2 t test __________________________________________________________________ 57 2.9.3 Wilcoxon test __________________________________________________________ 58 2.9.4 Correlation_____________________________________________________________ 59 Chapter 3 – Results _____________________________________________________ 61 3.1 Characteristics of human and mouse CpG islands __________________________________ 61 3.1.1 Effects of different algorithms and repetitive sequences on CpG island identification __ 61 3.1.2 Promoter CpG islands possess pronounced characteristics and are reliably detected____ 64 3.1.3 General differences between human and mouse CpG islands______________________ 66 3.1.4 The (TpG+CpA)/(2*CpG) ratio is correlated to epigenetic properties of CpG islands __ 67 3.1.5 Summary and conclusions of chapter 3.1 _____________________________________ 71 3.2 CpG islands in imprinted and non-imprinted regions ________________________________ 71 3.2.1 Different sequence properties of human and mouse imprinted and non-imprinted genes 71 3.2.2 General CpG island properties do not distinguish imprinted genes _________________ 73 3.2.3 Estimating CpG deamination effects on CpG islands in imprinted regions ___________ 75 3.2.4 Supporting evidence from genome-wide conservation studies_____________________ 75 3.2.5 Enrichment of tandem repeats______________________________________________ 76 3.2.6 Summary and conclusions of chapter 3.2 _____________________________________ 79 3.3 Sequence conservation at imprinted loci__________________________________________ 80 3.3.1 Low recovery rates of orthologs of imprinted genes_____________________________ 80 3.3.3 Properties of pairwise and genome-wide conserved elements _____________________ 81 3.3.4 Features of the promoter regions of imprinted genes ____________________________ 86 3.3.5 CpG-rich motifs in intragenic and intronic conserved regions _____________________ 88 3.3.6 Weak conservation of exonic sequences______________________________________ 90 3.3.7 Summary and conclusions of chapter 3.3 _____________________________________ 92 3.4 Divergence and conservation of protein-coding imprinted genes_______________________ 93 3.4.1 Contrasting evolution of rodent imprinted genes _______________________________ 93 3.4.2. Divergence at the base of rodent imprinting __________________________________ 95 3.4.3 Reconstruction of ancestral evolutionary patterns ______________________________ 97 3.4.4 Assessing ongoing evolution with single nucleotide polymorphisms________________ 99 3.4.5 Other factors influencing the low conservation of imprinted genes _________________ 99 3.4.6 Paralogous genes may facilitate divergence __________________________________ 100 3.4.7 Summary and conclusions of chapter 3.4 ____________________________________ 103 Chapter 4 – Discussion _________________________________________________ 105 4.1 Imprinted genes versus control genes and the genome ______________________________ 105 4.1.1 Choosing appropriate control groups _______________________________________ 105 4.1.2 Imprinting candidates ___________________________________________________ 106 4.2 CpG islands associated with human and mouse imprinted and biallelically expressed genes 108 4.2.1 Performance of alternative methods for CpG island identification_________________ 108 ii Table of contents 4.2.2 Recommendable strategies for detection of functional CpG islands _______________ 109 4.2.3 Special features of CpG islands in imprinted regions __________________________ 109 4.3 Influence of CpG deamination in imprinted regions _______________________________ 110 4.4 Possible epigenetic functions of tandem repeats __________________________________ 111 4.5 Connections between imprinted genes and the X chromosome_______________________ 112 4.6 Is there an "imprinting transcription factor"?_____________________________________ 114 4.7 Distinguishing patterns of conservation and divergence ____________________________ 115 4.7.1 Possible contributions to murine speciation__________________________________ 115 4.7.2 Reconstruction of ancient evolutionary patterns ______________________________ 116 4.7.3 Maternally expressed genes and female-specific benefits _______________________ 118 4.7.4 A critical look on sequence-based methods to keep track of protein evolution_______ 118 4.8 Paralogous genes and the evolution of imprinting _________________________________ 119 4.9 Conclusions and outlook ____________________________________________________ 121 Appendices ___________________________________________________________ 123 Appendix A _________________________________________________________________ 123 Table A1: Locations and data of genomic sequences _______________________________ 123 Table A2: Overlap of CpG islands with selected repetitive elements___________________ 132 Table A3: Ananlogous promoter CpG islands ____________________________________ 133 Table A4: Overlap of TJ CGIs with cpg CGIs ____________________________________ 134 Table A5: Overlap of cpg CGIs with TJ CGIs ____________________________________ 134 Table A6: Median values for CpG islands groups ______________________________