Genome Folding in Evolution and Disease
Total Page:16
File Type:pdf, Size:1020Kb
Johannes Gutenberg University Mainz Faculty of Biology Institute of Organismic and Molecular Evolution Computational Biology and Data Mining Group Dissertation Genome folding in evolution and disease Jonas Ibn-Salem February, 2018 Jonas Ibn-Salem Genome folding in evolution and disease Dissertation, February, 2018 Johannes Gutenberg University Mainz Computational Biology and Data Mining Group Institute of Organismic and Molecular Evolution Faculty of Biology Ackermannweg 4 55128 and Mainz Contents Abstract 7 1 Introduction 9 1.1 Regulation of gene expression ...................... 9 1.2 Distal regulation by enhancers ...................... 10 1.3 Methods to probe the 3D chromatin architecture . 11 1.3.1 Microscopy-based techniques to visualize the genome in 3D . 11 1.3.2 Proximity-ligation based method to quantify chromatin inter- actions ............................... 11 1.4 Hierarchy of chromatin 3D structure . 13 1.4.1 Chromosomal territories and inter-chromosomal contacts . 13 1.4.2 A/B compartments ........................ 15 1.4.3 Topologically associating domains (TADs) . 15 1.4.4 Hierarchy of domain structures across genomic length scales . 17 1.4.5 Chromatin looping interactions . 17 1.4.6 TAD and loop formation by architectural proteins . 18 1.5 Dynamics of chromatin structure .................... 20 1.5.1 Dynamics across the cell cycle . 20 1.5.2 Dynamics across cell types and differntiation . 20 1.6 Evolution of chromatin organization . 21 1.7 Disruption of chromatin architecture in disease . 23 1.8 Aims of this thesis ............................ 24 1.9 Structure of this thesis .......................... 27 2 Paralog genes in the 3D genome architecture 29 Preamble ................................. 29 Abstract .................................. 29 2.1 Introduction ................................ 30 2.2 Materials and methods .......................... 31 2.2.1 Selection of pairs of paralog genes . 31 2.2.2 Enhancers to gene association . 32 2.2.3 Topological associating domains . 32 2.2.4 Hi-C interaction maps ...................... 32 2.2.5 Randomization .......................... 32 1 2.2.6 Statistical tests .......................... 34 2.3 Results ................................... 34 2.3.1 Distribution of paralog genes in the human genome . 34 2.3.2 Co-expression of paralog gene pairs across tissues . 35 2.3.3 Paralog genes share enhancers . 37 2.3.4 Co-localization of paralogs in TADs . 38 2.3.5 Distal paralog pairs are enriched for long-range chromatin contacts .............................. 40 2.3.6 Close paralogs have fewer contacts than expected . 40 2.3.7 Paralogs in mouse and dog genome . 41 2.3.8 Orthologs of human paralogs show conserved co-localization 43 2.4 Discussions ................................ 45 2.5 Conclusion ................................ 47 Acknowledgements ............................... 47 3 Stability of TADs in evolution 49 Preamble ................................. 49 Abstract .................................. 49 3.1 Introduction ................................ 50 3.2 Results ................................... 51 3.2.1 Identification of evolutionary rearrangement breakpoints from whole-genome alignments . 51 3.2.2 Rearrangement breakpoints are enriched at TAD boundaries . 53 3.2.3 Clusters of conserved non-coding elements are depleted for rearrangement breakpoints ................... 55 3.2.4 Rearranged TADs are associated with divergent gene expres- sion between species ....................... 55 3.3 Discussion ................................. 58 3.4 Conclusion ................................ 61 3.5 Methods .................................. 61 3.5.1 Rearrangement breakpoints from whole-genome alignments . 61 3.5.2 Topologically associating domains and contact domains . 62 3.5.3 Breakpoint distributions at TADs . 62 3.5.4 Quantification of breakpoint enrichment . 63 3.5.5 Expression data for mouse and human orthologs . 63 3.5.6 Classification of TADs and genes according to rearrangements and GRBs ............................. 64 3.5.7 Source code and implementation details . 64 Declarations ................................... 64 Availability of data and material ..................... 64 Competing interests ........................... 65 Authors‘ contributions .......................... 65 2 Contents Acknowledgments ............................ 65 4 Position effects of rearrangements in disease genomes 67 Preamble ................................. 67 Abstract .................................. 67 4.1 Introduction ................................ 68 4.2 Materials and Methods .......................... 69 4.2.1 Selection of subjects with apparently balanced chromosome abnormalities ........................... 70 4.2.2 Clinical descriptions of DGAP cases . 70 4.2.3 Analysis of genes bordering the rearrangement breakpoints . 71 4.2.4 Assessment of disrupted functional elements and chromatin interactions bordering rearrangement breakpoints . 72 4.2.5 Ontological analysis of genes neighboring breakpoints . 73 4.2.6 Quantitative real-time PCR ................... 73 4.2.7 Assessment of DGAP breakpoints overlapping with non- coding structural variants in public databases . 73 4.3 Results ................................... 74 4.3.1 Genomic characterization of non-coding breakpoints . 74 4.3.2 Identification of genes with potential position effects . 76 4.3.3 Identification of subjects with shared non-coding chromo- some alterations and phenotypes . 78 4.4 Discussion ................................. 80 Declarations ................................... 84 Acknowledgements ............................ 84 Web Resources .............................. 84 5 Prediction of chromatin looping interactions 87 Preamble ................................. 87 Abstract .................................. 87 5.1 Introduction ................................ 88 5.2 Results ................................... 89 5.2.1 CTCF motif pairs as candidate chromatin loop anchors . 89 5.2.2 Similarity of ChIP-seq signals at looping CTCF motifs . 91 5.2.3 Genomic sequence features of CTCF motif pairs are associ- ated with looping ......................... 91 5.2.4 Chromatin loop prediction using 7C . 93 5.2.5 Prediction performance evaluation . 93 5.2.6 Prediction performance of 7C with sequence features and sin- gle TF ChIP-seq data sets ..................... 94 5.2.7 Comparison of transcription factors by prediction performance 96 5.2.8 Prediction performance in other cell types and for different TFs 96 Contents 3 5.2.9 The high resolution of ChIP-nexus improves prediction perfor- mance ............................... 98 5.3 Discussion ................................. 99 5.4 Conclusion ................................101 5.5 Methods ..................................102 5.5.1 CTCF motifs in the human genome . 102 5.5.2 Loop interaction data for training and validation . 102 5.5.3 ChIP-seq datasets in GM12878 cells . 103 5.5.4 ChIP-seq data types . 103 5.5.5 ChIP-nexus data processing for RAD21 and SCM3 . 103 5.5.6 Similarity of ChIP-seq profiles as correlation of coverage around motifs . 104 5.5.7 Genomic sequence features of chromatin loops . 104 5.5.8 Chromatin Loop prediction model . 104 5.5.9 Training and validation of prediction model . 105 5.5.10 Analysis of prediction performance . 105 5.5.11 Implementation of 7C and compatibility to other tools . 105 Declarations ...................................106 Availability of data and material . 106 Competing interests . 106 Authors‘ contributions . 106 Acknowledgments ............................106 6 Discussion 107 6.1 Co-regulation of functionally related genes in TADs . 107 6.2 Evolution by gene duplications and altered regulatory environments 109 6.3 TADs are stable across large evolutionary time-scales . 111 6.4 Gene expression changes by altered TADs in disease . 112 6.5 Towards predicting regulatory pathomechanisms of structural variants115 6.6 Towards studying genome folding in specific conditions . 117 6.6.1 Interpretation of non-coding variants by interacting genes . 117 6.6.2 Cell type-specific regulatory interactions between enhancers and genes .............................118 6.6.3 Variability of long-range interactions across individuals . 118 6.6.4 Genome folding in single cells . 119 6.6.5 Constantly improving targeted experimental methods. 119 6.7 Molecular mechanisms driving genome folding . 120 6.8 Conclusions ................................121 A Supporting Information: Co-regulation of paralog genes 123 B Supplementary Data: Stability of TADs in evolution 135 B.1 Supplementary Tables . 135 4 Contents B.2 Supplementary Figures . 135 C Supplemental Data: Position effects of rearrangements in disease genomes 139 C.1 Supplemental Note ............................139 C.1.1 Case Reports . 139 C.1.2 Nucleotide Level Nomenclature for DGAP karyotypes . 143 C.2 Supplemental Figure . 145 C.3 Supplemental Table Legends . 146 D Supplemental Information: Prediction of chromatin looping interac- tions 151 D.1 Supplementary Tables . 151 D.2 Supplementary Figures . 151 E Contribution to individual publications 153 Zusammenfassung 155 Bibliography 157 Contents 5 Abstract The human genome is hierarchically folded in the three-dimensional nucleus. Pair- wise chromatin contacts cluster in discrete chromosomal regions, termed topolog- ically associating domains (TADs). Whether TADs play an essential role in gene expression regulation in evolution and genetic diseases, is analyzed in this thesis by computationally integrating genome-wide contact maps with various