View of Another Example In
Total Page:16
File Type:pdf, Size:1020Kb
METHODS AND ANALYSES IN THE STUDY OF HUMAN DNA METHYLATION by KE HU Submitted in partial fulfillment of the requirements For the Degree of Doctor of Philosophy Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY May, 2018 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Ke Hu candidate for the degree of Doctor of Philosophy *. Committee Chair Dr. Jing Li Committee Member Dr. Angela Ting Committee Member Dr. Fulai Jin Committee Member Dr. Xusheng Xiao Date of Defense March 29, 2018 *We also certify that written approval has been obtained for any proprietary material contained therein. Table of Contents TABLE OF CONTENTS .............................................................................................................................. I LIST OF TABLES ..................................................................................................................................... IV LIST OF FIGURES ..................................................................................................................................... V ACKNOWLEDGEMENTS .................................................................................................................... VIII ABSTRACT ................................................................................................................................................... 1 CHAPTER 1 INTRODUCTION ................................................................................................................. 3 1.1 DNA METHYLATION IN MAMMALS .................................................................................................... 3 1.2 METHODS TO MEASURE DNA METHYLATION .................................................................................... 3 1.3 DISSERTATION ORGANIZATION .......................................................................................................... 5 CHAPTER 2 DETECTION OF CPG SITES WITH MULTI-MODAL DNA METHYLATION LEVEL DISTRIBUTIONS ........................................................................................................................... 6 2.1 MOTIVATION ..................................................................................................................................... 6 2.2 METHODS .......................................................................................................................................... 8 2.2.1 Data ......................................................................................................................................... 8 2.2.2 Gaussian Mixture Model Clustering ........................................................................................ 9 2.2.3 Detection of multimodal CpG sites ......................................................................................... 9 2.2.4 Associating GMM cluster labels with genotypes .................................................................. 11 2.3 RESULTS .......................................................................................................................................... 11 2.3.1 Genome-wide survey of mmCpG sites in GAW20 dataset ................................................... 11 2.3.2 Association between mmCpGs and SNPs ............................................................................. 13 2.4 DISCUSSION ..................................................................................................................................... 16 CHAPTER 3 DISCOVERING DNA METHYLATION CO-OCCURRENCE PATTERNS .............. 18 i 3.1 MOTIVATION ................................................................................................................................... 18 3.2 METHODS ........................................................................................................................................ 20 3.2.1 Characteristics ....................................................................................................................... 20 3.2.2 Workflow ............................................................................................................................... 21 3.3 RESULT ............................................................................................................................................ 27 3.3.1 Experiment Summary ............................................................................................................ 27 3.3.2 DNA methylation co-occurrence pattern analysis ................................................................. 28 3.3.3 Potential ASM detection ........................................................................................................ 29 3.3.4 Efficiency .............................................................................................................................. 32 3.4 DISCUSSION ..................................................................................................................................... 35 CHAPTER 4 GENOME WIDE PROFILING OF ALLELE-SPECIFIC DNA METHYLATION ..... 37 4.1 MOTIVATION ................................................................................................................................... 37 4.2 METHODS ........................................................................................................................................ 43 4.2.1 Data ....................................................................................................................................... 43 4.2.2 Analysis flow ......................................................................................................................... 46 4.2.3 The proposed ASM detection method ................................................................................... 50 4.2.3.1 Step 1: Mapping and methylation calling ................................................................................... 50 4.2.3.2 Step 2: Candidate region definition ............................................................................................ 50 4.2.3.3 Step 3 ASM detection based on a Graph model ......................................................................... 51 4.2.3.4 Step 4 Final analysis ................................................................................................................... 55 4.2.4 Genome annotation ................................................................................................................ 55 4.2.5 CTCF binding data ................................................................................................................ 56 4.2.6 RNA-seq data ........................................................................................................................ 56 4.2.7 SNP calling ............................................................................................................................ 56 4.2.8 Checking consistency between heterozygous alleles and ASM partitions ............................ 59 4.2.9 amrfinder result ..................................................................................................................... 59 4.3 RESULT ............................................................................................................................................ 60 ii 4.3.1 ASM is ubiquitous across the genome and is cell line specific ............................................. 60 4.3.2 Enrichment in female X chromosomes .................................................................................. 70 4.3.2.1 Overlaps with RefSeq Genes. ..................................................................................................... 71 4.3.2.2 Overlaps with ENCODE regulatory elements. ........................................................................... 73 4.3.2.3 Relationship with gene expression levels. .................................................................................. 80 4.3.3 ASM significantly overlaps imprinted gene regions ............................................................. 82 4.3.3.1 Majority of imprinted genes have ASMs. .................................................................................. 82 4.3.3.2 Imprinted genes overlap strong ASMs. ...................................................................................... 89 4.3.3.3 Variability of ASM in imprinted regions in different cell lines. ................................................. 91 4.3.3.4 Overlaps with promoter regions and correlation with gene expression. ..................................... 94 4.3.4 ASM patterns in autosomes ................................................................................................... 95 4.3.4.1 ASM distributions. ..................................................................................................................... 95 4.3.4.2 Overlaps with regulatory elements and relations with expression levels.................................... 96 4.3.5 Heterozygous SNPs located in identified ASM regions strongly support read partitions ..... 98 4.3.6 Comparison with amrfinder ................................................................................................. 106 4.4 DISCUSSION ............................................................................................................................... 116 CHAPTER 5 A GENERAL BISULFITE SEQUENCE CLUSTERING AND VISUALIZATION TOOL 120 5.1 BACKGROUND ..............................................................................................................................