Biq Analyzer: Visualization and Quality Control for DNA Methylation Data from Bisulfite Sequencing
Total Page:16
File Type:pdf, Size:1020Kb
A T G A T G A A A C G A CT G BiQ A T G A C G A Analyzer bisulphite sequencing simplified Computational Epigenetics PhD Thesis by Christoph Bock Computational Epigenetics Bioinformatic methods for epigenome prediction, DNA methylation mapping and cancer epigenetics Dissertation zur Erlangung des akademischen Grades des Doktors der Naturwissenschaften (Dr. rer. nat.) im Fach Informatik der Naturwissenschaftlich-Technischen Fakultäten der Universität des Saarlandes von Christoph Bock eingereicht im Mai 2008 ii Datum der Einreichung: 31. Mai 2008 Gutachter: Prof. Dr. Thomas Lengauer, Ph.D. Prof. Dr. Jörn Walter Prof. Dr. Martin Vingron Datum des Kolloquiums: 2. Oktober 2008 Dekan der Fakultät: Prof. Dr. Joachim Weickert Vorsitzender des Kolloquiums: Prof. Dr. Gerhard Weikum Protokollant Dr. Mario Albrecht iii iv CONTENTS LIST OF TABLES ..................................................................................................................... vi LIST OF FIGURES .................................................................................................................. vii ACKNOWLEDGMENT ............................................................................................................ x ABSTRACT .............................................................................................................................. xi KURZFASSUNG ..................................................................................................................... xii Part A. Introduction into Computational Epigenetics ................................................................. 1 A-1 Outline ......................................................................................................................... 1 A-2 Two facets of epigenetic inheritance........................................................................... 1 A-3 Mechanisms of epigenetic regulation.......................................................................... 2 A-4 Generation, low-level processing and quality control of epigenetic data ................... 4 A-5 Epigenome data analysis ............................................................................................. 6 A-6 Epigenome prediction: inferring epigenetic states from the DNA sequence .............. 9 A-7 Cancer epigenetics: toward improved diagnosis and therapy ................................... 10 A-8 Outline of the remainder of this thesis ...................................................................... 12 Part B. Epigenome Prediction ................................................................................................... 14 B-1 Outline ....................................................................................................................... 14 B-2 Predicting DNA methylation based on the genomic DNA sequence ....................... 14 B-3 EpiGRAPH: A user-friendly tool for advanced (epi-) genome analysis and prediction................................................................................................................... 26 B-4 CpG island mapping by epigenome prediction ......................................................... 43 B-5 An optimization-based approach to CpG island annotation ..................................... 62 Part C. DNA Methylation Mapping .......................................................................................... 75 C-1 Outline ....................................................................................................................... 75 C-2 BiQ Analyzer: Visualization and quality control for DNA methylation data from bisulfite sequencing .......................................................................................... 75 C-3 Insights from computational analysis of high-resolution DNA methylation data….. ...................................................................................................................... 78 C-4 Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping ................................................................................ 89 v Part D. Cancer Epigenetics ..................................................................................................... 107 D-1 Outline ..................................................................................................................... 107 D-2 Relevance of the methyl-CpG binding protein MeCP2 for Polycomb recruitment in cancer cells....................................................................................... 108 D-3 Optimizing a DNA-methylation-based biomarker of chemotherapy resistance for use in clinical settings ....................................................................... 119 Part E. Conclusion and Outlook ............................................................................................. 129 E-1 Outline ..................................................................................................................... 129 E-2 Conclusion ............................................................................................................... 129 E-3 Outlook .................................................................................................................... 133 Part F. List of Publications ..................................................................................................... 136 Part G. References .................................................................................................................. 138 vi LIST OF TABLES TABLE 1. METHODS FOR GENOME -WIDE MAPPING OF EPIGENETIC INFORMATION .................................................... 5 TABLE 2. LARGE -SCALE EPIGENOME MAPPING PROJECTS ......................................................................................... 8 TABLE 3. DNA-RELATED ATTRIBUTES DIFFER SIGNIFICANTLY BETWEEN METHYLATED AND UNMETHYLATED CPG ISLANDS .................................................................................................................... 19 TABLE 4. THE PREDICTIVE POWER OF ATTRIBUTE CLASSES DIFFERS REMARKABLY ; CONTROL EXPERIMENTS CONFIRM THE CHOICE OF THE PREDICTION METHOD ............................................................... 22 TABLE 5. TWELVE CPG ISLANDS WERE ANALYZED EXPERIMENTALLY TO VALIDATE OUR PREDICTIONS ................ 24 TABLE 6. LIST OF DEFAULT ATTRIBUTES INCLUDED IN EPI GRAPH ....................................................................... 30 TABLE 7. PREDICTION PERFORMANCE FOR DNA METHYLATION AND PROMOTER ACTIVITY AT CPG ISLANDS ........................................................................................................................................................ 50 TABLE 8. A SUBSET OF CPG ISLANDS EXHIBIT HIGHLY SIGNIFICANT OVERLAP WITH MULTIPLE EPIGENETIC MODIFICATIONS SIMULTANEOUSLY ........................................................................................... 52 TABLE 9. PREDICTION PERFORMANCE FOR THE DISTINCTION BETWEEN CPG ISLANDS THAT OVERLAP WITH A PARTICULAR EPIGENETIC MODIFICATION AND THOSE THAT DO NOT ................................................. 52 TABLE 10. PERFORMANCE COMPARISON BETWEEN THE COMBINED EPIGENETIC SCORE AND THE CPG ISLAND LENGTH ............................................................................................................................................ 60 TABLE 11. FUNCTIONS FOR COMPUTATIONAL SIMULATION OF EXPERIMENTAL METHODS FOR DNA METHYLATION MAPPING ............................................................................................................................... 94 TABLE 12. CORRELATION BETWEEN HIGH -RESOLUTION IMPROVEMENT AND ITS POTENTIAL PREDICTORS ........... 102 TABLE 13. STATISTICAL EVALUATION OF CANDIDATE BIOMARKERS ASSESSING MGMT PROMOTER METHYLATION ............................................................................................................................................ 125 vii LIST OF FIGURES FIGURE 1. CARRIERS OF EPIGENETIC INFORMATION : DNA AND NUCLEOSOME ......................................................... 2 FIGURE 2. PREDICTED DNA STRUCTURE DIFFERS IN THE NEIGHBORHOOD OF METHYLATED CPG ISLANDS COMPARED WITH THEIR UNMETHYLATED COUNTERPARTS .............................................................. 20 FIGURE 3. OUTLINE OF EPI GRAPH’ S SOFTWARE ARCHITECTURE .......................................................................... 29 FIGURE 4. DOCUMENTATION OF EPI GRAPH ANALYSES IN THE X-GRAF FORMAT ................................................ 33 FIGURE 5. RESULTS SCREENSHOT OF EPI GRAPH’ S MACHINE LEARNING MODULE QUANTIFYING THE PREDICTABILITY OF ULTRACONSERVED ELEMENTS BASED ON DIFFERENT GROUPS OF (EPI -) GENOMIC ATTRIBUTES ................................................................................................................................... 37 FIGURE 6. RESULTS SCREENSHOT OF EPI GRAPH’ S STATISTICAL ANALYSIS MODULE IDENTIFYING SIGNIFICANT GENE REGULATORY DIFFERENCES BETWEEN ULTRACONSERVED ELEMENTS THAT ARE RESTRICTED TO MAMMALS AND THOSE THAT ARE ALSO PRESENT IN BIRDS ................................................... 38 FIGURE 7. EPI GRAPH RESULTS SCREENSHOTS INDICATING THAT PROMOTERS OF MONOALLELICALLY EXPRESSED GENES ARE ENRICHED WITH REPRESSIVE HISTONE MODIFICATIONS AND CAN BE PREDICTED BIOINFORMATICALLY .................................................................................................................. 39 FIGURE 8. RESULTS SCREENSHOT OF EPI GRAPH’ S DIAGRAM GENERATION MODULE HIGHLIGHTING