I Have Never Taken Any Compliments to Heart Because Deep Down Inside I Know That All of Them Actually Belong to You Both. Thanks
Total Page:16
File Type:pdf, Size:1020Kb
Ihavenevertakenanycomplimentstoheart because deep down inside I know that all of them actually belong to you both. Thanks for everything, mom and dad. ‘I may not have gone where I intended to go, but I think I have ended up where I needed to be.’ -DouglasAdams Promoter: Prof. dr. ir. Tim De Meyer BioBiX - Lab. of Bioinformatics & Computational Genomics Department of Mathematical Modelling, Statistics and Bioinformatics Faculty of Bioscience Engineering Ghent University, Belgium Co-promoter: Prof. dr. ir. Wim Van Criekinge BioBiX - Lab. of Bioinformatics & Computational Genomics Department of Mathematical Modelling, Statistics and Bioinformatics Faculty of Bioscience Engineering Ghent University, Belgium Co-promoter: Prof. dr. Wim Vanden Berghe PPES - Lab. of Protein Chemistry, Proteomics & Epigenetic Signalling Department of Biomedical Sciences Faculty of Pharmaceutical, Veterinary and Biomedical Sciences University of Antwerp, Belgium Dean: Prof. dr. ir. Marc Van Meirvenne Rector: Prof. dr. Anne De Paepe Developing Bioinformatics Applications for the Analysis of Epigenetic Next-Generation Sequencing Data ir. Sandra Steyaert Promoter: Prof. Dr. ir. Tim De Meyer Co-promoters: Prof. Dr. ir. Wim Van Criekinge & Prof. Dr. Wim Vanden Berghe Thesis submitted in fulfilment of the requirements for the degree of Doctor (PhD) in Applied Biological Sciences Dutch translation of the title Ontwikkeling van bioinformatica toepassingen voor de analyse van epigenetische nieuwe generatie sequeneringsdata Cover Illustration The cover illustration shows the photograph (known as photo 51) of the X-ray crystallograph pattern of DNA obtained by Raymond Gosling under supervision of Rosalind Franklin in 1952. It was critical evidence in identifying the structure of DNA. Both James Watson and Francis Crick were struck by the simplicity and symmetry of this pattern. Franklin, an experienced crys- tallographer, didn’t even know that Watson and Crick had access to her X-ray diffraction data. The data clearly indicated to Watson and Crick that the B-form of DNA was a double helix and provided key information about its dimensions. That information was enough for them to build their model, for which they eventually earned the 1962 Nobel Prize in Medicine or Physiology. This picture was converted to ascii-art using IMG2TXT (www.DeGraeve.com). Reference Steyaert Sandra (2016), Developing Bioinformatics Applications for the Analysis of Epigenetic Next-Generation Sequencing Data. PhD thesis. Ghent University. Printing University Press, Zelzate Printing of this thesis was financially supported by MDxHealth. ISBN 978-90-5989-955-1 The author and the promoter give the authorization to consult and to copy parts of this work for personal use only. Every other use is subject to the copyright laws. Permission to reproduce any material contained in this work should be obtained from the author. Author Promoter Sandra Steyaert Tim De Meyer Members of the examination committee Prof. dr. ir. Koen Dewettinck Chairman Department of Food Safety and Food Quality Faculty of Bioscience Engineering Ghent University, Belgium Prof. dr. Tina Kyndt Secretary Department of Molecular Biotechnology Faculty of Bioscience Engineering Ghent University, Belgium Prof. dr. Jacques Balthazart GIGA Neurosciences University of Liege, Belgium Prof. dr. Frédéric Chédin Department of Molecular and Cellular Biology University of Davis, CA, USA Dr. ir. Lieven Verbeke Department of Information Technology (IBCN) Faculty of Engineering and Architecture Ghent University, Belgium Dr. ir. Pieter-Jan Volders VIB Medical Biotechnology Center Ghent University, Belgium Prof. dr. ir. Tim De Meyer Promoter Department of Mathematical Modelling, Statistics and Bioinformatics Faculty of Bioscience Engineering Ghent University, Belgium Prof. dr. ir. Wim Van Criekinge Co-promoter Department of Mathematical Modelling, Statistics and Bioinformatics Faculty of Bioscience Engineering Ghent University, Belgium Prof. dr. Wim Vanden Berghe Co-promoter Department of Biomedical Sciences Faculty of Pharmaceutical, Veterinary and Biomedical Sciences University of Antwerp, Belgium Table of Contents Table of Contents i Preface vii List of abbreviations x Research goals & Outline 1 1Researchgoals 3 2 Outline 5 Summary & Samenvatting 7 3 Summary 9 4 Samenvatting 13 I Introduction to genetics, epigenetics and imprinting 17 1 Molecular biology, genetics & genomics 19 1.1 Thecell ...................................... 19 1.2 Genes&DNA .................................. 20 1.3 The central dogma of molecular biology . 21 1.4 Gene regulation . 24 TABLE OF CONTENTS ii 2 Epigenetics 27 2.1 Histone modifications . 28 2.2 RNA ....................................... 29 2.3 DNA methylation . 29 3 Imprinting & Allele-specific expression 35 3.1 Parent-of-origin allele-specific expression . 36 3.2 Random allele-specific expression . 39 4 Neuroepigenetics 41 4.1 The emergence of a new epigenetic subdiscipline . 41 4.1.1 Neuroplasticity . 42 4.1.2 Dynamic & distinct DNA methylation in the brain . 42 4.2 An avian model for investigating the neurobiological basis of learning . 45 II Introduction to sequencing technologies 49 1 DNA sequencing 51 1.1 Sanger sequencing . 52 1.2 Next-generation sequencing . 52 1.2.1 Illumina Sequencing . 55 1.2.2 NGS advances & applications . 57 1.3 Next-next-generation sequencing . 61 1.3.1 Pacific Biosciences SMRT sequencing . 61 1.3.2 Life Technologies FRET sequencing . 62 1.3.3 Oxford Nanopore Technologies . 62 1.3.4 Ion Torrent Semiconductor sequencing . 63 2 Detection of DNA methylation 65 2.1 Bisulfite-based methods . 65 2.2 Enrichment-based methods . 68 2.3 Other methods . 70 TABLE OF CONTENTS iii 3 NGS data analysis 71 3.1 Primary analysis . 71 3.2 Secondary analysis . 72 3.3 Tertiary analysis . 73 3.3.1 Genotype calling . 73 3.3.2 Differential expression analysis . 74 III Detection of monoallelic DNA methylation 81 1 SNP-guided identification of monoallelic DNA methylation events from enrichment-based sequencing data 83 1.1 Abstract...................................... 83 1.2 Introduction.................................... 83 1.3 Materials & Methods . 86 1.4 Results ...................................... 91 1.5 Discussion . 97 1.6 Additional Data . 102 1.7 Author’s contributions . 102 103 2 INTERMEZZO Disadvantages of the pipeline for SNP-guided identification of monoallelic DNA methylation events 105 2.1 Rationale & bioinformatics pipeline . 105 2.2 Disadvantages................................... 106 2.3 Optimization . 107 109 3 A mixture model for the omics-based identification of monoallelically ex- pressed loci and their deregulation in breast cancer 111 3.1 Abstract...................................... 111 TABLE OF CONTENTS iv 3.2 Introduction.................................... 112 3.3 Materials & Methods . 114 3.3.1 Samples & data preprocessing . 114 3.3.2 Data-analytical framework . 114 3.3.3 Overlap with monoallelic methylation & 27/450k Infinium Human- Methylation Array validation . 119 3.4 Results....................................... 120 3.4.1 SNP tracing & data filtering . 120 3.4.2 Detection of imprinting . 121 3.4.3 Validation of putative imprinted regions . 123 3.4.4 Differential imprinting . 126 3.5 Discussion . 130 3.6 Author’s contributions . 135 IV Characterization of DNA methylation dynamics in the on- togeny of a songbird 137 1 A genome-wide search for epigenetically regulated genes in zebra finch using MethylCap-seq and RNA-seq 139 1.1 Abstract...................................... 139 1.2 Introduction.................................... 139 1.3 Materials & Methods . 141 1.4 Results....................................... 149 1.5 Discussion . 161 1.6 Data submission . 164 1.7 Acknowledgements . 164 1.8 Author’s contributions . 164 V Conclusion & future perspectives 165 1 Conclusion & future perspectives 167 TABLE OF CONTENTS v Bibliography 175 Nawoord 205 Curriculum Vitae 211 Appendix 1 A Appendix A 3 A.1 Detection of monoallelic DNA methylation . 3 A.1.1 Samples . 3 A.1.2 Additional methods . 12 A.1.2.1 Data-analytical framework: general discussion . 13 A.1.2.2 Examples . 18 A.1.3 WGBS datasets . 25 A.1.4 Illumina Human Body Map 2.0 . 26 A.1.5 SNP tracing and data filtering . 27 A.1.6 Detection of monoallelically methylated loci . 28 A.1.7 Functional location . 28 A.2.2 Allele-specific expression validation . 32 A.2 A mixture model for the omics-based identification of monoallelically ex- pressed loci and their deregulation in cancer . 33 A.2.1 Samples . 33 A.2.3 Data preprocessing . 33 A.2.4 Detection of (differential) imprinting . 33 A.2.4.1 Additional data filtering using an empirical Bayes approach 33 A.2.4.2 Additional data filtering using a goodness of fit test . 35 A.2.4.3 An introduction to Likelihood Ratio Tests . 36 A.2.4.4 PMF calculation: population allele frequency, binomial test and sequencing error rate . 37 TABLE OF CONTENTS vi A.2.4.5 Imprinting factor calculation . 40 A.2.4.6 Detection of differential imprinting . 40 A.2.5 SNP tracing & data filtering . 42 A.2.6 Mixture distributions . 43 A.2.7 Comparison with external references . 48 A.2.8 Differential imprinting . 49 B Appendix B 55 B.1 Characterization of DNA methylation dynamics in the ontogenyofasongbird .............................. 55 B.1.1 (Differential) methylation and expression analysis & enrichment analysis 55 B.1.2 Validation: single locus specific DNA methylation quantification by CpG pyrosequencing . 58 PREFACE viii Preface Those already familiar with the content of a PhD thesis might notice that this particular one is not