Identification of Functional Genetic Variants in Inflammatory Bowel Disease by Genome and Transcriptome Sequencing

Identification of Functional Genetic Variants in Inflammatory Bowel Disease by Genome and Transcriptome Sequencing Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Matthias Barann August, 2012 Kiel, Deutschland 1 2 Referent/in: Prof. Dr. Philip Rosenstiel Korreferent/in: Prof. Dr. Dr. h.c. Thomas Bosch Tag der mündlichen Prüfung: 31.10.2012 Zum Druck genehmigt: 31.10.2012 gez. Prof. Dr. Wolfgang J. Duschl, Dekan 3 Parts of this dissertation are contained in the following two manuscripts. Rosenstiel R, Barann M, Klostermeier UC, Sheth V, Ellinghaus D, Rausch T, Korbel J, Nothnagel M, Krawczak M, Gilissen C, Veltman J, Forster M, Stade B, McLaughlin S, Lee CC, Fritscher-Ravens A, Franke A, Schreiber S. Whole Genome Sequence of a Crohn disease trio Barann M, Klostermeier UC, Esser D, Ammerpohl O, Siebert R, Sudbrak R, Lehrach H, Schreiber S, Rosenstiel R. Janus – Investigating the two faces of transcription 4 Contents 1. Introduction ............................................................................................................................. 7 1.1. Pathogenesis of inflammatory bowel diseases ...................................................................................... 8 1.2. Physical barrier function of the gut ......................................................................................................... 11 1.3. Pathogen recognition ................................................................................................................................... 12 1.4. Complement System ..................................................................................................................................... 13 1.5. Autophagocytosis .......................................................................................................................................... 14 1.6. Unfolded protein response ......................................................................................................................... 15 1.7. Proteasome ....................................................................................................................................................... 16 1.8. Major histocompatibility complex and antigen presentation ....................................................... 17 1.9. Cytokines ........................................................................................................................................................... 18 1.10. Next generation sequencing as diagnostic tool for clinical application .................................. 20 1.11. RNA sequencing ........................................................................................................................................... 21 2. Aims of the study ................................................................................................................... 23 3. Material and methods ........................................................................................................... 24 3.1. Extraction of nucleic acids from blood samples .................................................................................. 24 3.2. Library generation .......................................................................................................................................... 24 3.3. Sequencing ....................................................................................................................................................... 25 3.4. Sequence analysis .......................................................................................................................................... 26 3.5. SNV annotation ............................................................................................................................................... 27 3.6. Detection of de novo SNVs .......................................................................................................................... 28 3.7. Analysis of short structural variations (sSVs) ........................................................................................ 28 3.8. Analysis of long structural variations (lSVs) .......................................................................................... 29 3.9. Demographic origin of sequenced subjects......................................................................................... 29 3.10. Single nucleotide variant distribution along chromosomes ........................................................ 29 3.11. STRING network analysis of missense single nucleotide variants .............................................. 29 3.12. Selection of differentially expressed transcripts ............................................................................... 30 3.13. Regions of homozygosity.......................................................................................................................... 30 3.14. Calculation of child’s CD risk relative to its parents ......................................................................... 31 3.15. SNV verification by Sanger sequencing ............................................................................................... 31 3.16. Identification of sense/antisense pairs in transcriptomic data .................................................... 31 4. Results .................................................................................................................................... 38 4.1. General mapping and variant calling statistics ................................................................................... 38 4.2. Verification of de novo SNVs by Sanger sequencing .......................................................................... 39 5 4.3 Geographic origin determination based on genetic variants ......................................................... 43 4.4. Genetic variants in genes associated with monogenic phenocopies of Crohn’s disease .... 44 4.5. Genetic variants with known CD-association and other variants in the associated genes. 45 4.6. Genetic variants in genes associated with other inflammatory diseases .................................. 52 4.7. SNVs predicted to be damaging including all genes ........................................................................ 53 4.8. Identification of mutational hotspots ..................................................................................................... 55 4.9. Pathway analysis of genes affected by missense variants in the child ....................................... 56 4.10. Expression changes in the child compared to the parents and genetic variants in the respective genes. .................................................................................................................................................... 58 4.11. Strand specific transcriptome analysis ................................................................................................. 58 5. Discussion .............................................................................................................................. 65 5.1. Sequencing and variant calling performance ...................................................................................... 65 5.2. Investigation of de novo variants .............................................................................................................. 66 5.3. Genetic variants concerning monogenic phenocopies of Crohn’s disease .............................. 66 5.4. Crohn’s disease associated risk variants ................................................................................................. 66 5.5. Regions of homozygosity in relation to known Crohn’s disease risk loci .................................. 68 5.6. Genetic variants associated with other inflammatory diseases .................................................... 68 5.7. Everything else - the remaining genetic variability ........................................................................... 69 5.8. Differential expression and genetic variants in differentially expressed genes ...................... 73 5.9. Strand specific transcriptome analysis ................................................................................................... 75 5.10. Conclusions .................................................................................................................................................... 77 5.11. Perspective ..................................................................................................................................................... 79 6. Summary (English) ................................................................................................................ 79 7. Summary (German) ............................................................................................................... 80 8. References ............................................................................................................................. 82 9. List of Figures ........................................................................................................................ 91 10. List of Tables ........................................................................................................................ 92 11. List of abbreviations ..........................................................................................................

Identification of Functional Genetic Variants in Inflammatory Bowel Disease by Genome and Transcriptome Sequencing

NIH Public Access Author Manuscript Reprod Biomed Online

Two Locus Inheritance of Non-Syndromic Midline Craniosynostosis Via Rare SMAD6 and 4 Common BMP2 Alleles 5 6 Andrew T

High-Density P300 Enhancers Control Cell State Transitions Steven Witte1,2, Allan Bradley2, Anton J

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Genetic Analysis of Hereditary Sensory, Motor and Autonomic

Supplementary Table 1: Adhesion Genes Data Set

Essential Genes and Their Role in Autism Spectrum Disorder

The Interactome of KRAB Zinc Finger Proteins Reveals the Evolutionary History of Their Functional Diversification

Learning from Cadherin Structures and Sequences: Affinity Determinants and Protein Architecture

Supplementary Table S4. FGA Co-Expressed Gene List in LUAD

Aneuploidy: Using Genetic Instability to Preserve a Haploid Genome?

A Common Analgesic Enhances the Anti-Tumour Activity of 5-Aza-2’- Deoxycytidine Through Induction of Oxidative Stress