Identification of Functional Genetic Variants in Inflammatory Bowel Disease by Genome and Transcriptome Sequencing
Total Page:16
File Type:pdf, Size:1020Kb
Identification of Functional Genetic Variants in Inflammatory Bowel Disease by Genome and Transcriptome Sequencing Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von Matthias Barann August, 2012 Kiel, Deutschland 1 2 Referent/in: Prof. Dr. Philip Rosenstiel Korreferent/in: Prof. Dr. Dr. h.c. Thomas Bosch Tag der mündlichen Prüfung: 31.10.2012 Zum Druck genehmigt: 31.10.2012 gez. Prof. Dr. Wolfgang J. Duschl, Dekan 3 Parts of this dissertation are contained in the following two manuscripts. Rosenstiel R, Barann M, Klostermeier UC, Sheth V, Ellinghaus D, Rausch T, Korbel J, Nothnagel M, Krawczak M, Gilissen C, Veltman J, Forster M, Stade B, McLaughlin S, Lee CC, Fritscher-Ravens A, Franke A, Schreiber S. Whole Genome Sequence of a Crohn disease trio Barann M, Klostermeier UC, Esser D, Ammerpohl O, Siebert R, Sudbrak R, Lehrach H, Schreiber S, Rosenstiel R. Janus – Investigating the two faces of transcription 4 Contents 1. Introduction ............................................................................................................................. 7 1.1. Pathogenesis of inflammatory bowel diseases ...................................................................................... 8 1.2. Physical barrier function of the gut ......................................................................................................... 11 1.3. Pathogen recognition ................................................................................................................................... 12 1.4. Complement System ..................................................................................................................................... 13 1.5. Autophagocytosis .......................................................................................................................................... 14 1.6. Unfolded protein response ......................................................................................................................... 15 1.7. Proteasome ....................................................................................................................................................... 16 1.8. Major histocompatibility complex and antigen presentation ....................................................... 17 1.9. Cytokines ........................................................................................................................................................... 18 1.10. Next generation sequencing as diagnostic tool for clinical application .................................. 20 1.11. RNA sequencing ........................................................................................................................................... 21 2. Aims of the study ................................................................................................................... 23 3. Material and methods ........................................................................................................... 24 3.1. Extraction of nucleic acids from blood samples .................................................................................. 24 3.2. Library generation .......................................................................................................................................... 24 3.3. Sequencing ....................................................................................................................................................... 25 3.4. Sequence analysis .......................................................................................................................................... 26 3.5. SNV annotation ............................................................................................................................................... 27 3.6. Detection of de novo SNVs .......................................................................................................................... 28 3.7. Analysis of short structural variations (sSVs) ........................................................................................ 28 3.8. Analysis of long structural variations (lSVs) .......................................................................................... 29 3.9. Demographic origin of sequenced subjects......................................................................................... 29 3.10. Single nucleotide variant distribution along chromosomes ........................................................ 29 3.11. STRING network analysis of missense single nucleotide variants .............................................. 29 3.12. Selection of differentially expressed transcripts ............................................................................... 30 3.13. Regions of homozygosity.......................................................................................................................... 30 3.14. Calculation of child’s CD risk relative to its parents ......................................................................... 31 3.15. SNV verification by Sanger sequencing ............................................................................................... 31 3.16. Identification of sense/antisense pairs in transcriptomic data .................................................... 31 4. Results .................................................................................................................................... 38 4.1. General mapping and variant calling statistics ................................................................................... 38 4.2. Verification of de novo SNVs by Sanger sequencing .......................................................................... 39 5 4.3 Geographic origin determination based on genetic variants ......................................................... 43 4.4. Genetic variants in genes associated with monogenic phenocopies of Crohn’s disease .... 44 4.5. Genetic variants with known CD-association and other variants in the associated genes. 45 4.6. Genetic variants in genes associated with other inflammatory diseases .................................. 52 4.7. SNVs predicted to be damaging including all genes ........................................................................ 53 4.8. Identification of mutational hotspots ..................................................................................................... 55 4.9. Pathway analysis of genes affected by missense variants in the child ....................................... 56 4.10. Expression changes in the child compared to the parents and genetic variants in the respective genes. .................................................................................................................................................... 58 4.11. Strand specific transcriptome analysis ................................................................................................. 58 5. Discussion .............................................................................................................................. 65 5.1. Sequencing and variant calling performance ...................................................................................... 65 5.2. Investigation of de novo variants .............................................................................................................. 66 5.3. Genetic variants concerning monogenic phenocopies of Crohn’s disease .............................. 66 5.4. Crohn’s disease associated risk variants ................................................................................................. 66 5.5. Regions of homozygosity in relation to known Crohn’s disease risk loci .................................. 68 5.6. Genetic variants associated with other inflammatory diseases .................................................... 68 5.7. Everything else - the remaining genetic variability ........................................................................... 69 5.8. Differential expression and genetic variants in differentially expressed genes ...................... 73 5.9. Strand specific transcriptome analysis ................................................................................................... 75 5.10. Conclusions .................................................................................................................................................... 77 5.11. Perspective ..................................................................................................................................................... 79 6. Summary (English) ................................................................................................................ 79 7. Summary (German) ............................................................................................................... 80 8. References ............................................................................................................................. 82 9. List of Figures ........................................................................................................................ 91 10. List of Tables ........................................................................................................................ 92 11. List of abbreviations ..........................................................................................................