Systems Genetics in Polygenic Autoimmune Diseases
Total Page:16
File Type:pdf, Size:1020Kb
From the Institute of Experimental Dermatology Director: Prof. Saleh Ibrahim Systems genetics in polygenic autoimmune diseases Dissertation for Fulfillment of Requirements for the Doctoral Degree of the University of Lübeck from the Department of Medicine Submitted by Yask Gupta from New-Delhi, India Lübeck 2015 i First referee:- Prof. Dr. med Saleh Ibrahim Second referee:- Prof. Dr. rer. nat. Jeanette Erdman Chairman:- Prof. Dr. Horst-Werner Stüzbecher Data of oral examination:- 7 June 2016 ii iii Contents 1. Introduction 9 1.1 Autoimmune diseases 3 1.2 Autoimmune bullous diseases 4 1.3 Epidermolysis bullosa acquisita (EBA): an autoantibody-mediated autoimmune skin disease 5 1.4 Role of miRNA in autoimmune diseases 5 1.5 Genes in autoimmune diseases 7 1.6 Gene networks 8 1.6.1 Gene/Protein interaction database 9 1.6.2 Ab intio prediction of gene networks 11 1.7 QTL and expression QTL 12 1.8. Prediction of ncRNA 13 1.8.1 Support vector machine and random forest 15 1.9. Structure of the thesis 18 2. Aims of the thesis 21 3. Materials and methods 23 3.1 Generation of a four way advanced intercross line 23 iv 3.2 Experimental EBA and observation protocol 24 3.2.1 Generation of recombinant peptides 24 3.3 Extraction of genomic DNA for genotypic analysis 25 3.4 Generation of microarray data (miRNAs/genes) and bioinformatics analysis 25 3.5 Software for Expression QTL analysis 26 3.5.1 HAPPY 27 3.5.2 EMMA 28 3.5.3 Implementation of software for expression QTL mapping 30 3.6 Ab initio Gene and microRNAs networks prediction 31 3.6.1 MicroRNAs 32 3.6.2 Genes 33 3.6.3 MicroRNAs gene target prediction 34 3.7 Visualization of interaction networks 35 3.8 Network databases (STRING and IPA) 36 3.9 Prediction of ncRNA tools 37 3.9.1 Dataset 38 3.9.2 Features for classification 39 3.9.3 Classification system 41 v 3.9.4 Implementation of support vector machines and random forest on the post transcriptional non coding RNA dataset 42 3.9.5 Work flow and output of ptRNApred 45 4. Results 47 4.1 MicroRNA Expression Profiling 47 4.1.1 Expression QTL mapping 47 4.1.2 Epistasis in miRNA 54 4.1.3 Genetic overlap of ASBD QTL (EBA) and expression QTL (miRNAs) 59 4.1.4 Expression and co-expression of miRNAs in ASBD (EBA) 61 4.1.5 Causal miRNA network in skin 65 4.2 Gene expression profiling 67 4.2.1 Differentially expressed and co-expressed genes in EBA 68 4.2.2 Expression QTL mapping 74 4.2.3 Combining protein interaction networks with eQTL 80 4.3 MicroRNA – Gene target prediction 88 4.4 Prediction of non-coding RNA 92 4.4.1 Validation of the method 95 4.4.2 Validation of the algorithm 97 vi 4.4.3 Validation of the feature number 97 4.4.4 Variable importance 99 4.4.5 Performance on a non-eukaryotic system 101 4.4.6 Performance on mRNA 101 5. Discussion 102 5.1 Candidate gene for regulation of miRNA 102 5.2 Regulating QTL for gene-expression 105 5.3 miRNA-targets in EBA 109 5.4 ptRNA-prediction tool 111 6. Conclusions 115 7. Summary 116 8. Zusammenfassung 120 9. References 125 10.1 List of abbreviations 139 10.2 List of figures 141 10.3 List of tables 146 11. Acknowledgements 148 12. Curriculum vitae 150 vii 13. Supplements 153 Section S1: Features for classification. 153 Set 1: Dinucleotide properties 153 Set 2: Secondary structure properties 154 a. The Minimum free energy (MFE) structure 154 b. The ensemble free energy 156 c. The centroid structure 156 d. The maximum expected accuracy (MEA) structure 157 e. The frequency of the MFE representative in the complete ensemble of secondary structures and the ensemble diversity 157 viii 1. Introduction The complexity of the human body has fascinated scientists for centuries. To study it, the biology-based interdisciplinary field called ‘systems biology’ was introduced (Kitano, 2002a, b). According to its main principle, in order to understand a biological system, one can modulate it, monitor the response, and formulate a mathematical model to describe those semantics. In system genetics, natural variation in the whole population perturbs biological processes (Civelek and Lusis, 2014; Farber, 2012; Feltus, 2014). Thus, the ultimate goal of system genetics is to understand the complexities of biological traits, by the identification of genes, pathways and networks associated with human diseases (van der Sijde et al., 2014). Genome-wide association studies (GWAS) have enabled the identification of genetic polymorphisms (SNPs) that are correlated to disease phenotypes (Gregersen et al., 2012). However, while data from GWAS studies may statistically determine single gene associations, it is not sufficient to explain molecular mechanisms (Farber, 2012). Thus, the development of adequate mouse models mimicking human diseases is required (Kung and Huang, 2001). Previously, the generation of recombinant inbred mouse strains was used as a powerful tool for mapping the genomic loci underlying ix the complex traits (QTLs) (Gonzales and Palmer, 2014). However, this data requires further fine mapping to eventually indicate the exact causal genes of deviant phenotypes (Mott and Flint, 2002). To shorten this process and dissect the molecular effects, gene expression data from inbred strains can be combined with genomic variations to identify the genes that control the traits (eQTL) (Cookson et al., 2009). The discovery of micro RNA (miRNA) introduced an additional level of complexity to the molecular mechanisms leading to disease (Lee et al., 1993). This class of small non-coding RNA molecules has been reported to control gene expression by diverse mechanisms (Capuano et al., 2011; Dai et al., 2013; Paraboschi et al., 2011). Similar to eQTL, the genomic loci controlling the miRNA expression can also be derived using chip technology (Liu et al., 2012). An observed statistical association of miRNA levels with mRNA levels is indicative of a mutual or joint genetic control. We seek those genetic loci for which variations affect these statistical interactions. Preferably, that locus is close (‘cis’) to the regions coding either of the two interacting genes. This, in turn, can lead to the identification of possible molecular pathways and help to unravel new therapeutic approaches. Recently, additional classes of non-coding RNAs such as snoRNA, snRNA, telomerase RNA etc. have been discovered to contribute to disease phenotypes (Esteller, 2011). Therefore, to gain an overview of the complete gene network, the integration of additional classes of non-coding RNA beyond miRNA is required. 2 However, the current computational algorithms cannot yet predict all the existing classes of non-coding RNA. Hence, in this work, I sought to address the following aims: i) To find the possible interaction network of genes and miRNAs that play an important role in the pathogenesis of autoimmune skin disease, using experimental Epidermolysis Bullosa Acquisita (EBA) as a model for autoimmune diseases. ii) To develop software which can predict different classes of post-transcriptional RNAs (a class of non-coding RNA) based on their sequences and structural properties using machine learning algorithms. 1.1 Autoimmune diseases One of the main functions of the immune system is to discriminate between self and non-self antigens (Parkin and Cohen, 2001). Failure to differentiate between self and foreign antigens can lead to a sustained immune response and development of autoimmune diseases (ADs). More than 80 ADs have been identified, affecting nearly 100 million people worldwide (Cooper et al., 2009; Cooper and Stroehla, 2003; Jacobson et al., 1997). ADs are differentiated from other types of diseases on the basis of of Witebsky´s postulates. According to these postulates ADs should fulfil at least two or more of the following criteria: 1) a specific adaptive immune response directed to an affected tissue or organ; 2) presence of auto-reactive T cells or auto- antibodies (AAbs) in the affected organ or tissue; 3) induction of the disease in healthy humans or animals by transfer of auto-reactive T cells or AAbs; 4) induction 3 of the disease in animals by immunization with auto-antigen; 5) immune- modulation and suppression of autoimmune response. Numerous risk factors have been associated with ADs, however the complete pathoaetiology of these diseases remains unknown (Rioux and Abbas, 2005). Nonetheless, it is clear that both genetic and environmental factors play a critical role in the pathology of autoimmunity. 1.2 Autoimmune bullous diseases Production of auto-antibodies (AAbs) directed against different structural proteins of the skin and the mucous membranes lead to the development of organ-specific autoimmune disorders in the skin called autoimmune bullous disease (AIBDs) (Sticherling and Erfurt-Berge, 2012). AIBDs can be classified into four distinct groups based on their clinical, histological and immunopathological criteria, i.e. pemphigus and the group of pemphigoid diseases, and dermatitis herpetiformis Duhring (Zillikens, 2008). In the group of pemphigus diseases, AAbs are directed against desmosomal proteins, leading to a loss of adhesion between adjacent epidermal keratinotcytes and intra-epithelial blister formation (Amagai et al., 1995). In pemphigoid diseases, AAbs are formed against hemidesmosomal proteins, resulting in the dysadhesion of basal keratinocytes to the underlying basement membrane (BM) and formation of subepidermal blisters (Borradori and Sonnenberg, 1999). 4 1.3 Epidermolysis bullosa acquisita (EBA): an autoantibody-mediated autoimmune skin disease Epidermolysis bullosa acquisita (EBA) is a severe chronic inflammatory subepidermal ABD which is characterized by the production of AAbs against type VII collagen (Ludwig, 2013). Type VII collagen is a major component of anchoring fibrils that is situated at the DEJ (dermal epidermal junction).