Role of Genomic Variants in the Response to Biologics Targeting Common Autoimmune Disorders
Total Page:16
File Type:pdf, Size:1020Kb
Role of genomic variants in the response to biologics targeting common autoimmune disorders by Gordana Lenert, PhD The thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Science Ottawa-Carleton Joint Program in Bioinformatics Carleton University Ottawa, Canada © 2016 Gordana Lenert Abstract Autoimmune diseases (AID) are common chronic inflammatory conditions initiated by the loss of the immunological tolerance to self-antigens. Chronic immune response and uncontrolled inflammation provoke diverse clinical manifestations, causing impairment of various tissues, organs or organ systems. To avoid disability and death, AID must be managed in clinical practice over long periods with complex and closely controlled medication regimens. The anti-tumor necrosis factor biologics (aTNFs) are targeted therapeutic drugs used for AID management. However, in spite of being very successful therapeutics, aTNFs are not able to induce remission in one third of AID phenotypes. In our research, we investigated genomic variability of AID phenotypes in order to explain unpredictable lack of response to aTNFs. Our hypothesis is that key genetic factors, responsible for the aTNFs unresponsiveness, are positioned at the crossroads between aTNF therapeutic processes that generate remission and pathogenic or disease processes that lead to AID phenotypes expression. In order to find these key genetic factors at the intersection of the curative and the disease pathways, we combined genomic variation data collected from publicly available curated AID genome wide association studies (AID GWAS) for each disease. Using collected data, we performed prioritization of genes and other genomic structures, defined the key disease pathways and networks, and related the results with the known data by the bioinformatics approaches. We queried the AID results against known data about the aTNFs interventional pathways. Our findings allowed us to infer potential genetic factors and pathways responsible for the aTNF therapeutic effects in AID. A multitude of publicly available bioinformatics tools and databases allowed us to extract the knowledge, analyse it and provide orthogonal evidence for the results. The results support existence of at least two different sets of pathways responsible for AID pathology, most probably reflecting subtypes of AID. Only one set of pathways might be influenced by aTNFs, offering an explanation why aTNF therapy is not always effective. Additionally, our results narrow down the complex common genomic variability responsible for aTNF unresponsiveness that could be tested in future. If our results are confirmed by functional assays and/or clinical trials, then the lack of response could be predicted ahead of aTNF therapy, leading to better patient selection and improved prognostic outcomes. The extremely high cost of the standard aTNFs therapy can easily cover the cost of genotyping ahead of medication. Table of Contents ABSTRACT II 1. INTRODUCTION 1 1.1. Motivation 1 1.2. Autoimmune diseases: burden in society, nature and treatment 2 1.2.1. Genetics of autoimmune diseases 4 1.3. Non-responsiveness to anti-TNF therapy 9 1.3.1. Tumor necrosis factor alpha 9 1.3.2. Anti-tumor necrosis factor biologics 9 1.3.3. Therapeutic role of aTNFs 10 1.3.4. Efficacy and safety of aTNFs 11 1.4. GWAS, variant database, tools, and analyses; genotype-phenotype relationship 13 1.4.1. Importance of genome wide association studies 13 1.4.2. GWAS methodology 14 1.4.2.1. Human Genome Project 15 1.4.2.2. HapMap Project 16 1.4.2.3. Linkage Disequilibrium 18 1.4.2.4. 1000 Genomes Project 19 1.4.2.5. The Encyclopedia of DNA Elements (ENCODE) project 21 1.4.2.6. Genotyping biochip 22 1.4.2.7. Other advancements contributing to GWAS implementation 23 1.4.3. GWAS database 25 1.5. Genotype–phenotype relationship in AID 26 1.5.1. From SNPs and genes to molecular networks and pathways 26 1.5.2. Basic concepts of biological networks and pathways 29 1.5.2.1. Biological pathways 30 1.5.2.2. Biological networks 33 1.6. Biological Ontologies 34 1.6.1. Gene Ontology 36 1.7. Research Approach 38 2. MATERIAL AND METHODS 40 2.1. Data collection of GWAS AID association SNPs 40 2.1.1. Software tools and databases used for data collection 41 2.1.1.1. NHGRI Catalog 41 2.1.1.2. PheGenI 41 2.2. SNP-gene analyses: mapping AID SNP to genes and other genomic structures 42 2.2.1. Rationale 42 2.2.2. Procedure 43 2.2.3. Software tools and databases 44 2.2.3.1. HaploReg 44 2.2.3.2. RegulomeDB 44 2.2.3.3. dbSNP 45 2.2.4. Evaluation of missense SNPs impact on proteins coded by missSNP AID genes 45 2.2.4.1. Rationale 45 2.2.4.2. PolyPhen-2 software tool and procedure 45 2.3. Functional analyses of GWAS AID SNPs: gene-pathway prioritization 45 2.3.1. Rationale 46 2.3.2. AID GWAS SNP pathway analysis 46 2.3.2.1. Rationale 46 2.3.2.2. Procedure for finding AID SNP pathways 47 2.3.2.2.1. Manual search for AID SNP pathways 47 2.3.2.2.2. AID SNP pathway enrichment analyses 47 2.3.2.2.3. Parsing of the AID SNP pathways 47 2.3.3. AID SNP GWAS network analysis 48 2.3.3.1. Cytoscape 48 2.3.3.1.1. Rationale 48 2.3.3.1.2. Procedure 48 2.4. Functional analyses of AID GWAS SNP data using Gene Ontology 49 2.4.1. Gene Ontology Enrichment Analysis 49 2.4.2. Rationale 49 2.4.3. Procedure 50 2.5. Software tools and databases used in prioritization and GO enrichment analyses 50 2.5.1. STRING 50 2.5.1.1. Procedure 51 2.5.2. ConsensusPathDB 51 2.5.2.1. Procedure 52 2.5.3. DiseaseConnectDB 52 2.6. Finding interactions of AID GWAS SNPs with non-coding RNAs 53 2.6.1. Rationale 53 2.6.2. Procedure 53 2.7. Overview of the bioinformatics software tools and databases used in the study 54 3. RESULTS 56 3.1. AID-associated single nucleotide polymorphisms (AID SNPs) 56 3.2. Gene candidate identification 57 3.2.1. Identification of gene candidate dataset for coding non-synonymous AID SNPs (missense SNP gene dataset) 57 3.2.1.1. Missense AID SNP dataset evaluation 58 3.2.1.2. Assessment of additional genes in LD with missense SNPs 60 3.2.2. Identification of gene candidate dataset based on non-coding AID GWAS SNPs (ncSNP gene dataset) 61 3.2.2.1. Evaluation of non-coding SNP dataset 61 3.2.2.2. Functional assessment of non-coding RNA genes 65 3.2.2.3. Functional assessment of microRNA genes 66 3.3. Functional analyses of GWAS AID SNP gene set: gene-pathway prioritization 66 3.3.1.2. Network analyses using STRING software tools (STRING network analyses) 70 3.3.2. Pathway analyses 72 3.3.2.1. Identification of KEGG pathway dataset for the GWAS AID SNP genes 72 3.3.2.1.1. Identification of KEGG pathway dataset for the GWAS AID missSNP genes 73 3.3.2.1.2. Identification of KEGG pathway dataset for the GWAS AID ncSNP genes 73 3.3.2.1.3. Classification of AID SNP KEGG pathway into modules 75 3.3.2.2. Identification of KEGG pathways of five anti-TNF biologics and TNF 75 3.3.2.3. Curative pathways: KEGG pathway dataset common to both GWAS AID SNP pathways and TNF signaling pathways 77 3.3.2.4. Pathways enrichment for AID GWAS SNP gene sets by STRING 81 3.3.2.5. Pathways enrichment of AID GWAS SNP genes by ConsensusPathDB 81 3.4. Functional analyses of AID GWAS SNP data using Gene Ontology 83 3.4.2. GO term enrichments by ConsensusPathDB 84 3.5. Disease prioritization 85 3.5.1. Disease Connect DB enrichment for AID SNP dataset 85 3.5.2. STRING disease enrichment for AID GWAS SNP dataset 86 4. DISCUSSION 88 4.1. GWAS AID Associations 89 4.1.1. P value 89 4.1.2. Rare vs. common (and low frequency common) SNPs 91 4.1.3. Genomic context, location, and LD of associated AID SNPs 92 4.1.4. Why there are no AID SNPs located on sex chromosomes? 94 4.1.5. Immunochip and its influence on AID SNPs 94 4.2. Functional effects of SNP-gene prioritization 95 4.2.1. Functional effects of AID missense SNPs 96 4.2.2. Regulatory effect of AID missense SNPs 98 4.2.3. Functional effects of AID synonymous coding SNPs 98 4.2.4. Functional effect of non-coding AID SNPs 100 4.2.5. Regulatory effect of AID ncSNP associations 100 4.2.6. Genes linked to ncSNPs 102 4.2.7. Relevance of eQTL results linked to AID SNPs 102 4.2.8. MicroRNA genes 103 4.3. Network and pathway analyses 104 4.3.1. Network analyses or gene-network prioritization 104 4.3.2. Pathway analyses or gene-pathway prioritization 107 4.4. Gene Ontology helps clarify the function of GWAS AID SNPs 111 4.4.1. GO disease terms enrichment for AID GWAS SNP gene sets 112 4.5. Pleiotropy at the gene and disease level 113 5. CONCLUSION 116 6. REFERENCES 118 7. TABLES AND FIGURES 135 7.1. Tables: Table 1. AID-associated SNPs and genes for each disease Table 2. Missense GWAS AID SNPs characteristics: I part (amino acid change caused by missSNPs, PolyPhen-2 evaluation of functional consequences) Table 3.