
<p><strong>Identifying Novel Disease-variants and Understanding the </strong><br><strong>Role of the STAT1-STAT4 Locus in SLE </strong></p><p>A dissertation submitted to the Graduate School of University of Cincinnati In partial fulfillment of the requirements for the degree of </p><p>Doctor of Philosophy </p><p>in the Immunology Graduate Program of the College of Medicine by </p><p>Zubin H. Patel <br>B.S., Worcester Polytechnic Institute, 2009 </p><p>John B. Harley, M.D., Ph.D. Committee Chair <br>Gurjit Khurana Hershey, M.D., Ph.D <br>Leah C. Kottyan, Ph.D. Harinder Singh, Ph.D. <br>Matthew T. Weirauch, Ph.D. </p><p><strong>Abstract </strong></p><p>Systemic Lupus Erythematosus (SLE) or lupus is an autoimmune disorder caused by an overactive immune system with dysregulation of both innate and adaptive immune pathways. It can affect all major organ systems and may lead to inflammation of the serosal and mucosal surfaces. The pathogenesis of lupus is driven by genetic factors, environmental factors, and gene-environment interactions. Heredity accounts for a substantial proportion of SLE risk, and the role of specific genetic risk loci has been well established. Identifying the specific causal genetic variants and the underlying molecular mechanisms has been a major area of investigation. This thesis describes efforts to develop an analytical approach to identify candidate rare variants from trio analyses and a fine-mapping analysis at the <em>STAT1-STAT4 </em>locus, a well-replicated SLE-risk locus. For the <em>STAT1-STAT4 </em>locus, subsequent functional biological studies demonstrated genotype dependent gene expression, transcription factor binding, and DNA regulatory activity. <br>Rare variants are classified as variants across the genome with an allele-frequency less than 1% in ancestral populations. Identifying these variants remains a statistical and logistical challenge due to the large number of individuals needed for a well-powered study. Currently, trio-based approaches are used to identify disease-associated rare variants. We developed a novel analytical approach to analyze high depth sequencing data from trio-studies to nominate candidate disease associated rare variants. Our approach uses multiple low-stringency filters to remove the sequencing artifacts while leaving true variation for the analysis. We tested our approach using trio data generated from three different DNA sources and increased our concordance rate from 96% to 99.999% between the three different DNA sources. We implemented our analytical approach in an analytical suite CASSI (Cincinnati Analytical Suite for Sequencing Informatics) for use by the research community at large. </p><p>ii <br>Common variants are defined as variants with a minor allele frequency greater than 1%. <br>Many common variants are associated with lupus through GWA studies and targeted sequencing studies. One of the well-replicated lupus associations is in the <em>STAT1-STAT4 </em>locus. Over 25 studies have identified this association in multiple different ancestries, but a causal variant has not been identified. Using a fine-mapping analysis with complementary frequentist and Bayesian approaches, we nominate a single-nucleotide polymorphism, rs11889341, as causal. We show that this variant is shared across all ancestries in our trans-ancestral discovery and replication cohorts. We also validate the allele-dependent functionality of this variant through an expression quantitative trait loci, by identifying differential transcription factor binding, and by showing an allelic gene-regulatory element. Overall, our analytical approaches allow for the identification and functional biology validation of novel risk variants, both common and rare. Our biological experiments provide insight into a potential mechanism to explain the association between rs11889341 and lupus-risk. </p><p>iii </p><p>This page is intentionally left blank </p><p>iv </p><p><strong>Acknowledgements </strong></p><p>The work described in this thesis would not have been possible without the support and guidance of the following individuals. I would like to thank my PhD mentor <strong>Dr. John B Harley </strong>for providing me with invaluable guidance and mentorship throughout my PhD. His efforts in securing funding for my project has allowed me to use state-of-the-art analytical and experimental technologies throughout my studies. He has challenged me to improve as a scientist and allowed me to experience some failures throughout my PhD so that I may learn and improve myself. I would like to thank <strong>Dr. Kenneth Kaufman</strong>, who has generously offered his time and expertise to help me develop as a bioinformatician. He patiently taught me about next-generation sequencing and helped me develop the CASSI story. He has always made himself available to answer my questions regarding the genetic datasets, while I was analyzing the <em>STAT1-STAT4 </em>locus. I would like to thank <strong>Dr. Louis Muglia </strong>for going above and beyond his role as my Senior Physician-Scientist mentor. Dr. Muglia has been an advocate for me throughout my PhD and has offered key-insights into my projects. I have always enjoyed my quarterly meetings with him, where I have been able to talk with him about not only my research, but also other my nonresearch life. I would like to thank <strong>Dr. Harinder Singh </strong>for serving on my thesis committee and for his mentorship throughout my PhD studies. He has challenged me during my committee meetings and made me a more sophisticated thinker. It has been a privilege to learn from a pioneer in the field of transcription factor signaling. </p><p>I would like to thank <strong>Arthur Lynch</strong>, <strong>Joshua Lee</strong>, <strong>Connor Schroeder</strong>, and <strong>Avery Maddox </strong>for </p><p>their help with the cloning projects. v<br>I would like to thank <strong>Dr. David Hildeman </strong>for providing me with encouragement and critical feedback during my first couple of years in graduate schools. I would like to thank <strong>Isabel Castro </strong>for all of her help with navigating the administrative intricacies of graduate school. Isabel has always been willing to answer the numerous questions I have had regarding graduate school paperwork. I would like to thank <strong>Dr. Neeru Hershey </strong>for serving on my PhD committee and for leading the MD/PhD program at UC. On my committee Dr. Hershey has provided helpful critical feedback to improve my scientific thinking. Outside of my committee, Dr. Hershey has always been a great physician-scientist role model. She has always been receptive of my feedback. She has worked tirelessly to improve the MD/PhD program. I have reaped the benefits of her tireless work as an MD/PhD student. I would like to thank <strong>Dr. Erin Zoller </strong>for her critical feedback throughout my graduate studies. Erin has challenged me from the first day to think through the scientific method. Her constant critical feedback allowed me to develop as a scientist. I would like to thank <strong>Daniel Miller </strong>for being a great friend and a colleague. Daniel has been invaluable as a resource to help me troubleshoot my experiments and to help me improve my experiments. I would like to thank <strong>Carmy Forney </strong>for being a great friend and colleague. She provided invaluable help with the luciferase experiments. I would like to thank <strong>Xiaoming Lu </strong>for being a great friend and fellow graduate student. Xiaoming has provided me with a great deal of guidance throughout my PhD project. We have </p><p>had the opportunity to collaborate on projects with each other and learn from each other’s’ </p><p>experience. Xiaoming has also been a great sounding-board for my analytical and experimental ideas. I would like to thank <strong>Dr. Matthew Weirauch </strong>for his invaluable mentorship. He graciously agreed to be a co-mentor for my application to join the Multidisciplinary Training Grant. As my covi mentor, Matt has provided valuable feedback for experiments validating my genetic findings in the <em>STAT1-STAT4 </em>locus. Matt has always provided me critical feedback on my oral presentations and helped me improve as a presenter. I owe a great deal of gratitude to <strong>Dr. Leah C. Kottyan</strong>. She has always been an advocate and mentor for me. I would not have been able to get through my graduate studies without her support and mentorship. She has always been available to help me through both my analyses and experiments. She has advocated for me throughout graduate school. Through her constant mentorship, I have become a better scientist. I would like to dedicate this Thesis to my parents, <strong>Hasmukh </strong>and <strong>Madhuri Patel</strong>, and my wife <strong>Shradha Patel</strong>. I would not have had the opportunity to study in a PhD program without the efforts of my parents in immigrating to this country. My success has been on the foundation of their sacrifices. A special thanks goes to Shradha for being a supportive partner through the highs and lows of graduate school. She has been patient and understanding throughout my graduate studies. A special thanks goes to my brother <strong>Kaushal Patel</strong>, who started graduate school with me. It has been great to have someone with whom I can commiserate and celebrate the downs and ups of graduate studies. </p><p>vii </p><p><strong>Table of Contents </strong></p><p><strong>Abstract .................................................................................................................................... ii Acknowledgements ................................................................................................................. v Table of Contents...................................................................................................................viii List of Tables and Figures......................................................................................................xii List of Abbreviations ..............................................................................................................xv Chapter 1: Introduction ........................................................................................................... 1 </strong></p><p><em>Immune System in Health and Diseas e . ................................................................................. 1 </em></p><p>Hypersensitivity................................................................................................................... 2 Immunodeficiency............................................................................................................... 3 </p><p><a href="#0_1"><em>Systemic Lupus Erythematosus.............................................................................................. 4 </em></a></p><p>Clinical Features & Diagnostic Criteria................................................................................ 4 Prevalence of Lupus ........................................................................................................... 5 Pathoetiology of Lupus........................................................................................................ 5 Cytokines in Lupus.............................................................................................................. 7 <br>Type I interferons ............................................................................................................ 7 <a href="#0_0">Type II Interferons ........................................................................................................... 8 </a>Interleukin-17 (IL-17) ....................................................................................................... 9 Interleukin-6 (IL-6)..........................................................................................................10 Interleukin-21 (IL-21) ......................................................................................................10 <a href="#0_9">Interleukin-2 (IL-2</a><a href="#0_9">)</a><a href="#0_9">.</a><a href="#0_9">.........................................................................................................11 </a></p><p>viii <br>B-cell Activating Factor (BAFF) ......................................................................................12 <br>Immune cell dysfunction in lupus .......................................................................................12 <br>B-cells ............................................................................................................................13 T-cells ............................................................................................................................14 Dendritic Cells ................................................................................................................15 Monocytes......................................................................................................................15 Neutrophils.....................................................................................................................16 <br>Complement system in lupus .............................................................................................17 Mouse Models of Lupus.....................................................................................................18 <br>Spontaneous Models of Lupus .......................................................................................18 Induced Models of Lupus ...............................................................................................20 <br>Lupus Therapeutics ...........................................................................................................21 </p><p><a href="#0_3"><em>Identifying the Genetic Component of SL</em></a><a href="#0_3"><em>E</em></a><a href="#0_3"><em>.</em></a><a href="#0_3"><em>...........................................................................23 </em></a></p><p>Linkage Studies .................................................................................................................24 Genome-wide Association Studies (GWAS).......................................................................24 Monogenic Lupus...............................................................................................................27 Identifying Coding variants associated with Lupus .............................................................28 </p><p><a href="#0_3"><em>Statistical Methods to identify lupus risk variant</em></a><a href="#0_3"><em>s</em></a><a href="#0_3"><em>.</em></a><a href="#0_3"><em>..................................................................29 </em></a></p><p>Common Variants ..............................................................................................................29 <br>Frequentist Methods.......................................................................................................29 Bayesian Methods..........................................................................................................32 <br>Identifying Rare Variants in Association Studies ................................................................33 Sequencing Based Approach to identify rare variants........................................................34 <a href="#0_23">Identifying expression quantitative trait loci (eQTLs) ..........................................................35 </a></p><p><a href="#0_5"><em>Regulation of Gene Expressio</em></a><a href="#0_5"><em>n</em></a><a href="#0_5"><em>.</em></a><a href="#0_5"><em>............................................................................................36 </em></a></p><p>ix <br>Transcription Factors .........................................................................................................37 Alternative Splicing ............................................................................................................39 RNA Interference ...............................................................................................................40 Long non-coding RNA (lncRNA) ........................................................................................40 RNA-binding Proteins ........................................................................................................42 </p><p><a href="#0_3"><em>STAT1 & STAT4 Proteins......................................................................................................43 </em></a></p><p>STAT1 & STAT4 Function in the Immune Signaling...........................................................43 Role of STAT1 and STAT4 in Lupus Pathogenesis............................................................45 Genetic Associations with STAT1/STAT4 locus in SLE......................................................46 </p><p><a href="#0_28"><em>Conclusion ............................................................................................................................48 </em></a></p><p><a href="#0_3"><strong>Chapter 2: The struggle to find reliable results in exome sequencing data: filtering out </strong></a><strong>Mendelian errors.....................................................................................................................50 </strong></p><p><a href="#0_3"><strong>Chapter 3: A plausibly causal functional lupus-associated risk variant in the </strong></a><a href="#0_3"><strong>STAT1- </strong></a><strong>STAT4 locus............................................................................................................................64 </strong></p><p><strong>Chapter 4: Discussion ..........................................................................................................107 </strong></p><p><em>Summary of Chapter 2 ........................................................................................................107 </em><a href="#0_30"><em>Summary of Chapter 3 ........................................................................................................109 </em></a><a href="#0_3"><em>Implication of increased STAT1 expression.........................................................................111 </em></a><a href="#0_31"><em>Use of Lymphoblastoid Cell-lines (LCLs) to study rs11889341 biolog</em></a><a href="#0_31"><em>y</em></a><a href="#0_31"><em>.</em></a><a href="#0_31"><em>...............................114 </em></a><a href="#0_31"><em>Identification of STAT1 or STAT4</em></a><a href="#0_31"><em> </em></a><a href="#0_31"><em>eQTL in other relevant immune cell subtypes ................116 </em></a><a href="#0_32"><em>Understanding the role of HMGA1 in rs11889341 biology ...................................................117 </em></a><a href="#0_3"><em>Determining whether rs11889341 is causa</em></a><a href="#0_3"><em>l</em></a><a href="#0_3"><em>.</em></a><a href="#0_3"><em>........................................................................119 </em></a></p><p>x</p><p><em>STAT1-STAT4 association in other autoimmune diseas e . ...................................................120 </em><a href="#0_33"><em>Evolution of the STAT1-STAT4 locus ..................................................................................122 </em></a><a href="#0_34"><em>SLE-risk Haplotype in Human History..................................................................................124 </em></a><a href="#0_35"><em>Summary and Conclusions..................................................................................................125 </em></a></p><p><a href="#0_3"><strong>References ............................................................................................................................128 </strong></a></p><p>xi </p><p><strong>List of Tables and Figures </strong></p><p><strong>Chapter 1 </strong></p><p></p><p><strong>Table 1: Selected Mouse models of SLE (Pg. 20) Table 2: A survey of STAT1-STAT4 associations with SLE (Pg. 41) </strong></p><p><strong>Chapter 2 </strong></p><p></p><p><strong>Table 1 | Sequencing quality parameters for all three individuals in blood, buccal, and saliva trio (Pg. 50) Table 2 | Concordance analysis of DNA from three individuals was collected from three biological sources and sequenced. (Pg. 55) Table 3 | DNA from the proband (child) was collected from three biological sources and sequenced. (Pg. 55) Table 4 | Candidate causative sequence variants were identified in unfiltered and filtered data from the same trio that was sequenced three times using different DNA sources. (Pg. 56) </strong></p><p></p><p><strong>Figure 1 | Depth of Coverage (Pg. 51) Figure 2 | Genotype Quality Score (Pg. 52) Figure 3 | Alternate Allele Ratio (Pg. 53) Figure 4 | Filter Schema (Pg. 54) Figure 5 | Effect of applying multiple filters (Pg. 54) Figure 6 | Impact of Filters upon Data Quality (Pg. 54) Figure 7 | Graphical Representation of the Cincinnati Analytical Suite for Sequencing Informatics (Pg. 56) </strong></p><p>xii </p><p><strong>Chapter 3 </strong></p><p></p><p><strong>Table S1. Discovery and replication cohorts (Pg. 90) Table S2: List of oligonucleotides and primers used in experiments. (Pg. 91) Table S3. Bayesian association analysis of discovery cohort demonstrates shared variants across ancestries. (Pg. 93) </strong></p><p></p><p><strong>Table S4: Bayesian association analysis of replication cohort demonstrates shared variants across ancestries. (Pg. 97) Figure 1: STAT1-STAT4 variants show a genome-wide association in a multi- ethnic discovery cohort. (Pg. 84) Figure 2: Discovery cohort Bayesian analysis identifies a small group of genetic variants in the STAT4 gene that comprise the 99% credible set in multiple ancestral analyses. (Pg. 85) </strong></p><p></p><p><strong>Figure 3: The single variant shared across the 99%-credible sets of the replication cohort also has the strongest association in a weighted trans-ancestral meta- analysis. (Pg. 86) </strong></p><p></p><p><strong>Figure 4: The lupus risk allele of rs11889341 increases HMGA1 binding and decreases repressor regulatory activity in a genotype-dependent manner. (Pg. 87) Figure S1. A single major genetic effect is accounted for by genotyped genetic variant rs11889341. (Pg. 92) Figure S2. Ancestrally-Informed Credible Set variants show strongest association in weighted trans-ancestral meta-analysis of the discovery cohort. (Pg. 94) Figure S3. The association of lupus-risk variants within the STAT4 gene replicates in a replication cohort. (Pg. 95) </strong></p><p>xiii </p><p></p><p><strong>Figure S4. Bayesian replication analysis identifies variants that comprise the 99% credible set in the European American and Asian American Replication Cohort. (Pg. 96) Figure S5: AICS Variant overlaps with H3K4Me1 ChIP-seq peaks in Lymphoblastoid cell-lines (LCL). (Pg. 98) </strong></p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages188 Page
-
File Size-