Identifying Novel Disease-Associated Variants and Understanding The
Total Page:16
File Type:pdf, Size:1020Kb
Identifying Novel Disease-variants and Understanding the Role of the STAT1-STAT4 Locus in SLE A dissertation submitted to the Graduate School of University of Cincinnati In partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Immunology Graduate Program of the College of Medicine by Zubin H. Patel B.S., Worcester Polytechnic Institute, 2009 John B. Harley, M.D., Ph.D. Committee Chair Gurjit Khurana Hershey, M.D., Ph.D Leah C. Kottyan, Ph.D. Harinder Singh, Ph.D. Matthew T. Weirauch, Ph.D. Abstract Systemic Lupus Erythematosus (SLE) or lupus is an autoimmune disorder caused by an overactive immune system with dysregulation of both innate and adaptive immune pathways. It can affect all major organ systems and may lead to inflammation of the serosal and mucosal surfaces. The pathogenesis of lupus is driven by genetic factors, environmental factors, and gene-environment interactions. Heredity accounts for a substantial proportion of SLE risk, and the role of specific genetic risk loci has been well established. Identifying the specific causal genetic variants and the underlying molecular mechanisms has been a major area of investigation. This thesis describes efforts to develop an analytical approach to identify candidate rare variants from trio analyses and a fine-mapping analysis at the STAT1-STAT4 locus, a well-replicated SLE-risk locus. For the STAT1-STAT4 locus, subsequent functional biological studies demonstrated genotype dependent gene expression, transcription factor binding, and DNA regulatory activity. Rare variants are classified as variants across the genome with an allele-frequency less than 1% in ancestral populations. Identifying these variants remains a statistical and logistical challenge due to the large number of individuals needed for a well-powered study. Currently, trio-based approaches are used to identify disease-associated rare variants. We developed a novel analytical approach to analyze high depth sequencing data from trio-studies to nominate candidate disease associated rare variants. Our approach uses multiple low-stringency filters to remove the sequencing artifacts while leaving true variation for the analysis. We tested our approach using trio data generated from three different DNA sources and increased our concordance rate from 96% to 99.999% between the three different DNA sources. We implemented our analytical approach in an analytical suite CASSI (Cincinnati Analytical Suite for Sequencing Informatics) for use by the research community at large. ii Common variants are defined as variants with a minor allele frequency greater than 1%. Many common variants are associated with lupus through GWA studies and targeted sequencing studies. One of the well-replicated lupus associations is in the STAT1-STAT4 locus. Over 25 studies have identified this association in multiple different ancestries, but a causal variant has not been identified. Using a fine-mapping analysis with complementary frequentist and Bayesian approaches, we nominate a single-nucleotide polymorphism, rs11889341, as causal. We show that this variant is shared across all ancestries in our trans-ancestral discovery and replication cohorts. We also validate the allele-dependent functionality of this variant through an expression quantitative trait loci, by identifying differential transcription factor binding, and by showing an allelic gene-regulatory element. Overall, our analytical approaches allow for the identification and functional biology validation of novel risk variants, both common and rare. Our biological experiments provide insight into a potential mechanism to explain the association between rs11889341 and lupus-risk. iii This page is intentionally left blank iv Acknowledgements The work described in this thesis would not have been possible without the support and guidance of the following individuals. I would like to thank my PhD mentor Dr. John B Harley for providing me with invaluable guidance and mentorship throughout my PhD. His efforts in securing funding for my project has allowed me to use state-of-the-art analytical and experimental technologies throughout my studies. He has challenged me to improve as a scientist and allowed me to experience some failures throughout my PhD so that I may learn and improve myself. I would like to thank Dr. Kenneth Kaufman, who has generously offered his time and expertise to help me develop as a bioinformatician. He patiently taught me about next-generation sequencing and helped me develop the CASSI story. He has always made himself available to answer my questions regarding the genetic datasets, while I was analyzing the STAT1-STAT4 locus. I would like to thank Dr. Louis Muglia for going above and beyond his role as my Senior Physician-Scientist mentor. Dr. Muglia has been an advocate for me throughout my PhD and has offered key-insights into my projects. I have always enjoyed my quarterly meetings with him, where I have been able to talk with him about not only my research, but also other my non- research life. I would like to thank Dr. Harinder Singh for serving on my thesis committee and for his mentorship throughout my PhD studies. He has challenged me during my committee meetings and made me a more sophisticated thinker. It has been a privilege to learn from a pioneer in the field of transcription factor signaling. I would like to thank Arthur Lynch, Joshua Lee, Connor Schroeder, and Avery Maddox for their help with the cloning projects. v I would like to thank Dr. David Hildeman for providing me with encouragement and critical feedback during my first couple of years in graduate schools. I would like to thank Isabel Castro for all of her help with navigating the administrative intricacies of graduate school. Isabel has always been willing to answer the numerous questions I have had regarding graduate school paperwork. I would like to thank Dr. Neeru Hershey for serving on my PhD committee and for leading the MD/PhD program at UC. On my committee Dr. Hershey has provided helpful critical feedback to improve my scientific thinking. Outside of my committee, Dr. Hershey has always been a great physician-scientist role model. She has always been receptive of my feedback. She has worked tirelessly to improve the MD/PhD program. I have reaped the benefits of her tireless work as an MD/PhD student. I would like to thank Dr. Erin Zoller for her critical feedback throughout my graduate studies. Erin has challenged me from the first day to think through the scientific method. Her constant critical feedback allowed me to develop as a scientist. I would like to thank Daniel Miller for being a great friend and a colleague. Daniel has been invaluable as a resource to help me troubleshoot my experiments and to help me improve my experiments. I would like to thank Carmy Forney for being a great friend and colleague. She provided invaluable help with the luciferase experiments. I would like to thank Xiaoming Lu for being a great friend and fellow graduate student. Xiaoming has provided me with a great deal of guidance throughout my PhD project. We have had the opportunity to collaborate on projects with each other and learn from each other’s’ experience. Xiaoming has also been a great sounding-board for my analytical and experimental ideas. I would like to thank Dr. Matthew Weirauch for his invaluable mentorship. He graciously agreed to be a co-mentor for my application to join the Multidisciplinary Training Grant. As my co- vi mentor, Matt has provided valuable feedback for experiments validating my genetic findings in the STAT1-STAT4 locus. Matt has always provided me critical feedback on my oral presentations and helped me improve as a presenter. I owe a great deal of gratitude to Dr. Leah C. Kottyan. She has always been an advocate and mentor for me. I would not have been able to get through my graduate studies without her support and mentorship. She has always been available to help me through both my analyses and experiments. She has advocated for me throughout graduate school. Through her constant mentorship, I have become a better scientist. I would like to dedicate this Thesis to my parents, Hasmukh and Madhuri Patel, and my wife Shradha Patel. I would not have had the opportunity to study in a PhD program without the efforts of my parents in immigrating to this country. My success has been on the foundation of their sacrifices. A special thanks goes to Shradha for being a supportive partner through the highs and lows of graduate school. She has been patient and understanding throughout my graduate studies. A special thanks goes to my brother Kaushal Patel, who started graduate school with me. It has been great to have someone with whom I can commiserate and celebrate the downs and ups of graduate studies. vii Table of Contents Abstract .................................................................................................................................... ii Acknowledgements ................................................................................................................. v Table of Contents ................................................................................................................... viii List of Tables and Figures ......................................................................................................xii List of Abbreviations ..............................................................................................................xv Chapter 1: Introduction ..........................................................................................................