Bioinformatics Software and Methods for Genome-Wide Association and Chip-Seq Studies

BIOINFORMATICS SOFTWARE AND METHODS FOR GENOME-WIDE ASSOCIATION AND CHIP-SEQ STUDIES by Ryan Patrick Welch A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Bioinformatics) in the University of Michigan 2013 Doctoral Committee: Professor Michael Boehnke, Co-Chair Associate Research Professor Laura J. Scott, Co-Chair Professor Goncalo R. Abecasis Professor David T. Burke Professor Margit Burmeister Assistant Professor Maureen Sartor Peter J. Woolf, FoodWiki LLC Dedication To my parents, Richard and Diane, for their love, support, and providing every opportunity a son could hope for. ii Acknowledgments I am very fortunate in my graduate career to have been advised by some truly wonderful mentors. My first lab rotation was with Dr. Peter Woolf, whom I credit with giving a former engineer an interest in statistics, where none previously existed (not an easy task!) I can recall many long meetings, gone completely off topic to the research at hand, that were very inspiring and shaped the way I would think for the rest of my time in graduate school. While our work together did not become a part of my thesis, I can safely say I would not have arrived at this destination without his guidance. I cannot thank him enough. I would like to thank Prof. Michael Boehnke and Prof. Laura Scott for their mentoring and support, and for taking a chance on a Bioinformatics student. They provided me an amazing opportunity to learn in a world class environment, to present and learn at many conferences, and to make a meaningful contribution to type 2 diabetes genetics. Both were infinitely patient with my frequent “it’s almost done!” software projects. And I certainly would not be the public speaker I am today without their advice about giving scientific presentations, and their willingness to listen to my rambling practice talks. I am also grateful to have studied under the guidance of Prof. Maureen Sartor. While we began working together only a year or so before the end of my studies, she taught me many things both about science, and about life, in that short time that I will not forget. I would like to thank my committee members: Prof. David Burke, Prof. Margit Burmeister, Prof. Goncalo Abecasis for their advice, and taking the time to attend my frequent (and long) committee meetings. One of the greatest benefits to working in our group was the opportunity to work with outstanding postdoctoral fellows and staff members, without whom I could not have finished my dissertation. I would like to thank Dr. Cristen Willer and Dr. Tanya Teslovich for their mentoring, and giving me the chance to work on two very successful thesis projects. Dr. Anne Jackson and Dr. Heather Stringham helped me on countless projects and I would not have made the progress I did without them. Terry Gliedt and Peter iii Chines taught me how to write good software, how to manage databases, and provided me many tools and tricks that I was fortunate to learn. My dissertation would not have been possible without support from many staff members and friends within the Bioinformatics Program, and the Center for Statistical Genetics (CSG). Susan Burke, Dawn Keene, Peggy White, and Julia Eussen were amazingly patient with all of problems and questions. I am sure I always had a perfectly good reason to show up in their offices at the last minute. I think it goes without saying that graduate school is not just about science, but about the relationships and friends that are made throughout. I am fortunate to have made many friends through the Bioinformatics Program, my lab, and my housemates, without whom I would not have had nearly as much fun as I did. From the early days, I remember many great outings with friends Patrick Harrington, Yongsheng Huang, Abhik Shah, Angel Alvarez, Greg Boggy, Huy Vuong, Jayson Falkner, Arun Manoharan, and Datta (whose last name is too long for me to even attempt to spell.) I thank my many housemates from over the years Steffen Froehlich, Chris Iacovella, Andreas Sophocleous, Gang Su, Michael Workman, Aaron Reina, Cameron Nelson, and Mark Reppell, for giving me a great place to come home to, and a moderately clean one at that. I am grateful to my friends at CSG and SPH, who made many long days in the office enjoyable: Clement Ma, Adrian Tan, Pranav Yajnik, Robert Weyant, Peng Zhang, and Matthew Zawistowski. And finally, but most certainly not least, I cannot express my gratitude and thanks to longtime friends Ted Arneault and Connie Chen for listening to my ramblings and frustrations, and being there when I needed it most. iv Table of Contents Dedication ........................................................................................................................ii Acknowledgments ........................................................................................................... iii List of Figures ..................................................................................................................ix List of Tables ................................................................................................................. xiii List of Abbreviations ...................................................................................................... xiv Abstract ..........................................................................................................................xv Chapter 1 Introduction .................................................................................................. 1 Chapter 2 Snipper: a tool for extracting and searching biological annotations of genes implicated by trait-associated SNPs ................................................................................ 8 2.1 Introduction ........................................................................................................ 8 2.2 Methods ............................................................................................................. 9 2.2.1 Implementation ............................................................................................ 9 2.2.2 Databases and gene annotations .............................................................. 10 2.2.3 Usage ........................................................................................................ 10 2.2.4 HTML report .............................................................................................. 11 2.2.5 Snipper interface ....................................................................................... 16 2.2.6 Example ..................................................................................................... 17 2.3 Conclusion ....................................................................................................... 18 2.4 Supplementary Data ........................................................................................ 18 Chapter 3 LocusZoom: regional visualization of genome-wide association scans ..... 19 3.1 Introduction ...................................................................................................... 19 3.2 Implementation ................................................................................................. 19 v 3.2.1 Features and functionality .......................................................................... 19 3.2.2 Usage ........................................................................................................ 21 3.3 Conclusion ....................................................................................................... 23 Chapter 4 Large-scale association study using the Metabochip array reveals new loci influencing glycemic traits ............................................................................................. 24 4.1 Introduction ...................................................................................................... 24 4.2 Methods ........................................................................................................... 25 4.2.1 Study design .............................................................................................. 25 4.2.2 Studies....................................................................................................... 26 4.2.3 Phenotypes ................................................................................................ 27 4.2.4 Trait transformations and adjustment ........................................................ 27 4.2.5 Genotyping and quality control .................................................................. 28 4.2.6 Statistical analysis ..................................................................................... 29 4.2.7 Fasting glucose, fasting insulin, and 2-hr glucose associated signals selection strategy ................................................................................................... 29 4.2.8 Fine-mapping of known glycemic trait loci ................................................. 30 4.2.9 Associations of glycemic trait variants with related traits ........................... 30 4.2.10 SNP/gene biology and functional annotation .......................................... 31 4.3 Results ............................................................................................................. 34 4.3.1 Approaches to identify loci associated with glycemic traits ........................ 34 4.3.2 Fine-mapping of established loci ............................................................... 48 4.3.3 Pathway analysis

Bioinformatics Software and Methods for Genome-Wide Association and Chip-Seq Studies

Ovarian Gene Expression in the Absence of FIGLA, an Oocyte

Whole Exome Sequencing in Families at High Risk for Hodgkin Lymphoma: Identification of a Predisposing Mutation in the KDR Gene

Goat Anti-Actin-Like 7B Antibody Peptide-Affinity Purified Goat Antibody Catalog # Af1021b

Genome-Wide Association Studies of Smooth Pursuit and Antisaccade Eye Movements in Psychotic Disorders: ﬁndings from the B-SNIP Study

Methylation of Leukocyte DNA and Ovarian Cancer

A Rapidly Evolving Actin Mediates Fertility and Developmental Tradeoffs in Drosophila

Supercomputing Pipeline for the Ark (SPARK) Thesis

The Human Gene Connectome As a Map of Short Cuts for Morbid Allele Discovery

Epigenetic Mechanisms Are Involved in the Oncogenic Properties of ZNF518B in Colorectal Cancer

The Human Gene Connectome As a Map of Short Cuts for Morbid Allele Discovery

Isolation and Identification of Proteins from Swine Sperm Chromatin and Nuclear Matrix

Content Based Search in Gene Expression Databases and a Meta-Analysis of Host Responses to Infection