In This Dissertation, I Describe My Genome-Wide Linkage Studies Of
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by ETD - Electronic Theses & Dissertations INTEGRATED ANALYSIS OF GENETIC AND PROTEOMIC DATA By David Michael Reif Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Human Genetics December, 2006 Nashville, Tennessee Approved: Professor James E. Crowe Professor Douglas H. Fisher Professor Jonathan L. Haines Professor Jason H. Moore Professor Scott M. Williams Copyright © 2006 David Michael Reif All Rights Reserved This work is dedicated to my family—Mom, Dad, and Dan—for teaching me how to work hard and play nice with others. And to Alison, for making me happier than I have ever been—no matter what else is going on. iii ACKNOWLEDGMENTS My graduate training was supported by the Vanderbilt University Interdisciplinary Graduate Program (1st year), the NIH Human Genetics Training Grant (2nd-3rd years), and my mentor, Jason H. Moore (4th-5th years). I want to acknowledge the vast contributions and support of the scientists and staff at both Vanderbilt and Dartmouth Medical School. In the Vanderbilt Center for Human Genetics Research, I wish to thank Jackie Bartlett, Kylee Spencer, Tricia Thornton-Wells, Jacob McCauley, Scott Dudek, Jeff Canter, Marylyn Ritchie, Kim Taylor, Alicia Davis, Lynn Roberts, and Maria Comer. I thank Chun Li for his insights into teaching and his philosophy on statistics in science. At Dartmouth, I would like to thank Todd Holden and Nate Barney. Special thanks go to Bill White at Dartmouth for his friendship and extensive help with computational issues, as well as stimulating discussions on topics relating to science and beyond. Special thanks also go to Brett McKinney for his support on both scientific and personal levels. Time and again, his inquisitiveness and optimism helped me find solutions to uncooperative problems. I am greatly indebted to the members of my thesis committee (James Crowe, Jr., Douglas Fisher, Jonathan Haines, Jason Moore, and Scott Williams) for their invaluable time, guidance, friendship, and support. Their ability to effective guide a project involving biological, computational, genetic, and immunological aspects is a testament to their diverse interdisciplinary expertise and commitment to training students. iv TABLE OF CONTENTS Page DEDICATION .......................................................................................................iii ACKNOWLEDGMENTS .......................................................................................iv LIST OF TABLES ............................................................................................... viii LIST OF FIGURES ...............................................................................................ix LIST OF ABBREVIATIONS .................................................................................. x Chapter I. INTRODUCTION ....................................................................................... 1 II. INTEGRATED ANALYSIS OF GENETIC, GENOMIC, AND PROTEOMIC DATA................................................................................... 6 A case for integrated analysis of multiple data types ...................... 7 Organisms as complex systems...................................................... 8 Biological complexity along the progression from genotype to phenotype ....................................................................................... 9 Methodology concerns and missing data ...................................... 12 Joint analysis simulation study...................................................... 14 Simulation models .............................................................. 15 Datasets ............................................................................. 17 Data analysis...................................................................... 18 Software and hardware ...................................................... 20 Simulation results and discussion ...................................... 21 Relevance of the joint analysis simulation study and application to real data .................................................................................... 24 How realistic are the disease models? ............................... 24 How realistic is the scenario in which key functional proteins will be missing from the data analyzed? ............... 25 How realistic is the scenario in which functional SNPs are measured when key functional proteins are not? ............... 26 Conclusions and future directions ................................................. 27 Summary of key issues ................................................................. 29 Acknowledgments......................................................................... 30 References.................................................................................... 30 v III. PROTEOMIC BIOMARKERS ASSOCIATED WITH ADVERSE EVENTS FOLLOWING SMALLPOX VACCINATION.............................................. 35 Introduction ................................................................................... 37 Subjects, materials, and methods ................................................. 38 Study subjects .................................................................... 38 Clinical assessments.......................................................... 39 Sample collection ............................................................... 40 Proteomic assay................................................................. 40 Statistical analysis methods ............................................... 44 Results .......................................................................................... 48 Discussion..................................................................................... 52 Acknowledgments......................................................................... 58 References.................................................................................... 58 IV. GENETIC POLYMORPHISMS ASSOCIATED WITH ADVERSE EVENTS FOLLOWING SMALLPOX VACCINATION .............................. 62 Introduction ................................................................................... 64 Subjects, materials, and methods ................................................. 66 Study subjects .................................................................... 66 Clinical assessments.......................................................... 67 Identification of genetic polymorphisms.............................. 67 Statistical analysis .............................................................. 98 Results .......................................................................................... 99 Demographic characteristics of subjects included in genetic analyses................................................................. 99 Genetic associations with adverse events........................ 101 Discussion................................................................................... 107 Biological mechanisms contributing to adverse events .... 107 Relationship between genetic results and proposed model of adverse events ............................................................. 108 Summary and future directions......................................... 112 Acknowledgments....................................................................... 114 References.................................................................................. 114 V. FEATURE SELECTION USING RANDOM FORESTS FOR THE INTEGRATED ANALYSIS OF MULTIPLE SIMULATED DATA TYPES................................................................................................... 118 Introduction ................................................................................. 120 Methods ...................................................................................... 121 Random forests ................................................................ 124 Data simulation................................................................. 127 Data analysis.................................................................... 133 Results ........................................................................................ 134 vi Discussion................................................................................... 139 Acknowledgments....................................................................... 142 References.................................................................................. 142 VI. INTEGRATED ANALYSIS OF GENETIC AND PROTEOMIC DATA IDENTIFIES BIOMARKERS ASSOCIATED WITH ADVERSE EVENTS FOLLOWING SMALLPOX VACCINATION ............................ 145 Introduction ................................................................................. 147 Subjects, materials, and methods ............................................... 149 Study subjects .................................................................. 149 Clinical assessments........................................................ 150 Identification of genetic polymorphisms............................ 151 Quantification of serum cytokine levels ............................ 152 Random forests ................................................................ 153 Decision trees................................................................... 155 Data analysis strategy .....................................................