Related Phenotypes a Dissertation Submitted to The

Related Phenotypes a Dissertation Submitted to The

ENVIRONMENT-WIDE ASSOCIATIONS TO DISEASE AND DISEASE- RELATED PHENOTYPES A DISSERTATION SUBMITTED TO THE PROGRAM IN BIOMEDICAL INFORMATICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Chirag Jagdish Patel August 2011 © 2011 by Chirag Jagdish Patel. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/mg775gw7130 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Atul Butte, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Jayanta Bhattacharya I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Mark Cullen Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii ABSTRACT Common diseases arise out of combination of both genetic and environmental influences. Advances in genomic technology have enabled investigators to create hypotheses regarding the contribution of genetic factors at a breathtaking pace. However, the assessment of multiple and specific environmental factors—and their interactions with the genome-- has not. We lack high-throughput analytic methodologies to comprehensively and systematically associate multiple physical and specific environmental factors, or the “envirome”, to disease and human health. We claim that the creation of hypotheses regarding the environmental contribution to disease is practicable through high-throughput analytic methods that have been well established in genomics. In the following dissertation, we develop and apply methods to systematically and comprehensively associate specific factors of the envirome with disease states, prioritizing factors for in- depth future study. The current disciplines of studying the environmental determinants of health include toxicology and epidemiology, which operate on molecular and population scales, respectively. This dissertation proposes approaches in both of these disciplines. For example, we have developed a framework to conduct the first “Environment-wide Association Study” (EWAS), systematically associating environmental factors to disease on a population scale. We have applied this framework to investigate type 2 diabetes and heart disease on cohorts that are representative United States population, finding novel and robust associations in diverse and independent cohorts. Given the lack of explained risk resulting from current day genome-wide studies, the time is ripe to usher in a more comprehensive study of the environment, or “enviromics”, toward better understanding of multifactorial diseases and their prevention. iv ACKNOWLEDGEMENTS Foremost, I thank my advisor, Dr. Atul Butte, for his undying confidence, inspiration, and guidance. Even just three years ago, it was far from my belief that the scientist whom I admired from afar would eventually take me on as a student and teach me how to compute, see, and enlighten. For Dr. Atul Butte’s supervision I am forever indebted and most fortunate. I am also indebted to my dissertation committee, Drs. Jay Bhattacharya, Mark Cullen, John Ioannidis, and Robert Tibshirani. Much of this work has come out of discussions with these individuals and it is inspired by and stands on their fundamental teachings. I thank my academic advisors, Drs. Mark Musen and Betty Cheng, for encouraging me to keep taking courses that enabled this work. I thank my many friends and colleagues in the Butte Laboratory and in the Biomedical Informatics program whom I continue to look up to and draw inspiration from. I feel honored and privileged to be among you. In particular, I thank Dr. Rong Chen, Alex Morgan, Joel Dudley, and Nick Tatonetti for providing support and encouragement when it was least expected but most needed. From teaching me how to read and write and to gifting me the newest computers, I thank my parents, Neela and Jagdish Patel. I will always be grateful to them for initiating this most rewarding journey of lifelong learning. I thank my brother, Ankur Patel, for his unflagging support and faith through thick and thin. v I thank my in-laws, Tapan and Kokila Chaudhuri, for their support and encouragement. I do not have the words to thank my partner in life, Trina Chaudhuri. I hope that I can some day enable her to achieve her aspirations as she has done for me. I am grateful to the National Library of Medicine and Applied Biosystems, Inc. for financial support. I thank Centers for Disease Control and Prevention (CDC), the National Center for Health Statistics (NCHS), and the staff and individuals who take part in the National Health and Nutrition Examination Survey (NHANES). In particular, I thank Vijay Gambhir and Peter Meyer of the CDC/NCHS for their support in accessing and processing NHANES restricted genetic data. I am grateful again to Dr. Atul Butte for providing funds to access the NHANES restricted data. I thank the staff of the Biomedical Informatics Training program and the Butte Laboratory, Mary Jeanne Oliva, Susan Aptekar, Alex Skrenchuk, Dr. Russ Altman, and Dr. Larry Fagan. Without the support of these institutions and people, this work would have not been possible. A portion of the work in this dissertation derives from two published articles and two articles currently in review for publication: Chapter 2: 1. Patel, C. J. and A. J. Butte, Predicting environmental chemical factors associated with disease-related gene expression data. BMC Med Genomics, 2010. 3(1): p. 17. vi Chapter 4: 2. Patel, C.J., J. Bhattacharya, and A.J. Butte, An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus. PLoS ONE, 2010. 5(5): p. e10746. 3. Patel, C.J., M. R. Cullen, J.P.A. Ioannidis, A.J. Butte, Non-genetic associations and correlation globes for determinants of lipid levels: an environment-wide association study. Submitted, 7/2011. Chapter 5: 4. Patel, C.J., R. Chen, J.P.A. Ioannidis, A.J. Butte, Systematic identification of interaction effects between validated genome- and environment-wide associations on Type 2 Diabetes Mellitus. Submitted, 8/2011. In the Chapter 2 work, I devised the methodology and wrote the manuscript with my advisor, Atul Butte. In the Chapter 4 work, I devised the “Environment-wide-Association Study” (EWAS) framework and carried out the analyses. For the EWAS on Type 2 Diabetes, I wrote the manuscripts with Jay Bhattacharya and Atul Butte. For the EWAS on serum lipid levels, I wrote and edited the manuscripts with Mark Cullen, John Ioannidis, and Atul Butte. Finally, in the Chapter 5 work, I devised the “Gene-Environment-Wide Association Study” (G-EWAS) framework and implemented the software to carry out the analyses. Rong Chen and Atul Butte provided the database of curated genetic information. I interpreted the data and wrote the manuscript with Rong Chen, John Ioannidis, and Atul Butte. vii TABLE OF CONTENTS CHAPTER 1: INTRODUCING MULTI-DIMENSIONAL AND DATA- DRIVEN APPROACHES TO CREATE HYPOTHESES REGARDING ENVIRONMENTAL ASSOCIATIONS TO DISEASE ................................ 1 What is the “Environment”? What is the “Envirome”? .................................... 3 Creation of robust hypotheses connecting the environment, genome, and multifactorial disease ............................................................................................ 12 Creating hypotheses comprehensively on a population scale ............................. 14 Creating hypotheses comprehensively on a molecular or toxicological scale .... 18 Discussion ............................................................................................................... 21 CHAPTER 2. MAPPING MULTIPLE TOXICOLOGICAL RESPONSES TO COMPLEX DISEASE ............................................................................. 25 INTRODUCTION ................................................................................................. 25 METHOD TO PREDICT ENVIRONMENTAL ASSOCIATION TO GENE EXPRESSION RESPONSE ................................................................................. 30 RESULTS ............................................................................................................... 41 Verification Phase ............................................................................................. 42 Predicting Environmental Chemicals Associated with Cancer Data Sets ... 44 Clustering Significant Predictions by PubChem-derived Biological Activity ............................................................................................................................ 54 DISCUSSION ........................................................................................................ 57 CHAPTER 3. METHODS TO EXECUTE ENVIRONMENT-WIDE ASSOCIATIONS ON DISEASE AND DISEASE-RELATED PHENOTYPES ON POPULATIONS. ........................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    189 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us