Downloaded in June 2010 [252], Supplemented with the Greengenes Taxonomy from the Previous Iteration [251] and Cyanodb [253]
Total Page:16
File Type:pdf, Size:1020Kb
THE DEVELOPMENT OF A MICROBIOME REFERENCE THAT SPANS THOUSANDS OF INDIVIDUALS by DANIEL THOMAS MCDONALD B.S., University of Colorado, 2008 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirement for the degree of Doctor of Philosophy Department of Computer Science 2015 This thesis entitled: The Development of a Microbiome Reference that Spans Thousands of Individuals written by Daniel Thomas McDonald has been approved for the Department of Computer Science Professor Robin Dowell Professor Ken Krauter Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. IRB protocol # ____12-0582________________ iii McDonald, Daniel Thomas (Ph.D., Computer Science) The Development of a Microbiome Reference that Spans Thousands of Individuals Thesis directed by Professor Rob Knight The research objective of this thesis is to measure the extent of microbial diversity associated with the human large intestine to an accuracy within the limits of the V4 region of the 16S rRNA gene at 97% similarity. This gene has become a powerful tool in assessing microbiome composition, and in recent years, a significant amount of research has shown an intimate relationship between the microbiome and human health. Unlike the human genome, in which the bulk of its content is shared across the human population, there is no common component of the human microbiome. What has been observed is a range of configurations, with factors such as age and BMI being strongly associated with these differences. To date, however, no project has aimed to scope the range of microbiome configurations, and thus our concept of what it means to be healthy (from a microbial perspective) is nonexistent. International efforts such as the American Gut Project will not only help us to understand more about our microbial constituents, but also pave the way toward understanding how these communities can be manipulated for the benefit of human health. The structure of this thesis is to first provide background about the microbiome through a commentary on the history of 16S, and a review on microbiome research. iv Following this, the next series of chapters is concerned with building the case for large-scale microbiome studies leading up to the American Gut Project. The second half of the thesis emphasizes the computational difficulties of the research, and specific contributions made to the processing and analysis of sequence data that enable insight into the microbiome. These contributions include a file format that is a recognized standard by the Genomic Standards Consortium, a novel method for transferring taxonomies for benefiting taxonomic curation, and a practical biological example of the use of reproducible and executable IPython Notebooks. Last, the thesis discusses a software tool that has been useful in the analysis of next- generation sequence data, and a few microbiome analyses. v ACKNOWLEDGEMENTS Progress in the sciences is contigent on human interaction. Collaboration is essential as it brings in new ideas and bridges gaps in experience. But the human side of science goes beyond collaboration; it is helping others discover the unknown, encouraging people to grow, empathizing with challenges and sacrifices, and of course patience. Ten years ago, I was in the process of failing out of an undergraduate degree in computer science and was perpetually injured due to snowboarding more than sleeping. I was disillusioned, literally a wreck, and teetering on dropping out. By chance from an emergency room visit, I met Jeremy Widmann, who was a researcher with Rob Knight. Jeremy and I became good friends, and he subsequently introduced me to Rob. We identified a research project to work on based on a mutual interest in networks, and it’s been an amazing ride ever since. Rob’s commitment and dedication is unparalleled, and he is nothing short of profoundly inspirational. Without a doubt, I would not be putting the final touches on a PhD thesis had it not been for that chance encounter, and Rob’s patience, persistence, encouragement and support. I thank Robin Dowell whose advice I’ve sought and valued through graduate school, and for offering to co-advise me for the tail end of my graduate career. I also thank vi Nikolaus Correll, Ken Anderson and Ken Krauter for their service on the thesis committee, and Henry Tufo for his service on the thesis proposal. Thank you to Phil Hugenholtz, whose painstaking curation efforts and endless ideas for improving taxonomy helped to put an initial direction for my graduate studies. Much of the work in this thesis was done in collaboration with the incredible members of the Knight lab (former and current) and in collaboration with fantastic individuals from around the world in other groups. Their names are included where appropriate in the text. I’d specifically like to thank (in no particular order) Jerry Kennedy whom I shared an office with for years and who would entertain insane ideas or issues as they arose, Ulla Westermann who tirelessly kept reimbursements and purchasing organized, Jeff DeReus for doing wonders to the compute infrastructure, Greg Caporaso for the well organized and productive code sprints, Justine Debelius for her relentless drive into the American Gut data, Jose Clemente for always being open to bounce ideas off of, Antonio Gonzalez for being perpetually positive and forward thinking, Greg Humphrey and Donna Berg-Lyons who make the wetlab magic happen, Cathy Lozupone for her excitement and detailed insight into the microbiome, Jessica Metcalf for always being willing to give detailed feedback on papers, Se Jin Song for answering my naïve molecular questions, Gail Ackermann for painstakingly managing the American Gut IRB protocols, Julia Goodrich for the late night phylogenetics work, Jeremy Widmann for getting me vii involved in this madness, Elaine Wolfe for being the public face of the American Gut help account, Adam Robbins-Pianka for the frequent and great late-night converstaions and instrumental work on the American Gut website, Josh Shorenstein for the work on the American Gut localization and Vioscreen, Yoshiki Vazquez Baeza for frequent laughs and making Chrome do terrible things, Luke Ursell for helping with the organization of this thesis, Embriette Hyde for maintaining the American Gut blog, Luke Thompson for resolving the LaTeX necessary for participant results, and Jose Navas and Amnon Amir for their investigation into filtering for bloom sequences in the American Gut. I’d especially like to thank Scott Handley, Karin Rengefors, Naiara Rodríguez- Ezpeleta and Konrad Paszkiewicz who organize and run the Workshop on Genomics in the Czech Republic, a workshop that I’ve taught at the last four years, and which has been one of the most remarkable experiences of my academic career. Over the last two years, I’ve had the opportunity to begin an investigation into the microbiome of ICU patients made possible thanks to the efforts of Paul Wischmeyer. For all of the students in the inaugural year of the Interdisciplinary Quantitative Biology program: I think a random walk is a reasonable null model for graduate school. viii The Interdisciplinary Quantitative Biology program is now is beginning its forth year, and I would like to thank Andrea Stith, Jana Watson-Capps, Emilia Costales, Kim Kelley, Kim Little and Janice Jones for keeping the program alive and running. And thank you to Tom Cech for making the program happen, and for listening to all the crazy ideas that the students have about it. Thank you to Rajshree Shrestha and Jackie DeBoard for helping to navigate the surprisingly difficult to figure out graduate school requirements. Thank you to my friends, and in particular, my old roommates Aimee d’Emery, Elias Santistevan, and Adam Robbins-Pianka who put up with my odd hours, and weird travel schedule. Finally, thank you to my parents Pam and David McDonald who put me on the path of playing with computers, and my sister Kate and her wife Max for unwavering advice and support. And, last, I’d like to abundantly thank my wife Alina for her incredible support and patience over the last four years. ix CONTENTS CHAPTER I. Ribosomal RNA, the lens into life ....................................................................... 1 Commentary .................................................................................................... 1 Concluding remarks ....................................................................................... 9 II. From molecules to dynamic communities ......................................................... 11 Introduction .................................................................................................. 11 From catalogs to robust, reproducible community patterns ...................... 14 How do we know which microbes are present? ........................................... 17 Is there a core human microbiome? ............................................................. 21 Microbial community states associated with disease ................................. 23 Changes in the microbiome over time ......................................................... 26 Conclusions and outlook ............................................................................... 31 Concluding remarks ....................................................................................