Meyer Cornellgrad 0058F 10073
Total Page:16
File Type:pdf, Size:1020Kb
METHODS FOR FUNCTIONAL INFERENCE IN THE PROTEOME AND INTERACTOME A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Michael Joseph Meyer January 2017 © 2017 Michael Joseph Meyer ALL RIGHTS RESERVED METHODS FOR FUNCTIONAL INFERENCE IN THE PROTEOME AND INTERACTOME Michael Meyer Cornell University 2017 Over the past several decades, biology has become an increasingly data-driven science. Due in large part to new techniques that allow massive collection of biological data, including next-generation sequencing and high-throughput experimental screening, many of the limitations currently facing the field are in the organization and interpretation of these data. In this dissertation, I present several computational methods and resources designed to organize and perform functional inference on these systems-level biological data sources. In Chapters 2 and 3, I describe the construction of a database and web tool to aid in foundational genomics research by providing predictions of interacting protein domains in interactomes and all-by-all conversions of popular variant identification formats. In Chapter 4, I describe the construction of the first whole-interactome protein interaction network in the fission yeast S. pombe, and, through comparisons with other complete networks in human and the budding yeast S. cerevisiae, demonstrate principles of functional evolution. Finally, in Chapters 5 and 6, I propose two new methods for functional genomic inference—an algorithm to predict cancer driver genes and mutations through 3D atomic clustering of somatic mutations and an ensemble machine learning method to predict the 3D interfaces of protein interactions by taking into account the evolutionary relationships and biophysical properties of proteins. Taken together, this suite of computational resources will help researchers interpret biological function on a genomic scale. BIOGRAPHICAL SKETCH Michael was born in 1989 to Joseph and Nicolette Meyer, and grew up in Worthington, Ohio as the eldest of three children. He graduated from Thomas Worthington High School in 2007 and attended the Massachusetts Institute of Technology from 2007 to 2011, where he earned his bachelor’s degree in Biological Engineering. In the summer of 2010 he interned at Ion Torrent, a startup DNA sequencing company founded by Dr. Jonathan Rothberg, where he helped develop the Ion Torrent Personal Genome Machine. He returned in the summer of 2011 after a successful buyout by Life Technologies. In 2011 he married his high school sweetheart, Sarai Meyer (Itagaki), and they moved to Ithaca, New York to begin their Ph.D.s at Cornell University. Michael entered the Tri- Institutional Training Program in Computational Biology and Medicine (CBM) and completed laboratory rotations at Cornell’s Ithaca campus and at the Weill Cornell Medical School in New York City. He ultimately joined the computational systems biology lab of Dr. Haiyuan Yu in the Weill Institute for Cell and Molecular Biology in Ithaca, where he completed the research presented in this dissertation. iii For my grandfathers v i ACKNOWLEDGEMENTS I am incredibly fortunate to have had the support of many people over the course of my life and studies. There are an awful lot of people to thank. But first, I’d like to acknowledge my incredible fortune to have been born in a time and place where scholarly aspirations are relevant and valued, without which none of this would have been possible. To my parents—Mom and Dad, thank you for giving me every chance to succeed, for investing in me, and for showing me how to be a good person. Mom, thank you for your unconditional love, home-cooked meals, and for giving me that problem-solving itch that I’m still trying to scratch. Dad, thank you pushing me, allowing me to learn from my mistakes, accept failure, and think for myself. And thank you both for being there from a distance to help me through these past five years. And to the rest of my family—thanks for putting up with me, for caring about my well-being, and for occasionally mustering the bravery to ask about my research. To all of my teachers throughout the years—Kathy McCulloch, Mark Hill, Peter Scully, and so many others, thank you for inspiring me. To my most recent mentor, Haiyuan, thank you for entrusting me with a great deal of responsibility in the early years of your lab, and for providing advice when I needed it, but still allowing me the freedom to determine my own course of study and explore my passions. To my committee—Chris, David, and Olivier— thank you for you for your genuine interest in my work and your feedback over the years. To several generations of labmates—thank you for being my daily buffer against the slog of graduate school—for the laughs, distractions, and taunts during March Madness. To the amazing undergraduate researchers I’ve worked with—Ryan, Philip, Mark, and Aaron— thank you for your enthusiasm, willingness to learn, and major contributions to much of the v work in this dissertation. To the old guard—Jishnu, Amanda, Tommy, and Robert—thanks for being there from the beginning and for helping to establish a productive and supportive research environment. Jishnu, thanks for showing me the ropes, for the most interesting arguments, and for sharing your encyclopedic knowledge of the literature. Tommy, thanks for your tireless efforts to make FissionNet a success, and for teaching me a thing or two about the wet lab in the process. To the new kids—Juan, Siwei, Antoine, and Charles—good luck. Juan, thank you for arriving at just the right time to remind me why I love what I do, for all the white-boarding, and for helping to create some really cool stuff. Thanks for the late- night code jams and after-work beers, for provocative discussions of philosophy, and for one awkward bro hug. And to Sarai—thanks for, well, everything. For all the little stuff—slipping butter between two halves of a grab-and-go muffin on a Monday morning, counting cats on summer evening walks, and leaving me little notes to find around the house. Thank you for being there, for smiling and laughing, for spontaneity and routine. And thank you for going with me on this first of many adventures—we made it. vi CONTRIBUTIONS I would also like to acknowledge the work of others that has gone into each of the chapters presented in this dissertation. Due to the collaborative nature of biological research, I have had the pleasure to work with many gifted scientists, several of whom are co-authors on the publications presented here. (* indicates co-first author) Chapter 2: Meyer, M.J.*, Das, J.*, Wang, X.*, and Yu, H. (2013). INstruct: a database of high- quality 3D structurally resolved protein interactome networks. Bioinformatics 29, 1577- 1579. Chapter 3: Meyer, M.J.*, Geske, P.*, and Yu, H. (2016). BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers. Bioinformatics 32, 1598-1600. Chapter 4: Vo, T.V.*, Das, J.*, Meyer, M.J.*, Cordero, N.A., Akturk, N., Wei, X., Fair, B.J., Degatano, A.G., Fragoza, R., Liu, L.G., et al. (2016). A Proteome-wide Fission Yeast Interactome Reveals Network Evolution Principles from Yeasts to Human. Cell 164, 310-323. Chapter 5: Meyer, M.J.*, Lapcevic, R.*, Romero, A.E.*, Yoon, M., Das, J., Beltran, J.F., Mort, M., Stenson, P.D., Cooper, D.N., Paccanaro, A., and Yu H. (2016). mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome. Human Mutation 37, 447-456. Chapter 6: Meyer, M.J.*, Beltran, J.F.*, Fragoza, R., Rumack, A., and Yu, H. (2016). A pan- interactome map of protein interaction interfaces. Under Review. vii TABLE OF CONTENTS Biographical Sketch . iii Dedication . iv Acknowledgements . v Contributions . vii Table of Contents . viii 1. Introduction: data and inference in the genomics era . 1 1.1 A biological data explosion . 1 1.2 New challenges . 2 1.3 Perspectives for the future . 3 1.4 References . 5 2. INstruct: a database of high quality 3D structurally resolved protein interactome networks . 8 2.1 Abstract . 8 2.2 Introduction . 8 2.3 Methods . 10 2.3.1 Identifying high-quality binary interactions . 10 2.3.2 Interaction Interface Inference and Validation . 12 2.4 Usage . 14 2.5 Looking ahead . 17 2.6 References . 20 3. BISQUE: locus- and variant-specific conversion of genomic, transcriptomic, and proteomic database identifiers . 23 3.1 Abstract . 23 3.2 Introduction . 23 3.3 Methods . 24 3.3.1 Database dependencies . 26 3.3.2 Database updates . 28 3.3.3 Identifiers . 29 3.3.4 Conversion architecture . 30 3.3.5 Determining optimal paths in the conversion graph . 30 3.3.6 Locus- and variant-specific conversion . 31 3.3.7 Conversion quality filtering options . 36 3.4 Results . 38 3.4.1 Usage . 38 3.4.2 Database & conversion scope . 39 3.4.3 Comparison with other conversion utilities . 40 3.5 Discussion . 42 3.6 References . 45 viii 4. A Proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human . 47 4.1 Abstract . ..