Biochemical and Functional Characterization of Human RNA
Total Page:16
File Type:pdf, Size:1020Kb
Biochemical and Functional Characterization of Human RNA Binding Proteins by Peter Freese A.B., Harvard University (2012) Submitted to the Graduate Program in Computational and Systems Biology in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2018 ○c Massachusetts Institute of Technology 2018. All rights reserved. Author...................................................................... Computational and Systems Biology December 22, 2017 Certified by. Christopher B. Burge Professor of Biology and Biological Engineering Thesis Supervisor Accepted by................................................................. Christopher B. Burge Director, Computational and Systems Biology Graduate Program 2 Biochemical and Functional Characterization of Human RNA Binding Proteins by Peter Freese Submitted to the Program in Computational and Systems Biology on December 22, 2017, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract RNA not only shuttles information between DNA and proteins but also carries out many other essential cellular functions. Nearly all steps of an RNA’s life cycle are controlled by approximately one thousand RNA binding proteins (RBPs) that direct RNA splicing, cleav- age and polyadenylation, localization, translation, and degradation. Despite the central role of RBPs in RNA processing and gene expression, they have been less well studied than DNA binding proteins, in part due to the historical dearth of technologies to probe RBP binding and activity in a high-throughput, comprehensive manner. In this thesis, I describe the affinity landscapes of the largest set of human RBPs to date elucidated through ahigh- throughput version of RNA Bind-N-Seq (RBNS), an unbiased in vitro assay that determines the primary sequence, secondary structure, and contextual preferences of an RBP. The 78 RBPs bound an unexpectedly low diversity of RNA motifs, implying convergence of binding specificity toward a small set of RNA motifs characterized by low compositional complexity. Offsetting the low diversity of sequence motifs, extensive preferences for contextual features beyond short linear motifs were observed, including bipartite motifs, flanking nucleotide content, and preference for or against RNA structure. These features likely refine which motif occurrences are selected in cells, enabling RBPs that bind the same linear motif to act on distinct subsets of transcripts. Additionally, RBNS data is integrated with complemen- tary in vivo binding sites from enhanced crosslinking and immunoprecipitation (eCLIP) and functional (RNAi/RNA-seq) data produced through collaborative efforts with the ENCODE consortium. These data enable creation of “RNA maps” of RBP activity in pre-mRNA splic- ing and gene expression levels, either with (eCLIP) or without (RBNS) crosslinking-based assays. The mapping and characterization of RNA elements recognized by over 200 human RBPs is also presented in two human cell lines, K562 and HepG2 cells. Together, these novel data augment the catalog of functional elements encoded in the human genome to include those that act at the RNA level and provide a basis for how RBPs select their RNA targets, a fundamental requirement in more fully understanding RNA processing mechanisms and outcomes. Thesis Supervisor: Christopher B. Burge Title: Professor of Biology and Biological Engineering 3 4 Acknowledgments I would like to thank my advisor, Prof. Chris Burge, for providing an incredibly rich envi- ronment in which to conduct research and for teaching me to be a rigorous computational biologist grounded in fundamental biological questions. Having never performed research on RNA or touched high-throughput sequencing data sets, the lab has been a wonderful place to learn new skills, formulate and test hypotheses, and receive feedback that has fueled my intellectual growth. A particular thanks goes out to Burge lab postdoc Daniel Dominguez and fellow CSB graduate student Maria Alexis, who have been phenomenal RBNS collabora- tors over the past years and central to this work as well as my overall scientific development; without either, I’m sure the RBNS project would have been nowhere near as successful or impactful as it has turned out to be. Additionally, I would like to thank all other past and present Burge lab members for their feedback and insights over the past five years, as well as their friendship and great company at our lab social events and beer hours, creating a multitude of memorable experiences outside of the lab. In particular, I’d like to thank Athma for being an always pleasant and engaging baymate, Alex and Nicole for being early RBNS collaborators who whetted my appetite for studying protein-RNA interactions, Jennifer and Yevgenia for our lighthearted coffee breaks, Matt for always being encouraging and ask- ing great biological questions, Bridget for planning great graduate student social events and group exercise class outings, Canadian Peter for his always outspoken and thought-provoking discussions, Marvin, Ana, Emma, and Kayla for their smiling faces and fresh perspectives, Genny for being a friendly face who always made me feel welcome in the Burge lab, Cassie, Myles, Amanda and Tsultrim for being amazing technicians who helped my understanding of the technical complexities of the RBNS assay, and Jason for being generous in taking me under his wing during my rotation despite being on paternity leave. My Thesis Advisory Committee members, Phil Sharp and Manolis Kellis, have been very generous in providing their guidance and expertise at my committee meetings as well as facilitating my professional development. The connections I have made with and through them have been and will continue to be incredibly valuable. I would also like to thank Melissa Moore for agreeing to serve as my external committee member and taking time out 5 of her incredibly busy schedule to support my scientific endeavors. I would like to thank my ENCODE RBP collaborators, who have been an instrumental source of feedback on my work and incredibly generous in sharing their time, data, and expertise in our group efforts. Through our work, I have learned a tremendous amount not only about RBP-RNA interactions and regulation but also the complexities and promise of studying biological problems through massive high-throughput integrative functional ge- nomic assays. In particular, Brent Graveley’s overall leadership of the RBP project has been instrumental in its success, Xintao Wei has always been a patient and responsive collaborator who made the logistics of working on a high-throughput consortium project very smooth, and Gene Yeo and his lab members Eric Van Nostrand and Gabe Pratt provided excellent scientific guidance and insights into RNA-RBP biology through their improved experimental and computational CLIP methods. My undergraduate research advisor Irene Chen and postdoc Kirill Korolev initially sparked my interest in conducting scientific research and showed me the challenges and rewards see- ing a research project through from start to finish. I would likely not have pursued graduate studies without these formative academic role models and their support of my ongoing sci- entific development. I would like to acknowledge the amazing friendship of my CSB 2012 classmates Vincent, Colette, Mandy, Mariana, Nezar, and Rotem, with whom I’ve shared over 5 years of incredible outings throughout MIT, Boston, and the Northeast. We’ve celebrated countless birthdays, taken summer trips to explore and relax, and supported each other through the ups and downs of a graduate career in a way that I couldn’t have imagined when we started this program together. It has been a pleasure to befriend the many other CSBs in years above and below me as we gather at retreats and other CSB events to share our research, successes, and struggles. Additionally, CSB administrator Jacquie Carota has done a phenomenal job of making the graduate program run smoothly and has always been a warm, smiling face to run into and chat with on the 2nd floor of Building 68. Finally, I would like to thank my family members, who have been incredibly patient and generous in their support of my PhD endeavors and broader educational upbringing that has allowed me to be where I am today. My mom, dad, and sister Kelly have been endless 6 supporters of my scientific passions and research, and always provided me with adown- to-earth perspective of the incredible fortunes I have been lucky enough to receive through graduate education at MIT. Going home to Minnesota to visit them and my extended family as well as our phenomenal vacations have helped me contextualize my research and career within a broader perspective. This work and my intellectual growth as a graduate student would not have been possible without their unflagging support. 7 8 Contents 1 Introduction 15 1.1 RNA binding proteins.............................. 16 1.1.1 RNA binding domains (RBDs) and sequence-specific recognition of RNA 16 1.1.2 RBP-mediated regulation of pre-mRNA splicing, RNA stability, and RNA translation............................. 23 1.2 Approaches for the study of RNA binding proteins.............. 31 1.2.1 In vitro RNA-RBP profiling techniques................. 32 1.2.2 In vivo RNA-RBP profiling techniques................. 34 1.2.3 Genetic studies of RNA profiling after RBP perturbation....... 40 1.2.4 Structural studies of RBPs and RBP-RNA interactions.......