Janga-Phd-Thesis.Pdf (PDF, 9Mb)
Total Page:16
File Type:pdf, Size:1020Kb
Exploiting network-based approaches for understanding gene regulation and function Sarath Chandra Janga A dissertation submitted to the University of Cambridge in candidature for the degree of Doctorate of Philosophy April 2010 Darwin College, University of Cambridge MRC Laboratory of Molecular Biology Cambridge, United Kingdom Previous page: A portrait of the transcriptional regulatory network of the budding yeast, Saccharomyces Cerevisiae. Each circle represents the network of transcriptional interconnections between all other chromosomes to one of the chromosomes. Evidently all chromosomes are transcriptionally controlled by factors encoded on many of the 16 chromosomes in this organism marked by the letters ‘a’ through ‘p’. iii Declaration of originality This dissertation describes work I carried out at the Medical Research Council Laboratory of Molecular Biology in Cambridge between January 2008 and April 2010. The contents are my original work, although much has been influenced by the collaborations in which I took part. I have not submitted the work in this dissertation for any other degree or qualification at any other university. Sarath Chandra Janga April, 2010 Cambridge, United Kingdom iv Acknowledgements First of all I would like to express my gratitude to Dr. Madan Babu with out whose continuous support all along my doctoral work, it would have just remained a dream for me to carry out my thesis work at MRC Laboratory of Molecular Biology. Madan has not only been an excellent supervisor but a good friend who was always supportive of my research interests, by allowing me to work independently on a wide range of problems during my stay here. He has been a source of great inspiration on various occasions and a great scientific colleague to work with. In short, I probably could not have had a more understanding and motivating supervisor. I am also very grateful to Dr. Sarah Teichmann whose equivalently supporting words from time to time have been a motivation to finish my doctoral work in a short time. I have learnt from her the art of adventuring into unchartered territories of molecular biology with out fear. I am also thankful for the kind support and warm welcome that I received from Dr. Cyrus Chothia from the first day that I came to LMB. I consider myself very fortunate to be in a wonderful lab with a lot of energetic and highly motivating people working on fundamental problems of molecular biology. Indeed, I must admit that I have learnt at least as much from my colleagues and seminars at LMB, as I have learnt from reading books and papers, not to mention the fun that I had during numerous lunch and dinner breaks with various members of the lab and TCB group in particular. I especially would like to thank A Wuster, B Lang, AJ Venkatakrishnan, D Hebenstreit, D Wilson, E Levy, G Chalancon, J Su, N Mittal, P Kota, R Janky, S De, T Perica, V Charoensawan and J Gsponer for making my stay at LMB a memorable experience. I am also greatly indebted to all my scientific friends, collaborators and mentors, both in the past and during my PhD, for having helped me learn and adventure diverse areas of molecular biology. In no defined order, I would like to sincerely thank Agustino Martinez-Antonio (Irapuato, Mexico) for his confidence in my abilities, Ernesto Perez-Rueda (Cuernavaca, Mexico) for his kind hospitality during my visits to mexico, Gabriel Moreno-Hagelsieb (Waterloo, Canada) for being a great mentor and an excellent scientific friend, Heladia Salgado (Cuernavaca, Mexico) for her energy and patience to my requests to data, Andrew Emili (Toronto, Canada) for giving me the opportunity to work on an unsolved mystery, Denis Thieffry (Marseille, France) for making me learn to focus on important ideas and many other colleagues for scientific discussions over the years which made me a mature and independent scientist. I would also like to take this opportunity to offer my gratitude to all colleagues, administrative staff and heads of division, Venki Ramakrishnan and Kiyoshi Nagai at LMB whose continuous support have made it possible for me to develop a career in science. I am also grateful to the financial support that I received from Cambridge Commonwealth Trust (CCT) and the Medical Research Council during my PhD. Last, but not the least, I am most indebted to my family (my parents and sister) as well as near and dear who have been continuously supportive of my adventures in science and for understanding my reasons to be in silence for months. My very presence on this planet would not have been possible if not for my mother who expired long before I knew what maths and science is all about. I dedicate this thesis on her name. v Abbreviations 3C Chromosome Confirmation Capture ArcA Aerobic respiration control protein A BDBH Bi-Directional Best Hits BLAST Basic Local Alignment Search Tool cAMP cyclic Adenosine MonoPhosphate ChIP Chromatin immunoprecipitation CLIP Cross Linking and Immuno-Precipitation COGs Clusters of Orthologous Groups CRP cAMP Receptor Protein CT Chromosomal Territory DBTBS DataBase of Transcriptional regulation in Bacillus Subtilis DNA DeoxyriboNucleic Acid EC Enzyme Commission FDR False Discovery Rate FIS Factor for Inversion Stimulation FISH Fluorescent In Situ Hybridization FFL Feed Forward Loop FNR regulator of Fumarate and Nitrate Reduction GBA Guilt By Association GC Genomic Context GO Gene Ontology GR Global Regulator GRN Gene Regulatory Network HMM Hidden Markov model hnRNP heterogeneous nuclear RiboNucleoProtein HNS Histone-like Nucleoid Structuring protein HU Heat Unstable protein IHF Integration Host Factor LAD Lamina Associated Domain LCMS Liquid Chromatography-Mass Spectrometry LCR Locus Control Region MALDI Matrix-Assisted Laser Desorption/Ionization MCL Markov CLuster algorithm mRNA Messenger RNA NAP Nucleoid Associated Protein PAB PolyAdenylate-Binding protein PI/PPI Protein Interactions PTM Post-Translational Modification PTN Post-Transcriptional Network PTS PhosphoTransferase System RBD RNA Binding Domain RBP RNA Binding Protein RIP RNP ImmunoPrecipitation RNA RiboNucleic Acid RNP RiboNucleo Protein complex RRM RNA Recognition Motif TAP Tandem Affinity Purification TF Transcription Factor TG Target Gene TPI Target Proximity Index TRN Transcriptional Regulatory Network vi Summary It is increasingly becoming clear in the post-genomic era that proteins in a cell do not work in isolation but rather work in the context of other proteins and cellular entities during their life time. This has lead to the notion that cellular components can be visualized as wiring diagrams composed of different molecules like proteins, DNA, RNA and metabolites. These systems-approaches for quantitatively and qualitatively studying the dynamic biological systems have provided us unprecedented insights at varying levels of detail into the cellular organization and the interplay between different processes. The work in this thesis attempts to use these systems or network-based approaches to understand the design principles governing different cellular processes and to elucidate the functional and evolutionary consequences of the observed principles. Chapter 1 is an introduction to the concepts of networks and graph theory summarizing the various properties which are frequently studied in biological networks along with an overview of different kinds of cellular networks that are amenable for graph-theoretical analysis, emphasizing in particular on transcriptional, post-transcriptional and functional networks. In Chapter 2, I address the questions, how and why are genes organized on a particular fashion on bacterial genomes and what are the constraints bacterial transcriptional regulatory networks impose on their genomic organization. I then extend this one step further to unravel the constraints imposed on the network of TF- TF interactions and relate it to the numerous phenotypes they can impart to growing bacterial populations. Chapter 3 presents an overview of our current understanding of eukaryotic gene regulation at different levels and then shows evidence for the existence of a higher- order organization of genes across and within chromosomes that is constrained by transcriptional regulation. The results emphasize that specific organization of genes across and within chromosomes that allowed for efficient control of transcription within the nuclear space has been selected during evolution. Chapter 4 first summarizes different computational approaches for inferring the function of uncharacterized genes and then discusses network-based approaches currently employed for predicting function. I then present an overview of a recent high-throughput study performed to provide a ‘systems-wide’ functional blueprint of the bacterial model, Escherichia coli K-12, with insights into the biological and evolutionary significance of previously uncharacterized proteins. In Chapter 5, I focus on post-transcriptional regulatory networks formed by RBPs. I discuss the sequence attributes and functional processes associated with RBPs, methods used for the construction of the networks formed by them and finally examine the structure and dynamics of these networks based on recent publicly available data. The results obtained here show that RBPs exhibit distinct gene expression dynamics compared to other class of proteins in a eukaryotic cell. Chapter 6 provides a summary of the important aspects of the findings presented in this thesis and their practical implications. Overall, this dissertation presents a framework which