Inferring Biological Networks from Genome-Wide Transcriptional And

INFERRING BIOLOGICAL NETWORKS FROM GENOME-WIDE TRANSCRIPTIONAL AND FITNESS DATA By WAZEER MOHAMMAD VARSALLY A thesis submitted to The University of Birmingham for the degree of Doctor of Philosophy College of Life and Environmental Sciences School of Biosciences The University of Birmingham July 2013 I University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. ABSTRACT In the last 15 years, the increased use of high throughput biology techniques such as genome-wide gene expression profiling, fitness profiling and protein interactomics has led to the generation of an extraordinary amount of data. The abundance of such diverse data has proven to be an essential foundation for understanding the complexities of molecular mechanisms and underlying pathways within a biological system. One approach of extrapolating biological information from this wealth of data has been through the use of reverse engineering methods to infer biological networks. This thesis demonstrates the capabilities and applications of such methodologies in identifying functionally enriched network modules in the yeast species Saccharomyces cerevisiae and Schizosaccharomyces pombe. This study marks the first time a mutual information based network inference approach has been applied to a set of specific genome-wide expression and fitness compendia, as well as the integration of these multi- level compendia. This work highlights how network inference can infer potentially novel biological relationships by identifying gene modules that exhibit similar response across samples. This information can then be used to identify strong candidate interactions which can be tested experimentally. In particular, this work has generated hypotheses in S. pombe that have led to a deeper understanding of the relationship between ribosomal proteins and energy metabolism, a recently discovered pathway termed riboneogenesis. To date, this link has only been reported in S. cerevisiae. Experimental validation of this hypothesis using ChIP-chip data has led to new theories on the role of energy metabolism enzymes in controlling ribosome II biogenesis in S. pombe, including the novel finding that fructose-1, 6-bisphosphatase (FBP1) may have roles in both gluconeogenesis and riboneogenesis. This thesis also demonstrates how the use of multi-level data allows for comprehensive insight into nuclear functions of the S. pombe nonsense-mediated mRNA decay protein, UPF1. This study provides a substantial amount of evidence demonstrating the role of UPF1 in DNA replication. The applicability of fitness data in identifying targets of metal and metalloid toxicity in S. cerevisiae has also been investigated. Ultimately, this thesis reports the effectiveness of using a systems biology and network inference approach to identify, elucidate and understand complex biological pathways in S. cerevisiae and S. pombe. III ACKNOWLEDGEMENTS First and foremost I want to dedicate my thesis to my parents, Reshad and Seeyreen Varsally. Without their unconditional love, support and sacrifice, I would not be where I am today. Their encouragement and motivation throughout the years, especially during the tougher times gave me the strength to continue pushing forward. I would also like to thank my brother Nadiim Varsally, who has always met me with a smile, no matter the situation. Professionally, I would like to thank my supervisors Dr. Francesco Falciani and Dr. Saverio Brogna for their guidance and support throughout both my undergraduate and PhD study. Their expertise and insight in their respective fields have been invaluable in helping me progress as a scientist. I would like to thank all my colleagues. Philipp Antczak, for his vast scientific insight, no matter what problem I had, he always had a solution. To Nil Turan-Jurdzinski for her listening skills and company, especially when the office was quiet. Anna Stincone for teaching me some Italian in our often very entertaining conversations. Jaanika Kronberg for her amazing cake baking skills, they always made my day brighter. Thanks to Harriet Davies whose energy and enthusiasm never failed to make me smile. To Sandip De for his exceptional experimental work in S. pombe which helped validate my results and I would also like to thank the rest of the group, Kim Clarke, Rita Gupta, Helani Munasinghe and Peter Davidsen. I wish to thank all my friends who have been there for me over the years. Special thanks to Anisah for her often humorous but trustworthy advice. To Vanica and Chahat for all the good times we’ve shared and to Yash, for our incredibly long conversations about life. IV LIST OF PUBLICATIONS [1] Varsally, Wazeer and Saverio Brogna. UPF1 involvement in nuclear functions. Biochemical Society Transactions 40.4 (2012): 778. [2] De, Sandip and Varsally, Wazeer and Falciani, Francesco and Brogna, Saverio. Ribosomal proteins' association with transcription sites peaks at tRNA genes in Schizosaccharomyces pombe. RNA 17.9 (2011): 1713-1726. V CONTENTS CHAPTER 1: INTRODUCTION AND BACKGROUND ................................................................ 1 1.1 Introduction to systems biology approaches for omics data analysis ................................. 1 1.1.1 The analysis of genome-wide omics datasets ............................................................. 1 1.1.2 Microarray data processing and normalisation........................................................... 2 1.1.3 Data exploration ......................................................................................................... 3 1.1.4 Identification of differentially expressed genes ......................................................... 4 1.1.5 Functional analysis ..................................................................................................... 5 1.1.6 The current state of network inference ....................................................................... 6 1.1.7 Modularisation approaches......................................................................................... 9 1.2 Other types of genome-wide data used in this study .......................................................... 9 1.2.1 The power of fitness data ........................................................................................... 9 1.2.2 An introduction to ChIP-chip analysis ..................................................................... 11 1.3 The biological systems of relevance ................................................................................ 13 1.3.1 Saccharomyces cerevisiae (budding yeast) and Schizosaccharomyces pombe (fission yeast) ........................................................................................................................... 13 1.3.2 Differences and similarities between S. cerevisiae and S. pombe ............................ 14 1.3.3 A review of system biology studies in yeast ............................................................ 15 1.4 Why focus on the relationship between ribosome biogenesis and energy metabolism? .. 20 1.5 Yeast ribosome biogenesis and energy metabolism ......................................................... 22 1.5.1 Yeast ribosomal proteins .......................................................................................... 22 1.5.2 The link between ribosome biogenesis and energy metabolism pathways .............. 23 1.6 Aims and outline of this thesis ......................................................................................... 31 CHAPTER 2: INFERENCE AND ANALYSIS OF A Saccharomyces cerevisiae GENE FITNESS NETWORK ...................................................................................................................................... 33 2.1 Introduction ...................................................................................................................... 33 2.2 Methods ............................................................................................................................ 35 2.2.1 The Biological System ............................................................................................. 35 2.2.2 Analysis Strategy ...................................................................................................... 37 2.2.3 Data processing ........................................................................................................ 39 2.2.4 Network Inference .................................................................................................... 41 2.2.5 Network analysis: Topology .................................................................................... 43 2.2.6 Network Analysis: Modularisation .......................................................................... 43 VI 2.2.7 Ribosomal proteins first neighbour analysis ............................................................ 44 2.3 Results .............................................................................................................................. 45 2.3.1 A

Inferring Biological Networks from Genome-Wide Transcriptional And

Selecting Molecular Markers for a Specific Phylogenetic Problem

An Automated Pipeline for Retrieving Orthologous DNA Sequences from Genbank in R

Analysis of Gene Expression Data for Gene Ontology

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Proteomics Provides Insights Into the Inhibition of Chinese Hamster V79

Involvement of DPP9 in Gene Fusions in Serous Ovarian Carcinoma

Integrating Single-Step GWAS and Bipartite Networks Reconstruction Provides Novel Insights Into Yearling Weight and Carcass Traits in Hanwoo Beef Cattle

Ythdc2 Is an N6-Methyladenosine Binding Protein That Regulates Mammalian Spermatogenesis

"Phylogenetic Analysis of Protein Sequence Data Using The

Análise Integrativa De Perfis Transcricionais De Pacientes Com

New Syllabus

Molecular Effects of Isoflavone Supplementation Human Intervention Studies and Quantitative Models for Risk Assessment