Towards Relating the Evolution of the Gene Repertoire in Mammals to Tissue Specialisation
Total Page:16
File Type:pdf, Size:1020Kb
Towards Relating the Evolution of the Gene Repertoire in Mammals to Tissue Specialisation Shiri Freilich Wolfson College This dissertation is submitted to the University of Cambridge for the degree of Doctor of Philosophy 21 December 2006 To Leon, who was the wind blowing in my sails, in the deep blue sea of this journey of ours. This Thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This Thesis does not exceed the specified length limit of 300 pages as defined by the Biology Degree Committee. This Thesis has been typeset in 12pt font according to the specifications defined by the Board of Graduate Studies and the Biology Degree Committee. 3 Summary: Towards Relating the Evolution of the Gene Repertoire in Mammals to Tissue Specialisation The sequencing efforts of recent years have provided a rich source of data for investigating how gene content determines similarity and uniqueness in a species’ phenotype. Work described in this PhD Thesis attempts to relate innovations in the gene repertoire along the mammalian lineage to the most obvious phenotypic characteristic of animals: the appearance of highly differentiated tissue types. Several different approaches, outlined below, have been followed to address some aspects of this problem. Initially, a comprehensive study of the pattern of expansion of the complement of enzymes in various species was performed in order to obtain a better view of the principles underlying the expansion of the gene repertoire in mammals. Although several studies have described a tendency toward an increase in sequence redundancy in mammals, not much is known about the way such sequence redundancy reflects functional redundancy. Our analysis indicates that different kingdoms differ in the pattern of expansion of their enzymatic set. Whereas for unicellular species one can observe a strong correlation between the gene content and the enzymatic reaction repertoire, for multicellular species we observe a tendency toward an increased functional redundancy. Furthermore, the relationship between expansion of gene families in mammals and differentiation of tissue types was studied by analysing mouse gene expression data from many tissues. We show that duplication events lead, on average, to a more specific expression of the duplicate proteins. Though, such tendency is observed only when the duplication event postdated the transition to multicellularity. The combined expression distribution of all members of a protein family implies a complementary expression between family members. Therefore, the analysis provides large-scale, corroborative support for the subfunctionalization theory, which states that the division of expression of an ancestor gene between its daughter duplicates promotes the retention of the duplicates in the genome. Based on the assumption that the physiological role of a tissue is determined by the unique composition of genes expressed in that tissue, we characterised each tissue with regard to the distribution of function and phyletic origin of the expressed genes. A stronger tendency is observed for pre-metazoan genes (which are predominantly involved in 4 metabolic functions) to be globally expressed, versus a stronger tendency observed for metazoan-specific genes (which are predominantly involved in regulatory functions) to be specifically expressed. Finally, the phyletic origin of the mammalian metabolic genes and the relationship between innovations in this metabolic set to the development of new, tissue-specific, pathways was studied. Overall, the analysis demonstrates that tissue differentiation is at least in part achieved by tissue specialisation of metabolism – either by differentially expressing the ancient metabolic proteins, or by gaining new metabolic proteins participating in tissue specific functions. 5 Preface This Thesis is the end result a challenging but also immensely rewarding Ph.D. work at the European Bioinformatics Institute (EBI). For this experience, there are several people to whom I would like to thank. Firstly, I would like to express my deep gratitude to my supervisor, Prof. Janet Thornton. It was a great privilege working for her both scientifically and personally. I couldn't have wished for a better supervisor and I will miss her guidance. Throughout my research in the Thornton group I was fortunate enough to enjoy a most stimulating and amiable environment. I would like to thank the past and present members of the group for allowing me such an experience. I would like to thank my family: My husband Leon who had encouraged me to apply, always took interest in my work, and was a true partner and a source of encouragement and support; my father Moshe, my mother Tzipor and my two brothers, Ofer and Ariel, whose love and support were my source of comfort throughout my stay away from home. My Ph.D. had started in March 2002 and is ending in Summer 2006. The past four and a half years in Cambridge had been wonderful. I loved my research and I have acquired precious friends. During that time my son, Idan, was born. The conclusion of this Thesis takes place in Israel with a different, sad, tone – the sounds of war. On this note, I would like to wish the good people from both sides of the border, better days, days of peace. 6 Table of Contents SUMMARY: TOWARDS RELATING THE EVOLUTION OF THE GENE REPERTOIRE IN MAMMALS TO TISSUE SPECIALISATION .......................................................................................... 4 PREFACE..................................................................................................................................................... 6 LIST OF FIGURES.................................................................................................................................... 10 LIST OF TABLES...................................................................................................................................... 11 CHAPTER 1: INTRODUCTION............................................................................................................. 12 1.1 MAJOR EVENTS IN THE PHYLOGENESIS OF MAMMALS: FROM A SINGLE-CELL PROTIST TO THE APPEARANCE OF VERTEBRATES................................................................................................................. 12 1.1.1 The appearance of a multicellular ancestor of animals .............................................................................. 12 1.1.2 Early metazoa ................................................................................................................................15 1.2 INNOVATIONS IN THE GENE REPERTOIRE ALONG THE MAMMALIAN LINEAGE .................................... 17 1.3 DIFFERENTIATION OF TISSUES IN VERTEBRATES ................................................................................. 20 1.4 CONSERVATION AND VARIATION BETWEEN ADULT MAMMALIAN TISSUES ........................................... 21 1.5 METHODOLOGY.................................................................................................................................. 22 1.5.1 BLAST and PSI-BLAST............................................................................................................... 22 1.5.2 Estimation of synonymous and nonsynonymous substitution rates (dN/dS) .................................................... 24 1.5.3 Functional annotation methods ........................................................................................................... 25 1.5.4 Gene Ontolgy (GO) ......................................................................................................................... 26 1.5.5 Definition and identification of orthologous and paralogous proteins ............................................................ 26 1.5.6 Microarray data acquisition ............................................................................................................... 28 1.5.6.1 Acquisition of microarry data.............................................................................................................. 28 1.5.6.2 The Affymetrix Gene Chip.................................................................................................................. 29 1.6 OUTLINE OF THESIS ............................................................................................................................ 30 CHAPTER 2: THE COMPLEMENT OF ENZYMATIC SETS IN DIFFERENT SPECIES........... 31 2.1 OVERVIEW .......................................................................................................................................... 31 2.2 INTRODUCTION................................................................................................................................... 31 2.3 METHODS ........................................................................................................................................... 32 2.3.1 Construction of the query list .............................................................................................................. 32 2.3.2 Construction of enzymatic sets for each species ........................................................................................ 33 2.3.3 Data retrieved from KEGG database .................................................................................................. 33 2.3.4 Determination of the representation of a species or a group of species in the query list......................................