Phylogenetic Profiling of the Arabidopsis Thaliana Proteome

Open Access Research2004Gutiérrezet al.Volume 5, Issue 8, Article R53 Phylogenetic profiling of the Arabidopsis thaliana proteome: what comment proteins distinguish plants from other organisms? Rodrigo A Gutiérrez*†§, Pamela J Green‡, Kenneth Keegstra*† and John B Ohlrogge† Addresses: *Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824-1312, USA. †Department of Plant Biology, Michigan State University, East Lansing, MI 48824-1312, USA. ‡Delaware Biotechnology Institute, University of Delaware, 15 § Innovation Way, Newark, DE 19711, USA. Current address: Department of Biology, New York University, 100 Washington Square East, New reviews York, NY 10003, USA. Correspondence: Rodrigo A Gutiérrez. E-mail: [email protected] Published: 15 July 2004 Received: 16 March 2004 Revised: 10 May 2004 Genome Biology 2004, 5:R53 Accepted: 7 June 2004 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2004/5/8/R53 reports © 2004 Gutiérrez et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL Phylogenetic<p>Theopportunitycodingof life</p> genes availability to profilingof decipher <it>Arabidopsis of theof the the complete genetic Arabidopsis thaliana genomefactors </it>onthaliana that sequence define the proteome: ofbasis plant <it>Arab of form whattheir idopsisand proteinspattern function. thaliana of distingu sequence To </it>together beginish similarityplants this task, from wi to thwe other organismsthose have organisms? of classified other across organisms the the nuc threlear proe domains protein-vides an deposited research Abstract Background: The availability of the complete genome sequence of Arabidopsis thaliana together with those of other organisms provides an opportunity to decipher the genetic factors that define plant form and function. To begin this task, we have classified the nuclear protein-coding genes of Arabidopsis thaliana on the basis of their pattern of sequence similarity to organisms across the three domains of life. refereed research refereed Results: We identified 3,848 Arabidopsis proteins that are likely to be found solely within the plant lineage. More than half of these plant-specific proteins are of unknown function, emphasizing the general lack of knowledge of processes unique to plants. Plant-specific proteins that are membrane- associated and/or targeted to the mitochondria or chloroplasts are the most poorly characterized. Analyses of microarray data indicate that genes coding for plant-specific proteins, but not evolutionarily conserved proteins, are more likely to be expressed in an organ-specific manner. A large proportion (13%) of plant-specific proteins are transcription factors, whereas other basic cellular processes are under-represented, suggesting that evolution of plant-specific control of gene expression contributed to making plants different from other eukaryotes. interactions Conclusions: We identified and characterized the Arabidopsis proteins that are most likely to be plant-specific. Our results provide a genome-wide assessment that supports the hypothesis that evolution of higher plant complexity and diversity is related to the evolution of regulatory mechanisms. Because proteins that are unique to the green plant lineage will not be studied in other model systems, they should be attractive priorities for future studies. information Background than 99% of the terrestrial biomass and support most of the Plants have played a major role in the geochemical and cli- biodiversity on Earth), plants are essential for humans as the matic evolution of our planet. Today, in addition to their fun- main source of food, provide raw materials for many types of damental ecological importance (plants account for more industry and chemicals for medical applications. It is thus Genome Biology 2004, 5:R53 R53.2 Genome Biology 2004, Volume 5, Issue 8, Article R53 Gutiérrez et al. http://genomebiology.com/2004/5/8/R53 daunting to realize how little we understand about them. For processes that can tell us about the unique aspects of plants example, only approximately 10% of the genes of Arabidopsis and other organisms. One way to promote discoveries that are thaliana, the best explored model system for plant biologists, of special significance to plants or other lineages is to focus have been characterized experimentally [1]. attention on those genes that are found preferentially in that lineage. However, few comparative genomic studies have What makes plants different from other organisms? This is a been carried out that focus on lineage-specific aspects of the central question in plant biology that has a complex, multilay- sequenced genomes [5]. The recent availability of several ered answer. The genomic sequence of Arabidopsis, pub- complete genome sequences from all three domains of life lished in December 2000, allows us to begin answering this (Eukarya, Bacteria, Archaea) makes comparative genomic fundamental question from the perspective of genetic infor- strategies particularly attractive to tackle this issue. mation. By identifying the similarities and the differences in the gene content of this model plant as compared with organ- As a first step towards identifying plant-specific genes, we isms in other phylogenetic lineages, one can begin to differen- classified the nuclear-encoded protein-coding genes of Arabi- tiate ancient processes that are shared by cells in plants and dopsis on the basis of their pattern of sequence similarity to other organisms, from those that evolved independently in protein sequences of organisms that belong to Eukarya, Bac- the plant lineage and contributed to determining plant form teria and Archaea. We then used this initial classification to and function. identify 3,848 Arabidopsis proteins that are likely to be found only in green plants. We believe that these plant-specific pro- The vast majority of the genes in Arabidopsis are encoded in teins may play important roles in processes that are unique to the nuclear genome. Most of the original genetic content of and of significance to green plants. We found that many the cyanobacterial ancestor of the chloroplast and the proteo- plant-specific proteins are known or putative transcription bacterial ancestor of the mitochondrion have been trans- factors, indicating that evolution of plant-specific mecha- ferred to the plant cell nucleus. The remaining 79 chloroplast- nisms of regulating gene transcription was important for the encoded proteins are mostly components of the photosystem evolution of the plant lineage. However, many plant-specific and electron-transport chain or are involved in protein syn- proteins have no known function. From our analysis of the thesis [2]. Most of the 58 genes encoded in the mitochondrial latter proteins, we suggest that plant-specific processes that genome are devoted to components of the respiratory chain occur in chloroplasts or mitochondria, and/or that are associ- and tRNAs [3]. ated with membranes, are the most understudied. Interest- ingly, plant genes encoding plant-specific proteins were often The initial analysis of the sequenced Arabidopsis genome expressed in an organ-specific manner. To facilitate the func- suggested that 70% of the Arabidopsis gene products can be tional characterization of plant-specific proteins, we have assigned to functional categories on the basis of sequence compiled and integrated information from multiple public similarity to genes of known function in other organisms [1]. databases (for example, TIGR, MIPS, TAIR, SIGNAL) and However, the proportion of Arabidopsis genes with related computer prediction programs into a searchable database counterparts in other organisms with completed genome that is accessible from the worldwide web [6]. sequences (Escherichia coli, Synechocystis sp. PC6803, Sac- charomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens) vary greatly depending on Results the functional category: from 8-23% in transcription to 48- Arabidopis protein-coding genes exhibit diverse 60% in protein synthesis when using a stringent BLASTP cut- phylogenetic profiles off value of E < 10-30. These results suggest that plant tran- To try and understand what makes plants different from scription factors have evolved independently, but that other organisms at the genetic level, we sought to identify and proteins that participate in other cellular processes are highly characterize the Arabidopsis protein-coding genes that are conserved in the plant lineage [1]. present in plant species but not in organisms outside the Plantae. As a first step we investigated the pattern of similar- Determining the function of proteins deduced from genomic ity of Arabidopsis protein sequences among organisms that sequences is a central goal in this post-genomic era. The belong to the three domains of life (Eukarya, Bacteria and importance of this pursuit is emphasized by the observation Archaea). We determined the phylogenetic profile (PP) of that even for some of the most extensively studied organisms, each Arabidopsis protein sequence by recording the presence such as E. coli, a high proportion of the predicted genes are as or absence of similar sequences in protein sets from other yet uncharacterized [4]. How do we prioritize the characteri- organisms.

Phylogenetic Profiling of the Arabidopsis Thaliana Proteome

Maternally Inherited Pirnas Direct Transient Heterochromatin Formation

"Phylogenetic Profiling"

Pellegrini M. Using Phylogenetic Profiles To

The Biogrid Interaction Database

Flybase: Updates to the Drosophila Melanogaster Knowledge Base Aoife Larkin 1,*, Steven J

Integrating Genomic Data to Predict Transcription Factor Binding

Scalable Phylogenetic Profiling Using Minhash Uncovers Likely Eukaryotic Sexual Reproduction Genes

Review Article Protein-Protein Interaction Detection: Methods and Analysis

Mapping Global and Local Co-Evolution Across 600 Species to Identify Novel Homologous Recombination Repair Genes

Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd

The Drosophila Melanogaster Transcriptome by Paired-End RNA Sequencing

Rnacentral: Progress in the Development of the Non-Coding RNA Sequence Database