BMC Genomics Biomed Central
Total Page:16
File Type:pdf, Size:1020Kb
BMC Genomics BioMed Central Research article Open Access Cross genome comparisons of serine proteases in Arabidopsis and rice Lokesh P Tripathi and R Sowdhamini* Address: National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore 560 065, India Email: Lokesh P Tripathi - [email protected]; R Sowdhamini* - [email protected] * Corresponding author Published: 09 August 2006 Received: 11 April 2006 Accepted: 09 August 2006 BMC Genomics 2006, 7:200 doi:10.1186/1471-2164-7-200 This article is available from: http://www.biomedcentral.com/1471-2164/7/200 © 2006 Tripathi and Sowdhamini; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Serine proteases are one of the largest groups of proteolytic enzymes found across all kingdoms of life and are associated with several essential physiological pathways. The availability of Arabidopsis thaliana and rice (Oryza sativa) genome sequences has permitted the identification and comparison of the repertoire of serine protease-like proteins in the two plant species. Results: Despite the differences in genome sizes between Arabidopsis and rice, we identified a very similar number of serine protease-like proteins in the two plant species (206 and 222, respectively). Nearly 40% of the above sequences were identified as potential orthologues. Atypical members could be identified in the plant genomes for Deg, Clp, Lon, rhomboid proteases and species-specific members were observed for the highly populated subtilisin and serine carboxypeptidase families suggesting multiple lateral gene transfers. DegP proteases, prolyl oligopeptidases, Clp proteases and rhomboids share a significantly higher percentage orthology between the two genomes indicating substantial evolutionary divergence was set prior to speciation. Single domain architectures and paralogues for several putative subtilisins, serine carboxypeptidases and rhomboids suggest they may have been recruited for additional roles in secondary metabolism with spatial and temporal regulation. The analysis reveals some domain architectures unique to either or both of the plant species and some inactive proteases, like in rhomboids and Clp proteases, which could be involved in chaperone function. Conclusion: The systematic analysis of the serine protease-like proteins in the two plant species has provided some insight into the possible functional associations of previously uncharacterised serine protease-like proteins. Further investigation of these aspects may prove beneficial in our understanding of similar processes in commercially significant crop plant species. Background olysis plays a major role in diverse cellular processes by The proper functioning of a cell is ensured by the precise regulating biochemical activities of proteins in concert regulation of protein levels that in turn are regulated by a with other cellular mechanisms such as post-translational balance between the rates of protein synthesis and degra- modification[1,2]. Serine proteases are one of the largest dation. Protein degradation is mediated by proteolysis, groups of proteolytic enzymes involved in numerous reg- which paves the way for recycling of amino acids into the ulatory processes. They catalyse the hydrolysis of specific cellular pool. In addition to degradation, regulated prote- peptide bonds in their substrates and this activity depends Page 1 of 31 (page number not for citation purposes) BMC Genomics 2006, 7:200 http://www.biomedcentral.com/1471-2164/7/200 on a set of amino acids in the active site of the enzyme, This paper reports the first bioinformatics genome-wide one of which is always a serine. They include both survey of plant serine proteases and cross-comparisons of exopeptidases that act on the termini of polypeptide several serine protease families across monocots and chains and endopeptidases that act in the interior of dicots. Species-specific proteases and existence of evolu- polypeptide chains and belong to many different protein tionary changes like lateral gene transfer and gene shuf- families that are grouped into clans[3,4]. A clan is defined fling were observed at the catalytic domain. Further as a group of families, the members of which are likely to experimental characterization of Arabidopsis and rice ser- have a common ancestor. Currently there are over 50 ser- ine proteases that are likely to be involved in basic func- ine-protease families known as classified by MEROPS tions associated with various physiological processes in database[5], an information resource for peptidases and plants would be useful in expanding the current under- their inhibitors. standing of these processes in regulating plant growth and development. This would prove beneficial in understand- Serine proteases appear to be the largest class of proteases ing protease processes in plants and application of this in plants[2]. Consistent with their regulatory role, plant knowledge to important crop species. serine proteases have been shown to be involved in diverse processes regulating plant development and Results and discussion defense responses[2,6-8]. In addition, some of plant ser- Despite the differences in genome sizes between Arabidop- ine carboxypeptidases (Family S10) function as acyltrans- sis and rice (125 Mb and 389 Mb respectively) and ferases rather than hydrolases, suggesting diversification encoded number of genes (25,498 and 37,544 non-trans- of function to regulate secondary metabolism in plants [9- posable-element-related-protein-coding[14,15] and 11]. However, the function and regulation of plant pro- 55,890 in total as in TIGR[20] rice database), the two teases is poorly understood primarily due to lack of iden- plant species appear to encode a very similar number of tification of their physiological substrates. While there is serine protease-like proteins (206 and 222 putative num- extensive literature available on plant serine proteases, bers, respectively; Table 1; see additional files 1.2: Table particularly in Arabidopsis thaliana, most of these studies S1.pdf, Table S2.pdf). These include several putative ser- are either restricted to specific families [11-13] or specific ine protease homologues that are apparently catalytically physiological aspects (please see the references above). inactive, since they either lack the amino acid residues Arabidopsis thaliana and Oryza sativa constitute model sys- that are believed to be essential for catalysis or carry tems for Dicotyledons and Monocotyledons respectively, amino acid substitutions at positions corresponding to the major evolutionary lineages in angiosperms[14,15]. catalytically active sites. Such inactive-enzyme homo- The availability of the complete genomic sequence of logues are not restricted to serine proteases, but have been these two species renders it possible to carry out a compre- identified in diverse enzyme families, chiefly among pro- hensive analysis of serine-protease families to gain teins involved in signaling pathways and proteins that are insights into their function and to ascertain their evolu- apparently extracellular in function and are conserved tionary relationships. across metazoan species. They are believed to have acquired newer and more distinct functions, in the course The current analysis aims at the identification and analysis of evolution, thereby adding to the complexity of various of serine protease families encoded in the genomes of the regulatory networks in the cell[21]. We discuss below the two plant species. The domain organization of the puta- occurrence, phylogenetic patterns of serine protease tive serine proteases in the two plant genomes have been domains in the plant genomes by grouping according to analysed employing several sensitive sequence search MEROPS classification[5]. Only serine protease families methods [16-18] that have provided insights into differ- with non-zero occurrence of putative members in the two ent domain architectures. To understand the evolutionary genomes are discussed. relationships among the Arabidopsis and rice serine pro- tease-like proteins, phylogenetic analysis was performed DegP proteases (Family S1) based at their protease domains. Cross genome compari- DegP proteases are a group of ATP-independent serine son of the serine proteases across the two genomes has led proteases that belong to MEROPS[5] family S1 (trypsin to the identification of members likely to be conserved family). E.Coli DegP is a peripheral membrane protein across the two species. Predictions of the sub-cellular located in the periplasmic side of the plasma membrane. localizations of serine protease-like proteins to obtain fur- It is a heat-shock protein that combines both chaperone ther insight into their possible functional associations and proteolytic activities that switch in a temperature- were carried out using TargetP[19]. Comparison of plant dependent manner. The chaperone activity is predomi- serine proteases with other major eukaryotic lineages nant at low temperatures, while the proteolytic activity reveals selective expansion of serine protease families in takes over at higher temperatures[22]. Crystal structure of plants and identification of plant specific serine proteases. E.coli DegP reveals that the