Glygly-CTERM and Rhombosortase: a C-Terminal Protein Processing Signal in a Many-To-One Pairing with a Rhomboid Family Intramembrane Serine Protease
Total Page:16
File Type:pdf, Size:1020Kb
GlyGly-CTERM and Rhombosortase: A C-Terminal Protein Processing Signal in a Many-to-One Pairing with a Rhomboid Family Intramembrane Serine Protease Daniel H. Haft1*, Neha Varghese2 1 Department of Bioinformatics, J. Craig Venter Institute, Rockville, Maryland, United States of America, 2 Georgia Institute of Technology, Atlanta, Georgia, United States of America Abstract The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domain’s tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies. Citation: Haft DH, Varghese N (2011) GlyGly-CTERM and Rhombosortase: A C-Terminal Protein Processing Signal in a Many-to-One Pairing with a Rhomboid Family Intramembrane Serine Protease. PLoS ONE 6(12): e28886. doi:10.1371/journal.pone.0028886 Editor: Maureen J. Donlin, Saint Louis University, United States of America Received August 16, 2011; Accepted November 16, 2011; Published December 14, 2011 Copyright: ß 2011 Haft, Varghese. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This project has been funded in whole with federal funds from the National Human Genome Research Institute, under grant number R01 HG004881, and National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under contract number HHSN272200900007C. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] Introduction Exosortase, the proposed sorting enzyme for PEP-CTERM domain proteins, is a highly hydrophobic protein with eight LPXTG and PEP-CTERM provide examples of C-terminal predicted transmembrane helices. Just as all species with LPXTG protein sorting signals proteins have a sortase, all species with PEP-CTERM proteins Many members of the Firmicutes have collections of proteins have an exosortase. This relationship led to the in silico discovery of that share similar C-terminal regions with a tripartite architecture exosortase by Partial Phylogenetic Profiling [4]. In many archaea, consisting of the signature motif LPXTG, a transmembrane (TM) a similar C-terminal putative sorting signal, PGF-CTERM, pairs alpha helix, and a cluster of basic residues [1]. The signature motif with archaeosortase A, a distant homolog of exosortase, and contains a cleavage site for the transpeptidase sortase A (EC appears involved in the processing of S-layer glycoproteins [5]. 3.4.22.70), which separates the target protein from its C-terminal The sortase/LPXTG system and exosortase/PEP-CTERM helix between the Thr and Gly, and reattaches it to the cell wall system are not related by homology, but show similar patterns envelope [2]. The sorting signal and the sortase that acts on it are in their results from comparative genomics analyses. Proteins with jointly present, or jointly absent, in all reference genomes [3], and LPXTG or PEP-CTERM at the C-terminus always have some their relationship usually is many-to-one. form of signal peptide at the N-terminus. PEP-CTERM domains, The PEP-CTERM homology domain, found only in Gram- like LPXTG regions, can appear as a sequence suffix, that is, an negative bacteria, has the same C-terminal location in proteins extra region shared by a select few proteins in a family whose and same tripartite architecture as LPXTG, but has a different members otherwise exhibits full-length homology [4]. signature motif, Pro-Glu-Pro [4]. As with LPXTG, proteins A paralogous domain recognized by a specific protein-sorting bearing PEP-CTERM domains are found in a minority of species, machinery has been described in the oral pathogen Porphyromonas but species with at least one often have twenty or more. gingivalis [6]. The Por secretion signal clearly differs from LPXTG PLoS ONE | www.plosone.org 1 December 2011 | Volume 6 | Issue 12 | e28886 Rhombosortase and Its GlyGly-CTERM Cleavage Signal and PEP-CTERM in its architecture, and may serve a type of GlyGly-CTERM regions in a genome are homologous sorting system that satisfies different types of constraints. However, through paralogous domain formation, rather than lineage-specific paralogous families of C-terminal domains that do similar through convergent evolution match the tripartite architecture of PEP-CTERM and LPXTG Shewanella baltica OS195 has ten GlyGly-CTERM proteins. Only can be detected, and these may strongly imply the existence of one two of these (,YP_001555385.1 and YP_001556128.1), S8/S53 or more previously undescribed sorting systems. Biocuration that, family proteases (Pfam accession PF00082) with overall sequence for a large set of reference genomes, catalogs exactly which ones identify below 20%, are detectably similar by pairwise alignment or carry instances of a proposed protein-sorting domain and which membership in the same Pfam [15] HMM. Other homology families do not creates a binary vector (a list of 19s and 09s, representing represented in this set are YP_001555110.1 in Pfam family PF11949 YES states and NO states) called a phylogenetic profile. This (DUF3466), the trypsin homolog YP_001557123.1 (PF00089), the vector becomes the key input through which a powerful data putative nuclease or phosphatase YP_001556017.1), the metallo- mining algorithm, Partial Phylogenetic Profiling, can discover the protease YP_001552571 (PF05547), the von Willebrand factor type corresponding processing protein in silico [4]. A domain protein YP_001556203.1, and thioredoxin domain Theworkreportedherediscussesacandidateprotein-sorting protein YP_001553411.1 (PF01323). Two additional proteins, signal that features a glycine-rich signature motif and that is YP_001554502.1 and YP_001556760.1, are unclassified and each widespread among (but rare outside) the Proteobacteria. A unrelated to all the others outside of the GlyGly-CTERM region. hidden Markov model (HMM) [7], TIGR03501 in our However, in a multiple sequence alignment (see Figure 1), TIGRFAMs database [8], provides a starting point for further comparison over twenty-one columns shows the ten average 45% investigation. Our evidence from comparative genomics now pairwise sequence identity in the GlyGly-CTERM region. This associates this sorting signal not simply with the presence of region includes a column in which nine of ten residues are aromatic some rhomboid protease (EC 3.4.21.105), an intramembrane (Trp, Tyr, or Phe),. It is highly hydrophobic, but includes three serine protease family found in all domains of life, but with the columns dominated by potentially helix-disrupting small residues presence of a particular subfamily. Crystallography suggests (Gly, Ala, Ser) or Pro. In this same stretch, the six most closely related thattherhomboidproteaseactivesiteserineresidesten sequences average a remarkable 58% sequence identity to each Angstroms deep in the outer leaflet of the plasma membrane other. It is very unlikely such high levels of sequence identity could [9]. A structure bundling six transmembrane helices limits occur through convergent evolution, especially toward a simple access to the active site serine on TM4, so gating functions biophysical constraint such as transmembrane alpha helix formation closely regulate protease activity [10]. An emerging under- capability. The GlyGly-CTERM region in Shewanella baltica standing of rhomboid intramembrane proteases points to an therefore must be viewed as a homology domain. High-scoring expanding set of cellular functions and disease associations: homologs to Shewanella GlyGly-CTERM proteins often exhibit mitochondrial remodeling,