<<

BMC Genomics BioMed Central

Research article Open Access homologues in Neil D Rawlings* and Alex Bateman

Address: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK Email: Neil D Rawlings* - [email protected]; Alex Bateman - [email protected] * Corresponding author

Published: 16 September 2009 Received: 28 November 2008 Accepted: 16 September 2009 BMC Genomics 2009, 10:437 doi:10.1186/1471-2164-10-437 This article is available from: http://www.biomedcentral.com/1471-2164/10/437 © 2009 Rawlings and Bateman; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Background: Peptidase family A1, to which pepsin belongs, had been assumed to be restricted to eukaryotes. The tertiary structure of pepsin shows two lobes with similar folds and it has been suggested that the has arisen from an ancient duplication and fusion event. The only sequence similarity between the lobes is restricted to the motif around the aspartate and a hydrophobic-hydrophobic-Gly motif. Together, these contribute to an essential structural feature known as a psi-loop. There is one such psi-loop in each lobe, and so each lobe presents an active Asp. The human immunodeficiency virus peptidase, retropepsin, from peptidase family A2 also has a similar fold but consists of one lobe only and has to dimerize to be active. All known members of family A1 show the bilobed structure, but it is unclear if the ancestor of family A1 was similar to an A2 peptidase, or if the ancestral retropepsin was derived from a half-pepsin gene. The presence of a pepsin homologue in a prokaryote might give insights into the evolution of the pepsin family. Results: Homologues of the aspartic peptidase pepsin have been found in the completed genomic sequences from seven species of bacteria. The bacterial homologues, unlike those from eukaryotes, do not possess signal , and would therefore be intracellular acting at neutral pH. The bacterial homologues have Thr218 replaced by Asp, a change which in has been shown to confer activity at neutral pH. No pepsin homologues could be detected in any archaean genome. Conclusion: The peptidase family A1 is found in some species of bacteria as well as eukaryotes. The bacterial homologues fall into two groups, one from oceanic bacteria and one from plant symbionts. The bacterial homologues are all predicted to be intracellular , unlike the eukaryotic . The bacterial homologues are bilobed like pepsin, implying that if no horizontal gene transfer has occurred the duplication and fusion event might be very ancient indeed, preceding the divergence of bacteria and eukaryotes. It is unclear whether all the bacterial homologues are derived from horizontal gene transfer, but those from the plant symbionts probably are. The homologues from oceanic bacteria are most closely related to memapsins (or BACE-1 and BACE-2), but are so divergent that they are close to the root of the phylogenetic tree and to the division of the A1 family into two subfamilies.

Background hydrolyse proteins to amino and peptides for nutri- Peptidases are widespread enzymes that catalyse the tion and recycling, but they also perform some of the hydrolysis of bonds. Not only do peptidases most important post-translational processing events lead-

Page 1 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

ing to the activation (or inactivation) of many other pro- because, apart from motifs contributing to each psi-loop, teins, including other enzymes and peptide hormones. there is no observable similarity in the sequence. Over 2% of the protein coding in a genome encode This led to the hypothesis that an ancestral gene duplica- peptidases, and there are over 500 peptidase genes in the tion had taken place, followed by gene fusion [9,9]. This . Peptidases exist in at least six catalytic raises the question of how did the original gene product types, depending on the nature of the nucleophile in the function, being only one half of a modern-day pepsin? catalytic reaction (either the hydroxyl of a serine or threo- The discovery of a peptidase, retropepsin (EC 3.4.23.16), nine, the thiol of a cysteine, or activated water bound within the genome of the human immunodeficiency virus either to a metal ion (metallopeptidases) or aspartate or (HIV) that also represents a single lobe of an extremely glutamate residues). Peptidases are grouped into over 250 divergent pepsin (so divergent that it is the representative different evolutionary families [1]. of a different family, A2, in MEROPS, but within the same clan or superfamily AA) and which has to dimerize to be The pepsin family (MEROPS family A1 [2]) contains active, provided evidence to answer this question [10]. aspartic-type . Activated water is bound by Proteins with a single pepsin lobe also exist in eukaryotes. two aspartate residues each of which is located in a motif The structure of the Saccharomyces cerevisiae protein Ddi1 Xaa-Xaa-Asp-Xbb-Gly-Xbb where Xaa is a hydrophobic (DNA-damage inducible protein 1), which is not a pepti- amino and Xbb is either Ser or Thr. A third important dase, has been resolved and shown to contain a domain residue is a which interacts with the and with a retropepsin-like structure, and the protein also has is located on a beta-hairpin loop known as the "flap" to dimerize to be active [11]. This domain was probably which forms part of the roof to the active site. A second derived from a retroelement and represents an even more motif, part of a structure known as the psi-loop [3] con- ancient divergence than that of the A1 and A2 families, as sists of two hydrophobic residues followed by a glycine. has been shown in a recent phylogeny of clan AA pepti- Two of these motifs are conserved in all sequences known dases [12]. It has been assumed that the HIV peptidase or predicted to be active within family A1. The psi-loops represents the ancestral state and that the ancestral half- are important in stabilizing the structure, especially the pepsin also had to dimerize to be active. However, it is active centre. Many members of family A1 have three con- also possible that the HIV peptidase was derived from a served disulphide bridges. Homologues are known from normal, bilobed pepsin when a virus captured half of its animals, plants, fungi and protozoa, but no homologues host's gene. This "chicken and egg" situation might be have been reported from prokaryotes. Family A1 has resolvable if pepsin homologues could be found in recently been divided into two subfamilies, with the prokaryotes, because this would represent an extremely majority of characterized peptidases in subfamily A1A. distant sequence divergence. If a prokaryote pepsin Peptidases within this subfamily include: (EC homologue contained only one lobe, then it would sup- 3.4.23.1), gastricsin (EC 3.4.23.3) and (EC port the hypothesis of the ancient gene duplication and 3.4.23.4), which are digestive enzymes in the ; fusion event. However, if the prokaryote pepsin homo- D (EC 3.4.23.5), which is a lysosomal logue were bilobed, then either a) this hypothesis is incor- that degrades phagocytozed peptides and proteins; renin rect; b) the gene duplication precedes the most recent (EC 3.4.23.15), which is a peptidase that processes angi- common ancestor of prokaryotes and eukaryotes; or c) the otensinogen; and memapsins 1 (or BACE-2, EC bacterial homologue genes were derived by horizontal 3.4.23.45) and 2 (BACE-1, EC 3.4.23.46), widely distrib- gene transfer. uted peptidases of unknown function but which may be involved in Alzheimer's disease (memapsin-2 is currently We report the discovery of the first prokaryote pepsin under investigation as a drug target [4]). Subfamily A1B homologues in the completed genome sequences of sev- contains predominantly peptidase homologues from eral . plants including nucellin [5] and (EC 3.4.23.12; [6]). Besides peptidases, family A1 also con- Results and Discussion tains a number of proteins which cannot be peptidases Over 960 completely sequenced prokaryote genomes because the active site aspartates are not conserved. These were analysed (914 bacterial and 55 archaean), and 953 include pregnancy-associated glycoproteins [7] and a of these contained no detectable homologue of pepsin. xylanase inhibitor [8]. This included all of the archaean genomes. However, pep- sin homologues were detected in seven bacterial genomes, The tertiary structures of several members of this family and these are shown in Table 1. Characteristics of the pre- have been determined, and each shows a bilobed protein. dicted proteins of these homologues are shown in Table 2. Each lobe contains one of the active site aspartates in a All the bacteria containing pepsin homologues are mem- psi-loop and the two lobes are structurally similar. This bers of the class , and all except similarity can only be observed in the tertiary structure Marinomonas are members of the order ;

Page 2 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

Table 1: Homologues of pepsin from completely sequenced bacterial genomes

Species Genome accession ProteinID Gene locus Base pairs

Colwellia psychrerythraea CP000083 AAZ24813 CPS_0549 c(546351..547676) Marinomonas sp. MWYL1 NC_009654 YP_001340503 Mmwyl1_1642 1847648..1848868 amazonensis CP000507 ABL98994 Sama_0787 c(969066..970403) Shewanella denitrificans CP000302 ABE54094 Sden_0804 923752..925041 Shewanella loihica CP000606 ABO24862 Shew_2996 3556266..3557951 Shewanella sediminis CP000821 ABV38171 Ssed_3567 4347419..4348963 Sinorhizobium medicae CP000738 ABR61223 Smed_3492 2467002..2468108

Members of peptidase family A1 found in the proteomes derived from completely sequenced bacterial genomes. Base pairs in parentheses preceded by the letter "c" are on the complimentary strand with respect to the database entry.

Marinomonas is a member of the order Oceanospirillales. monas as the seed sequences for 19 and 14 iterations Despite the genomes having been completely sequenced, respectively (returning 2888 known pepsin homologues); no pepsin homologue was detected in the proteomes of these also failed to find any new bacterial homologues. the following Shewanella species: S. baltica, S. frigidima- rina, S. halifaxensis, S. oneidensis, S. pealeana, S. putrefaciens, Four homologues were found among the environmental S. sp. ANA-3, S. sp. MR-4, S. sp. MR-7, S. sp. W3-18-1 or samples nucleotide sequence database (see Table 3), and S. woodyi, nor in Sinorhizobium meliloti. all were from marine environments. The predicted protein sequences of AACY023056044 and AACY024067159 BlastP searches of the NCBI non-redundant database were virtually identical. None of these fragments were using the bacterial pepsin homologues as queries returned identical to any known pepsin homologue, but each was only one other bacterial sequence. This was the most closely related to a homologue from Shewanella and A5A_A0203 gene product from Vibrio cholerae strain are likely to be derived from at least three other bacterial MZO-2 (UniProt accession A6A7Y6). This sequence con- species. Because these are fragments only, they are not sists of 382 residues with the peptidase unit predicted to considered further. be residues 14-257, and the active site residues to be Asp41, Phe87 and Asp249 (numbered according to the Several of the bacteria with pepsin homologues are translated coding sequence). The genome of this particu- marine and psychrophilic, including Colwellia psychreryth- lar strain is incomplete and no pepsin homologue is raea (formerly Vibrio psychroerythus) [13] and Shewanella present in any of the other thirteen strains of Vibrio chol- detrificans [14]. Shewanella loihica is also marine but asso- erae for which the genome sequences are complete. ciated with deep-sea, hydrothermal vents. Shewanella ama- Because this sequence might be a contaminant, it has not zonensis was isolated from marine sediments from the been included in the analysis. A search of the UniProt shallow waters of the Amazon river delta, and Shewanella sequence database using HMMER also failed to return any sediminis from sediments in Halifax Harbour, Canada other sequences from bacteria. Two Psi-Blast searches of [15]. Marinomonas sp. MWYL1 is a salt marsh species ini- the NCBI non-redundant protein sequence database were tially isolated from the root surface of the grass Spartina undertaken using the sequences of the pepsin anglica [16]. Exceptionally, Sinorhizobium medicae is found homologues from Shewanella amazonesis and Marino- associated with root nodules of Medicago species [17].

Table 2: Predicted characteristics of bacterial homologues of pepsin

Species Length Predicted active site residues Alien_Hunter score (threshold)

Colwellia psychrerythraea 441 D36, Y68, D300 6.975 (14.690) Marinomonas sp. MWYL1 406 D49, F95, D274 13.692 (13.678) Shewanella amazonensis 445 D37, Y69, D282 4.976 (13.455) Shewanella denitrificans 429 D37, Y69, D287 8.983 (10.423) Shewanella loihica 561 D41, Y73, D399 3.441 (15.338) Shewanella sediminis 514 D41, Y88, D364 12.942 (15.906) Sinorhizobium medicae 368 D53,F97,D243 5.649 8.762 (16.612)

Predicted active site residues are numbered according to the translation of the respective coding sequence. The threshold scores for Alien Hunter analysis are given in parenthesis after the raw score; a different threshold exists for each genome analysis. The pepsin homologue in Sinorhizobium medica is derived from a gene that is split between two 5000-residue windows in the Alien Hunter analysis, hence there are two scores.

Page 3 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

Table 3: Bacterial pepsin homologues in environmental samples

Accession source closest pepsin homologue % identity

AACY023084992 Eastern Pacific ocean surface water Shewanella amazonensis 38% AACY023056044 Eastern Pacific ocean surface water Shewanella denitrificans 31% AACY024067159 Eastern Pacific ocean surface water Shewanella loihica 43% AAFZ01015571 microbial mat from gray whale carcass in the Santa Cruz Basin, Pacific Ocean Shewanella denitrificans 37% (depth 1674 metres)

There are completed genome for several other Shewanella Fig. 1 shows that of the cysteines forming the three disul- species that are also marine (S. baltica and S. frigidimarina, phide bridges found in many of the eukaryotic homo- for example), but these do not contain pepsin homo- logues, only those forming the first are conserved in the logues, so there is no apparent correlation between envi- Sinorhizobium and Marinomonas homologues. Cys282 ronment and presence of a pepsin homologue. from the third disulphide bridge is retained in the sequences from Colwellia, Shewanella loihica and S. sedi- An alignment containing only bacterial homologues, minis, which may mean that the proteins are thiol- human pepsin A and memapsins 1 and 2, generated by dependent. Disulphide bridges would not be expected in extracting the sequences from the MUSCLE alignment and intracellular proteins. The Sinorhizobium and Marinomonas removing all the gap-only columns, is shown in Fig. 1. As homologues also differ in having Phe75 instead of Tyr can be seen from this figure, both active site aspartates (although it must be acknowledged that the alignment (Asp32 and Asp215) are conserved, indicating that the here is by no means certain); although this is a common bacterial homologues are bilobed. This means that each replacement, none of the homologues with Phe75 has homologue has the same structure for the peptidase unit ever been biochemically characterized or shown to be cat- as pepsin, and each would be active in the monomeric alytically active. Replacement of Tyr75 for Phe in rhizo- form. puspepsin by site-directed mutagenesis led to some, weakened activity [21]. The hydrophobic-hydrophobic- None of the proteins were predicted to possess a signal Gly motifs in the psi-loops are conserved in all the bacte- peptide. This means that unlike the majority of members rial homologues except Sinorhizobium and Marinomonas . of peptidase family A1, these bacterial proteins are not Because these bacterial homologues have inserts here, the secreted (the aspartic peptidase BcAP1 from the plant alignment is uncertain. Although it is possible that the pathogenic Botrytis cinerea also does not possess a inserts might compensate for the loss of the hydrophobic- signal peptide nor disulfide bonds [18]). Many members hydrophobic-Gly motif, which would imply a different of family A1 are active at acidic pH, being secreted in the structure in this region, it is much more likely that the two stomach, or to the lysosome or plant and fungal vacuoles, proteins lacking these motifs are not active as peptidases. but a cytoplasmic peptidase would presumably be active at neutral pH. Amongst the mammalian pepsin homo- From the solved tertiary structure the residues forming the logues, renin is secreted into the blood and is also active S1 and S1' substrate-binding pockets for human pepsin A at neutral pH. This change in pH optimum has been are known. Pepsin A prefers large hydrophobic residues explained by the replacement of Thr218 by Ala, thereby (Phe and Leu) in substrates for both P1 and P1', and the preventing a forming which affects the substrate-binding pockets are correspondingly lined with acidity of the active site residue Asp215. That this replace- hydrophobic residues [22]. The bacterial homologues, ment is also found in the HIV retropepsin, which is also with the exception of that from Marinomonas, have most active at neutral pH, supports this hypothesis [19]. of the substrate-binding residues conserved except Thr77, Intriguingly, six of the seven bacterial homologues also which is replaced by another hydrophobic residue, and have Ala218 (see Fig. 1). In pepsin A, Thr218 is hydrogen- Thr218, which is replaced by Ala (see Fig. 1). bonded to Asp303, but in renin both residues are replaced by Ala and site-directed mutagenesis of Ala303 for Asp The phylogenetic tree (Fig. 2) shows that all bar two of the lowered the pH optimum [20]. In the sequences from bacterial sequences are close to the origin of the division Shewanella and Colwellia, Asp303 is replaced by Leu, between subfamilies on the tree, implying that horizontal which is an isosteric replacement. In the Sinorhizobium transfer of genes from a recent eukaryote species to a bac- sequence Asp303 is replaced by Arg, and in the Marino- terium is unlikely (the key is found in Additional File 1). monas sequence it is replaced by Ser. There would be no This is confirmed by the results from Alien Hunter (see problem accommodating the smaller Ser in the available Table 2) which finds horizontal transfer of genes unlikely space, but the larger Arg could be more problematic. in all species except Marinomonas. Considering that Mari-

Page 4 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

1 2 3 4 5 6 7 8 9 12345678901234567890123456789012345678901ABC2345AB678901234567890ABCD12345A678901234567890A12345678901A23AB * /1 1\ * PEPA_HUMAN ------VDEQPLENYLDMEYFGTIGIGTPAQDFTVVFDTGSSNLWVP---SVYC--SSLACTNHNRFNPED----SSTYQ-STSETVSITYGTGSM-TGILGYDTVQV-GG-- MEM1_HUMAN ------AMVDNLQGDSGRGYYLEMLIGTPPQKLQILVDTGSSNFAVAGTPHSYIDT------YFDTER----SSTYR-SKGFDVTVKYTQGSW-TGFVGEDLVTIPKG-- MEM2_HUMAN ------EMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHR------YYQRQL----SSTYR-DLRKGVYVPYTQGKW-EGELGTDLVSIPHG-- BACP_COLWE ------MTTLLRIPITNVFGQGDYSVTLHLGSEKSPVNLLLDTGSSTLVVKST------TYQANL----DKKLT-PSTAVQEVNYGVGGW-NGPVIFTSINI--HEH BACP_SHEDE ------MSKHFIPLPLTNVLADGGYSASVCLGSQLAKVNLIIDTGSSTLVVHEN------RYQGMH----DTRLQ-STSLAQQVSYGVGGW-FGSVVHTRFNI-LD-- BACP_SHEAM ------MSKRIIELKLTNLLAEGGYTACLRLGSEAQSVNLIIDTGSSTLVVHES------GYKAGN----DKELK-PTTLAQEVNYGLGGW-LGAVVHTSIHA-GE-- BACP_SHELO ------MSSTTNARTLKVPLVNVYAKGGYSAQLRIGAEKHPLNLILDTGSSTLVVSGS------DYVEER----DTALT-PTSFAQEVLYGIGGW-DGPVVYTRVGI-REDA BACP_SHESE ------MAGQIKARTLKVPITNVYAKGDYCAQLRLGSQAQRANLILDTGSSTLVVKQS------AVKQNV(15)DNLLT-VTSLAQEVNYGIGGW-DGPVVTTRVAL-CEES BACP_SINME MRKGGVRSTVEAENTLVFELRRGAITNNGATPWWVSATVGTSTVPQTFKFMMDTGTTNTWVT---AKSCST--SECTQHDSFDETA----SSTFVWQDQSLTPIDFSGWGNMDVNLGNDVWVL-SD-- BACP_MARMW ----MEKFLSVIGEGKGISVSLDRGPFQNNGASPWYAYVGVGTPEQALKFSFDTGSDFIWVT---SSLCKV--DTCHHYGNQEFAY--QISSSFSWVNSQTIEVNFGPWGNMEVKTGKDLFTL-TP-- Consensus/80% ...... b.h..hps.t..sY.spl.lG*..p.hpllbDTGSSsbhV...... ap...... sspbp.sssbs.pVsYt.Gtb.pG.ls.s.hsl.....

10 11 12 CDEFGHIJKLMNOPQRST456789012ABCDEFG345678A90123456ABCDE789ABCDEFGHIJKLMNOPQRS0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop &&& PEPA_HUMAN ------ISDTNQIFG------LSETEP-GSFLYYAP-----FDG------ILGLAYPSIS------MEM1_HUMAN ------FNTSFLVNI------ATIFESENFFLPGIK-----WNG------ILGLAYATLA------MEM2_HUMAN ------PNVTVRANI------AAITESDKFFINGSN-----WEG------ILGLAYAEIA------BACP_COLWE CH---KSDHPQSIAKSPSMNLSNAPVA------LVHSDKQSKTFGD------ADG------ILGLAYHHLNKSFDFSDYFTEHKITPALSYPWPFSHEV------S BACP_SHEDE ------VSVDDMPLA------LVHHEAEH-SFIN------ADG------IWGMAYHSLNRSYDMSEYLTANAIAPPATYPWPFPEND----QPQFPLPEFQ BACP_SHEAM ------LAMADTPLA------LVHAEAEH-TFLD------ADG------IWGMAYHPLNRGYDLTTYLTSHGIDPASTFPWPFPDEA----HPL-----VK BACP_SHELO ID-FDLHDDHVPHAHHP-IVLEQCPVA------VVSSLAQETSFAD------ADG------ILGLAYDALNKVYDLKNYFHHYGIAPALTFPWPFDEEI(32)HQATETSFNS BACP_SHESE CE(6)KRLPHLHLARQP-LVLDACPLA------IVSSVKQQQTFGD------ADG------ILGLGYHHLNRSYDLKAYLESHQYSPAVTYPWPFADAI(35)KKKTELPFSS BACP_SINME ------KNAIDQDVT------VDFWLS--QSYSGARFGELIQDG------FIACGPYGDN------BACP_MARMW ------EALGQSTSGSVSLLSSDVYLA--QSYQGVQFRELDWDGGIGIPSTTGLSDSLSMYRSFRAMTPSSKT------Consensus/80%...... sh.s.shs.. hs..bs....b.s.. ...hDG.. . . IbGbtY..ls......

13 14 15 16 17 18 19 20 qrstuvwxyzABCD0123456789012345678AB901234567890ABCDEFGHIJKLMNOPQRSTUVW1234567890123AB4567A8901234ABC56789012ABC3456789012A345678 /2 PEPA_HUMAN ------SSGATPVFDNIWNQGLVSQ--DLFSVYLSADDQ------SGSVVIFGGIDSS--YYTG-SLNWVPV---TVEGYWQI---TVDSITMNGE-AIACAE MEM1_HUMAN ------KPSSSLETFFDSLVTQANIPN---VFSMQMCGAGL------PVAGS---GTNGGSLVLGGIEPS--LYKG-DIWYTPI---KEEWYYQI---EILKLEIGGQSLNLDCR MEM2_HUMAN ------RPDDSLEPFFDSLVKQTHVPN---LFSLQLCGAGF------PLNQSEVLASVGGSMIIGGIDHS--LYTG-SLWYTPI---RREWYYEV---IIVRVEINGQDLKMDCK BACP_COLWE EDLVSFKKLMWKYPEHDVTPYFTLLEEQQLSAN--K-FAFYSKRSAIHVNKS(6)KIPSDELA---KDPLNKGLLILGGGEEQTDLYQG-EFQSIKV---EHDVYYNV---DLKSVQVGTSQPVIA-- BACP_SHEDE SEASDFKQLIHGLPEQDVATAFTLMEQQGLVSN--R-FAFIAHRSSIHHAKA---NMTPESLA---LDPLNQGMLILGGDERLSQYFQG-EFTDLKV---VHDRYYNV---NLLSLSLAGGAPIKL-- BACP_SHEAM QQLDDFKALLPQLPEEDVATCFTLMEERGLCAN--R-FTLITRRSSIHVTAP---NACANTLA---ADPLNQGLLILGSHAADAAPNHG-CAVDLKV---VHDRYYNV---NLKAVRWQNDTDIAL-- BACP_SHELO EDLRQFKRFLRTQPTDDLPPYFSNLTSHGLTPN--K-FAFFARRSSIHVASQ(6)--SANEFAFLGQDPLNQGWLILGGGEEHTELYQG-EFKTIAV---IENRYYNV---NLVGLKIGDGEMIPSLL BACP_SHESE EDLKQFKQFLWQQPEQDLIPYFTALEEHGLTAN--K-FAFYSRRSSIHVTNG---DLSPQQLA---NDPLNRGWLILGGGEEHTELYQG-DFHTIRV---DHDIYYNV---TLTSIRVGDQPPIPA-- BACP_SINME ------SRSNLIFDALWYSGALAS--PWMSYWVDYTID------GAAVDSGEMIFGGSNAD--KYDPDTVISLDI---V-PGIWTH---MGDSIVVDGAEIFT--- BACP_MARMW ------EPTSFHFFQSLLDQGVVSSTCPYVTFLTD--ID------NKSGQVEFGQLNQD--YRENREYLFLPW(4)A-PYLWTS(6)IGDTQVISPESITS--- Consensus/80%...... ppsh.shFs.l.ppthsss....Ftbbhp.ssb...... s.s.G lIbGGhp.p..bapG.pb..l.l.....p.Yapl....l.slplss...h....

21 22 23 24 25 26 ABCDEFGHIJKLMNOPQRSTUVWXYZ9012345678901234567890123456789ABCDEFGHIJKL0123456789ABC012345AB6789012ABC34ABCDEFGHIJKLMNOPQRS5678 2\ * /3 PEPA_HUMAN ------GCQAIVDTGTSLLTGPTSPIANIQSDIGASE------NSDGDMVVSC---SAISSL--PDIVFTI---NG------VQYP MEM1_HUMAN EYN------ADKAIVDSGTTLLRLPQKVFDAVVEAVARASLIPEFSDG-----FWTGSQLACWTNSETPWSYFPKISIYL---RD------ENSSRSFR MEM2_HUMAN EYN------YDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDG-----FWLGEQLVCWQAGTTPWNIFPVISLYL---MG------EVTNQSFR BACP_COLWE ------EPL----NQKHLKSY-----FTNAIIDTGAGGIVLTAEIYQQVINDLIAYNANFAQCLAPF---KEMSAQLVGIDINELKLEQWPTVYFNF---VSE----PATANTE----EKIVQLA BACP_SHEDE ------PSA----LEAQLSRG-----SSNAIIDTGASLVSLPSQAFTQVIAELSQTVPQAAKLLAPF(5)KVTQAQSQGIAMSELNLALWPALEFHF---EGA------ADGRASLS BACP_SHEAM ------PTA----QAAHLCRG-----SSNAIVDSGASLICLPAVAFTPLMQRLWARLPDSKQLTSPF(5)QIMAAQKQGIDSSLLNLDDWPRLEFVF---EGA------NGENVCLG BACP_SHELO TQS(6)QAL(29)DGQHLHTH(10)-QSNAIIDTGASAIVLTRHLFGALTQALESIDPKFTQQIAPF---SDIAFQQIGIDMAELQLERWPDLTFYF---VGP(6)APSCDDPDCPPGCEPIAIR BACP_SHESE ------PAL----DEKYRKNY-----YSNAIVDTGASAIALTEPLYRALLKSLAAISPKFVPLIAPF---KEISAQERGIDASLLDLSQWPDITFTF---LGD(10)AGTQNSD----RQAVSIV BACP_SINME ------SPALLIDTGASEIKGEQAEIDKLIAAVTLDGQLPL------SPDNPDHYAY------PDLVFNL(4)DG------STGQ BACP_MARMW ------TPYFFALDTGSSQFKGDLDVMTTLYNIATKTKE------DLVIEI(4)SG------NIGK Consensus/80% ...... sptIlD*Gs*.l.hs..hbp.lhp.l...... s.p..sb..s.h.....Psl.b.b....G...... pb.

27 28 29 30 31 32 901AB23456ABCDEF78901234567890123456789012345678901234567890123456ABCDEFGHIJKLMNOPQRSTUVWXYZabc 3\ &&& PEPA_HUMAN VPP--SAYIL------QSEGSCISGFQGMNLPTESGELWILGDVFIRQYFTVFDRANNQVGLAPVA------MEM1_HUMAN ITILPQLYIQPMMGAGLNY-ECYRFG-----ISPSTNALVIGATVMEGFYVIFDRAQKRVGFAASPCAEIAGAAV------MEM2_HUMAN ITILPQQYLRPVEDVATSQDDCYKFA-----ISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRT------BACP_COLWE CTP--ETYWQ--INTPNYDKACFRLL---SQLPQWPNQSIIGLPLLNNYYVVFDRSKDKTGVVKFAQQKE------BACP_SHEDE CPA--ECYWQ--LNSPAPGRAVFKLM---GQLAGWPAQSILGLPLLNPYFVLFQRDSGEFGTVRFAAHRQKS------BACP_SHEAM VDA--HHYWQ--VNAPDYGKAVFKLM---GQLPHWPAQSILGLPLFNPYRVVFRRDLGEQGIVRFIPHPPLSQGPVSQSPLNQNPINQSKAMP-- BACP_SHELO VTP--QHYWQ--INTPTQGKACFKLL---SQLPNWPNQSILGLPLMSAYYVVFDRSVDDKGVIRFAEQRPLQTSEVQHAR------BACP_SHESE CRP--ESYWQ--LNAPGEGKACFKLL---SQLPNWPNQTLLGLPLLNDHLVIFDRSVHKTGVIRFAKQK------BACP_SINME LVFPPRAYFNYIEAGTDQGKWQIAMT---VLEDLGENSLIFGRNLLDRLYSEWEYDTSGTSVAGKTIRLAQRV------BACP_MARMW LVITSDVYDVLIEAGENKGEVIAQFQ---PMENADYIA-LVGSVLMDYLYTVYEFRVVQREGENCIIPVGMWIFNKEKGPKVIVTTQNKPARIFN Consensus/80% hs...phYbp..bssss..cssbpbh...sbbs..ss..lbG.slbp.aaslF-Rs..p.Ghh..h......

Key to sequences of peptidase units: PEPA_HUMAN: human pepsin A; MEM1_HUMAN, human memapsin 1; MEM2_HUMAN, human memapsin 2; BACP_COLWE, Colwellia psychrerythraea; BACP_SHEDE, Shewanella denitrificans; BACP_SHEAM, Shewanella amazonensis; BACP_SHELO, Shewanella loihica; BACP_SHESE, Shewanella sediminis; BACP_SINME, Sinorhizobium medicae; BACP_MARMW, Marinomonas sp. MWYL1.

AlignmentFigure 1 of bacterial pepsin homologues with human pepsin A and memapsins 1 and 2 Alignment of bacterial pepsin homologues with human pepsin A and memapsins 1 and 2. Residues are numbered according to mature human pepsin A. Inserts relative to pepsin A are indicated by letters. Each active site residue (Asp32, Tyr75 and Asp215) is indicated by an asterisk. The hydrophobic-hydrophobic-Gly motifs in the psi-loops are indicated by ampersands. A disulphide bridge is indicated by a slash over one cysteine followed by a number and a backslash preceded by the same number over the second cysteine. The Chroma software [37] has been used to highlight residues according to amino acid properties and to generate the consensus line below each alignment block showing 80% conservation or more. Inserts found in only one sequence have been removed and are indicated by the number of amino acids excised in parentheses.

Page 5 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

AAA60061 PEPA HUMAN AAH29055 PEP2B GADMO XP 602493 PEPAF RABIT PAG1 BOVIN CHYM BOVIN PEPE CHICK PEPC HUMAN Animals CATE HUMAN O57572 CHIHA O65390 ARATH NP 001031219 Q9XEC4 ARATH ASPR HORVU Q9N9H3 NECAM Plants CATD HUMAN CARP RENI2 MOUSE EAW71846 NAPSA HUMAN ASP2 BLAGE Q9GN67 EIMTE Q9GYX7 BOOMI Q9Y006 PLAFA

PLM1 PLAFA Animals Q8IM16 PLAF7 PLM2 PLAFA O65453 ARATH Q9LQA9 ARATH O76830 CAEEL A6UC43 SINMW A0WNB8 9GAMM AAW45937 PEPA ASPSA

PEPF ASPFU Bacteria PENP PENJA YE12B YEAST Q9HDT6 TRIHA CARP CRYPA CARP PODAN A1YWA4 ASPNG A1YWA5 ASPNG CARP SYNRA CARP RHICH EAL90084 CARP RHIMI CARP IRPLA CARP3 CANAL CARP2 CANAL CARP1 CANAL CARP4 CANAL CARP6 CANAL CARP5 CANAL CARP CANTR Fungi CARP8 CANAL CARP1 CANPA CARP2 CANPA A7A122 YEAS7 MKC7 YEAST YPS3 YEAST BAR1 YEAST CARP9 CANAL CARP7 CANAL ABK64120 YPS1 SCHPO AXP1 YARLI A1YWA6 ASPNG Q9C6M0 ARATH Q9LS40 ARATH Q9M356 ARATH Q9LNJ3 ARATH Q9LHE3 ARATH Q9LEW3 ARATH NP 196638 O23792 TOBAC AAL91289 Q9LYS8 ARATH Q9SJG1 ARATH Q9FL43 ARATH Q9M2U7 ARATH O04496 ARATH O23500 ARATH Q9FHE2 ARATH Q9LZL3 ARATH O22282 ARATH Q9FGI3 ARATH Q9C8C9 ARATH NEP1 NEPGR AAP72988 Q9XIR2 ARATH NP 850251 Q9C864 ARATH AAV85724 Q9SJJ2 ARATH Q9ZUU5 ARATH Q9SL33 ARATH

Q9SZV6 ARATH Plants Q9SV77 ARATH Q8S8N7 ARATH AAY78833 Q8H0K8 WHEAT Q9SVD1 ARATH Q9LTW4 ARATH Q9M1M1 ARATH FPD6 ARATH Q9FFC3 ARATH Q0WQ50 ARATH ASPL2 ARATH AAY56415 Q9MA42 ARATH Q9M8R6 ARATH ASPL1 ARATH Q9SZT5 ARATH AAB80784 NP 190704 Q9SD15 ARATH NP 190703 Q9C6Y5 ARATH AAL32740 Q9SZC6 ARATH ASP1 ORYSI Q9M9A8 ARATH Q9FMH3 ARATH Q9SN13 ARATH A8FZ98 SHESH Q489F7 COLP3 A3QHB4 SHELP A1S3N8 SHEAM

Q12R32 SHEDO Bacteria BACE1 HUMAN BACE2 HUMAN Q8I6Z5 PLAF7 0.1

PhylogeneticFigure 2 tree derived from members of peptidase family A1 Phylogenetic tree derived from members of peptidase family A1. The tree was generated for all peptidase unit sequences of family A1 holotypes, plus those from the sequences of the bacteria listed in Table 1. The tree is unrooted, but the sequence of -5, which is very divergent, was chosen as the outgroup. Homologues from bacteria are highlighted with a blue background. Branches that were present in 90% of the bootstrap trees are shown in red. Key: [see Additional file 1].

Page 6 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

nomonas is a commensal organism living on the root sur- species diverged around 350 million years ago [24]); face of a grass, the origin of a pepsin homologue via while that of human and nemepsin-2 from horizontal transfer of a gene from the host is not unex- the nematode Ancylostoma caninum is 52% (the species pected. The Marinomonas homologue is most closely diverged around 660 million years ago [23]). The muta- related to that from Sinorhizobium. These sequences do not tion rate for pepsin homologues amongst bacteria must cluster with the other bacterial homologues on the tree. therefore be considerably higher than eukaryote members Sinorhizobium is also a commensal organism, living in the of the family, and this rapid mutation rate might explain root nodules of legumes, but the region containing the their placing near the root of the phylogenetic tree. gene was not predicted to be the result of horizontal trans- fer. There are therefore two groups of bacterial , The evidence as to whether the ancestral pepsin gene in one containing sequences from Shewanella and Colwellia bacteria originated from a horizontal gene transfer from a and one containing sequences from Sinorhizobium and eukaryote or not can be summed up as follows. The posi- Marinomonas. tioning of most of the bacterial pepsins close to the diver- gence of the two subfamilies on the phylogenetic tree, the To further investigate whether the pepsin homologues in cytoplasmic location and lack of disulfide bridges and the Shewanella species were derived from horizontal gene scores from Alien Hunter imply no lateral transference; transfer, we attempted to estimate the rate of mutation. the absence of homologues in other closely related bacte- We hypothesized that if the bacterial pepsins are of rial species and Archaea and the apparent fast mutation ancient origin then their rate of mutation must be very rate are points in favour of horizontal gene transfer. low otherwise we would not be able to see the sequence similarity to eukaryotic proteins. We calculated the per- If the bacterial genes are the result of lateral gene transfer- centage identities between the pepsin sequences from ence from an ancient eukaryote gene, then there is no Shewanella species and compared these with the percent- known current eukaryote gene that resembles it, because age identities from other peptidase families. The families nearly all modern pepsin homologues are secreted pro- of 2 (peptidase family A8) and the ClpP teins and the ancient gene product would presumably be subunit of Clp (family S16) were chosen cytoplasmic. The eukaryotic peptidases that are most because these are well conserved in bacteria, and all the closely related to the bacterial homologues are the mem- Shewanella species have only one homologue per family. apsins, as can be seen in Fig. 2. The memapsins and bac- The percentage identities with respect to the homologues terial pepsins cluster with high confidence. It is possible from Shewanella denitrificans are shown in Table 4, which that memapsins, which are widely distributed in mamma- shows that the ClpP subunit is the most stable, followed lian tissues, might be closer to the ancestral peptidase by signal peptidase 2 and then pepsin. The pepsin homo- from which all other vertebrate pepsin homologues are logues are changing at twice the rate of the other two fam- derived. ilies. The closest homologues to S. denitrificans signal peptidase 2 and ClpP are the respective proteins from S. Aspartic-type peptidases unrelated to pepsin are known in loihica. However, of the Shewanella pepsin homologues, bacteria, including the gpr peptidase from Bacillus megate- that from S. loihica is most distantly related to S. denitrifi- rium (peptidase family A25) [25] and from cans, even more distantly related than the sequence from (peptidase family A26) [26]. Neither pepti- a species in a different genus, Colwellia. To put the percent- dase is inhibited by pepstatin. Sporulation factor SpoIIGA age identities into a geological time-frame, percentage (peptidase family U4) [27] has also been claimed to be an identities were calculated for cathepsin D, a pepsin homo- aspartic peptidase because of the presence of a single Asp- logue found only in animals where divergence times can Ser-Gly motif (the protein is assumed to be active as a be estimated from the fossil record. The percentage iden- dimer), but this alone is not sufficient because a same tity between cathepsin D sequences from human and pig motif occurs around the active site Asp of the serine-type is 88% (human and pig diverged around 65 million years peptidase subtilisin. Pepstatin-sensitive aspartic pepti- ago [23]), and human and Xenopus tropicalis is 73% (the dases have previously been found in Escherichia coli and

Table 4: Comparison of replacements among bacteria with pepsin homologues

Family Shewanella Shewanella Shewanella Colwellia Sinorhizobium Marinomonas sp. amazonensis loihica sediminis psychrerythraea medicae MWYL1

A1 (pepsin) 55.5 42.4 43.6 41.3 17.1 15.5 A8 (signal peptidase 2) 70.8 72.4 67.6 48.3 35.8 46.4 S16 (endpeptidase Clp) 89.0 90.9 89.9 78.1 61.6 67.2

The protein sequences from Shewanella denitrificans were compared with orthologous protein sequences from other bacteria. Sequences were aligned with Muscle [30] and percentage identities calculated using Alistat http://hmmer.janelia.org/.

Page 7 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

Haemophilus influenzae, but the sequences were not mation (NCBI) FTP site http://www.ncbi.nlm.nih.gov/ homologous to that of pepsin but homologues of gluco- Ftp/[30]. Each proteome was submitted to the MEROPS nate permease [28]. A recent publication [29] has reported batch Blast Internet utility [31]. This service performs a an acidic peptidase from a Synergistes species isolated from BlastP search [32] against a representative set of peptidase the anaerobic digester used for treatment of tannery solid sequences for every sequence in the proteome library. For waste. This peptidase is inhibited 75% by 0.01 mM pep- each sequence that is detected as homologous to a pepti- statin, but not by inhibitors of other catalytic types. No dase sequence (where the expect value is less than 0.001) sequence is available, but this may be the first character- the extent of the peptidase unit is determined and the ized pepsin homologue from a bacterium. Unlike the potential active site residues are predicted. The peptidase homologues from the marine bacteria, this is presumably unit is defined as the subdomains determined from the a secreted protein. No complete genome sequence for any tertiary structure of the type example peptidase (in this Synergistes species is publicly available, so we were unable case human pepsin A) that bear the known active site res- to test for the presence of a pepsin homologue. The idues. genomes of Dethiosulfovibrio peptidovorans and Therma- naerovibrio acidaminovorans, both of which are members of The MEROPS batch Blast software determines to which the same bacterial phylum Synergistetes, have been par- peptidase family a sequence belongs. To further character- tially sequenced (genome projects 20741 and 29531, ize the sequences, each sequence returned in the MEROPS respectively). These were searched with the amino acid batch Blast output was submitted to a further BlastP sequences of pepsin homologues from Shewanella amazo- search against the complete library of peptidase unit nensis and Sinorhizobium medicae, but no pepsin homo- sequences. During this second round of searching, logues were found, even at an E value of 10. homologous fragments are distinguished from possible false positive fragments by using a more rigorous expect Conclusion value threshold (< e-10). A sequence was defined as a frag- The pepsin family of peptidases (family A1) is not just ment if in the BlastP alignment between the query and hit confined to eukaryotes, but is also present in some bacte- sequences there was no overlap in the regions of sequence ria. These bacterial homologues are predicted to be intra- where active site residues were predicted to be. cellular, whereas nearly all other members of family A1 enter the secretory pathway. The bacterial homologues are Any pepsin homologue detected was used in a further structurally similar to pepsin, consisting of two lobes each BlastP search of the NCBI non-redundant protein of which bears one active site Asp. Two of the genes for sequence database to try to find more distantly related bacterial pepsins might be derived from horizontal gene homologues. The protein sequence database at NCBI transfer from a eukaryote, but whether all the bacterial from environmental samples was also searched for homo- genes for the bacterial pepsins are so derived is debatable, logues using BlastP. Additionally, TBlastN searches [32] and evidence is presented for and against the case. If no were done against the environmental samples nucleotide horizontal gene transfer has occurred, then the hypothet- sequence database and against all nucleotide sequences in ical gene duplication and fusion event that gave rise to the the NCBI non-redundant nucleotide sequence database. ancestral pepsin gene would therefore have occurred The UniProt database was searched using the HMMER before the divergence of bacteria and eukaryotes. If so, this software http://hmmer.janelia.org/ and a Hidden Markov has implications for the origin of the viral and retrotrans- Model built from any pepsin homologues found. Psi-Blast poson peptidases in family A2, which were either derived [32] searches of the NCBI non-redundant database were from the single-lobed ancestral sequence, or secondarily also undertaken, accumulating hits with an E-value of from the horizontal transfer of half a pepsin gene from a 0.005 or less. host. For the retropepsin gene to be derived by vertical transfer the ancestral gene would have to have been Sequence alignments present in a prokaryote. However, retroviruses infect only Sequences were aligned using MUSCLE [33] to holotypes eukaryotes and retrotransposons are unknown in prokary- of the pepsin family. In the MEROPS database a holotype otes, making a more recent horizontal transfer the more sequence is chosen to be a representative of each different likely origin of the retropepsin gene. peptidase characterized biochemically. Holotypes have also been selected for every different human, mouse and Methods Arabidopsis thaliana gene product predicted to be a pepti- Data sources and analyses dase homologue. Any sequences where the active site res- All the complete, predicted proteomes (in FastA format) idues were not conserved were removed, leaving 135 from bacterial genome sequencing projects were down- sequences. loaded from the National Center for Biotechnology Infor-

Page 8 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

Phylogenetic tree 3. Cooper JB, Khan G, Taylor G, Tickle IJ, Blundell TL: X-ray analyses of aspartic proteinases. II. Three-dimensional structure of A phylogenetic tree derived from the MUSCLE alignment the hexagonal crystal form of porcine pepsin at 2.3 A resolu- was generated using the PROML (protein maximum like- tion. J Mol Biol 1990, 214:199-222. lihood) program from the Phylip package [34]. A consen- 4. Ghosh AK, Gemma S, Tang J: beta-Secretase as a therapeutic target for Alzheimer's disease. Neurotherapeutics 2008, sus tree was also generated using the BOOTSEQ, PROML 5:399-408. and CONSENSE programs from the Phylip package for a 5. Bi X, Khush GS, Bennett J: The rice nucellin gene ortholog subset of the holotype library, retaining all holotypes OsAsp1 encodes an active aspartic without a plant- specific insert and is strongly expressed in early embryo. from subfamily A1A plus the sequences of nepenthesin Plant Cell Physiol 2005, 46:87-98. and nucellin from subfamily A1B. The number of boot- 6. Athauda SB, Matsumoto K, Rajapakshe S, Kuribayashi M, Kojima M, Kubomura-Yoshida N, Iwamatsu A, Shibata C, Inoue H, Takahashi K: straps was 100, otherwise default values were used. Enzymic and structural characterization of nepenthesin, a unique member of a novel subfamily of aspartic proteinases. Signal peptide prediction Biochem J 2004, 381:295-306. 7. Hughes AL, Green JA, Piontkivska H, Roberts RM: Aspartic protei- The SignalP server http://www.cbs.dtu.dk/services/Sig nase phylogeny and the origin of pregnancy-associated glyc- nalP/ was used to predict signal peptides [35]. oproteins. Mol Biol Evol 2003, 20:1940-1945. 8. Sansen S, De Ranter CJ, Gebruers K, Brijs K, Courtin CM, Delcour JA, Rabijns A: Structural basis for inhibition of Aspergillus Prediction of horizontally transfer DNA niger xylanase by triticum aestivum xylanase inhibitor-I. J Biol In order to find regions where DNA might have been hor- Chem 2004, 279:36022-36028. 9. Tang J, James MNG, Hsu IN, Jenkins JA, Blundell TL: Structural evi- izontally transferred between species, rather than verti- dence for gene duplication in the evolution of the acid pro- cally transferred from a common ancestor, the program teases. Nature 1978, 271:618-621. Alien Hunter was used [36]. This program uses composi- 10. Lapatto R, Blundell T, Hemmings A, Overington J, Wilderspin A, Wood S, Merson JR, Whittle PJ, Danley DE, Geoghegan KF, et al.: X- tional biases (including CG content and third codon ray analysis of HIV-1 proteinase at 2.7 Å resolution confirms usage) to detect potential horizontally transferred genetic structural homology among retroviral enzymes. Nature 1989, 342:299-302. material. The default value window of 5000 bases was 11. Sirkis R, Gerst JE, Fass D: Ddi1, a eukaryotic protein with the used. Any region scoring higher than the calculated retroviral protease fold. J Mol Biol 2006, 364:376-387. threshold for the genome is potentially derived by hori- 12. Llorens C, Futami R, Renaud G, Moya A: Bioinformatic flowchart and database to investigate the origins and diversity of Clan zontal transfer. AA peptidases. Biol Direct 2009, 4:3. 13. Methe BA, Nelson KE, Deming JW, Momen B, Melamud E, Zhang X, Moult J, Madupu R, Nelson WC, Dodson RJ, et al.: The psy- Authors' contributions chrophilic lifestyle as revealed by the genome sequence of NDR conceived the study, performed the sequence analy- Colwellia psychrerythraea 34H through genomic and pro- ses and drafted the manuscript. AB participated in the teomic analyses. Proc Natl Acad Sci USA 2005, 102:10913-10918. 14. Brettar I, Christen R, Hofle MG: Shewanella denitrificans sp. nov., design and helped to draft the manuscript. Both authors a vigorously denitrifying bacterium isolated from the oxic- read and approved the final manuscript. anoxic interface of the Gotland Deep in the central Baltic Sea. Int J Syst Evol Microbiol 2002, 52:2211-2217. 15. Hau HH, Gralnick JA: Ecology and biotechnology of the genus Additional material Shewanella. Annu Rev Microbiol 2007, 61:237-258. 16. Van Landschoot A, de Ley J: Intra- and intergeneric similarities of the rRNA cistrons of Alteromonas, Marinomonas (gen. Additional file 1 nov.) and some other gram-negative bacteria. J Gen Microbiol 1983, 129:3057-3074. Key to Fig. 2. (phylogenetic tree derived from members of peptidase 17. Rome S, Fernandez MP, Brunel B, Normand P, Cleyet-Marel JC: family A1). the file contains the key to the tips of the phylogenetic tree. Sinorhizobium medicae sp. nov., isolated from annual Medi- Click here for file cago spp. Int J Syst Bacteriol 1996, 46:972-980. [http://www.biomedcentral.com/content/supplementary/1471- 18. Ten Have A, Dekkers E, Kay J, Phylip LH, van Kan JA: An aspartic 2164-10-437-S1.doc] proteinase gene family in the filamentous fungus Botrytis cin- erea contains members with novel features. Microbiology 2004, 150:2475-2489. 19. Ido E, Han HP, Kezdy FJ, Tang J: Kinetic studies of human immu- nodificiency virus type 1 protease and its active-site hydro- gen bond mutant A28S. J Biol Chem 1991, 266:24359-24366. Acknowledgements 20. Yamauchi T, Nagahama M, Hori H, Murakami K: Functional char- We would like to thank Dr K. Woodwark, Dr P. Gardner and Dr A. J. Bar- acterization of Asp-317 mutant of human renin expressed in rett for helpful suggestions and discussions. We would also like to thank the COS cells. FEBS Lett 1988, 230:205-208. 21. Park YN, Aikawa J, Nishiyama M, Horinouchi S, Beppu T: Involve- diligent reviewers of this manuscript who made many constructive sugges- ment of a residue at position 75 in the catalytic mechanism tions. This work was supported by the Wellcome Trust [grant number of a fungal aspartic proteinase, Rhizomucor pusillus pepsin. WT077044/Z/05/Z]. Replacement of tyrosine 75 on the flap by asparagine enhances catalytic efficiency. Protein Eng 1996, 9:869-875. 22. Abad-Zapatero C, Rydel TJ, Neidhart DJ, Luly J, Erickson JW: Inhib- References itor binding induces structural changes in porcine pepsin. Adv 1. Rawlings ND, Morton FR, Kok CY, Kong J, Barrett AJ: MEROPS: Exp Med Biol 1991, 306:9-21. the peptidase database. Nucleic Acids Res 2008, 36:D320-D325. 23. Lee MS: Molecular clock calibrations and metazoan diver- 2. Rawlings ND, Barrett AJ: Evolutionary families of peptidases. gence dates. J Mol Evol 1999, 49:385-391. Biochem J 1993, 290:205-218.

Page 9 of 10 (page number not for citation purposes) BMC Genomics 2009, 10:437 http://www.biomedcentral.com/1471-2164/10/437

24. Hugall AF, Foster R, Lee MS: Calibration choice, rate smoothing, and the pattern of tetrapod diversification according to the long nuclear gene RAG-1. Syst Biol 2007, 56:543-563. 25. Carroll TM, Setlow P: Site-directed mutagenesis and structural studies suggest that the germination protease, GPR, in spores of Bacillus species is an atypical protease. J Bacteriol 2005, 187:7119-7125. 26. Vandeputte-Rutten L, Kramer RA, Kroon J, Dekker N, Egmond MR, Gros P: Crystal structure of the outer membrane protease OmpT from Escherichia coli suggests a novel catalytic site. EMBO J 2001, 20:5033-5039. 27. Imamura D, Zhou R, Feig M, Kroos L: Evidence that the Bacillus subtilis SpoIIGA protein is a novel type of signal-transducing . J Biol Chem 2008, 283:15287-15299. 28. Hill J, Phylip LH: Bacterial aspartic proteinases. FEBS Lett 1997, 409:357-360. 29. Kumar AG, Nagesh N, Prabhakar TG, Sekaran G: Purification of extracellular acid protease and analysis of fermentation metabolites by Synergistes sp. utilizing proteinaceous solid waste from tanneries. Bioresour Technol 2008, 99:2364-2372. 30. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al.: Database resources of the National Center for Biotechnol- ogy Information. Nucleic Acids Res 2008, 36:D13-D21. 31. Rawlings ND, Morton FR: The MEROPS batch BLAST: a tool to detect peptidases and their non-peptidase homologues in a genome. Biochimie 2008, 90:243-259. 32. Altschul SF, Madden TL, Schöffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402. 33. Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 2004, 5:113. 34. Felsenstein J: PHYLIP - phylogeny inference package. Cladistics 1989, 5:164-166. 35. Emanuelsson O, Brunak S, von Heijne G, Nielsen H: Locating pro- teins in the cell using TargetP, SignalP and related tools. Nat Protoc 2007, 2:953-971. 36. Vernikos GS, Parkhill J: Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics 2006, 22:2196-2203. 37. Goodstadt L, Ponting CP: CHROMA: consensus-based colour- ing of multiple alignments for publication. Bioinformatics 2001, 17:845-846.

Publish with BioMed Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime." Sir Paul Nurse, Cancer Research UK Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright

Submit your manuscript here: BioMedcentral http://www.biomedcentral.com/info/publishing_adv.asp

Page 10 of 10 (page number not for citation purposes)