Protein Science ~2000!, 9:1930–1934. Cambridge University Press. Printed in the USA. Copyright © 2000 The Protein Society

The PA domain: A -associated domain

PIERS MAHON1 and ALEX BATEMAN2 1 Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, United Kingdom 2 The Sanger Centre, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom ~Received December 7, 1999; Final Revision July 20, 2000; Accepted August 3, 2000!

Abstract We have identified a similarity between the apical domain of the human transferrin receptor and several other protein families. This domain is found associated with two different families of peptidases. Therefore, we term it the PA domain for protease-associated domain. The PA domain is found inserted within a loop of the peptidase domain of family M80M33 zinc peptidases. The PA domain is also found in a vacuolar sorting receptor and a ring finger protein of unknown function that may be a cell surface receptor. The PA domain may mediate substrate determination of peptidases or form protein–protein interactions. Keywords: aminopeptidase Y; cell wall-associated ; transferrin receptor; vacuolar receptor

Trafficking of soluble proteins through subcellular compartments that contains a PA domain at the N-terminus and three EGF do- is crucial to eukaryotic life. This process depends on integral mem- mains preceding the C-terminal transmembrane helix and the short, brane proteins that act as receptors. These bind lumenal proteins in tyrosine-targeting motif-containing, cytosolic region. These pro- one subcellular location. After transport of the receptor0lumenal teins bind the NPIR motif of soluble proteins destined for the lytic protein complex to the target organelle the lumen proteins are vacuole ~Kirsch et al., 1994, 1996!. The protein is found in the released. While studying the structure of a plant specific vacuolar prevacuolar compartments of plant cells ~Sanderfoot et al., 1998!. receptor, BP-80, we found distant homology in one domain to the There are at least 10 of these receptors in Arabidopsis thaliana, mammalian transferrin receptor, involved in the endocytosis of and they are conserved in both monocot and dicot plants, suggest- transferrin. The occurrence of this domain in such diverse traffick- ing an important role for this family. ing receptors was striking, and led us to further characterize the We found that C-RZF ~Tranque et al., 1995! contains an domain. The publication of the three-dimensional structure of the N-terminal PA domain, separated by transmembrane helix pre- ectodomain of the transferrin receptor ~Lawrence et al., 1999! dicted with Tmpred ~Hofmann & Stoffel, 1993!, from a previously allowed us to relate the sequence of these domains to a known recognized C-terminal ring finger domain. Members of this family structure. were found in round 5 of the PSI-Blast search with an E-value of We used the common structural domain between BP-80 and the 0.005. Members of the RZF family are found in fungi, plants, and human transferrin receptor ~residues 184–384 of P02786! as a metazoa. Every member has a predicted signal peptide using Sig- query using PSI-BLAST ~Altschul et al., 1997! with an E-value nalP ~Nielsen et al., 1999!. The PA domain from this family is threshold of 0.01. The search converged after eight iterations. Using more closely related to that from the plant vacuolar receptors, the BP-80 domain ~residues 27–182 of P93484! as a query con- suggesting that the plant specific vacuolar receptors may have verges to essentially the same set of proteins. We have called this evolved from this family. We predict the PA domain to be extra- domain the PA domain for protease-associated domain. Based on cellular and the ring finger intracellular. This conflicts with Tranque the known structure of the transferrin receptor, the PA domain is et al. ~1995!, who suggest that C-RZF is a soluble nuclear protein. 170–210 amino acids long and has a b-sandwich structure with However, the differential sedimentation method used in this paper two peripheral helices ~see Fig. 1A!. The PSI-BLAST searches to support nuclear localization does not distinguish nuclear and reveal that the PA domain occurs in four distinct families of pro- membrane locations. As we predict C-RZF to be an integral mem- teins as described below. brane protein, not soluble, Triton X-114 phase partitioning would The plant vacuolar receptors homologous to BP-80 ~Kirsch et al., resolve this conflict ~Brusca & Radolf, 1994!, as our structure 1994; Paris et al., 1997! were found in the second round of search- predicts C-RZF to enrich in the Triton detergent phase, in contrast ing with an E-value of 0.002. They have a large lumenal region to Tranque et al.’s transcription factor hypothesis, which would enrich in the aqueous phase. Reprint requests to: Alex Bateman, The Sanger Centre, Wellcome Ge- The structure of the transferrin receptor ectodomain has recently nome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom; e-mail: been published ~Lawrence et al., 1999! and shows that the receptor [email protected]. contains three structural domains. The first domain is an inactive 1930 PA domain 1931 protease related to known M80M33 amino- and carboxypepti- dases. Inserted within a loop in the protease domain is the apical domain. The third all-helical C-terminal domain is involved in receptor dimerization ~Lawrence et al., 1999!. The apical domain is equivalent to the PA domain ~see Fig. 1B,C! and allows us to define the domain boundaries of the PA domain. Rawlings and Barrett have examined the domain structure of the related gluta- mate carboxypeptidase0prostate-specific membrane antigen pro- tein ~PSM! and call the PA domain domain D ~Rawlings & Barrett, 1997!. PSM has two known splice variants—one is membrane bound, whereas the other is cytosolic ~Su et al., 1995!. This is the only clear example of the PA domain occurring in the cytosol. The cytosolic form may predispose cells to localized folate deficiency, a possible factor in prostate cancer progression ~Heston, 1997!. The PA domain is found in the Pyrolysin family of subtilases ~Siezen & Leunissen, 1997!, which includes bacterial endopepti- dases such as c5a protease, involved in immune response evasion, and plant subtilases such as cucumusin that are involved in plant pathogen defense and development. Members of this family were found in the first iteration of PSI-blast with an E-value of 0.0001. Fig. 1. A: The structure of the PA domain. The structure was drawn using The PA domain is equivalent to the I domain of Siezen Siezen, ~ MOLSCRIPT ~Kraulis, 1991! on Protein Data Bank file 1CX8. Secondary 1999! and vr13 of Siezen and Leunissen ~1997!. Removal of the structures are shown in ribbon representation with those aligned in ~B! PA domain in the Lactococcus lactis cell-envelope proteinase shaded black and marked with designations from Lawrence et al. ~1999!. changed the caseinolytic specificity of the ~Bruinenberg The dotted line represents the apical traverse that is found to be variable in et al., 1994!. They suggest that this region is involved in substrate length. B: A multiple alignment of the core conserved regions of repre- sentative PA domains. We constructed a multiple alignment of PA domains specificity. with CLUSTALW followed by manual adjustment. Although we know the In addition to the above four families, the PA domain was found structure of the whole domain, only the regions that can be confidently in a protein from Deinococcus radiodurans ~DRA0325! where it is aligned are shown. The first column gives the SWISS-PROT or TrEMBL N-terminal to a region homologous to the Peptide-N4-~N-acetyl- name. The TrEMBL names are followed by a five-letter SWISS-PROT d species designation. The names are followed by the start and end points of b- -glucosaminyl! asparagine Amidase F ~PNGase F!. The func- the alignment in the whole sequence. Conserved residues in the alignment tion of PNGase F is to cleave the b-aspartoglucosylamine bond of are highlighted using the CLUSTALX color scheme from the JALVIEW N-linked glycans. The PA domain is found in isolation in the program ~M. Clamp, pers. obs.! in default mode. The secondary structure Drosophila protein CG9849 and in association with a glycosyl is marked with an arrow for b-strands and a cylinder for a-helix, the of family 47 in CG5682. The function of these Dro- designations for secondary structures is taken from Lawrence et al. ~1999!. Numbers in brackets indicate the length of insertions not shown in the sophila proteins awaits experimental determination. alignment. C: A schematic representation of the domain organization of PA We propose that the PA domain is a novel protein–protein in- domain containing proteins. The SWISS-PROT or SP-TrEMBL accession teraction domain for the following reasons: ~1! the linkage via the number for a representative of each domain organization is shown in brack- PA domain of the two trafficking receptors BP-80 and the trans- ets. The species distribution of each group is also shown. The boundaries ferrin receptor suggest this domain is responsible for their conser- of the domains in this figure may differ from those of the structural domain. ~Figure continues on next page.! vation of function, i.e., binding soluble proteins. ~2! The proposed model for transferrin binding to its receptor ~Lawrence et al., 1999! uses a large patch of the PA domain for the binding. ~3! In subti- lases, the PA domain has been shown to be a determinant of pro- tilases and in trafficking receptors would predict the following tease specificity, but its presence is not essential for catalytic activity results: ~1! Removal of a PA domain from an active protease should ~Bruinenberg et al., 1994!. In the deletion mutant of the outer reduce the binding affinity of the enzyme for protein substrates, envelope protease of L. lactis ~involved in the uptake of essential reducing kcat and increasing koff for the reaction. ~2! Monoclonal amino acids from milk!, the relative amounts of the breakdown antibodies to the PA domain in the outer envelope protease of products of the substrates as1 and b casein changed with respect to L. lactis ~Bruinenberg et al., 1994! should preferentially inhibit the the wild-type, but the general cleavage pattern remained the same. activity of the subtilase toward its native protein substrates sub- This suggests that the mutant was active toward all the cleavage stantially over and above any inhibition towards short, cleavable sites in the caseins, but with differing effectiveness from the wild- peptides. ~3! Swapping PA domains between with dif- type. A protein interaction domain could achieve this by binding ferent targets, for instance, between the c5a protease of Strepto- the substrate and inhibiting or enhancing the presentation of cer- coccus pyogens ~Chen & Cleary, 1990! and the outer envelope tain cleavage sequences to the . ~4! PA domains have protease of L. lactis, should change the substrate specificity in a been recruited in multiple independent evolution events to both predictable manner. ~4! Monoclonal antibodies to the PA domains subtilases and M80M33 proteases. This suggests that the role of of BP-80 and the transferrin receptor should be competitive inhib- PA domain in proteases is independent of the active site type. The itors of their respective binding functions. specificity determination properties of the PA domain outlined If we are correct in our hypothetical function for the PA domain above would provide such a role. then the structure of RZF proteins is suggestive of a new family of Our hypothesis that the PA domain is a novel protein interaction receptors or signal transducers. We predict the protein interaction domain with a particular role in binding substrate proteins in sub- domain of the RING zinc finger to be in the cytosol and the protein 1932 .MhnadA Bateman A. and Mahon P.

Fig. 1. Continues. PA domain 1933

Fig. 1. Continued.

interaction domain of the PA domain to be extracellular. Both Boyle MDP. 1995. Variation of multifunctional surface binding proteins—A could interact with protein substrates on opposite sides of the virulence strategy for group A streptococci? J Theor Biol 173:415–426. Bruinenberg PG, Doesburg P, Alting AC, Exterkate FA, de Vos WM, Siezen RJ. membrane. Finally, we note that Gram-negative bacteria swap do- 1994. Evidence for a large dispensable segment in the -like cata- mains in cell surface proteins in what appears to be a passive lytic domain of the Lactococcus lactis cell-envelope proteinase. Protein Eng evasion of the immune response ~Boyle, 1995!. The bacterial sub- 7:991–996. tilases with the PA domain are also extracellular and some, for Brusca JS, Radolf JD. 1994. Isolation of integral membrane proteins by phase partitioning with Triton X-114. Methods Enzymol 228:182–193. instance, the c5a protease, are involved in the active evasion of the Chen CC, Cleary PP. 1990. Complete nucleotide sequence of the streptococcal immune response ~Ji et al., 1996!. Domain recombination by bac- C5a peptidase gene of Streptococcus pyogenes. J Biol Chem 265:3161– teria in these proteases could provide a method of generating di- 3167. Heston WDW. 1997. Characterization and glutamyl preferring carboxypeptidase versity in the active immunity evasion system through the shuffling function of prostate specific membrane antigen: A novel folate hydrolase. of protease activity determinants. Urology 49:104–112. Hofmann K, Stoffel W. 1993. TMbase—A database of membrane spanning proteins segments. Biol Chem Hoppe-Seyler 347:166. Acknowledgments Ji YD, McLandsbrough L, Kondagunta A, Cleary PP. 1996. C5a peptidase alters clearance and trafficking of group A streptococci by infected mice. Infect We would like to thank Chris Ponting for very helpful comments on the Immunity 64:503–510. manuscript. Kirsch T, Paris N, Butler JM, Beevers L, Rodgers JC. 1994. Purification and initial characterization of a potential plant vacuolar targeting receptor. Proc Natl Acad Sci USA 91:3403–3407. References Kirsch T, Saalback G, Raikhel NV, Beevers L. 1996. Interaction of a potential vacuolar targeting receptor with amino and carboxyl terminal targeting de- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. terminants. Plant Physiol 111:469–474. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein data- Kraulis P. 1991. MOLSCRIPT: A program to produce both detailed and sche- base search programs. Nucleic Acids Res 25:3389–3402. matic plots of protein structures. J Appl Crystallogr 24:946–950. 1934 P. Mahon and A. Bateman

Lawrence C, Ray S, Babyonyshev M, Galluser R, Borhani D, Harrison S. 1999. izes with AtPEP12p on a prevacuolar compartment in Arabidopsis roots. Crystal structure of the ectodomain of human transferrin receptor. Science Proc Natl Acad Sci USA 95:9920–9925. 286:779–782. Siezen RJ. 1999. Multi-domain, cell-envelope proteinases of lactic acid bacteria. Nielsen H, Brunak S, von Heijne G. 1999. Machine learning approaches for the Antonie van Leeuwenhoek 76:139–155. prediction of signal peptides and other protein sorting signals. Protein Eng Siezen RJ, Leunissen JAM. 1997. Subtilases: The superfamily of subtilisin-like 12:3–9. serine proteases. Protein Sci 6:501–523. Paris N, Rogers SW, Jiang L, Kirsch T, Beevers L, Phillips TE, Rogers JC. 1997. Su SL, Huang I-P, Fair WR, Powell CT, Heston DWD. 1995. Alternatively Molecular cloning and further characterization of a probably plant vacuolar spliced variants of prostate-specific membrane antigen RNA: Ratio of ex- sorting receptor. Plant Physiol 115:29–39. pression as a potential measurement of progression. Cancer Res 55:1441– Rawlings N, Barrett A. 1997. Structure of membrane glutamate carboxypepti- 1443. dase. Biochim Biophys Acta 1339:247–252. Tranque P, Crossin KL, Cirelli C, Edelman GM, Mauro VP. 1995. Identification Sanderfoot AA, Sharif UA, Marty-Mazars D, Rapoport I, Kirchhausen T, Marty and characterization of a RING zing finger gene ~C-RZF! expressed in F, Raikhel NV. 1998. A putative vacuolar cargo receptor partially colocal- chicken embryo cells. Proc Natl Acad Sci USA 93:3105–3109.