Copyright 0 1988 by the Genetics Society of America

Many Products From a Few Loci: Assignmentof Human Salivary Proline-Rich to Specific Loci

Karen M. Lyons,* Edwin A. Azen,* Patricia A. Goodman?and Oliver Smithies* *Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin 53706, and tDepartment of Neurology, University of Cal$ornia, San Francisco, California 94143 Manuscript received January 14, 1988 Revised copy accepted May 12, 1988

ABSTRACT Earlier studies of protein polymorphisms led to the description of 13 linked loci thought to encode the human salivary proline-rich proteins (PRPs). However, more recent studies at theDNA level have shown that there areonly six which encode PRPs. The present study was undertaken in order to reconcile these observations. Nucleotide and decoded amino acid sequences from each of the six genes were compared with the available protein sequence data for PRPs. This analysisallowed assignment of the PmF, PmS and Pe proteins to the PRBI locus, the G1 protein to the PRB3 locus, the Po protein to the PRB4 locus, the Ps protein to the PRB2 locus, and the CONl and CON2 proteins to the PRB4 locus. Correlations between insertion/deletion RFLPs and PRP protein pheno- types were observed for the PmF, PmS, GI and CON2 proteins. Our overall analysis indicates that in many instances several proteins previously considered to be the products of separate loci are actually proteolytic cleavage products of a large precursor specified by one or other of the six genes identified at the DNA level. Our analysis also demonstrates that some of the "null" alleles proposed to occur at 11 of the 13 loci in the earlier genetic studies, are actually productive alleles having alterations at proteolytic cleavage sites within the relevant precursor protein. The absence of cleavage leads to the persistence of longer precursor peptides not resolved electrophoretically, concurrently with an absence of the smaller PRPs seen when cleavage occurs.

HE human proline-rich proteins (PRPs) are a DENNISTON1980), PC (KARN, GOODMAN and Yu T heterogeneous group of more than 20 proteins 1985), Po (AZENand Yu 1984a), and Pe (AZENand that constitute approximately 70% of the protein con- YU 1984a). The glycosylated PRPs were proposed to tent of human saliva. They are characterized by an be encoded by three loci: CONl (AZENand Yu 1984b), abundance of the amino acids proline (25-42%), gly- CON2 (AZENand Yu 1984b), and GZ (AZEN,HURLEY cine (16-22%) and glutamic acid/glutamine (15- and DENNISTON1979). Null alleles, characterized by 28%), which together make up 70-88% of their total the absence of the corresponding PRP on electropho- amino acid content. PRPs have been classifiedas resis gels, were postulated for eleven of the thirteen either acidic, basic, or glycosylated. They all contain loci. Null alleles were not described for thePr and PC a series of proline-rich tandem repeats 16-21 amino loci. acids in length (reviewed by BENNICK1987). DNA clones corresponding to members of the hu- Electrophoretic studies of polymorphic PRPs led to man PRP multigene family were isolated by AZENet the description of 13 linked loci, each of which en- al. (1984). Subsequent DNA studies (MAEDA 1985; coded one of these polymorphic PRPs (reviewed by Maeda et al. 1985) suggested that the human PRP AZENand MAEDA 1988). Linkage studies indicated multigene family contains only six genes rather than that the loci covered a genetic distance of 15 cM, the thirteen proposed in the earlier genetic studies. A suggesting that either the PRP cluster spanned determination of the physical organization of the PRP a very large physical distance, or that recombination gene complex (H.-S. KIM, unpublished observation) frequently occurs within the multigene family (GOOD- has confirmed that there areonly six genes encoding MAN et al. 1985). The loci proposed to encode acidic PRPs. PRPs were Pr (AZENand OPPENHEIM1973; AZENand The DNA studies also revealed that the PRP mul- DENNISTON1974), Pa (FREIDMAN,MERRITT and RI- tigene family can be dividedinto two subfamilies based VAS 1975; AZEN 1977), Db (AZENand DENNISTON on the DNA sequences of the tandem repeats that 1974) andPZF (AZENand DENNISTON198 1). The loci comprise the third exons of these genes: PRHl and proposed to encode basic PRPs were PmF (IKEMOTO PRH2 form one subfamily (the PRH genes) and en- et aZ. 1977), PmS (AZENand DENNISTON1980; AN- code acidic PRPs (KIM and MAEDA 1986; AZENet d. DERSON, KAUFFMANand KELLER 1982), Ps (AZENand 1987);PRBl, PRBB, PRB3 and PRB4 (the PRB genes) Genetics 120 255-265 (September, 1988) 256 K. M. Lyons et al. form the secondsubfamily (MAEDA 1985). Length method of PONCZet al. (1982). Genomic DNA (5 Mg per variants, caused by insertions or deletions of DNA, lane) was digested withEcoRI, electrophoresed in 0.8% agarose gels, and transferred to nitrocellulose according to have been identified at each of the PRB loci (O’CON- the method of SOUTHERN(1 975) with modifications as de- NELL et al. 1987; LYONS,STEIN and SMITHIES1988). scribed by WAHL,STERN and STARK(1979). Filters were MAEDAet al. (1985) demonstrated by cDNA studies hybridized to anick-translated 980 bp HinfI fragment from that differential mRNA splicing and proteolytic proc- PRBl (AZENet al. 1984). Hybridization conditions were as essingof products of the PRP loci can potentially described by VANINet al. (1983). generate multiple PRPs from a single transcription unit, and BENNICK(1 987)has compared the available RESULTS ANDDISCUSSION protein sequences with the decoded amino acid se- quences for these cDNAs. Whileboth of these studies Overall assignments: Our general strategy for as- show that a single locus can encode multiple PRPs, signing the basic and glycosylated PRPs to the PRB they only allowa partial reconciliation of the number locihas been to compare the decoded amino acid of loci proposed in the genetic studies and thenumber sequences derived from the nucleotide sequences of demonstrated in DNA studies. For example, the dis- thethird exons ofvarious alleles of these genes crepancies betweenthe proposed mode of inheritance (LYONS,STEIN and SMITHIES1988; MAEDAet al. 1985) of the acidic PRPsand theDNA studies were resolved with protein sequences determined by other investi- by MAEDA(1985), and the extension of these ideas to gators (KAUFMANet al. 1982, 1986; KAUFFMANand the basic and glycosylated PRPswas hypothesized; she KELLER1979; SAITOH,ISEMURA and SANADA1983a). suggested that the electrophoretic data, originally This strategy is practical because the third exon con- used to support the existence of the four loci Pr, Db, tains nearly all ofthe protein coding portion of PRP Pa and PZF, could be interpreted more economically genes (AZEN et al. 1984). Where protein sequence in terms of two loci with no null alleles. This reinter- data are incomplete, we have used small peptide se- pretation has since been confirmed by DNA studies quences (SHIMOMURA,KANAI and SANADA1983) or (KIM and MAEDA 1986; AZENet al., 1987) which data on amino acidcomposition (GOODMANet al. demonstrate that PRHZ encodes the Pa, Db and PIF 1985). We have also usedelectrophoretic comparisons proteins and PRH2 encodes the Pr protein. of purified PRPs with the polymorphic PRPs (AZEN In our present study, we have reexamined the in- 1988). Finally, we compared the segregation in fami- heritance of the basic and glycosylated PRPs inorder lies of DNAlength variants detected by Southern blot to furtherresolve the discrepancy betweenthe results analysis withthe segregation of PRP variants detected of the original genetic studies and the DNA data. By by electrophoresis. comparing the amino acid sequences decoded from In order tofacilitate the presentation in the follow- the DNA sequences of the four PRB genes with the ing sections of the detailed considerations on which available protein sequences, and by comparing the our assignments are based, we first present in Figure segregation of allelic DNA length variants with the 1 a summary of our overall conclusions. For the sake segregation of PRP proteins in family members and of completeness we have included in the figure the in unrelated individuals, we have been able to assign assignments ofthe acidic proteins Pa, Pr, PIF andDb almost every basicand glycosylated polymorphic PRP as proposed by MAEDA(1985) and confirmed by AZEN to a particular locus. An essentially complete outline et al. (1987). All six PRP loci are therefore illustrated of the genetic and phenotypic complexities ofthe PRP in the figure along with specific allelesfor each locus, system is generated by this analysis. and theproteins that are theproducts of these specific alleles. MATERIALS AND METHODS Reexamination of theinheritance of thebasic proline-richproteins: Polymorphismsof the basic Electrophoretic typingof PRPs in saliva: Parotid saliva samples were collected as described by AZENand DENNISTON PRPs, PmS, PmF, Peand Po had been interpreted as (1974). Ps, PmF and PmS protein polymorphisms were being due to the autosomal inheritance of one ex- typed by electrophoresis in acid-lactate polyacrylamide gels pressed and one unexpressed allele at each of four stained withCoomassie brilliant blue R-250 (AZENand loci. The basic PRP PC was ascribed to the autosomal DENNISTON1980). The Po protein polymorphism was typed inheritance of two expressed alleles at a fifth locus on immunoblots treated with anti-Ps or anti-Pr serum as 1985). described by AZEN andYu (1984a). The GI protein poly- (KARN,GOODMAN and Yu A sixth locus with morphism was typed by electrophoresis in acid-lactate poly- two expressed alleles and one unexpressed allele was acrylamide gels stained with periodic acid-Schiff asdescribed proposed to control the expression ofone of the basic by AZEN,HURLEY and DENNISTON(1979). CON1 and PRPs, Ps (AZENand DENNISTON1980). CON2 protein polymorphisms were typed by electropho- The PRBl locus encodes the PmF and PmS pro- resis in SDS gels with a concanavalin A stain asdescribed by teins: The amino acid sequencesof the PmF and PmS AZENand Yu (1984b). Southernblot hybridizations: High molecularweight proteins were determined by KAUFFMANel al. (1 982 DNA was prepared from peripheral blood leukocytes by the and 1986, respectively). The decoded amino acid Controlling SalivaryLoci Controlling PRPs 257

LOCUS: ALLELE: PROTEINS: described in the next section indicate that the PRBI' allele is associated withthe PmF+PmS+phenotype and the PRBIM allele is associated with the PmF-PmS- phenotype. Figure 2A shows that a portion of the decoded amino acid sequence of the allele PRBl', amino acid residues 59 to 1 19, is identical to the protein sequence determined for PmF (KAUFMANet al. 1982). We there- fore conclude that, as illustrated in Figure 1, part of the allele PRBl codes for PmF. Figure 2A also shows that amino acid residues 120 to 237 coded by PRBls comprise a second decoded amino acid sequence which is identical to the amino acid sequence of PmS (KAUFMANet al. 1986)except for asingle substitution. The PmS protein sequence has a glycine at amino acid position 123 whereas the PRBZ' allele encodes an arginine at this position; this difference is unlikely to result in a PmS- phenotype. We therefore conclude ps_t- "i"' - that a different part of the allele PRBl' encodes PmS, as illustrated in Figure 1.

ICDII-I1 For these conclusionsto be valid, it is necessarythat PRE3 t GI 1 thejunction between the decoded amino acidse- IN11 - IS 1 1 2 1 1 1cI quences encoding the PmF and PmS peptides (be- GI 3 PRB3 PRE3' t 8 IN11 1 2 1 >I IC1 tween amino acid residues l 19 and 120) in PRBls

ICOll-01 include a proteolytic cleavage site for a salivary pro- MNl IP-01 PR84L !%tu? tease. Although most characterized sites of post-trans- t " t-pO" PRB4 9 IN 11 1 3 1 2 2-3 2 1 XCI lational proteolytic cleavage are a sequence of two basic amino acids (BOND and BUTLER FIGURE 1 .-Summaryof assignments of specific proline-rich 1987), proteins to the six PRP loci. The names of specific loci are shown SCHWARTZ(1986) has shown that a single arginine in boldface capitals, specific alleles in lightface capitals, encoded residue, often preceded by a proline can serve as a proteins in light type. The names of encoded proteins or small site of proteolytic processing. peptides with sequence identity to the proteins indicated (see text) SCHWARTZ pointed outthat the general are enclosed insmall square brackets. The coding sequences of (1986) specific alleles are illustrated schematically as arrays of open boxes structure basic-x-x-arginine is common to avariety of and open arrows. Small vertical arrows indicate potential proteolytic monobasic proteolytic cleavage sites. He also noted cleavage sites. The boxes labeled "S" represent a signal peptide. that charged and polar amino acids are often found The boxes labeled "NI" and "N2" represent amino-terminal se- near the site ofproteolytic cleavage. Our data are well quences. Each open arrowrepresents one proline-rich tandem repeat. The box labeled "C" represents the carboxyl-terminal re- explained by the assumption that the related struc- gion. Stippled triangles indicate N-linked glycosylation sites. The ture, arginine-serine-x-arginine-serineserves to gen- allele of PRB2 presented in the figure is an incomplete cDNA copy erate potential proteolytic cleavage sites inthe human of atranscript derived from PRB2L (MAEDA et al. 1985). The PRPs, with cleavage occurring after the last arginine localization of the coding regions for each protein are shown by residue. This structure contains a basic amino acid lines. Where partial proteolytic cleavage is known to occur, the lengths of both protein products are shown, e.g., Db fast is coded (arginine) three residues upstream of the site of pro- by the PRHl ' allele of the PRHl locus by N1, N2,and 4% repeats, teolytic cleavage, an arginine immediately upstream Db slow by N1, N2,and 6 repeats plus C. The tentative assignment of the site of cleavage, and polar amino acids (serine of the PmS protein (in parentheses) in PRB2L requires that the residues) near the potential proteolytic cleavage site, rightmost proteolytic cleavage site indicated in PRB2L is not func- tional. features discussed by SCHWARTZ(1986) as common elements of monobasic proteolytic cleavage sites. We sequences for the third exons of two alleles at the cannot determine whether all amino acid sequences PRBl locus, PRBIM and PRBl', were determined by that contain the above structure have functional pro- LYONS,STEIN and SMITHIES(1 988). These alleles were teolytic cleavage sites in PRPs in vivo. However, com- cloned from a female whose saliva was typed electro- parison of the amino acid sequences decoded from phoretically as PmS+PmF+. The segregation of the the DNA sequences with the actual amino acid se- PmS and PmF proteins in her family members indi- quences of PRP proteins determined by other inves- cates that she must carry one encoding tigators, as discussed in the following sections, shows a PmF+PmS+phenotype and one chromosome encod- that all of the proteolytic products of the PRPs char- ing a PmF-PmS- phenotype. The association studies acterized to date conform to this rule. 258 K. M. Lyons et al.

A. PRB1 B. PRB1

,l PPPGKPUGPPPUGGHKPPGPP 64 CCTCCTCCAG(rAAAGCCACAAG~ACCACCCCCACA~GGAGGCAACAAACClCA~G~~CC1CC~ LO PPGIPPGPPPOGDL SRSPii 127 ... CCICCAGGAIAGCCACAIGGACCACCCCCACAAGG~~~C~A...G~CCCGLL~IC~~C~~58dla1 )[GEII-d 59 79 SPP~~POLPPPPGGNPPOU~P 184 I~TCClCCAGGAA~AC~ACAAGGACCACCCCCICAAGGA~GIAACCAGCCCCAAGblCCCCCA 24b 60 100 PPPGKPPGPPPPGGNKPOGPP PmF- 247 CClCCICCAGGAAAGCCACAAGGACCACCCCCACAAGG~GGC~ACAAACCTCA~G~lCCCCCA309 } 101 R 119 PPGKPOGPPPPGDKSPSPQ 310 ...CCTCCAGGALIGCCACAAGG~CCACCCCCACaAGGAGAC AA... GlCCCAAAGlCCCC~Al 3bO 120 140 120 ~ . SPPGKPPGPPPOGGNPPPGPP /SPPRLPPCPPPOGGhUPOGPP 361 TClCClCCAGGAAAGCCAC~AGG~CCACCCCCACAA6GA6GlAACCAGCCCCAAGGlCCCCCA 3b7 lCTCClCCAA6AAAGCCACAAGGACCACCCCCACAAGGAGGlAACC~ACCCCAAGGlCCCCC~ IlL1 111 161

162 PPGKPU6PPAOG6SKSOS4R 493 ... CCICCAGGAAAGCCACAAGGACCACCCGCACAAGGAGGCAGCAAGlCCCA~AGTGCCC~A la2 SPPGLPPGPPPPEGNNPaGPP SI3 T~ICClCCr\66AAAGCCACIIGG~CCACCCCAACAAG~AGGCAACAAlCClCAA~ClCCCCC~b1J Id3 223

PGkPOCPPPPEbh~PObPP' 75b LClCClCCIIGGAAAGCCAC~A'G~CC~CCC'AAC~~6~AGGCA~CAAlCCTC~~~~~CCC~C~17VI 2b3 , PPIG6nP"PPaLPPI~YP~"i~ 799 CCICCAGCAGGAG(ICLl~CCCCAGCA6CClCAGGCA~ClCClGClGGACAGCCCC4~G~ICCA,Ib1 280 1::QPPPGbIPSIPPI 861 CC1C6CCClCClCAAGGG6GC~G~lCllCC~6ACClCCCC~6lCALICClCCCCAblilTCl 914 FIGURE2.-Nucleotide and decoded amino acid sequences from the third exons of (A) PRBl' and (B) PRBIMand comparison to the PmF, PmS and DEAEII-2 protein sequences determined by KAUFFMANet al. (1982 and 1986, respectively). Lower case letters represent nucleotides in the intron preceding exon 3. Upper case letters represent nucleotides in exon 3. The translated portion of exon 3 is boxed, and the decoded amino acid sequence of this region is presented above the corresponding nucleotide sequence. Gaps have been introduced into the nucleotide sequences in order topermit alignment of tandem repeats. The stop codon is indicated by an asterisk. Potential proteolytic cleavage sites are indicated by vertical arrows. Each peptide potentially produced by proteolytic cleavage is included in a separate box. The portions of the decoded amino acid sequence which are most similar to the PmF, PmS and Pe protein sequences are indicated by brackets to the right of the sequence. The Pe protein was assigned based on its electrophoretic identity to the DEAEII-2 peptide. In the positions where the decoded amino acid sequences and theprotein sequences determined by other investigators differ, the amino acid present in the protein sequence is presented circled above the corresponding decoded amino acid.

Figure 2A illustrates the relevance of this assump- The second region in PRBIM encoding a PmF-like tion. Only the two proteolytic cleavage sites indicated peptide extends from amino acid residues 120- 18 1, in PRBl in Figure 2A contain the structure arginine- but it contains three amino acids differing from the serine-x-arginine-serine. Thus the PmF andPmS pro- PmF protein sequence. The first, at residue 151 in teins are readily accounted for as being the products PRBI', is a glutamic acid in the decoded sequencein of a single locus specifying a precursor protein which place of a proline presentin the sequence determined is then cleaved by aprotease that recognizes such by KAUFFMANet al. (1982). The second, at residue monobasic sites. 156, is an arginine in place of a lysine. The third The decodedamino acid sequence of the allele difference, at amino acid position 18 1, is a glutamine PRBl is presented in Figure 2B. Although this allele in place of an arginine. Arginine 18is 1 in the position is associated with the PmF-PmS- phenotype, it also associated with proteolytic cleavage, and its absence contains two regions closely relatedto the protein in PRBIMis, we suggest, again consistent with a PmF- sequence for PmF (KAUFFMANet al. 1982). The first phenotype, which this allele produces. region occurs, as it does in the allele PRBI', from The PRBIM allele also contains aregion with a amino acid positions 59 to 1 19. However, aminoacid decoded amino acid sequence identical to that of the residue 116 in PRBl is glutamine, whereas it is PmS protein sequence determined by KAUFFMANet arginine in PRBIS and in the protein sequence deter- al. (1986) except for a single amino acid difference; mined by KAUFFMANet al. (1982). The loss of this the decoded aminoacid sequence contains an alanine arginine residue destroys the arginine-serine-x-argi- at aminoacid position 225, whereas a serine occursat nine-serine motif and, we suggest, therebygreatly thecorresponding position in the PmS protein se- decreases or abolishes the proteolytic cleavage neces- quence. This amino acid substitution would not be sary to generate the PmF+ phenotype. expected to result in a PmS- phenotype. However, Loci Controlling Salivary PRPs 259 the other substitutions in PRBI', at amino acid posi- tions 1 16 and181, alter two potential proteolytic processing sitesso that thefinal product of the PRBl allele is a peptide so much larger than PmS that it would be classified as a PmS- phenotype. The Pe proteinis encoded by the PRBZ locus: The Pe protein polymorphism has beendescribed in terms of one expressed (Pe') and one unexpressed (Pe-) autosomal allele (AZENand Yu 1984a). AZEN et al. (1987) showed that theprotein DEAEII-2 (KAUFFMAN and KELLER 1979) has identical electrophoretic prop erties to the Pe protein and proposed that DEAEII-2 and Pe are identical. BENNICK(1987) has reported that a decoded cDNA sequence derived from PRBl (MAEDAet al. 1985), contains a region identical to the partial sequence of DEAEII-2.The region of identity includes anamino terminal portion encoded by exons 1 and 2 plus residues 1-45 from exon 3 (Figure 2). A 1 LI1L 1 MI1S potential proteolytic cleavage siteoccurs at amino acid position 58, suggesting that the complete sequence of q-@ DEAEII-2 will extend to this site as illustrated in Figure 1. Thus, thePRBl locus encodes the Pe protein lLllM1LIlS lLIlM lL/lM as well as the PmS and PmF proteins. The individual from whom the PRBIM and PRBlsalleles were cloned " ++ " " FIGURE3.-Comparison of the segregation of PRBl length var- has a Pe+ phenotype, which is consistentwith this iants with PmF and PmS protein phenotypes in one family. (A) A assignment. Southern blot of EcoRldigested DNA samples hybridized to the Associations between DNA length variantsof the 980 bp Hinfl fragment from PRBl. Three alleles of PRBI, giving PRBZ locus and PmF plus PmS protein phenotypes: rise to bands labeled as 1 L, 1 M or 1 S, are segregatingin this family. Studiesof the inheritance of the basicPRPs had The unmarked bands correspond to other PRP loci. (B) Pedigree of the family in Figure 3A.The PRBI alleles present in each revealed a strongcorrelation between the presence or individual are indicated above the circle or square. The PmF and absence ofthe PmS and PmF proteins, suggesting that PmS protein phenotypes areindicated below the circle or square. thesetwo proteins are inherited as a unit, with PmF+S+ and PmF-S- being the most common phe- TABLE 1 notypes. Occasionally the phenotype PmF+S- is seen, Comparison of PmF and PmS protein phenotypes with PRBl but the PmF-S+ phenotype has not been observed length variants in 24 unrelated individuals (AZENand DENNISTON1980). The above comparisons Protein phenotype of the decoded amino acid sequences with the deter- PRBI No. of mined protein sequences for PmS and PmF suggest allele(sr PrnF PmS individuals that some alleles at the PRBl locus encode the PmF 1 L/1 L - - 3 and PmS proteins, but that others do not. We have lL/lM - - 6 therefore compared the PmF and PmS protein phe- lL/lM + - 1 notypes with the presence of specific alleles of the lL/lS + + 1 lM/lM - - 7 PRBl locus. 1 M/1 M + - 1 The segregation of three allelic length variants of lM/lS + + 5 PRBl (PRBIS,PRBIM and PRBI ") were compared a 1L = PRBIL,1M = PRBIM,1s = PRBI'. with the segregation of the PmS and PmF protein phenotypes in three families (20 individuals). The lele, and no individual who carries PRBl' was typed comparison reveals a complete correlation between as either PmF- or PmS-. No other PRP locus displays the presence of the PRBls allele and the PmF+PmS+ this absolute correlation between PmS and PmF pro- phenotype. An example is presented in Figure 3. In tein phenotype and segregation of one of its length this family, and in two others examined, we found a variants. complete correlation between the presence of the Two individuals in Table 1 were typed electropho- PRBl allele and the PmF+PmS+phenotype. retically as PmF+PmS-. Both these of individuals carry Table 1 presents the results from a similar compar- the PRBIM allele. However, 13 other individuals car- ison of 24 unrelated individuals. All six individuals rying at least one copy of the PRBIM allele have a with the PmF+PmS+ phenotype carry the PRBls al- PmF-PmS- protein phenotype. This result suggests 260 K. M. Lyons et al. that the PRBIM length variant represents at least two PRB2 alleles, one that encodes the PmF+PmS- phenotype, andone that encodes a PmF-PmS- phenotype.A comparison between protein phenotypes and the seg- regation of PRBI alleles in a family of seven segregat- ing the PmF+PmS- phenotype (data notshown) shows a complete correlation between the PmF+PmS- phe- notype and the presence of the PRBIM allele. Segre- gation analysis of PmF plus PmS protein phenotypes and PRBl length variants indicates that the PRBIM allele presented in Figure 2B encodes a PmF-PmS- phenotype in agreement with the argument already presented.A PRBIM allele encodinga PmF+PmS- phenotype could potentially be generated by a single nucleotide change leading to an arginine at position 1 16in place of the glutamine encoded by the allele in Figure 2B. This single substitution would be expected to generate a functional proteolytic cleavage site, re- sulting in a PmF+PmS- phenotype. The PRBIL allele is associated with a PmF-PmS- phenotype in these PRPPPGGRPSRPPP comparisons. The DNA sequence of this allele has not 694 CC~CGCCCICLTCAAGGGGGCAGACCI~CCAGMCCTCCCCMG GACAGCCTCCCCLGTCATCT 750 been determined, but we suspect that the PmF-PmS- FIGURE 4.-Nucleotide and decoded aminoacid sequences from the third exon of a truncated cDNA clone derived from PRB2L phenotype encoded by this allele is a result of altera- (MAEDA et al. 1985). Coding sequences and potential proteolytic tions at proteolytic cleavage sites that reduce or abol- cleavage sites are identified as described in the legends to Figures ish post-translational cleavage. There appears to be 1 and 3. Potential N-linked glycosylation sites are identified by no a priori reason for theabsence of any alleles encod- triangles. The region spanned by amino acids 196 through 251 is ing a PmF-PmS+ phenotype; a more extensive search identical to the P-H peptide (SAITOH,ISEMURA and SANADA, 1983a) (see text). might reveal their existence. For example,a single change at position 18 1 in that results in the PRBIM amino acid differences which would abolish this pro- formation of afunctional proteolytic cleavage site teolytic cleavage site, and so lead to the PmS+ phe- would generate a PmF-PmS+ phenotype. notype. The PmS protein may also be encoded by the The Ps protein is encoded by the PRB2 locus: PRB2 locus: We have examined the decoded amino Amino acid sequence data are not available for the Ps acid sequences from other PRP loci for homology to proteins, the proposed products of the Ps locus (AZEN the PmS and PmF protein sequences. The decoded and DENNISTON1980), although the aminoacid com- amino acid sequence of an incomplete cDNA copy of positions of the Psl and Ps2allelic proteins, deter- the PRB2 locus (MAEDAet al. 1985) is presented in mined by GOODMANet al. (1985), are very similar to Figure 4. The three regions in PRB2L that contain those of the PmF and PmS proteins and are distinct thesequence arginine-serine-x-arginine-serineand from those of the acidic and glycosylated PRPs. In thus serve as potential proteolytic cleavage sites are addition, AZEN andDENNISTON (1980) have demon- indicated in the figure. Residues 134-251 in PRB2 strated that the Ps protein can be stained faintly for have only two differences from the sequence for PmS carbohydrate with the periodic acid-Schiff reagent. determined by KAUFFMANet al. (1986). The first Comparison of thedecoded amino acid sequences difference, at residue 185, results in the substitution fromthe PRBl and PRB2 loci (Figures2 and4) of a proline forthe alanine presentin the PmS protein demonstrates that they encode peptides with amino sequence. The second substitution, at residue 192, acid sequences very similar to each other. However, results in replacement of a glutamine present in the the decoded amino acid sequence of PRB2 contains PmS protein sequence with an arginine residue. This several asparagine-lysine-serine tripeptides, which substitution is likely to result in production of a func- conform tothe amino acid sequence asparagine-x- tional proteolytic cleavage site within the region of serine recognized as a site for N-linked sugar addition, identity with the PmS protein because it places a basic whereas the decoded amino acid sequence of PRBl residue three positions in front of a potential proteo- does not. The decoded amino acid sequence of the lytic cleavage site. We conclude thatthe allele of PRB2 truncated cDNA clone from PRBB contains three presented in Figure 4 probably doesnot encode a potential proteolytic cleavage sites, which suggests PmS protein because of this alteration, butwe suggest that up to four peptides are encoded by PRB2. Two that other unsequenced alleles of PRB2 might have of these peptides contain N-linked glycosylation sites. Loci Controlling Salivary PRPs 26 1

Amino acids10 through 133of the decoded sequence PRB4 ofPRBZL comprise a glycosylated peptide with an amino acid composition nearly identicalto that deter- mined for thePs proteins (GOODMANet al. 1985). The individual from whom this PRB2 clone was isolated produces the Psl protein. Consequently, we suggest that this region, corresponding to amino acid residues

10 through 133,encodes the glycosylated Psl protein. .. PHPGKPESRPPPGGHOSOGPP Aminoacid residues 196 through 25 1 comprise a 316 CClCAlCCAGGAAAGCCAGAIAGCCGICCCCCACAAtiGAG~ACACCAGl~CCAAG~lCCCCCA1 37d 103 .I 123 peptide that is identical to the P-H peptide sequenced PlPGhPEGPPPPGGNOSOGlP 379 CClAClCCAGGAAAGCCGGAAGGACCACCCCCACAAGGAGGAAACCAGlCCCAAGGlA~CCCA 441 124 144 by SAITOH,ISEMURA and SANADA(1983a). The P-H PPPGKPEGR[PP~GGTP~~GPP peptide has not been correlated with any of the poly- 442 CClCClCCAGGAAAGCCAGAAGGACGICCCCCACAAGGAGGCAICCAG~CCCAAGti~CCCCCA SGC morphic PRPs described in genetic studies (AZEN 505 CClCAlCCAGGAAAGCCAGAAAGACCACCCCCACAAGGAGGCAICCAGlCCCACCG~CCCCCA 507 166 - 186 1988). PPPGKPERPPPQGGiOSPGPr 568 CClCClCCAGGAAAGCCAGAAAtiACCACCCCCACAAGGAGGC~ACCAGlCCCA~GGlCCCCCA 630 The segregation of PRB2 length variants and Ps PHPGKPEGPPPPtGNISRSARv proteins could not be compared because there were 631CTCAlC C AGGAAAG CC A G AAC~AC~ACCCCC~CAGGAAGG~CA.GrCCCGIAtirti...I 693 1 1208 no PRB2 length variants in our families. Furthermore, SPPGKPOGPPOPEGNKPPtiPP 694 1ClCClCCAGtiAAAGCCICAAGGACCACCCCAACIAGAAGGCAACIAGCClCAAGG~CCCCCA 756 only one individual out of our random sample of 25 229 PPGKPPGPPPPGGNPOUPOAP 7 carried an allelic length variant for PRBP. Thus, al- 757 CC~~C~GGAAAGCCACAIGGCCCACCCCCACCAGGAGGCAA~CCCCAGCAGCC~CAG~CACC~ 1250 270 1250 I }a PAGYPPGPPPPPOGGRPPRP1 though our data do not allow detection of potential I 882 correlations between the Ps protein phenotype and J the PRB2 gene, the nearly identical amino acid com- FIGURE5.-Nucleotide and decoded amino acid sequences from positions of Ps and the PRBB-encoded peptide, plus the third exons of PRL34L compared to the P-D protein sequence the presence of potential glycosylationsites in the determined by SAITOH,ISEMURA and SANADA(1983b). The P-D PRB2-encoded peptide provide strong support for our peptide is electrophoretically identical to the Po protein (see text). assignment. Intronic sequences,coding sequences,potential proteolytic cleavage sites, and N-linked glycosylation sites are identified as describedin The Po protein is encoded by the PRB4 locus: the legends to Figures 1 and 4. Regions of identity with the CD-IIg Previous genetic studies described one expressed (Po') peptide sequence determined by SHIMOMURA,KANAI and SANADA and oneunexpressed (Po-) allele at the Po locus (AZEN (1983) are enclosed in small square brackets. and Yu 1984b). A basic proline-rich peptide, P-D, sequenced by SAITOH,ISEMURA and SANADA(1983b), alleles are cloned have a Po+ phenotype. We were has been suggested to be a product of the Po locus unable to correlate Po protein phenotypes with the based on the identical mobilities of the P-D and Po presence of length variants at PRB4 because of the proteins in several electrophoretic gel systems (AZEN small number of individuals typedfor the Po protein. 1988). Figure 5 presents the decoded amino acid However, no other PRP locus contains a region of sequence for thethird exon of the allele and complete or even near identity to the P-D peptide. compares it to the sequence of the P-D protein. They The PC proteinis probably a proteolytic cleavage are identical from amino acid residues 208-277. Pro- product of a PRB locus: The assignments of the basic teolytic cleavage must occur at amino acid position polymorphic PRPs to the loci indicated above dem- 207 in order toresult in production of the Po protein. onstrate that all of these proteins, previously consid- This site is reasonable since it contains the structure ered to be products of separate loci, represent proteo- arginine-serine-xarginine-serine. The individual from lytic cleavage products of larger precursor proteins. whom this allele of PRB4 was cloned does produce The decoded amino acid sequences presented in Fig- the Po protein, which further supports thisassign- ures 2-5 demonstrate that thereare likely to be ment. We therefore conclude that the Po protein is additional potential proteolytic cleavage products en- encoded by PRB4 and results from the proteolytic coded by these alleles. We anticipate that the unas- cleavage of a larger precursor. signed basic protein, PC, will be encoded by one of DNAsequences have been determinedfor two these fragments. other alleles of PRB4 (LYONS,STEIN and SMITHIES Reexamination of the inheritance of the heavily 1988). One of these alleles encodes a peptide identical glycosylated PRPs: Genetic studies led to the descrip- to the Po protein. The other encodes a peptide with tion of three loci, G1, CONI and CON2, which encode a single difference: an alanine occurs at amino acid heavily glycosylated PRPs (AZEN, HURLEY andDEN- position 239 rather than the proline found in the P- NISTON 1979; AZENand YU 1984b). D protein. This substitution is unlikely to result in a The G1 protein is encoded by the PRB3 locus: Po- phenotype, and we are therefore not surprised Polymorphisms for the glycosylated protein GI were that all three individuals from whom these PRB4 described in terms of four common productive alleles 262 K. M. Lyons et al.

A. PRB3 B. PRB3'

1 63

64

127

1 PO

2S3 GI3 316 378

379 ill

442 SO4

505 167

161 630 631 -1

FIGURE6.-Nucleotide and decoded amino acid sequences from the third exon of PRB? (A) and PRB3' (B). lntronic sequences, coding sequences, and Winkedglycosylation sites are identified as described in the legends to Figures 1 and 3. Regions of identity with the CD-IIf peptide sequence determined by SHIMOMURA,KANAI and SANADA(1983) are enclosed in small square brackets.

and onenull allele. The products of the four produc- PRB3M,and PRB3' alleles were confirmed by South- tive alleles differ in their molecular weights due to ern blot analysis using other restriction enzymes (data differences in the length of their procein backbones not shown). The results of the comparison in all 24 (AZEN, HURLEY andDENNISTON 1979). Although a unrelated individuals examined are presented in Ta- complete protein sequence has not been determined ble 2. Acomplete correlation between GI protein for the GI protein, a partial amino acid sequence of phenotype and PRB3 length variants was found. No the peptide, CD-IIf,isolated from the majorglycopep- other locus displays acorrelation with G1 protein tide found in saliva and thought tobe the GI protein, phenotype. Furthermore,the relative sizesof the has been determinedby SHIMOMURA,KANAI and SAN- PRB3 DNA length variants (PRB3vL > L > M > S) ADA (1983). The decoded amino acid sequences for correspond to therelative sizes ofthe G1 proteins (GI4 the two most common alleles atthe PRB3 locus, > 1 > 2 > 3) determined by AZEN,HURLEY and PRB3Land PRB3', are presented in Figure 6 and the DENNISTON(1979). The segregation of G1 protein regions identical to CDII-f are enclosed by square phenotypes and PRB3 alleles was also examined in brackets. The CD-IIf sequence is encoded six times in three families segregating PRB3 length variants: a PRB3L and three times in PRB3'. No other PRP locus completecorrelation was found.This analysis thus encodes this peptide sequence. The decoded amino indicates that the decoded amino acid sequences pre- acid sequences of PRB3L and PRB3S differ from those sented in Figures 6A and 6B for PRB3L and PRB3' of the other PRPloci in that there are nosites within represent the GI1 and GI3 proteins, respectively. exon 3 which contain the general features described AZEN, HURLEYand DENNISTON(1 979) described a abovefor monobasic proteolytic cleavage sites.We null allele, GIo, associated with the GI protein poly- therefore conclude that PRB3 encodes the G1 protein morphism. We have examinedthe DNA froma and no other PRP. GloGlo individual. No alterations in the PRB3 or any Associations between PRB3 length variants and other PRP locus were detected in a Southern blot of GI protein phenotype: Analysis of the segregation in DNA from this individual (data notshown). Thus, the families and in unrelated individuals of alleles at the molecular basis for this null phenotype does not ap- PRB3 locus provides strong support forthe conclusion pear to involve a large DNA rearrangement such as a that GI is encoded by PRB3. Figure 7 compares PRB3 deletion. Its basis remains obscure. length variants and G1 protein phenotypes in eight The CON1 and CON2 proteins are products of unrelated individuals. The assignments of the PRB3L, the PRB4 locus: The inheritance of the remaining Controlling SalivaryLoci Controlling PRPs 263 ABC DEF GH ments of the CONl and CON2 proteins to two sepa- rate loci, each encoding one productive and one non- productive allele (AZENand Yu 1984b). The data can be reconciled if there are actuallytwo productive alleles, with gene frequencies in whites of 0.396 and 0.034, corresponding to the CONI+and CON2+ pro- "r. tein phenotypes, respectively (AZEN and YU 1984b), and one "null" allele with a gene frequency in whites of 0.570. A reexamination of the inheritance of the CONl and CON2 proteins in 24 of the families (in- cluding 132 individualsoriginally studied by AZEN

PRB3 alleles: and Yu 1984b) show no inconsistencies with this pro- GI Protein phenotype: posal. The "null" allele, which still mustbe proposed, FIGURE7.-Comparison of the segregation of PRB3 DNA may represent a functional allele(s)whose protein length variants with GI protein phenotypes in eight unrelated indi- products have not yet been identified electrophoreti- viduals. Southern blot of EcoRIdigested DNA samples hybridized cally as a consequence of loss or gain of proteolytic to the 980 bp Hinff fragment from the third exon of PRBI. The cleavage sites. positions of the bands derived from the PRB3 alleles are marked. The types of PRB3 alleles and GI protein phenotypes for each We have compared CON 1 and CON2 protein phe- individual are indicated below each lane. notypeswith length variants at the PRB4 locusin families in order to further investigate this proposal. 2 TABLE Only one family (seven members) in our sample was Comparison of GI protein phenotype withPRB3 length variants polymorphic for PRB4 length variants, and inthis in 23 unrelated individuals family, a complete correlation was found between the presence of the PRB4M allele and the presence of the PRBS GI protein No. of allele(s)" phenotype individuals CON2 protein (data not shown). We were unable to 3L/3L 1+ 12 investigate the correlation of PRB4M and the CON2 3L/3M 1 +2+ 1 protein in our collection of unrelated individuals be- 3L/3S 1+3+ 8 cause none of them carried the PRB4M allele. Five of 3L/3VL 1 +4+ 1 the 18 unrelated individuals examined did, however, 3s/3s 3+ 1 have a CON2+ phenotype, and all of them carried at a 3L = PRB3L, 3M = PRBjrM,3s = PRB3', 3VL = PRB3vL. least one copy of the PRB4' allele. This suggests that glycosylated polymorphic PRPs, CONl and CON2, the CON2 protein may also be encoded by at least have each beendescribed as beingdetermined by one some alleles of the PRB4' type. CONl protein phe- productive and one nonproductive allele (AZEN and notypes could not be compared with the presence of Yu 1984b). TheCONl and CON2 proteins stain length variants because all individuals in our sample intensely with concanavalin A, suggesting that they produced CON 1. The available evidence thus indi- are heavily glycosylated. Although no data on amino cates that the CON 1 and CON2 proteins are encoded acid composition or sequence are available for these by PRB4. proteins, SHIMOMURA,KANAI and SANADA(1983) de- Conclusions: The proposed assignmentsof the termined the amino acid sequence of a glycopeptide, basic and glycosylated PRPs to the fourPRP loci that CD-IIg, which was isolated from a heavily glycosylated we have considered in detail are summarized in Figure PRP. The decoded amino acidsequence of PRB4 1. The assignments proposed by MAEDA(1985) and (Figure 5) contains this peptide sequence (enclosedby confirmed by DNA studies (AZENet al. 1987) for the square brackets). This sequence is found only in alleles acidic PRPs are also included in the figure. Together at the PRB4 locus, indicating that CD-IIg is derived these results demonstrate that 12 of the 13 polymor- from a heavilyglycosylated PRP distinct from the phic PRPs so far described, which were assigned to other heavily glycosylated protein, G1. As discussed 12 distinct loci in the earlier genetic studies, can now above, the GI protein appears to be the only product be assigned to the six PRP genes characterized in the of the PRB3 locus, suggesting that the CONl and DNA studies. The inheritance of the PRP polymor- CON2 proteins must be encoded by a different locus. phisms described at the protein level in the earlier The only other PRP locuscapable of encoding a studieshave been correlated with the presence of heavily glycosylatedPRP is PRB4. These observations specific DNA length variants in most cases. suggest that PRB4 encodes CONl and CON2. As originallysuggested by MAEDA et al. (1985) Our present proposal that the CONl and CON2 following their characterization of cDNAs encoding proteins are in fact the products of a single locus, PRPs, many of these proteins are actually proteolytic PRB4, requires reconciliation with the earlier assign- cleavage products of larger precursors rather than the 264 K. M. Lyons et al. products of separate loci. An example of this from proteolytic cleavagesite within a sequence corre- our present work is found in the case of the PmS, sponding to a known longer product, such as occurs PmF and Pe proteins, which we suggest are derived in the PmS-like region in PRB2L (Figure 4), can also by proteolytic cleavage from a single larger precursor lead to a null phenotype. We suspect that the large encoded by PRBl. Our assignment of PmS, PmF,and differences in size, and in some cases in amino acid Pe to a single transcript from PRBl predicts that composition, among the peptides generated by pro- strong associations should be evident among all three teolytic cleavage from a single transcript prevented of these proteins. A strong association between the the detection ofassociations between some of the PmF and PmS proteins has already been noted by cleavage products in earlier genetic studies. Incom- AZENand DENNISTON(1980), but no strong associa- plete proteolyticprocessing at some or allof the tions were noted between the Pe protein and either potential proteolytic cleavagesites discussed above PmF or PmS individually(AZEN and Yu 1984a). How- would also be likely to hinder the recognition that ever, a reexamination of the data derived from 108 several proteins of very different lengths can be de- individuals whose Pe, PmF and PmS protein pheno- rived from the same allele. types are known confirms our expectation-all indi- In summary, we havebeen able to construct an viduals with a Pe- phenotype are also phenotypically essentially complete outline of the genetic and phe- PmF-PmS- even though most that en- notypic complexities of the PRP system. This system code a PmF-PmS- phenotype also encode a Pe+ phe- is remarkable in the way in which it produces a large notype. This association of the Pe- phenotype with number of proteins from only a few loci. the PmF-PmS- phenotype can be explained by the We thank LINDAM. SABATINIfor providing the Southern blot occurrence of a single amino acid substitution that in Figure 7. This work was supported by National Institutes of inhibits proteolytic cleavage after residue 58 in an Health Grants GM20069 to O.S. and DE0 3658-23 to E.A. allele otherwise identical to PRBl (Figure 2B). We have concluded that the Po and CON proteins LITERATURECITED are proteolytic cleavage products of a precursor en- coded by PRB4. However, no strong association was ANDERSON, L. C., D. KAUFFMANL. and P. J. KELLER, 1982 Identification of the Pm and PmS human parotid sali- noted between the Po+ and CON+ protein phenotypes vary proteins as basic proline-rich proteins. Biochem. Genet. in previous genetic studies (AZENand Yu 1984a). A 20 1131-1137. reexamination of the data derived from 108 individ- AZEN, E. A., 1977 Salivary peroxidase (SAPX): genetic modifica- ualswhose Po and CON protein phenotypes are tion and relationship to the proline-rich (Pr) and acidic (Pa) proteins. Biochem. Genet. 15: 9-29. knownstill revealed no clearassociations between AZEN, E. A., 1988 Genetic protein polymorphisms of human these proteins. We suspect that the low level of poly- saliva. In ClinicalChemistry of HumanSaliva, Edited by J. morphism (onlysix individuals havea Po- phenotype) TENOVUO,CRC Press, Inc., Cleveland, Ohio (in press). in the sample available to us precludes the identifica- AZEN,E. A,, and C. L. DENNISTON,1974 Genetic polymorphism tion of a strong association, and that an association of human salivary proline-rich proteins: further genetic analy- sis. Biochem. Genet. 12: 109-120. would be evident in a larger sample. AZEN, E. A., and C. L. DENNISTON,1980 Polymorphism ofPs We have concluded that the Ps proteins are proteo- (parotid size variant) and detection of a protein (PmS) related lyticcleavage products of precursors encoded by to the Pm (parotid middle band) system with genetic linkage PRB2. The PC protein could not beassigned to a of Ps and Pm to G1, Db and Pr genetic determinants. Biochem. Genet. 18: 483-50 1. specific locus, but we expect that it will also prove to AZEN,E. A,, and C. L. DENNISTON,1981 Genetic polymorphism be a proteolytic cleavageproduct of a large precursor. of PIF (parotid isoelectric focusing variant) proteins with link- The existence of null alleles at 1 1of the 13 loci was age to the PPP (parotid proline-rich protein) gene complex. an unusual feature of the original genetic description Biochem. Genet. 19475-785. of the PRP system. The sequence of the PRBl allele AZEN, E. A., and N. MAEDA,1988 Molecular genetics of human salivary proteins and their polymorphisms. In: Advancesin (Figure 2), which encodes a PmS-PmF- phenotype, Human Genetics, Plenum Press, New York, New York (inpress). appears to encode a protein that is much larger than AZEN,E. A,, and F. G. OPPENHEIM,1973 Genetic polymorphism either PmS or PmF because of amino acid substitu- of proline-rich human salivary proteins. Science 180: 1067- tions that remove proteolytic cleavage sites. Our data 1069. AZEN,E. A,, and P. L. Yu, 1984a Genetic polymorphism of Pe suggest that manyof the nullalleles proposed for and Po salivary proteins with probable linkage of their genes other PRP loci are also due to losses of proteolytic to the salivary protein gene complex (SPC). Biochem. Genet. cleavagesites accompanied by the production of 22: 1065-1080. fewer, but much larger PRPs, from a single transcript. AZEN,E. A.,and P.L. Yu, 1984b Genetic polymorphism ofCON1 These larger products would not be easily resolved in and CON2 salivary proteins detected by immunologic and concanavalin A reactions on nitrocellulose with linkage of Con the electrophoretic systems used to type PRP poly- 1 and Con 2 genes to the SPC (salivary protein complex). morphisms, thus preventing their recognition as allelic Biochem. Genet. 22: 1-19. to the much smaller PRPs. Similarly, the gain of a AZEN,E. A., C. K. HURLEYand C. L. DENNISTON,1979 Genetic Loci Controlling Salivary PRPs 265

polymorphism of the major parotid salivary glycoprotein (GI) LYONS,K. M., J. H. STEINand 0. SMITHIES,1988 Length poly- with linkage to the genes for Pr, Db and Pa. Biochem. Genet. morphisms in human proline-rich protein genes generated by 17: 257-279. intragenic unequal crossing over. Genetics 120 267-278. AZEN,E. A., K. M. LYONS,T. MCGONIGAL,N. L. BARRETT,L. S. MAEDA,N., 1985 Inheritance of human salivary proline-rich pro- CLEMENTS,N. MAEDA,E. F. VANIN,D. M. CARLSONand 0. teins: a reinterpretationin terms of six lociforming two subfam- SMITHIES,1984 Clones from the human gene complex coding ilies. Biochem. Genet. 23: 455-464. for salivary proline-rich proteins. Proc. Natl. Acad. Sci. USA. MAEDA, N.,H.-S. KIM, A.E. AZEN and 0. SMITHIES, 81: 5561-5565. 1985 Differential RNA processingand post-transitional cleav- AZEN,E. A., H.-S. KIM, P. GOODMAN,S. FLYNNand N. MAEDA, ages in the proline-rich protein gene system. J. Biol. Chem. 1987 Alleles at thePRHZ locus coding for the human salivary 260: 11123-11130. proline-rich proteins (PRPs) Pa, Db, and PIF. Am. J. Hum. O'CONNELL,P., G. M. LATHROP,M. LAW,M. LEPPERT,Y. NAKA- Genet. 41: 1035-1047. MURA,M. HOFF, E. KUMLIN,W. THOMAS, T. ELSNER, L. BENNICK,A., 1987 Structural and genetic aspects of proline-rich BALLARD,P. GOODMAN, E.AZEN, J. E. SADLER,G. Y. LAI,J.- proteins. J. Dent. Res. 66: 457-461. M. LALOUELand R. WHITE,1987 A primary genetic linkage BOND,J. S., and P. E. BUTLER,1987 Intracellular Proteases. Annu. map for human . Genomics 1: 93-102. Rev. Biochem. 56 333-364. PONCZ,M., D. SOLOWIEJCZYK,B. HARPEL, Y. MORI,E. SCHWARTZ FRIEDMAN,R. D., A. D. MERRIIT and M. L. RIVAS,1975 Genetic and S. SURREY,1982 Construction of human gene libraries studies of human acidic salivary protein (Pa). Am. J. Hum. from small amounts of peripheral blood: analysis of @-like Genet. 27: 292-303. globin genes. Hemoglobin 6: 27-36. GOODMAN, P. A., P. L. Yu, E. A. AZENand R. C. KARN, 1985 The SAITOH,E., S. ISEMURAand K. SANADA, 1983a Further fraction- human salivary complex (SPC): a large block of related genes. ation of basic proline-rich peptides from human parotid saliva Am. J. Hum. Genet. 37: 785-797. and complete amino acid sequences of basic proline-rich pep- IKEMOTO,S., K. MINAGUCHI,K. SUZUKIand K. TOMITA, tide P-H. J. Biochem. 94 1991-1995. 1977 New genetic marker in humanparotid saliva (Pm). SAITOH,E., S. ISEMURAand K. SANADA,1983b Complete amino Science 197: 378-379. acid sequence of a basic proline-rich peptide, P-D, from human KARN,R. C., P. A. GOODMANand P. L. Yu, 1985 Description of parotid saliva. J. Biochem. 93: 495-502. a genetic polymorphism of a human proline-rich salivary pro- SCHWARTZ,T. W., 1986 The processing of peptide precursors. tein, PC, and its relationship to other proteins in the salivary FEBS Lett. 200 1-10. protein complex (SPC). Biochem. Genet. 23 37-5 1. SHIMOMURA,H., Y. KANAIand K. SANADA,1983 Amino acid KAUFFMAN,D. L., and P. J. KELLER,1979 The basic proline-rich sequences of gbcopeptides obtained from basic proline-rich protein in human parotid saliva from a single subject. Arch. glycoprotein of hhman parotid saliva. J. Biochem. 93: 857- Oral Biol. 24 249-256. 863. KAUFFMAN,D., R. WONG,A. BENNICKand P. KELLER,1982 Basic SOUTHERN,E. M., 1975 Detection of specific sequences among proline-rich proteins from human parotid saliva: complete co- DNA fragments separated by gel electrophoresis. J. Mol. Biol. valent structure of protein IB-9 and partial structure of protein 98: 503-5 17. IB-6, members of a polymorphic pair. Biochemistry 21: 6558- VANIN,E. F., P. S. HENTHORN,D. KIOUSSIS, F.GROSSVELD and 0. 6562. SMITHIES,1983 Unexpected relationships between four large KAUFFMAN,D., T. HOFMANN,A. BENNICKand P. KELLER, deletions in the human @-globingene cluster. Cell 35: 701- 1986 Basic proline-rich proteins from parotidsaliva: complete 709. covalent structures of proteins IB-1 and IB-6. Biochemistry 25: WAHL,G. M., M. STERNand G. R. STARK,1979 Efficient transfer 2387-2392. of large DNA fragments from agarose gels to diazobenzyloxy- KIM, H.-S., AND N. MAEDA, 1986 Structure of two HaeIII-type methyl-paper and rapid hybridization by using dextran sulfate. genes in the human salivary proline-rich protein multigene Proc. Natl. Acad. Sci. USA 76 3683-3687. family. J. Biol. Chem. 261: 6712-6718. Communicating editor: R. E. GANSCHOW