Copyright 0 1988 by the Genetics Society of America

Length Polymorphisms in Human Proline-Rich Generatedby Intragenic Unequal Crossing Over

Karen M. Lyons, James H. Stein and Oliver Smithies

Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin 53706 Manuscript received September 25, 1987 Revised copy accepted May 16, 1988

ABSTRACT Southern blot hybridization analysis of genomic DNAs from 44 unrelated individuals revealed extensive insertion/deletion polymorphisms within the BstNI-type loci (PRBl, PRBP, PRB3and PRBI) of the human proline-rich protein (PRP) multigene family. Ten length variants were cloned, including alleles at each of the four PRB loci, and in every case the region of length difference was localized to the tandemly repetitious third exon. DNA sequences covering the region of length variation were determined for seven of the alleles. The data indicate (1) that the PRB loci can be divided into two subtypes, PRBl plus PRBP, and PRB3 plus PRBI, and (2) that the length differences result from different numbers of tandem repeats in the third exons. Variant were also identified with different numbers of PRP loci resulting from homologous but unequal exchange between the PRBl and PRBP loci. The overall data are compatible with the observed length variants having been generated via homologous but unequal intragenic exchange. The results also indicate that these crossover events are sensitive to the amount of homology shared between the interacting DNA strands. Allelic length variants have arisen independently at least 20 times at the PRB loci, but only one has been detected at PRHa locus. Comparison of the detailed structures of the repetitious regions in PRB and PRH loci shows that the repeats in PRB genes are very similar to each other in sequence and in length. The PRH genes contain fewer repeats, which differ considerably in their individual lengths. These differences suggest that the larger number of length variants in PRB genes is related to their greater ease of homologous but unequal pairing compared to PRH genes.

TUDIES of the evolution of families of repetitive The same types of recombinational events that oc- S DNAs have revealed that these sequences show cur inyeast between repeated sequences mayalso greater homogeneity within speciesthan between spe- occurwithin genes that have internally repetitious cies. SMITH(1 976) constructed computer models that structures. An increasing volume of literature docu- showed repeated unequal sister chromatid exchange ments the existence of such genes (see, for example, can generate and maintain this homogeneity. Other ALLISONet al. 1985; MUSKAVITCHand HOGNESS1982; investigators (SCHERERand DAVIS 1980; NAGYLAKI SHIN et al. 1985; WARREN,COROTTO and WOLBER and PETES 1982) have considered the role of 1986). conversions in similar processes of homogenization. The human proline-rich protein (PRP) multigene In addition to these model-building studies, recip- family (reviewed by AZEN 1988) provides a particu- rocal recombination and gene conversionbetween larly interesting opportunity to examine the types of equally and unequally paired repeated sequences have recombinational events that can occur within genes been directly demonstrated in yeast. Unequal sister with internally repetitious structures. This family con- chromatid exchange has been detected in the rDNA sists of six tandemly linked genes, which encode the repeats during mitosis and meiosis (SZOSTAKand WU complex array of PRPs found in human saliva (AZEN 1980; PETS 1980). Experiments involving dispersed et al. 1984; MAEDAet al. 1985; LYONS,AZEN, GOOD- repeats on the same have demonstrated MAN and SMITHIES 1988). that homologous but unequal recombination occurs The six PRP genes are arranged on chromosome in both meiosis (KLEINand PETES198 1; KLEIN1984; 12p (MAMULAet al. 1985) in the order5’ PRB2-PRBI- JACKSONand FINK1985; JINKS-ROBERTSON and PETES PRB4-PRH2-PRB3-PRHI 3’ (H.-S. KIM, unpublished 1986) andmitosis (JACKSON and FINK 198 1; ROEDER, data); the cluster spans a physical distance of approx- SMITHand LAMBIE1984). Gene conversions between imately 600 kbwith approximately 70 to 180 kb repeated genes have also been reported (AMSTUTZet separating adjacent genes. The six genes are of two al. 1985;JINKS-ROBERTSON and PETES1985; JACKSON types that canbe distinguished by their ability to and FINK 1981). hybridize to a probe derived from PRBl (AZENet al.

Genetics 120: 267-278 (September, 1988) 268 K. M. Lyons, J. H. Stein and 0. Smithies 1984; MAEDA1985). Four of the genes (PRBZ, PRBB, 123456 PRB3 and PRBI) hybridize strongly to the probe and T " contain multiple BstNI restriction sites (BstNI-type or PRB genes). The remaining twogenes (PRHI and PRH2) do not hybridize as strongly to the probe and contain multiple Hue111 restriction sites (HaeIII-type or PRH genes). Nucleotide sequence analyses (MAEDA et al. 1985; KIM and MAEDA1986) have demonstrated that the PRPs contain a series of proline-rich tandem repeats whichvary in length from 16 to 21 amino acids (48-63 bp). Initial molecular studies of the human PRP multi- gene family (AZENet al. 1984) demonstrated that four of the six genesshow frequent insertion/deletion FIGURE 1.-Southern blot hybridilation analysis of PRP gene length polymorphisms. We haveextended these stud- patterns in six unrelated individuals. DNAs were digested with ies by analyzing at the DNA sequence level ten alleles EcoRI. The 980-bpHinfl fragment from pPRPII2.2RP was used as from these four loci. This analysis reveals that the a probe. Assignment of length variants to specific loci was per- length differences are due to changes in the numbers formed as described in the text. Alleles of PRBl are identified by white squares. Alleles of PRBZ, PRB3 and PRB4 are identified by of the proline-rich tandem repeats. We consider sev- white arrows placed on the center, right, and left, respectively, of eral models for the generation of these length differ- the hybridizing band. Bands corresponding to PRHl and PRH2 are ences and find that intragenic homologous but un- identified by white dots (shown only in lane 2). The lengths of equal exchange provides the most simple explanation polymorphic alleles are indicated in kb. among those examined. We also find that thenumber TABLE 1 of PRP loci has been altered in some individuals by unequal exchanges between different loci. Incidence of polymorphic alleles in 44 unrelated individuals

ObservedfEx MATERIALS AND METHODS No. oFted" No. of individuals DNA preparation: High molecular weightDNA was Size (kb) of alleles in heterozygous prepared from peripheral blood leukocytes of 44 unrelated Locus EcoRI fragment population at locus

~ individuals, including three individuals examined in a pre- PRB44 1 6.2 18123.1 vious study (AZENet al. 1984) by the method of PONCZet 6.0 36 al. (1982). 7 5.8 Southernblot hybridizations: Human genomic DNA 5.6 1 (5.0 pg per lane) was digested with EcoRI, electrophoresed in 0.8% agarose gels, and transferred to nitrocellulose ac- PRBZ 4.6 2 514.8 cording to the method of SOUTHERN(1 975) with modifica- 3.9 83 tions as described by WAHL,STERN and STARK(1979). 3.7 2 Filters were hybridized to a nick-translated 980 bp Hinfl 3.3 1 fragment from the plasmid pPRPII2.2RP. which contains PRB3 4.4 1 1811 6.6 the 2.2-kbEcoRIIPstI fragment from PRPl (AZENet al. 4.3 67 1984) subcloned into PAT153. The 980 bp Hinfl fragment 4.2 2 used as the probe contains the tandemly repetitious exon 3 4.0 18 of PRBI. Hybridization conditions were as described in VANINet al. (1 983). PRB4 3.4 6 1011 1.7 Construction and screening of phage libraries: Insert 3.3 4 DNA was prepared by digesting human genomic DNA to 75 3.2 completion with BamHI or Hind111 (New England Biolabs). 3 3.0 Bacteriophage X Charon 35 (LOENENand BLATTNER1983) Calculated from observed allele frequencies, assuming Hardy- arms were prepared, and phage libraries were constructed Weinberg mating. essentiallyas described by MANIATIS,FRITSCH and SAM- BROOK (1982). The phage libraries were plated onto Esch- erichia coli strain K802recA. vided by the University of Wisconsin Genetics Computer Subcloning the polymorphicPRP fragments: DNA iso- Group (DEVEREUX,HABERLI and SMITHIES1984). lated from small recombinant bacteriophage growths was digested with EcoRI, subcloned into pAT153, and trans- formed into E. coli strain K802recA. RESULTS DNA sequencing and analysis: DNA sequence analysis Length polymorphisms: Previous studies (AZENet was performed as described by MAXAMand GILBERT(1977) using the modifications of SLIGHTOM,BLECHL and SMITHIES al. 1984) showed that thenick-translated 980-bp Hinfl (1980). Nucleotide sequences were determined for both fragment from PRBI detected polymorphisms in hu- strands. The sequences were analyzed using software pro- man genomic DNA samples from three individuals. Length Polymorphisms in PRP Genes 269

P E PV P PRBlM FIGURE2.-Maps of thefour human c c c 4 puls PRB loci illustrating their two subtypes and their allelic variants. A, PRBl; B, PRB2;C, PRB3 and D, PRB4. The re- striction sites indicated are BstEII (Bs), EcoRI (E), HpaI (H),PstI (P),PuuII (Pv), PRBZS SstI (S), Sf1 (Sf), and XbaI (X). Restric- I+ 1M bl tion sites unique to a particular gene are * PRBZL indicated by single-headed arrows. Re- 1+ P3a bl PRBZV striction sites in identical positions in PRBl and PRB2 or in PRB3 and PRB4 are indicated by double-headed arrows. The regions of length variation were - = 250 localized and are shown above or below the respective complete maps forthe shortest allele at each locus. The approx- E S PHX Ir 9M hn h PB P. PRB3L imate lengthdifference between the ii 4 i his, * 41 PRBIS shortest allele at each locus and the other C tt allele(s) is indicated in bp above the map srp PV BS of each length variant. The restriction maps of alleles at each locus are identical D pRB4S except for the PstI site indicated in pa- rentheses that is present in PRB2vLbut e' d. not inother the alleles of PRB2. Open PRB4M boxes represent the regions of hybridi- PRB4L zation to the 980-bp Hinfl probe.

We have extended these observations by examining These individuals are discussed in more detail below. EcoRI digests of genomic DNAfrom an additional 4 1 No alterations in the intensities of bands associated unrelated individuals. An illustrative set of results is with any other locus were evident. presented in Figure 1. The polymorphisms detected In total, 26 different DNA patterns were identified in these EcoRI digests were alsodetectable with other among these 44 individuals, and 33 of the individuals restriction enzymes, demonstrating that the variants were heterozygous for at least one allelic length var- result from insertions or deletions ofDNA rather iant. The expected number of heterozygotes was cal- than from point mutations. culated for each locus from the observed frequencies The bands could be assigned to their respective loci of the different alleles assumingHardy-Wienberg pro- by Southern blotanalysis using several restriction portions (Table 1). The observed and expected values enzymes (data not shown). Most of the polymorphic do not differ significantly. bands appear as doublets in Figure 1, with the length Restriction site map analysisof allelic length var- differences between the bands in a doublet ranging iants In order to investigate thenature of these from approximately 100 to 300 bp (see, for example, frequent polymorphisms, ten different length variants the doublets at 4.3/4.0, 3.9/3.7, and 3.4/3.2 kb in were cloned, including examples from each of the individual 5). Severalof the polymorphic bands in four PRB loci. The length of each cloned fragment Figure 1 are not associatedwith obvious doublets was confirmed directly by Southern blotting experi- because they result from a greater length difference. ments of genomic DNA in order to insure that the For example, individuals 1 and 4 in Figure 1 carry a cloned and uncloned genomic alleleswere identical in 4.6-kb band associated with PRBP,and individual 2 length. carries a 3.3-kb band associated withPRBP. The restriction maps of the EcoRI fragments con- Table 1 lists the sizes and frequencies in 44 unre- taining PRB loci are compared in Figure 2. PRBl and lated individuals ofthe allelic length variants detected PRB2 share many restriction enzyme recognition sites by Southern blot analysis of genomic DNA digested 5' to the region of length variation. Similarly, PRB3 with EcoRI. Four different length variants were iden- and PRB4 have sites in common 5' to the region of tified at each of the four PRB loci, but none were length variation. Thus, PRBl and PRBP form one detected at either of the PRH loci. However, one subtype, and PRB3 and PRB4 form a second subtype length variant undetectable in Southern blots of the of the PRB loci. type used in our present study has been identified at The regions of length difference among the alleles the PRHI locus and is discussed below. We alsoiden- were localized by restriction mapping (Figure 2). In tifiedindividuals with a decreased or an increased all alleles, the region of length variation hybridized intensity ofthe EcoRI fragments associated withPRBI. strongly to the 980-bp Hinfl probe, suggesting that 270 K. M. Lyons, J. H. Stein and 0. Smithies A PRBl andPRB2

PRB 1 M PRBZL (incomplete)

1 atcftgtgtgggcttttactctgtctttctcttctctctctt 41 tgtcctcta$ 1 T A c 81.1 ...... (~nconplele)...... GGGTGAC~ACAAGlCCCGAAGlfClCGA InL I 82.1 29 01 .l 167 ... G ... 63.1 82 d2.1 224 A 1 I 81 .2 145 ... AG 8S.l 287 G 82.2 205 a1 .z 350 ...... A 03.2 268 A I 82.2 407 T e1 .s 331 ... GG 1 T d3.Z 470 A G 82.3 391 T 1 81.3 83.3 454 az.3 81.4 51 7 ... G G 1 83.3 AT I 653 82.4 577 -~""__.""""""" !" -?_""""""_" ~ 31.4 71 6 GCAA G 83.4 640 ...... AGGGCC AG TCCC GC ,>C 676 GT c CG A GC ...... 2 839 ...... GC TCCCGCCAGAGG 709 ~CTCCTCAAGGGGGCAGACCTlCCAGACCTCCCCAGlGACAGCCTCCCCA~AiCiAGGAGG a75 CG GI A Gt wa CCTCCTCAAGGGGGCAGACCITCCAGACCTCCCCAGIGACAGCCTCCCCAGTCATCIAGG

PRBlS. 1 atcttgt~t~ggcttttactctgtctttctcttcfctctt 41 t-atCCfCtd T C T A 81.1 104 02.1 167 ... G ... 83.1 224 287 350 407 470 533 G GCAA 53.4 593 ...... 656 GC 692 726 Figure 3.-Nucleotide sequences from portions of eight alleles of the fourPRB loci: PRBl,PRBZ, PRB3 and PRB4. The DNA sequences shown in bold type within heavy boxes are theoverall consensus sequences for therepeats present in exon 3 from alleles of PRBl plus PRB2 (A) and PRB3 plus PRBl (B). These overall consensus sequences are also given, for alignment purposes, above each locus. The consensus sequences for the B1, B2 and B3 repeats in PRBl and PRBZ,and forthe B4 repeats in PRB3 and PRB4 are displayed next in the heavy boxes above each locus with only differences from the overall consensus sequences being indicated. The DNA sequence from each allele is lightly boxed. Each repeat has been given a separate designation, indicated to theright of the sequence, that indicates the repeat type. For example, repeat B1.4 in PRBIM is a B1-type repeat, and differs from the consensus B1 repeat for PRBl by the single nucleotide T that is indicated. The DNA sequences above the dashed line include repeats that show few differences from the consensus sequences derived for the specific locus (indicated in light type within heavy boxes). DNA sequences below the dashed line (designated "C")show substantial differences from these consensus repeat sequences, and in this case the indicated differences are from the overall consensus sequence (indicated in bold type within heavy boxes). Lower case DNA sequences represent intronic regions. Dots indicate gaps introduced to maintain alignment of repeats. the length differences occur inexon 3. The restriction DNA sequence analysis of polymorphic alleles: sites flanking theregions oflength variation were Nucleotide sequences for the regions of length varia- identical among sets of alleles except for a single PstI tionwere determinedfor seven of theten alleles site presentin the longest allele of PRB2 that is absent cloned in this study and are presentedin the appendix. in theother two alleles. Thisdifference probably These sequences are compared in Figure 3. Although reflects a simple restriction site polymorphism. The no alleles of PRBP were sequenced in our present restriction mapping experiments indicatethat most of study, thesequence of an incomplete cDNA fromthis the polymorphic alleles varied in length by a multiple locus was determined by MAEDA et al. (1985) and is of about60 bp, the approximate length ofthe repeats included in Figure 3 for comparative purposes. which comprise exon 3 (Figure 2). The DNA sequences presented in Figure 3 cover Length Polymorphisms in PRP Genes 27 1 B PR83 andPRB4

LGG 1

PRB3L PRB4L

1 gtfactgtaaaaatttgcggcaaclaltcaglgaal9cc~l~t~tCCCCC~~ 1 ccalCtagctccattactgtaaaaattlglggcaaclaltcagtgaatgccgl~lglccccca t.4 GG C CA 04.1 64 cctcctcca$ C GG cc 84.1 127 C 127 I t I 04.2 190 I 04.3 190 I G C I 3413 253 A 04.4 253 A A 84.C 316 84.5 31 6 ACG C 379 04.6 379 A G 34.b 642 84.7 442 G C I. 505 A C 505 cc I 04.8 568 568 A C 1 a4.9 631 631 GA A G A G G 82.1 a& 1 691 694 ...... "" r'ccc -A.G."~c-c..aeG"- 92.2 757 757 GC 820 ... A G f c? € XG- -GI€ -rT, - u17 GlCG; 1 ...... >C wa CG T CG 851 914 ...

PRB3S PRB4M 1 ccatltagctgcgttactglaaaaa~tlgcggcaaclal~caglgaalgccglglglccccca 1 tcattt~gct~cattactgtaaaaalllglggcaaclaltcaglga~lgccgtal~lccccca 61 cctcctccad GG C CA 84.1 64 ctcctccad GG cc 84.1 127 I CA G C 84.2 127 c d4.2 190 A 84.3 1?0 253 A a4.4 253 31 6 04.9 31 6 379 1 34.10 379 B4.8 442 G G 1 TT A 82.1 442 B4.Y 505 A AT A C a2.2 505 1 GA ."_ ~ - "_" 1 568 ... A G TCCC GCCAGTG 565 625 . CG r CG 1 .: ...... lL 631 ... C TCCC nc GCC AGG 662 CCTCCTCAAGGGGGCAGACCACACAGACClCCCCAGGGACAGCCTCCCCAGTAAlCAAGG ' 6')1 GT C G G 1 ...... 754 :CTCCTCAAGGGGI;CAGiCCACCCAGACClGCCCAGGGACAACAGCCTCCCCAGTAATClAG

PRB4S 1 ccatttagctccattaCtglaa~aatltgtggcaactaltcaglgaalgccglatglccccca tU.1 BL.2 1W G C 253 A A 04.11 I A C CC B4.12 326 389 A 1 ~16.13

628 1 GT C G G T ...... 691 CCTCCTCAAGGGGGCAGGCCACCCAGACCCIGGGPATCTAG FIGURE3. the tandemly repetitious third exons. The sequences quences that can be derived for the B1,B2, B3 and have been aligned to allow comparisons of individual B4 repeats in each locus are indicated next in heavy repeats within each allele. Note that three types of boxes and regular type, with onlythe differences from repeated element (B1 , B2 and B3) comprise most of the overallconsensus sequences for each subtype exon 3 from PRBl and PRB2 (Figure 3A) while a being shown. The actual nucleotide sequences from single type of repeated element (B4) comprises most each allele are presented next within the light boxes, of exon 3 from PRB3 and PRB4 (Figure 3B) (MAEDA again with onlydifferences from the repeat consensus et al. 1985). The overall consensus sequencesthat can sequences for the B1, B2, B3 and B4 repeats being be derived for the repeats in the two subtypes of the indicated. PRB loci are indicated in heavy boxes and bold type Comparisons of the DNA consensus sequences for at the tops of the comparisons in Figures 3A and 3B. the tandem repeats derived from each of the PRB loci This sequence is also givenabove the sequence of each (Figure 3) confirm that the PRB genes can be divided allele, for alignment purposes. The consensusse- into two subtypes. PRBl and PRB2 form one subtype 272 K. M. Lyons, J. H. Stein and 0. Smithies and share a repeating array of B 1, B2 and B3 repeats on the assumption that the three possible base substi- in exon 3. The consensus sequences ofthe B 1 andB2 tutions occur with equal probability (2.1: 1). repeats are identical, and the consensus sequences for This result implies near neutrality of amino acid the B3 repeats have only a single difference: the B3 changes. Thus, although there is some evidence for repeat in PRBl contains a 3-bp gap that is not present selection for the production of this group of in the consensus repeat from PRB2. (all gaps are 3 bp in length, and no nonsense substi- PRB3 and PRB4 form a second subtype. The third tutions have been observed), there is no evidence for exons of these genes are composed largely of a single selection of a particular amino acid sequence. In fact, repeat type, B4, and the consensus sequences for the the observation of individuals withdeletions involving B4 repeats from PRB3 and PRB4 differ at only two the PRBl and PRB2 loci (seebelow) suggests that positions. This subdivision is further supported by the there is not strong selection for the production of a observation that the 3’ portions (region “C”)of PRBl particular PRP. Manyof the repeats in each allele and PRB2 contain a gap of 27 bp that is not found in share the samebase substitutions from the repeat PRB3 or PRB4. consensus sequence suggesting that some mechanism Figure 3B shows that the two sequenced alleles of has resulted in the distribution of base substitutions PRB3 differ in the number of B4 repeats that they across more than one repeat unit. contain. PRB3’ (Short) has six copies of this repeat, Generation of allelic length variants: There are while PRB3L (Long) has ten copies. Similarly, each of several mechanisms that could generate changes in the three alleles of PRB4 contains different numbers the numbers of repeats in an array. Intergenic unequal of the B4-type repeat: PRB4’ (Short) has six copies, crossovers could produce not only alterations in the PRB4M(Medium) has seven copies,and PRB4L (Long) number of repeats within an allele, but would also has nine copies. The DNA sequences from the two change the number of loci on the chromosome. Gene alleles of PRBl presented in Figure 3A demonstrate conversion events involving unequal numbers of re- that these alleles differ in length by three repeats in peats on the interacting strands, or intragenic unequal such a way that the arrayof B1, B2 and B3 repeats is crossovers, would not change the number of loci on maintained. Although nucleotide sequences were not the chromosome. determined for the three alleles of PRB2 cloned in We can rule out single homologous but unequal this study, their restriction maps indicate that these exchanges between different loci as the mechanism alleles also vary lengthin by multiples of three repeats: responsible for thegeneration of the seven sequenced PRB2’ (Short) appears to be three repeats (3 X 63 bp length variants, since these variants were not associ- z 180 bp) shorter than PRB2L (Long) while PRB2vL ated with changes inthe intensity of the bands corre- (Very Long) appears to be 15 repeats longer than sponding to any other PRP loci. PRB2’ (15 X 63 bp 900 bp). Thus the evidence A pairwise comparison of all repeats permits infer- suggests that alleles ofPRBl and PRB2, with the three- ence of the types of events that generated the se- repeat array, differ in length only by multiplesof quenced length variants. The analysis was performed three repeats. using the COMPARE program (DEVEREUX,HABERLI Analysis of base substitutions: Inspection of the and SMITHIES1984). Gapswere introduced where individualsequences presented in Figure 3 reveals necessary to maximize alignments. All gapswere that most repeats within a locus varyfrom the consen- found to be 3 bp in length, and were scored as single sus B1, B2, B3 or B4 repeat for that locus by fewer differences. than five base substitutions. This sequence conserva- Different mechanisms for the generation of length tion might reflect selection for the maintenance of a variants predict different outcomes from the pairwise strictly repeated amino acid sequence and/or homog- comparisons. For example, an exchange of repeats enization of the sequences due to repeated unequal between different loci resulting from a gene conver- exchange. sion event could bedetected by the existence of fewer The potential contribution of selectionfor a partic- differences between repeats from a different locus ular amino acid sequence was investigated by compar- (the donor locus) than between repeats from an allele ing the DNA sequence of each repeat to the repeat of the same locus (the recipient locus). On the other consensus sequencethat it most resembles.The num- hand, unequal exchange betweenalleles would be ber of changes that result in amino acid indicated by the existence of fewer differences be- replacements and the number that are silent changes tween repeats from alleles atthe samelocus than were determined for each repeat in PRBI‘, PRB2L, between repeats from different loci. PRB3L,and PRB4L (in total, 145 replacement substi- The pairwise comparisons indicate that in all the tutions and 60 silent substitutions). This comparison sequences examined there are as few or fewer differ- reveals that theobserved ratio of replacement to silent ences between the repeats from alleles than between substitutions (2.4: 1) is nearly identicalto that expected repeats from different loci. For example, no evidence Length Polymorphisms Length in PRP Genes 273

0 A oc A cc A PRBlL

A'O AA 0 OT FIGURE 4."Specific models for PRBIS -.(81.1=(B2.1~B3.1~Bl.2)(B2.2;63.2~~~~~ the generation of allelic length var- iants at the PRB loci. Each open ar- rowrepresents one repeat. The nameswithin each arrow are the sameas those used to identify the repeats in Figure 3. Crossover zones are indicated byvertical linessur- rounding a cross, andnucleotide dif- ferences between allelic variants are identified above the repeat in which they occur. The exact locationsof these base differences can be deter- mined by reference to Figure 3.

was found forthe introduction of B1, B2,or B3 mize the number of additional base substitutions that repeats from PRBl or PRBP into PRB3 or PRB4. were also needed to account for the data. Similarly, no B4 repeats were found in any alleles of The most simple explanation for the generation of PRBl or PRBB. Thus, there is no evidence for the eachof the allelic length variants is that it is the occurrence of gene conversions between PRP genes product of a single homologousbut unequal crossover of different subtypes. event between alleles or sister chromatids. The gen- In the comparisons made between alleles of PRBl eration of PRB4M, for example, can be entirely ac- and PRB2, and between alleles of PRB3 and PRB4, counted for by a single unequal crossover between some of the repeats from the two loci were found to PRB4Land PRB4S. For the remaining pairs of alleles, have as few differences as the repeats from within a base substitutions must be assumed in order to fully single locus. Such sequence similarities between the explain the generation of one allelic length variant loci within eachsubtype may reflect either their recent from another. Thus, in order to fully account for the divergence from an ancestral gene or recent gene generation of PRB3' from PRB3L,one crossover and conversions, which canunder favorable circumstances three nucleotide substitutions were required. Simi- be distinguished (see, for example, SLICHTOMet al. larly, the generation of PRBls from PRBIM requires 1980). The sequence differences between the mem- one crossover and six nucleotide substitutions. It is, bers of the PRB subtypes are, however, so few that of course, possible that someof the unsequenced this distinction is not possible with the present data length variants would require fewer nucleotide sub- for either the PRBl and PRB2 loci or the PRB3 and stitutions in order to account for the generation of PRBI loci. Short gene conversions between the mem- the sequenced length variants. In spite of this uncer- bers of these two pairs of locithat do not result in the tainty, the models postulating unequal crossovers be- generation of length variation may have occurred. tween allelesor sister chromatids require fewer events The pairwise comparisons facilitated the construc- in order to account for thegeneration of alleliclength tion of specific models able to account for the gener- variants than do any other models we tested, including ation of each of the variants studied at the PRB loci models involving unequal gene conversions (or un- (Figure 4). The modelswere constructed so as to equal double crossovers). minimize the number of crossover events required. Unequal crossovers between loci: Examination of Attempting to account for eachof the sequenced genomic DNAs by blothybridization revealed that allelesas being derived from events involving the several individuals had bandscorresponding to PRBl other sequenced allele(s) is clearly artificial, because of greater or less than normal intensity. In one indi- the true histories are likely to involve alleles that we vidual, the EcoRI fragment containing the PRBl gene have not studied. Nevertheless, the models allow us was completely absent and was not replaced by any to delineate the types of events that may haveoccurred. alternate fragment (Figure 5A, lane 3). This individ- In constructing the models, the locations of the pos- ual appears to behomozygous fora chromosome tulated crossover events were assigned so as to mini- carrying a deletion involving the PRBl locus. We have 274 K. M. Lyons, J. H.and Stein 0. Smithies also observed two individualsin our sample of44 with A B C a band corresponding to PRBl which appears to be 1234 1234 1234 half as intense as the band found in most individuals, suggesting that they are heterozygous for a deletion- bearing chromosome (data not shown). Three individ- 14 uals had a band of greater than normal intensity at PREI the PRBl locus. The 6.2-kb EcoRI fragment in lane 2 of Figure 5A, for example, has a greater intensity than the fragments associated with in lanes 1 PRBl PR83 and 4, suggesting that this individual carries a dupli- PRE2 cationinvolving PRBl. In addition, one individual with three different length variants of PRBl was de- PRBI tected in our sample (data not shown). These obser- vations showthat duplications and deletions involving ECO RI Hlndlll Barn H I PRBl are relatively common in the studied popula- FIGURE5.-Southern blot hybridizations of restriction digests tion. of genomic DNA from four individuals (lanes 1 to 4). Individual 3 has a deletion detected by the Hinfl 980 bp probe. Individual 2 In order toinvestigate the natureof a chromosome carriesa band of increased intensity. Individuals 1 and 4 are normal. carrying a presumptive deletion, genomicDNA The locations of the missing bands in individual 3 and the bands of (kindly provided by P. O'CONNELL)from the individ- increased intensity in individual 2 are indicated respectively by dots ual homozygous for this rearrangement was digested and squares. Their sizes are indicated in kb. with three different restriction enzymes and analyzed by agarose gel electrophoresis (lane 3, Figure 5, A- A PRB2 :iYY C). The Southern blot was hybridized to the 980 bp 3- Hinff probe. With all three enzymes a band corre- B PIE1 ,QX tt tB sponding to either PRBI or PRB2 is absent. These ! I ! fl XH data and the results of digestion with several other restriction enzymes (data not shown) permit the deri- 1 vation of a map of restriction sites for the PRP gene c PRB2Il hl*on : i i i"W region on this variant chromosome and a description of its history. t ?! t Comparison of the maps of the normal and FIGURE6.-Formation of a PRB fusion gene. A, Map of the PRBl PRB2 gene region; B, map of thePRBl gene region; C, map of the PRB2 regions (Figure 6, A and B, respectively) shows PRB2/1 fusion gene region. Only sites pertinent to .mapping with that although the positions of manyrestriction enzyme the Hinfl 980-bp probe are shown. In C, sites indicated above the sites are identical in the two regions, several restriction line are identical to those in PRB2; sites below the line are identical sites or fragment lengths are unique to either the to those in PRBl. The potential crossover zone generating the or the gene region. Comparison of the fusion gene is indicated by vertical lines surrounding a cross. The PRBl PRBP open boxes represent the tandemly repetitious third exon. Restric- restriction map of the fusion gene (reminiscent ofthe tion enzymes are B, BamHI; Bs, BstEII; E, EcoRI; H, HindIII; S, Lepore 6-8gene) on thedeletion chromosome (Figure MI; X, XbaI. 6C) with the maps of the normal PRBl and PRB2 loci shows that the restriction map of the fusion gene PRBl (in the EcoRI digest) or PRB2 (in the HindIII region is identical to that of the PRB2 gene region 5' or BamHI digests)is of greater than normal intensity. to and including the tandemly repetitious third exon. These results are consistent with the presence of a The restriction maps of the regions 3' to exon 3 in PRBlIPRB2 fusion gene, which migrates with PRBl the fusion gene andthe PRBI gene are identical. in the EcoRI digest and with PRB2 in the BamHI and These results are consistent withthe generation of the HindIII digests, in addition to the normal PRBl and variant chromosome via a homologous but unequal PRB2 loci on this variant chromosome. Thus, this crossover event between PRBl and PRB2 as illustrated individual is carrying a duplication comparable to that in Figure 6, with the region of crossover being local- which would be formed as the reciprocal of the event ized to a 700-bp region defined by the BstEII and leading to the deletion illustrated in Figure 6 [remi- EcoRI sites in the repetitious third exon of PRBl niscent of the anti-Lepore B-S gene (Hemoglobin Mi- (assuming that there are nounrecognized restriction yada)l. site polymorphismsthat alter these sites). DISCUSSION Genomic DNA from the individual havingan EcoRI fragment associated withPRBI of greater than normal Previous cDNAsequence studies ofthe human PRP intensity (lane 2 in Figure 5A) was also digested with multigene family (MAEDAet al. 1985) demonstrated BamHI and HindIII (lane 2 in Figure 5, A-C). With that the six loci could be classified into two groups all three enzymes, the band corresponding to either [the HaeIII-type (PRH)and theBstNI-type (PRB)loci] Length Polymorphisms Length in PRP Genes 275 based on the sequences of the tandem repeats com- mologous. No structural features, such as Chi sites prising thethird exons of these genes. The and simple sequence DNA, are evident that might present study demonstrates that the four PRB genes predispose the PRB loci to recombine. However, the can be further subdivided; PRBl and PRB2 form one organization of the third exons of the PRH loci sug- subtype, and PRB? and PRB4 form a second subtype. gests two features that might explain the infrequent This observation suggests that sequences have been generation of length variants at these loci compared exchanged recently between the members of each of to thePRB loci. First, the third exons of the two PRH these two subclassesand/or that themembers of each genes contain only five tandem repeats, whereas the subclass share a recent common ancestor. number of repeats in the PRB genes is higher (range Although the data prsented here do not allow pre- 8-1 3, median 10-1 l),allowing a greater opportunity cise determination of the evolutionary relationships for unequal pairing. Secondly, the five individual re- among the four PRB loci, they do indicate that re- peats in the PRH loci exhibit substantial length varia- peated unequal crossing over between alleles(or sister tion, being, respectively, 48, 51, 51, 63 and 63 bp chromatids) has played a significant role in the evo- long. These length differences are very likely to affect lution of the loci. Inspection of the sequences pre- adversely the stability of misaligned DNA strands. In sented in Figure 3 reveals that the same base differ- contrast, the PRB repeats are in general all the same ence relative to therepeat consensus sequence is often length. Consequently, we suggest that the length dif- found in several repeats within a given allele. This ferences among therepeats found in PRH loci account observation is consistent withSMITH'S (1 974)proposal for the differences in the frequency of generation of that base substitutions spread to multiple copies of a length variants between the PRH and PRB loci. We tandem repeat as a consequence of repeated intra- cannot, however,rigorously exclude the possibility genic unequal exchange. Evidence supporting a pre- that the number of repeats in the protein products dominant role for repeated intragenic exchanges encoded by the PRH lociis more constrained as a rather than events involving allelesat different loci in result of some criticalstructure needed for their phys- the evolution of the PRB genes is our observation that iological function. the repeat consensus sequences derived for each of We identified intergenic unequal crossover events the PRB loci are unique. The consensus B3 repeat involving the PRBl and PRB2 loci, but no other PRP sequences from PRBl and PRB2 differ, as do the loci. The observation of both deletions and duplica- consensus B4 repeats from PRB? and PRB4. tions involving these loci suggeststhat such intergenic Our study demonstrates that length variants are events are not rare, since it is extremely unlikely that common at the PRB loci. We have identified 20 inser- the deleted and duplicated chromosomes arose during tion/deletion polymorphisms at these loci.Pairwise a single meiosis and were both found in our studied comparisons indicate that themost simpleexplanation population. One factor that may influence the relative for the generation of these length differences is via frequencies of unequal recombination between loci is homologous but unequal intragenic exchange. Al- the chromosomal organization of the PRP multigene though several different examples of length variants family. Thus, PRBl and PRBB are physically closer were detected at each of the fourPRB loci, none were (70 kb) to each other than are any of the other loci found in this study at either of the PRH loci in spite (H.4. KIM, unpublished data). PRB3 and PRB4, on of the fact that the third exons of PRHl and PRH2 the other hand, are separated from each other by are composedmainly of tandem repeats (KIM and approximately 350 kb and one of the PRH lociis MAEDA 1986). However, AZENet al. (1987) found located between them (H.-S. KIM, unpublished data). that one of the three common alleles at the PRHl Unequal crossover between the PRB3 and PRB4 loci locus is one repeat longer than the other two alleles. would therefore lead to the loss (or gain) of two loci Thus, length variants appear to arisemuch more and a total of 350 kb of DNA. This factor may be readily at thePRB loci than the PRH loci. partly or even completely responsible for the obser- The relative paucity of allelic length variants at the vation that unequal crossing over between PRB? and PRH loci could be due either to structural features PRB4 has not been detected by us or by others that interfere with unequal pairing, and/or to selec- (O'CONNELLet al. 1987). tion against size variants. The observation that the Studies on homology requirements for recombina- one variant PRHl allele is fairly common in Cauca- tion in bacteria (WATTet al. 1985; SHENand HUANG sians suggeststhat selection against size variantsis not 1986) have demonstrated that efficient homologous likely to be the major cause for this difference. Ac- recombination requires an uninterrupted stretch of at cordingly we examined the sequences of PRH and least 25-50 bp of perfect homology, and that even a PRB genes to see if we could find some features that singlemismatch dramatically lowers the efficiency. might account for thedifference. Experiments in mammalian cells (RUBNITZand SUB- The consensus sequencesfor PRH and PRB repeats RAMANI 1984; AYARESet al. 1986; LISKAY,LETSOU (KIM and MAEDA 1986) are approximately 80% ho- and STACHELEK1987) have alsodemonstrated a strict 276 K. M. Lyons, J. H. Stein and 0. Smithies homology requirement for homologous recombina- KIM,H.-S., and N. MAEDA,1986 Structures oftwo HaeIII-type tion. genes in the human salivary proline-rich multigene family. J. Taken as a whole, our evidence suggests that syn- Biol. Chem. 261: 67 12-67 18. KLEIN, H. L., 1984 Lack ofassociation between intrachromosomal apsis formation during unequal pairing within the gene conversion and reciprocal exchange. Nature 310 748- repetitious regions of the PRB loci is very sensitive to 753. sequence differences among repeats. Thus, the strict KLEIN, H. L., and T. D. PETES,1981 Intrachromosomal gene maintenance of the Bl-B2-B3 array in alleles ofPRBl conversion in yeast. Nature 289 144-148. and PRB2 can be explained as the consequence ofthe LISKAY,R. M., A. LETSOUand J. L. STACHELEK,1987 Homology requirement for efficient gene conversion between duplicated inability of the B3 repeats, which contain 3 bp gaps chromosomal sequences in mammalian cells. Genetics 115: relative to the Bl and B2 repeats, to pair with the B1 161-167. or B2 repeats. The greater length variations of the LOENEN,W. A.M., and F. R. BLATTNER,1983 Lambda Charon repeats within alleles of PRH loci could account for vectors (Ch 32, 33, 34 and 35) adapted for DNA cloning in the observation of only a single length variant at a recombination-deficient hosts. Gene 26 17 1-1 79. LYONS,K. M., E. A. AZEN,P. A. GOODMANand 0. SMITHIES, PRH locus. Our results thus provide further evidence 1988 Many protein products from a few loci: assignment of that homologous recombination is very sensitive to human salivary proline-rich proteins to specific loci. Genetics differences in the amounts of homology between in- 120 255-265. teracting strands. MAEDA,N., 1985 Inheritance of the human salivary proline-rich proteins: a reinterpretation in terms of six loci forming two We thank PETERO'CONNELL for providing several human ge- subfamilies. Biochem. Genet. 23: 455-464. nomic DNAs, and HYUNG-SUKKIM for permission to quote unpub- MAEDA, N., H.-S.KIM, E. A. AZEN and 0. SMITHIES, lished data. We also thank PATRICIAA. POWERS,PHILIP HOWLES 1985 Differential splicing and post-translational cleavages in andJAMES F. CROWfor helpful discussions on the manuscript. This the human salivary proline-rich protein gene system. J. Biol. work was supported by National Institutes of Health grant Chem. 260 11 123-1 1130. GM20069. MAMULA,P. W., N. A. HEEREMA,C. G. PALMER,K. M. LYONSand R. C. KARN,1985 Localization of the human salivary protein LITERATURECITED complex (SPC) to chromosome band 12~13.2.Cytogenet. Cell Genet. 39 279-284. ALLISON,L. A., M. MOYLE,M. SHALESand C. J. INCLFS, MANIATIS, T., E. F. FRITSCHand J. SAMBROOK,1982 Molecular 1985 Extensive homology among the largest subunits of eu- Cloning (A Laboratory Manual). Cold Spring Harbor Labora- karyotic and prokaryotic RNA polymerases. Cell 42: 559-610. tory, Cold Spring Harbor, New York. AMSTUTZ,H., P. MUNZ, W.-D. HEYER,U. LEUWLD and J.KOHLI, MAXAM,A. M., and W. GILBERT,1977 A new method for se- 1985 Concerted evolution of tRNA genes: intergenic conver- quencing DNA. Proc. Natl. Acad. Sci. USA 74 560-654. sion among three unlinked serine tRNA genes in S. pombe. Cell MUSKAVITCH,M. A. T., and D. S. HOGNEE., 1982 An expandable 40: 879-886. gene that encodes a Drosophila glue protein is not expressed AYARES,D., L. CHERKURI,K.-Y. SONGand R. KUCHERLAPATI, in variants lacking remote upstream sequences. Cell 29: 104 1- 1986 Homology requirementsfor intermolecular recombi- 1051. nation in mammalian cells. Proc. Natl. Acad. Sci.USA 83: O'CONNELL,P., G. M. LATHROP,M. LAW,M. LEPPERT,Y. NAKA- 5199-5203. MURA, M. HOFF, E. KUMLIN,W. THOMAS,T. ELSNER,L. AZEN, E. A., 1988 Genetic protein polymorphisms of human BALLARD,P. GOODMAN,E. AZEN,J. E. SADLER,G. Y. LAI, J.- saliva. In: ClinicalChemistry of HumanSaliva, Edited by J. M. LALOUELand R. WHITE,1987 A primary genetic linkage TENOVUO,CRC Press, Inc., Cleveland, Ohio (in press). map for human . Genomics 1 (in press). AZEN,E. A., P. GOODMAN,S. FLYNN,N. MAEDAand H.4. KIM, NAGYLAKI,T., and T. D.PETES, 1982 Intrachromosomal gene 1987 Alleles at thePRHl locus coding for the human salivary conversion and themaintenance of among acidic proline-rich proteins (PRPS) Pa, Db, and PIF. Am. J. repeated genes. Genetics I00 315-337. Hum. Genet. 41: 1035-1047. PETES,T. D., 1980 Unequal meiotic recombination within tandem AZEN,E., K. M. LYONS,T. MCGONIGAL,N. L. BARRETT,L. S. arrays of yeast ribosomal DNA genes. Cell 19 765-774. CLEMENTS,N. MAEDA,E. F. VAVIN,D. M. CARLSON,and 0. PONCZ,M., D. SOLOWIEJCZYK,B. HARPEL, Y. MORY,E. SCHWARTZ SMITHIES,1984 Clones from the human gene complex coding and S. SURREY,1982 Construction of human gene libraries for salivary proline-rich proteins. Proc. Natl. Acad. Sci. USA from small amounts of peripheral blood: analysisof @-like 81: 5561-5565. globin genes. Hemoglobin 6 27-36. DEVEREUX, J.,P. HABERLIand 0. SMITHIES,1984 A comprehen- RoEDER,G.S.,M.SMITH~~~E.J.LAMBIE,1984 Intrachromosomal sive set of sequence analysis programs for the VAX. Nucleic movement of genetically marked Saccharomyces cerevisiae tran- Acids Res. 12: 387-395. sposons by gene conversion. Mol. Cell. Biol. 4: 703-71 1. JACKSON, J. A., and G. R. FINK,1981 Gene conversion between RUBNITZ,J., and S. SUBRAMANI,1984 The minimum amount of duplicated genetic elements in yeast. Nature 292: 306-3 1 1. homology required for homologous recombination in monkey JACKSON,J. A., and G. R. FINK, 1985 Meiotic recombination cells. Mol. Cell. Biol. 4 2253-2258. between duplicated genetic elements in Saccharomyces cerevisiae. SCHERER,S., and R. W. DAVIS,1980 Recombination of dispersed Genetics 109 303-332. repeated DNA sequences in yeast. Science 209 1380-1 384. JINKS-ROBERTSON,S. and T. D. PETES, 1985 High-frequency SHEN,P., and H. V. HUANG,1986 Homologous recombination in meiotic gene conversiodn between repeated genes on nonho- Escherichia coli: dependence onsubstrate length and homology. mologous chromosomes in yeast. Proc. Natl. Acad. Sci. USA Genetics 112: 441-457. 82: 3350-3354. SHIN,HA., T. A. BARGIELLO,B. T. CLARK,F. R. JACKSON and M. JINKS-ROBERTSON,S., and T. D. PETFS,1986 Chromosomal trans- W. YOUNG, 1985 An unusual coding sequence from a Dro- locatio~sgenerated by high-frequency meiotic recombination sophila clock gene is conserved in vertebrates. Nature 317: between repeated yeast genes. Genetics 114: 731-752. 445-448. Length Polymorphisms in PRP Genes 277 SLIGHTOM,J. L., A.E. BLECHLand 0.SMITHIES, 1980 Human methyl-paper and rapid hybridization by using dextran sulfate. fetal cy- and *y-globin genes: complete nucleotide sequences Proc. Natl. Acad. Sci. USA 76 3682-3687. suggest that DNA can be exchanged between these duplicated WARREN,G., L. COROTTOand P. WOLBER,1986 Conserved re- genes. Cell 21: 627-638. peats in diverged ice nucleation structural genes from two SMITH,G., 1974 Unequal crossover and the evolution of multi- species of Pseudomonas. Nucleic Acids Res. 14: 8047-8060. gene families. Cold Spring Harbor Symp. Quant. Biol. 38: WATT, V. M., C. J. INGLES,M. S. URDEAand W. J. RUTTER, 507-513. 1985 Homology requirementsfor recombination in Esche- SMITH, G. P., 1976 Evolution of repeated DNA sequences by richia coli. Proc. Natl. Acad. Sci. USA 82: 4768-4772. unequal crossover. Science 191: 528-535. Communicating editor: B. W. GLICKMAN SOUTHERN,E. M., 1975 Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98: 503-5 17. SZOSTAK,J. W., and R. WU, 1980 Unequal crossing over in the APPENDIX ribosomal DNA of Saccharomyces cerevisiae. Nature 284 426- 430. Nucleotidesequences (Figure 7) of the Hinfl VANIN,E. F., P. S. HENTHORN,D. KIOUSSIS,F. GROSVELDand 0. (PRBIS,PRBIM, PRB4’, PRB4M and PRB4L) and SMITHIES,1983 Unexpected relationships between four large Hue111 plus HueIII/Hinfl fragmentscontaining the deletions in the human B-globin . Cell 35: 701- 709. tandemly repetitious third exons. Sequences from the WAHL, G.M., M. STERNand G. R. STARK,1979 Efficient transfer intron proceeding the third exon are in lower case. of large DNA fragments from agarose gels to diazobenzyco- Sequences from the third exon arein upper case.

PRBIS

1 atcttgtgtgggcttttactctgtctttctcttctctctttgtcctctagtMATCCACAAGtiACCATCCCCACAAttiAG 80 81 GCAACAAGCCCCAAttiCCCCCCACCTCCTCCAttiAAAGCCACAA66ACCACCCCCACAA~A66CAACAMCCTCM66T 160 161 CCCCCACCTCCAttiAAAGCCACAAGGACCACCCCCACAAGGGGACMGTCCCGAAGTCCCCGATCTCCTCCA66~CC 240 241 ACAAGGACCACCCCCACAAGtiAGGTAACCAGCCCCAAGGTCCCCCACCTCCTCCA66~GCCACMGtiACCACCCCCAC 320 321 AAGGAGGCAACAGACCTCAAGGTCCCCCACCTCCAGGAAAGCCACAAGGACCACCCCCACAA~~AGACMGTCCCGMGT 400 401 CCCCGATCTCCTCCAAGAAAGCCACAAGGACCACCCCCACAA66AGGTAACCAACCCC~GtiTCCCCCACCTCCTCCAGG 480 481 AAAGCCACAAGGACCACCCCCACAA~~~AGGCAACAAACCTCA~TCCCCCACCTCCA~~AAAGCCACAA~~ACCACCCG560 561 CACAAGGAGGCAGCAAGTCCCAAAGTGCCCGATCTCCTCCAGGAAAGCCACAAG6ACCACCCCAACAAGAA66CAACAAT 640 641 CCTCAAGGTCCCCCACCTCCAGCA66A66CAATCCCCAGCAGCCTCAGGCACCTCCTGCTGGACAGCCCCA~ACCACC 720 721 ACGCCCTCCTCAAGGGGGCAGACCTTCCAGACCTCCCCAGTGACAGCCTCCCCAGTCATCTAGGATTCAATGACAGG 797

PRBIM 1 atcttgtgtgggcttttactctgtctttctcttctctctttgtcctctagtAMTCCACM66ACCATCCCCACAAGGAG 80 81 GCAACAAGCCCCAA66CCCCCCACCTCCTCCAGGAAAGCCACAAGGACCACCCCCACMtibAffiCAACAAACCTCAA66T 160 161 CCCCCACCTCCAGGAAAGCCACAAGGACCACCCCCACAAGGGGACAAGTCCCGAAGTCCCCGATCTCCTCCAttiAMACC 240 241 ACAAGGACCACCCCCACAAGGAtibTAACCAGCCCCAAGtiTCCCCCACCTCCTCCAGG~GCCACAA66ACCACCCCCAC 320 321 AAGGAGGCAACAAACCTCAGtibTCCCCCACCTCCA66AAAGCCACAAGGACCACCCCCACM66AGACMGTCCCAAAGT 100 401 CCCCGATCTCCTCCAGGAAAGCCACAAGGACCACCCCCACAAttiA66TAACCAGCCCCAA66TCCCCCACCTCCTCCA66 480 481 AAAGCCACAAGGACCACCCCAACAAGGAGGCAACAGACCTCAAffiTCCCCCACCTCCA66AMGCCACAA66ACCACCCC 560 561 CACAAGGAGACAAGTCCCGAAGTCCCCAATCTCCTCCA66AAAGCCACAA66ACCACCCCCACAA66A66CAACCAGCCT 660 641 CAAGGTCCCCCACCTCCTCCA66AAAGCCACAAttiACCACCCCCACAA~A66CAACAAACCTCAA66TCCCCCACCTCC 720 721 AGGAAAGCCACAAGGACCACCCGCACAA66AGGCAGCAAGTCCCAMGTGCCCGAGCTCCTCCAffi~GCCACMGtiAC am 801 CACCCCAACAAGAAGGCAACAATCCTCAA66TCCCCCACCTCCAGCAGGA66CAATCCCCAGCAGCCTCAGtiCACCTCCT 880 881 GCT66ACAGCCCCAGGGACCACCACGCCCTCCTCAA66G66CAGACCTTCCAGACCTCCCCAGTGACAGCCTCCCCAGTC 960 961 ATCTAGG 967

PRB3S

1 agtgtccttattctgactcctccttagactgagagcccctcaactgcttccttcttctccagcattccactccagagttc 80 81 tagagcttcactgaaaatgcaaaga~attagtgtctgggtcttatttttgtgc~tttccccatttagctgcgttactgta 160 161 aaaatttg~gg~aa~tatt~agtgaatgccgtgtgtccc~~a~~t~ct~~agtAAA6CCAGMGtiACGACGCCCACAAtib240 241 AGGAAACCAGCCCCAACGTACCTCACCTCCTCCAGGAAAGCCAGAAGGACGACCCCCACAAGGAGGCAACCA6TCCCAAG 320 321 GTCCCCCACCTCGTCCAGGAAAGCCAGAAGGACCACCCCCACAA66AGGAAACCA6lCCCAAGGTCCCCCACCTCGTCCG 400 401 GGAAAGCCAGAAGGACAACCCCCACAAttiAGGAAACCAGTCCCAAGGTCCCCCACCTCGTCCGGGMAGCCAGMGGACC 480 481 ACCCCCACAAGGAGGAAACCAGTCCCAAGtiTCCCCCACCTCGTCC66GAAAGCCAGM66ACCACCCCCACAA66AffiAA 560 561 ACCAGTCCCAAGGTCCCCCACCTCGTCCAGGAAAGCCAGAAGGATCACCTTCACAAGtiAGGCAACAAACCTCAA66TCCC 640 641 CCACCTCATCCAGGAAAGCCACAA66ACCACCCCCACAAGAAGGTAACAAACCTCAACGTCCCCCTCCTCCAAGMGtiCC 720 721 ACAAGGACCACCCCCACCAGGAGGCAATCCCCAGCAGCCTCTGCCACCTCCCGCTGGAAAGCCCCAGGtiACCACCTCCAC 800 801 CTCCTCAAGGGGGCAGACCACACAGACCTCCCCAGGGACAGCCTCCCCAGTAATCAAGGTTCAATGACA66TATGA 876

FIGURE7.-Nucleotide sequences from the third exons ofPRB1, PRBP, PRB3, and PRB4 (figure is continued on p. 278). K. M. Lyons, J. H. Stein and 0. Smithies

PRB3L

1 agtgtccttattctgactcctccttagactgagagcccctcaactgcttccttcttctccagcattccactccagagttc 80 81 tagagcttcactgaaaatgcaaagaaattagtgtctgggtcttatttttgtgcatttccccatttagctgcgttactgta 160 161 aaaatttgcggcaactattcagtgaatgccgtgtgtcccccacctcctcca~AAAGCCAGAAGGACGACGCCCACAA~ 240 261 A~AAACCAGCCCCAACGTACCCCACCTCCTCTAGGAAAGCCAGAA~ACGACCCCCACAAGGAGGCAACCAGTCCCAAG 32U 321 GTCCCCCACCTCGTCCAGGAAAGCCAGAAGGACCACCCCCACAA~AGGAAACCAGTCCCAAGGTCCCCCACCTCGTCCG 400 401 GGAAAGCCAGAAGGACAACCCCCACAAGGAGGAAACCAGTCCCAAGGTCCCCCACCTCGTCC~AAAGCCAGAAGGACC 480 481 ACCCCCACAAGGAGGAAACCAGTCCCAAGGTCCCCCGCCTCGTCCGGGAGAGCCAGAAGGACCACCCCCACAA~AGGAA 560 561 ACCAGTCCCAAGGTCCCCCGCCTCATCCGGGAAAGCCAGAAGGACCACCCCCACAAGGAGGAAACCAGTCCCAA~TCCC610 6*1 CCACCTCG~CCGGGAAAGCCAGAAGGACCACCCCCACAACCACCAAACCAGTCCCAA~~TCCCCCACCTCGTCC~AAA720 721 GCCAGAAGGACCACCCCCACAAGGAGGAAACCAGTCCAAACCACCCC 800 801 CACAAGGAGGAAACCAGTCCCAAGGTCCCCCACCTCGTCCAGGAAAGCCAGAAGGATCACCTTCACAAGGA~CAACAAA 880 881 CCTCGAGGTCCCCCACCTCATCCAGGAAAGCCACAAGGACCACCCCCACAAGAAGGCAACAAACCTCAACGTCCCCCTCC 960 961 TCCAAGAAGGCCACAAGGACCACCCCCACCAGGAGGCAATCCCCAGCAGCCTCTGCCACCTCCCGClGGAAAGCCCCAGG 1040 1041 GACCACCTCCACCTCCTCAAGGGGGCAGACCACACAGACCTCCCCAGGGACAGCCTCCCCAGTAATCAA66TTCAATGAC 1120 1121 AGGTATGA 1128

PRB4S 1 ~tt~ctagactgggcctattctcctccttag8ctg~agcccctc~acttctcccttttccccca~cgttccactccaga80 81 gttctagggcttcactgaaaatgcaaagaaattagtatctgggtctccattttgtgcatttccccatttagctccattac 160 161 tgtaaaaatttgtggcaactattcagtgaatgccgtatgtcccccacctcctcca~AAAGCCAGAAG6ACGACGCCCAC 240 241 AAGGAGGAAACCAGCCCCAACGTCCCCCACClCCTCCA~AAAGCCACAA~ACCACCCCCACAAGGA~AAACCAGTCC 320 321 CAAGGTCCCCCACCTCCTCCAGGAAAGCCAGAAGGACGACCCCCACAA~AGGCAACCAGTCCCAAG6TCCCCCACCTCA 400 401 TCCAGGMAGCCAGAAAGACCACCCCCACAA6GALAC 480 481 GACCACCCCCACAAGGAGGCAACCAGTCCCACCGTCCCCCACCTCCTCCAGGAAAGCCAGAAAGACCACCCCCACAAGGA 560 561 GGTAACCAGTCCCAAGGTCCCCCACCTCATCCAGGAAAGCCAGAAGGACCACCCCCACAGGAAGGAAACAAGTCCCGAAG 640 641 TGCCCGATCTCCTCCAGGAAAGCCACAAGGACCACCCCAACAAGAAGGCAACAAGCCTCAA~TCCCCCACCTCCTGGAA 720 721 AGCCACAAGGCCCACCCCCAGCAGGAGGCAATCCCCIGCAAAGCCCCAGGGGCCACCT 8W 801 CCACCTCCTCAAGGGGGCAGGCCACCCAGACCTGCCCA~GACAACAGCCTCCCCAGTAATCTAG 865

PRB4M 1 cttcctagactgggcctattctcctccttagactgagagcccctcaacttctcccttttcccccagcgttccactccaga 80 81 gttctagg~cttcactgaaaatgcaaagaaattagtatctgggtctcatttttgtgcatttccccatttagctccattac 160 161 tgtaaaaatttgtggcaactattcagtgaatgccgtatgtcccccacctcctcca~AAAGCCAGAA~ACGACGCCCAC 240 241 AAGGAGGAAACCAGCCCCAACGTCCCCCACCTCCTCCAGGAAAGCCACAAGGACCACCCCCACAAGGAG6AAACCAGTCC 320 321 CAAGGTCCCCCACCTCCTCCAGGAAAGCCAGAAGGACGACCCCCACAAGGAGGCAACCAGTCCCAAGGTCCCCCACCTCA 400 401 TCCAGGAAAGCCAGAAAGACCACCCCCACAAGGAGGAAACCAGTCCCAAGGTACCCCACCTCCTCCAGG~AGCCAG~G 480 481 GACGACCCCCACAAGGAGGCAACCACTCCCAA~TCCCCCACCTCATCCA~AAAGCCAGAAAGACCACCCCCACAAGGA 560 561 GGCAACCAGTCCCACCGTCCCCCACCTCCTCCAGGAAAGCCAGAAAGACCACCCCCACAA~AGGCAACCAGTCCCAAGG 640 641 TCCCCCACCTCATCCAGGAAAGCCAGAAGGACCACCCCCACAGGAAGGAAACAAGTCCCGAAGTGCCCGATCTCCTCCAG 720 721 GAAAGCCACAAGGACCACCCCAACAAGAAGGCAACAAGCCTCAAGGTCCCCCACCTCCTGGAAAGCCACAA~CCCACCC 800 801 CCACCAGGAGCCAATCCCCAGCAGCCTCACCTtCTG 880 881 CAGGCCACCCAGACCTGCCCAGGGACAACAGCCTCCCCAGTAATCTAG 928

PRB4L 1 cttcctbgactgsgcctattctcctcctta~actgagagcctctcaacttctcccttttcccccbgcgttccactccaga 80 81 9ttctagggcttcactgaaaatgcaaagaaattagtatctgggtctcatttttgtgcatttcccca~tagctccattac 160 161 tgtaaaaatttgtggcaactattcagtgaatgccgtatgtcccccacctcctccagGAAAGCCACAAGGACGACGCCCAC 240 241 AAGGAGGAAACCAGCCCCAACGTCCCCCACCTCCTCCAGGAAAGCCACAAGGACCACCCCCACAAGGA~AAACCAGTCC 320 321 CAAGGTCCCCCACCTCCTCCAGGAAAGCCAGAAGGACGACCCCCACAAGGAGGCAACCAGTCCCAAGGTCCCCCACCTCA 400 401 TCCAGGAAAGCCAGAAAGACCACCCCCACAAGGAGGAAACCAGTCCCAAGGTCCCCCACCT~ArCCA~AAAGCCAGAAA 480 481 GCCGACCCCCACAAGGAGGACACCAGTCCCAAGGTCCCCCACCTACTCCAGGAAAGCC~AAGGACCACCCCCACAA~A 560 561 GGAAACCAGTCCCAAGGTACCCCACCTCCTCCAGGAAAGCCAGAAGGACGAC~CCCA~AAGGAGGCAACCAGTCCCAAGG 640 641 TCCCCCACCTCATCCAGGAAAGCCAGAAAGACCACCCCCACAAGGAG6CAACCAGTCCCACCGTCCCCCACCTCCTCCAG 720 721 GAAAGCCAG~AAGACCACCCCCACAAGGAGGCAACCAGTCCCAAGGTCCCCCACCTCATCCAGGAAAGCCAGAAGGACCA 800 801 CCCCCACAGGAAGGAAACAAGTCCCGAAGlGCCCGATCTCCTCCAGGAAAGCCACAA~ACCACCCCAACAAG~~CAA 880 881 CAAGCCTCAAGGTCCCCCACCTCCTGGAAAGCCACAAGGCCCACCCCCACCAGGAGGCAATCCCCAGCAGCCTCA~CAC 960 961 CTCCTGCTGGAAAGCCCCAGGGGCCACCTCCACCTCCTCAAGGGGGCAGGCCACCCAGACCT6CCCA~GACAACAGCCT 1040 1041 CCCCAGTAATCTAG 1os4 FIG. 7-(Continued)