<<

The EMBO Journal, Vol. 1 No. 2, pp.231-236, 1982

Human papillomavirus la complete DNA sequence: a novel type of genome organization among Papovaviridae

O.Danos*, M.Katinka, and M.Yaniv copy episome in the nucleus of non-producing cells, without Unite des Virus oncogenes, Departement de Biologie moleculaire, Institut apparent integration in the (Law et al., 1981; Pasteur, 25, rue de Dr. Roux, 75724 Paris Cedex 15, France Lancaster, 1981; Moar et al., 1981a, 1981b). Therefore, they Communicated by M. Yaniv may be used as efficient eucaryotic cloning vectors (Sarver et Received on 29 January 1982 al., 1981). The complete sequence of human papillomavirus We report here the complete nucleotide sequence of the type la (7811 ) has been established. The overall human papillomavirus type 1 subtype a (HPVla) DNA (Cog- organization of the viral genome is different from that of gin and Zur Hausen, 1979). This virus is the causative agent other related papovaviruses (SV40, BKV, polyoma). Firstly, of human deep palmo-plantar warts, and the prototype for genetic information seems to be coded by one strand. Second- one of the nine different types of human papillomaviruses ly, no significant is found with SV40 or polyoma characterized so far (Orth et al., 1980; Kremsdorf et al., in coding sequence for either DNA or deduced preparation). We show that, in contrast to SV40 or polyoma, sequences. The relatedness of human and bovine papilloma- the open coding frames are located on only one strand. The viruses is revealed by a conserved coding sequence in the two region coding for the viral structural is identified and species. Two regions can be defined on the viral genome: the mapped relative to the bovine papillomavirus type l(BPVI) putative early region contains two large open reading frames genome. of 1446 and 966 nucleotides, together with several split ones, Results and corresponds to the transfonning part of the bovine papiflomavirus type 1 genome, and the remaining sequences, Determination and general features of the nucleotide which indude two open reading frames likely to struc- sequence of HPVia tural polypeptide(s). The DNA sequence is analysed and We have reported the cloning in of the en- putative signals for regulation of expression, and tire HPVla genome, and its characterization (Danos et al., homologies with the Alu family of human ubiquitous repeats 1980). This homogeneous source of viral DNA has been used and the SV40 72-bp repeat are outlined. to prepare subgenomic fragments which have been cloned in Key words: Alu family/dideoxy sequencing/M13 shotgun the replicative form of M13 sequencing vectors mWJ43 or cloning/ papillomaviruses evolution mp7 (Rothstein and Wu, 1981; Messing et al., 1981), and se- quenced according to the dideoxy chain termination pro- cedure (Sanger et al., 1977). Introduction The HPVla sequence is presented in Figure 1. It is 7811 nucleotides long, and begins with the T of the single BamHI Papillomaviruses infect many higher , including site (GGATCC) chosen as origin of the physical map (Favre man. They are very host- and tissue-specific and produce et al., 1977b). The size agrees well with the electron micro- benign tumors (warts) of the epithelium they colonize. These scopic measurements (O.Croissant, personal communication) tumors generally regress but may, in some cases (Shope and the estimated mol. wt. of 5 x 106 daltons (Crawford, papillomavirus, bovine alimentary tract papillomavirus, and 1969). The G + C content is 40.3%o resembling the 4007o found human papillomavirus type 5), undergo malignant trans- for the (Chargaff and Davidson, 1955). The formation (for review, see Orth et al., 1977, 1980; Zur G + C content does not appear to be homogeneously distri- Hausen, 1980). Moreover, the of these viruses appears buted on the viral genome, but is organized in clusters. to be closely linked to the state of differentiation of epithelial Dinucleotide frequency analysis reveals a rare occurrence of cells (for review, see Orth et al., 1977). Papillomaviruses and CG (20%): for SV40 and polyoma the values are even lower polyomaviruses (SV40, BKV, polyoma) have been classified (0.6% and 1.8%, respectively). morphologically (Wildy, 1971) as genera of the Papo- Figures 2a and c show the distribution of termination vaviridae since they share a common icosahedral capsid struc- codons on both strands. Only one strand has significantly ture in which a double-stranded superhelical DNA molecule is large open reading frames, likely to code for proteins, and found associated with (Fareed and Davoli, 1977; will be referred to as the sense strand. No open reading frame Favre et al., 1977a). This DNA is -8000 bp long in the larger than 342 nucleotides is found on the other, anti-sense, papillomaviruses compared with 5200- 5300 bp in the poly- strand. We have examined the genuineness of the coding omaviruses. Although a great deal of information is now capacities of the open reading frames on the sense strand, us- available about genome organization and molecular biology ing the procedure developed by Shepherd (1981). In this type of polyomaviruses (Soeda et al., 1980a), such data conceming of analysis, the relative distance in terms of number of point papillomaviruses are still scarce, due mainly to the lack of in from an ideal coding sequence, i.e., (Pu-X-Py)n, is vitro cell culture systems in which they can be reproducibly calculated for a group of n codons in the three reading propagated. frames. The diagram in Figure 2b shows the result of this One reason for a renewed interest in papillomaviruses, is analysis which indicates, in any particular region, the frame the observation that their genome is propagated as a multi- which is most closely related to the ideal coding sequence. These regions correlate very well with the coding regions out- *To whom reprint requests should be sent. lined in Figure 2a. IRL Press Limited, Oxford, England. 02614189/82/0102-0231$2.00/0 231 O.Danos, M.Katinka, and M.Yaniv

TCCACCCGTG GTCTGTMAA MGGCGGTGC CAATCAGCTT .aAWGTCTCA GGTACAGACT TAAAGCATCT ACTCAAGTTG 80 GGTTTTATTA ATAATTGCGC TMTTITATAT AATAAGTACT ATTAACACCG CACCCGTTGT GGCTAATCCC TTATGGTAIT 4000 ACTTTGACAG CATAAGCACC ACATGGCATr GGACAGATAG AAMAACACC GAGAGGATAG GTAGTGCTAG MTGTTAGTA 160 TAAAAGACTA CACCTACAGG ATGTATTGTC TTCATTGTTT ATGGTTTACC GCGCTCCAAA GACGGTTTGC CCAAAGACGG 4080 AAGTTATTG ATGAGGCTCA ACGAGAGAAG TTCTTGAGA GAGTTGCITT GCCCAGATCA GTGTCTGTGT ITTTGGGACA 240 7TTGCCAACC GCGGTTAGGA CTGTTTCAA TTTGCTGCCA AAMCTTATCTG GTCGTGCTCC AACGCCTfC CTGCCAACCA 4160 GTTAATGGG TCTAAAATT AATGGAA(TT GAITTTGCTT GGACGTGTGT ACATAGTCCC TGTATATATT CCCCTCCTAC 320 CCTAAAACGG TAGGTGTGTA CTCTTTCAA GAAITMCAA AGGACATTTC TCCCGCCAAA TTATI.CGA GCCACCGMT 4240 CCCCACATAC CTTGAAGCCTT (CMCATTGT AACAATGTA TCGCCTACGT AGAAAACGCG CTGCCCCCAA AGATATATAC 400 TCGGTCGTAA AMTCTAAAG TGATGATTGT TGTTGTTAC TACCATCATT CATTATTCTA CTTACAACAA GAACCTAGGA 4320 CCCTCATGCA AAATATCAAA CACCTGCCCA CCTGACATTC AAAATAAAAT TGAGCATACA ACAAMTGCTG ATAAATATT 480 GTTATATGCC AGAAGTMGC CTATAAAATA CACAGGTAAG ACTCTGCACA GCACCACATG GCGACACCGA TCCGGACCGT 4400 GCAATATGGC AGTCTGGGAG TTTTTGGG AGGITTGGGC ATTGGMACAG CCAGAGGCCT GGAGGAAGAA TTGGTATAC 560 CAGACAGCMT TCCGAAAGCC TCTGTATCCC ATATATTGAT GMTAITGC CITGTAITfr ITGTAATrAT TTl(TCTA 4480 TCCCCTCGGT GAGGTrGGTG GGGTTAGAGT TGCTACTCGT CCAACTCCAG TAAGGCCTAC AATACCTGTG GAMCAGTAG 640 ATGCTGACAA GCTGCTTTT GATCATTTG ATTTGCATCT TGTCTGGACA GACAATTTGG T=CTGATG CTGTCAAGGG 4560 GCCCCAGTGA AAMTTCCCC ATAGATGTTG TAGATCCTAC AGGCCCTGCT GTAlTCCCC CTACAACGAT TrAGGrAGAG 720 TGTGCTAGM CTGTTAGCCT ATPGGAGITI GTTTTATAT ATCAGGAGTC TTATGAGGTA CCGCAAATAG AAGAITA T 4640 ACTTCCCMT ACCAACTGTG CAGGTTATTG CAGAMTTCA CCCTATTCT GACATACCAA ACATTGTGC ATCTTCAACA 800 GGACAGACCT TTATTGCAAA TTGAACTCCG TTGTGTTACA TGCATAAAAA AACTGAGTGT TGCTGAAMA TTGGAGCTTG 4720 AATGMGGAG AATCTGCCAT ATrAGATCTG TTACAGGCAA TGCAACCATA CGCACTGTTT CAAGAACACA ATACAATAAC 880 TGTCAAACGG AGAMGAGTG CATAGAGTTA GAMCAGACT TAAAGCAACG T5TAGTGT GTCGCTfCTA TGCTATATAA 4800 CCCTCTrTCA CTGTTGCATC TACATCTAAT ATMGTGCTG GAGAAGCATC MACATCAGAT AFrGTTATTrG TITAGCMTGG 960 CAATGGTGGG CGAMATGCCA CCACTAAAGG ACCTGGTTCT TCAAMCGM CCAACCTFCC TAGATTTAGA TCTTTATTGT 4880 TTCAGGTGAC AGGGTGGTGG GCCAGGATAT CCCCTTGGTA GMTTAMCT TAGGCCMTGA AACAGACACA TCT=TrG 1040 TACGAGGACG TGCCTCCTCA TGACATACAC GAGGAT7ACG T5TCCCTfCA rG CCMTAT CCTITC'Trr MCCTf-2r,C 4960 TACMGAAAC AGCATITTCC AGCAGCACAC CAATTGCTGA AAGACCCTCT TTAGGCCCT CMAGATCTA TAATAGGCGT 1120 CTATTGCGA5 AAACTGGTC GATGvACCGf CCTCGCGGAT CACAGCGCCA TTAGACAGCT GAGAAMACTC CTTCTGCGAT 5040 CrATATCAAC AGGTGCMGT ACAAGACCCT AGGlTCGTTG AGCAGCCACA GTCMTGGTC ACITTTGATA ATCCACCATT 1200 CTTGAACAT CGTGTGCCCA CTGTGCACCC TACAGCGACA GTAAAATGGC ACATAATAAA GGTACTfAAA CCATT('T S120 TGAGCCAGAG CMTGATGAGG TGTCTATTAT CTTCCMACA GACTTAGATC CTC-TGCTCA GACACCAGTG CCTGAAMTA 1280 TCfGTGACG CGACAGATTG TGACAGACGCTACAGA,A CCTCAMTGG TGACCTACAT AATGCITff ( TfAGCGA 5200 GAGATGTA(;T TrATCTGAGC AAGCCCACAT TTTCGCGGGA ACCAGGGGGA CGGTTAAGGG TTAGCCGCCT TGGCAAAACT 1360 CTMATCTGAT TTATTACACG AGCCGCCGCA AAGCCAGGGG AITTCCCTGG ATTGTTCCA CAACASGAA TCGCTGGAA 5280 TCAACTATTC GTACACGCCT GGGCACAGCA ATTGGCGCCA GAACCCACMT 1TTCTATGAT TTAAGTTCTA TTGCTCCAGA 1440 GCGAACAGGA ACTTMTGCT TTAAAACGM ACTTACTTTA CACTCCTCAG GCCAGAAGCG CGGACCGAkC ACACAITGCT 5360 AGACTCMTfT CGAATATGC CTTTAGGTGA GCATAGTCAA ACAACAGTCA TTAGTTCCM MITAGGTGAC ACAGCAiTTA 1520 AGCAITAGTC CrAGATTAGA AACTAITCT ATTACAAAGC AAGACAAAM C,GTATCGA AGGCACFGT ITCTCAGCA 5440 TACAAGGTGA GACAGCAGAG GATGACTrAG AAGTTATCTC 3TTAGAAACA CCACAAMTAT ATTCAGAAGA AGAGCCTTTA 1600 TGATAGTGGT TTAGAGCTAT CGCTGCITCA GGATCAAC7 GAAAATATTG AT(CCMTCGA CACAGGTAGA TCAACAGCAG 5520 GACACAAACG AAAGTGTGGG CGAMATATG CAACTTACTA TTACTMCTC ACACC2GTGAG (TITCTATAC TACATTTAAC 1680 AAGAACIATA CTGGGGAMT TGGGGCCGCf GGGCTGAACA TTTGAAAGC TAGTATATC CGCCCCGCAT TAT7AAGCAG 5600 ACAAAGCAGA GTCAGCCCAC CTITOGGCAC TGAACATACT AGCTTGCATG TATATrACCC AAATTCTTCT AAA(GGACTC 1760 ATTTMAMGAT ACCGCTGGCG TCAG1TTTAC AGACCTGACG CGGTCGTACA AGAGCAACAA AACCTGTTGT GGAG,ATTGGG 5680 CAAT.AT AA TCCT5MAACA TCA1TTACAC C=TGCT[AT TATAGCTCTT AACAACTCCM CACGGGA3Tf TGAGTTACAT 1840 TITrGGCAGT TTCoGGTGTC CGTGAAAATT TAMTTGACAG TGTAAAAGAA ITATTGCAAM CCCATTGTGT (TATATTCAA 5760 CCTA(C.TCA GAAAGCGTCG TAAM(CACfT TATGTATAAT (TITI1TCAGA T(E'CT=CfCMn'ACCAGCG CGAATAA(SC 1920 TTCGAACATG CAGTMACTGA AMAMAMTACA TTTTAITTT TATTGCTACG ATTTMAGCC CAGMAAAAaA GACACACTGT 5840 TCTA.TCITCC TCCCCAGCCC ATCACTACAA TCCTGTCCAC TGAT'AATAT GTAACCAGAA CCATCTCTM CTACCATGCA 2000 GATAAAAMCT ATAACCACAA TCTMCCAGT TGATGCTAGC TATAITTGT CTGAGCCTCC AAAATCAAGA AGCGTTGGCTG 5920 ACATCTGMC GCACTGCT GGTCGCACAT CC1TTG1TIG AGATCTCCAG TMTC"A(-T GTAACTATAC CAAAAGTCTC 2080 Cf('CATTATT TTGGTATAAA AGATCTATGT TTCAACTGTfTTTACATGG GGTACAAMCT TGCACTGGAT TCkACAGCAA 6000 ACCMATGCA TTTAGAGCTT TrAGGGTGCG TTTCCrGAT CCAAATAGAT TTCATT3TG(; CGATAACfGA ATITTMTC 2160 ACCCTIATTA ATCATCAITT AGATTCCGAA AGTCCCITTG AGCTTAS AATCITTCACG TGGC-CCTA'G ATAMTCGACA 6080 CAGAAACAGA AAGATTA IT TGCGGCCTM CAGGGATAGA GATAGGTAGA GG,CCCGAC(CTC TACCTATAGC MTMCCGG(: 2240 TACACMCAG TGTMAAATCG CATATIATrA TGC1CITITA GCAGATGAGG ATCGAATGC AAGGGCAITT (CFAAGCTCTA 6160 CACCCTCTTT TAAATAAGTI ACATGATGCA CAMATCCAA CAMTATAT TAATA(CTCAT G(:AMTGGA(; ATT(CAA(CACA 2320 ATTCACAGCA AAtATATCTC AAAGACTGTG CACAAARTC AAGACACTAT TTACGTGCTC ACATGGCACA AMTlI'CIATC, 6240 AAATACTGCT TTTGATGCAA AMCAGACACA MTG1TCCTC GTC(G(TGTA ('T('('TCTTC( AGUGmcIAC rcAACAACGTC 2400 T(CACGA(7CGA IITTTACAM ACTAGATAAT GTAGAAGIT CTGGTMATTG G4AAGAAAIT GTC GAITTT TMGAITTCA 6320 GTCGT(TGCCC ACGC.AACAA GTGAAACTTG GGGA(7GC(C CGC;TMCAA ATATACA1T CTCTCATACA ACAT(;GT;CAC 2480 AGMASTGAA TATATMGCT TrATGATTGC ATMAAAGAT TCT(=ATCIGCGTAACCAAA GAAAAACTGF TTTTMATAT 6400 ATGATGGATA TTGGTIITGG GGCTATGGAT TIcC7G(CGT TACAGCAAGA CMGTCT(;AT GTCCCTTA(; ATGITCTG A 2560 TTC('ACCTCC AAATACAGGA AAATCAATGT TTTGTACAAG ITATAAACM TTGTTAGGAG GCC .TfCAT TFCATA('TGT 6480 AGCAACATGC AMTATCCTG ATTATATCAG AATGMACCAT GAAGCCTATG C(AACTCTAT (7MTTTIM CA(ACCT(:(;(; 2640 MCAGTAAAA GTCAGT1TACG TTCiCAGCCT CTCGATGCTA AGATACCGCr AITAGATCAT CCAACAAAGC CATGITCGGGA 6560 AGCAMTGTA TACCAGGCAC 7TITTTA('C GC(GGTCTFC GCkT1GGGTAT MCXAGGCCAC, TCCCACAA(; C(TGTATTTA 2720 ITATATCCAC AITTATATCA CAMT('ATtI' GCATCGTA(: ACTAMTTGTA ITGATTMAA ACATAGACCT CCTCAACAAA 6640 ACAGCAGAT(' CTGMCCAAG AACA(TTA GCMCAACM ATTATGTAG; CACACCAAGT GGCTCTATGC TI'FCATTF('A 2800 TTAATCCCC ACC=TTACMT ATTACTAGTA ATAITGATGT TAAATCAG.AT ACCTGTTCGA TITATfTCCj\ 1'TA(TAGAATA 6720 TGTCCMTTG TTAATAGAT (TfACTCGGT TCACC(CAT(f CAAGC'CCAGA ATMTCGCAT TTC(C(GGAGA MCCAGTTAT 2880 ICAGC=TITA AMTTGCTCA TGACTTTCCA ITfAAAGACA ATGCTGATCC AGCAITTI(C TTAACG\ACC AAAAITCCA 6800 TTATTACAGT TGGAGATAAT ACCAGAGCAA CAACMTATC TATCA(7TATG AAAAACMTG CMA(TACTAC ATATTCCAAT 2960 ATCTTTCTTT GAAAGCGTT G(ACACAGTT AGAATTAMG GACCAAGAAG ACGACGCAA CGATGGAAAA CCTANGCAGT 6880 GCTAATTTA ATGATI AICMGACATACA CAMGAMTTTG ATCTTTCTTT TATA;TTAGCTIG=TAAAG TMACATMC 3040 CC1TTAGACT lACT('CMCA GCAGCTMT(. ACCTATATC AACAGGACAG TAMATCGATA GAAGATCAAA T'MAA(C\GTC 6960 TCCCGAAAAT CTAGCCTACA TTCATACMT GGACCCTAAT A3TTTAGAGG AITTGGCMCT ATCTCTATC(T CMCCACCTA 3120 GAATFCTAAT ACACAAGAAC AACMTCT( T CCAITTCGCC ACAMMAATG GCI.MTGAG AMTTGGATCG CAGGCAGITC 7040 CCAATCCTCT AGAAGATCAA TATAGGTTTT TAGCGGC1TC MTCACAGCA AAAT(;TCCAG AACAGGCGCC TCCTCA(ACCC 3200 CATCTTTAGC GTCCTCACAG GAAAGCACM AGACACCTAT TGAATGGTC TTACAITTAG AGTCTAAA (a.A(Cf(:AC(T 7120 CAGACTGATC CTTATAGTCA ATATAAATTC TGGGAAGTCG ATCTCACAGA MGGATGTCC GAACAATIAG ACCATI1CC 3280 TATGCCACAG AGGATTGGTC ATCAAGAC ACTAGCAGAG AGCICTT GGCACCCCCA GCTC-CACCT TCMAIA('AG 7200 ACTAGGAAGG AMTTTCTAT ATCAAAGTGG CATGACACAA CGTACTGCTA CTAGTTCCAC CACAAAGCGC AAAACAGTGC( 3360 TGGCAGCACA CITGAGITTA CCTATGACAA TAACCCTfAT ASTCAGACAA GGCACACAAT TTGGAATCAT CT(IAITATC 7280 GTGTATCTAC GT(AGCCAAG CGCAGGCGTA A(GCTTAGTA TATATTATAT ATAACTATAT TTATTAGTAC ATITATTTAIT 3440 AAATGGG(A CGATGTATGG AGAAAAGTAT CCACFTCCFT T(ATGCTGTA CCAGCGTACT ATTTAGAACA CGAT(;CCTAT 7360 ATATATITT ATATTTTAT ACTTTTTATA ('TTGTTAGT TCTAMTAGA CATGTMGAT TTACATTAGT ATAAGTAGGC 3520 AAAAATTATT ATGTGTTATT TGCTGAGCAG GCCACTCT ACAGCACAAC AGGACAATAT GCTGTMAAIT ACACCGCTAA 7440 ATGTATTTAC ATAAAATAGT CTTGGAAACC ATTTATGT GAACCATCAT TTACAMTAGT GACATCATAG TTCATCTGCA 3600 AAGCITACA AATGTTATGT CTTCCACTAG CTCCCCMAGG GCTGCTCGGGC CTCCTGCAGT ACACTCCCAC TACCCAACCC 7520 ATTGCTATTC CATCGTTCCT CACATATTCT ACAGTAGTGT TCTCTAGATT GTATTGCTAT ITTCCTGTTA GCAACAAC 3680 TATCCGAGAG TGACACCGCC CAGCAATCGA CGTCCATCGA CTACACCGAA CTCCAGGACA GGGGGAGACC TCGCAG(T('(' 7600 AACATCTITA CATGGACCM ACAACCCACT TTCATTTTAT TGTGCTGCAT ATATTCCAGA TTGTTGAGGA ITTATTTGTT 3760 CACAAAGACA GCACAAMCA CCTGTACGCA GACCGC-ITA CGGACGGCGA AGATCCAGAA GTCCCAGAGG TCGACG(ACGA 7680 TAGACTCCGG TGCATTATAC ACAAGTGTGC ATITTTTGTG TTCTCTGATT GAITGTGTGT TATITTCCTG CAATATGCM 3840 AGAGAAGGAG ASTCAACGCC CTCTAGGACA CCCGGATCTC TCCCTTCTGC GCGAGACGTT CAMSCTATAC ACAAACCCC 7760 TAAAAGTGAG CTGTCCTTTC TTrrTTGAA TCCCTCCCTA CICCAATAAA AAATCCCTAC CCCTAAAATC TGTTGTGCT 3920 TCAAMGGGA CATTCTTCAM GACTTAGACG ACTTCTGCAG CAAGCITGGG A ?811 Fig. 1. Nucleotide sequence of HPVla DNA. The sequence is 7811 nucleotides long and only the sense strand is shown here, 5' to 3'. It begins with the T of the single BamHI site (GGATCC) chosen as origin of the physical map. Two different M13 shotgun cloning strategies were used. First, fragments generated by frequently-cutting restriction enzymes such as Sau3A, AluI or HaeIII, were randomly cloned and sequenced. This approach turned out to be biased because of the highly favored integration of shorter restriction fragments in the M13 vectors. The other method was to clone in M13 mp7 random fragments obtained after sonication, size fractionation, and end-repair of viral DNA (Deininger, in preparation). This allowed us to obtain a set of recom- binants covering most of the genome (85%o). Gaps remaining in the sequence were filled either using purified internal primers or by end labelling of specific restriction fragments and chemical degradation. Random sequences were introduced into a computer and ordered using the DBUTIL program (Staden, 1980). Over 70°0o of the sequence has been determined on both strands, 20°lo was sequenced several times in independent clones and only 10°7 of the sequence comes from single clones each of which have been systematically cross-checked. We are confident that any remaining errors cannot alter the deduced genomic organization presented here. An ambiguity about the number of T residues at position 7096 remained. Two separate dideoxy-sequenced M13 clones showed once 3 and once 4 thymines whereas 3 adenines where revealed by chemical degradation on the other strand. of this region by Cald et at. (1982) also presented three adenines at this position. The three T version was adopted throughout this paper. Recent studies by DNA cross-hybridization have shown permits the alignment of the physical and functional maps of partial homology between HPVla and BPVl genomes (Law both viruses (Figure 2). These observations are further con- et al., 1979). Heteroduplex analysis (Croissant et al., 1982) firmed by comparison of the HPVla sequence with the short 232 Hwun papifomavirn DNA sequene

BaTHI Hindlil EcoRI BPV A

10IOO 30PO 4 sco 7 7812 3 2 a1L | L -11 11 111 4 Ta 1111I lie~I In IL2~~~~~~~~~~~~~~_XllIsIZII I II I I lt- El_ I I EooRI EcoRI

3 2 b

3 I 0 2 I I I I1 111 1 111 1 11111111 11111111111111 111 N I| 1111||11 11 111 111 c

Fig. 2. Distribution of the termination codon of HPVla genome. Diagrams (a) and (c) show the distribution of termination codons on sense and anti-sense strands respectively. From left to right, the sequence is read 5' to 3'. Groups of 10 codons are observed in each of the three reading frames, and a vertical bar is drawn when at least one termination codon occurs among them. In (a), dotted vertical bars indicate the first ATG(s) present in the corresponding open coding frame. The two EcoRl sites at position 4237 and 5240 are shown to facilitate the orientation of the diagram relative to the physical map. Diagram (b) shows, for successive groups of 210 nucleotides, the frame which is the most likely to be coding, as a result of the analysis of the Pu/Py arrangement of the sequence, on the sense strand (see text). On the top of the figure is a schematic representation of the BVP1 genome, oriented relative to HPVla. The BamHI- HindIII 69/o transforming fragment is depicted as a solid bar. sequence of BPVl reported below. This analogy map permits amide gels reveals a minor 76 000-dalton species (Favre et al., us to define, on the HPVla genome, a potential early region 1977a). Such a mol. wt. would fit a protein encoded by the that corresponds to the transforming segment of BPVI 69% two open reading frames only if they were joined, for exam- of genome length between the BamHI and HindIII sites ple, by splicing of a large transcript. Several putative signals (Lowy et al., 1980). The remaining 30%o not required for for such a splice can be found in the sequence and an example transformation probably codes for the virion structural pro- is given in Table I. teins. Eucaryotic signals for transcriptional control are found Putative late region bordering the late region (see Table I): (1) a TATAATA box-like sequence (Goldberg, 1979) 65 in the late region have Hogness-Goldberg The two coding segments found nucleotides upstream of the L2 gene ATG, together with a can code for polypeptides of 508 been called LI and L2, and CCAAT sequence (Efstratiadis et al., 1980) 38 nucleotides and 234 amino acids respectively. There is a five-nucleotide before; and (2) two polyadenylation signals AATAAA two frames. It is likely that the LI overlap between the coding (Proudfoot and Brownlee, 1976) located in a non-coding region codes for the VPI major protein detected on SDS- region, 440 and 486 nucleotides after the termination codon after disruption of the virion (Favre et al., acrylamide gels of the LI gene. Note that the LI gene ends at the very brink 1977a). This protein has an estimated mol. wt. of 54 000. The of an A +T cluster (8807o A or T). calculated ratio Glu + Gln + Asp + Asn/Arg + Lys is 0.54. Similar values (0.59, 0.46, and 0.54) are found for VPI Putative early region protein of BPVI, SV40, and polyoma, respectively (Meinke Two large open reading frames with a 19-codon overlap, and Meinke, 1981). The polypeptides encoded by LI and L2 extending between nucleotides 5473 and 15 are found in the would also have a similar, highly basic, C-terminal region early region: we refer to them as the El and E2 coding region with the sequences Lys-Arg-Arg-Arg-Lys (see Figure 4) and (see Table I). The potential polypeptides are 482 and 322 Arg-Lys-Arg-Arg-Lys-Arg, respectively. Interestingly, a basic amino acids long respectively, starting from the first ATG. stretch at the C-terminal region is also found in SV40 and This organization is in agreement with sequence analysis of polyoma VP2/VP3 proteins. The sequence data we obtained the same region by Cald et al. (1982). Several other open for a short fragment of BPVI around the HindIII site (the reading frames of coding capacities ranging from 63 to 177 border of the transforming region) cover a region coding for amino acids are found, together with suitable splicing signals, the C-terminal part of a polypeptide similar to that encoded but the number of their possible combinations is too high to by LI. Homologies at both DNA and amino-acid levels bet- allow us to present any valid hypothesis about the nature of ween BPVI and HPVla in this region, are depicted in Figure early proteins. The El coding region is in an area of maxi- 3. Here again, one notes the highly basic nature of the BPVI mum homology between the genomes of papillomaviruses -terminal. At 30 residues from the C-terminals a (Croissant et al., 1982; Heilman et al., 1980). Its product does sequence of eight amino acids is conserved in the two pro- not share any homology with the proteins encoded by the ear- teins. Such a common peptide would be sufficient to account ly region of SV40 or polyoma. for the observed immunological cross-reactivity of the two We believe that the early transcript(s) begin(s) in the middle the 2a), between proteins (Jenson et at., 1980). The overall DNA homology in non-coding region of genome (see Figure the region shown in Figure 3 is 47.60o with a 72.3%o - nucleotides 4050 and 4800. There is a set of sequences that imum in the conserved peptide region. may play a role in the control of early . Starting Analysis of the HPVla virion proteins by denaturing acryl- from nucleotide 4050, one finds successively (see Table I): (1) 233 ODnos, M.Kadnka, and M.Yaniv

Thr Glu IArg IMet Ser Glu Cln Leu Asp Gln Phe Pro Leu Gly Arg Lys HPV1a A C A C A A A G G A T G T C C G A A C A A T T A G A C C A A T T T C C A C T A G G A A G G A A A * - - 0 - * * - - p * BPV1 A AAA G'A A A G C T T TTC TT T G G A C T T A G A T C A A T T T C C C T T G G G A A G A A GrA Lys Glu Lys Leu Ser Leu Asp Leu Asp Gln Phe Pro Leu Gly Arg Arg

Phe Leu Tyr Gln Ser Giy Met Thr Gln Arg Thr Ala Thr Ser Ser Thr HPV1a T T T C T A T A T C A A A G T G G C A T G A C A C A A C G T A C T G C T A C T A G T T C C A C C

BPV1 T;T T TT A G C A CA G C A AG G G G C A G G A T G T T C A A C T G T G A G A A A AC G A A G A Phe Leu Ala Gin Gln Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg

Thr Lys Arg Lys Thr Val Arg Val Ser Thr Ser Ala Lys Arg Arg Arg Lys HPV1a A C A A A G C G C A A A A C A G T G C G T G T A T C T A C G T C A G C C A A G C G C A G G C G T A A G * * 0 0 * . BPV1 A T T A G C C A A A A A A C T TT C A G T A A G C C T G C A A A A A A A A A A A A A A A A A Ile Ser Gln Lys Thr Ser Ser Lys IPro Ala Lys Lys Lys Lys LysI

Ala HPVla G C TIT A GIT A T A T A T T A T A T A T A A C T A T A T T T A T T A

BPV1 A G C T A A G T T T C T A;T A A A T G T T C T G T A A * AAA Fig. 3. Homologies between the 3'-end region of the LI coding frame and the corresponding region of BPV1. A M13 recombinant carrying a BPV1 BglII fragment starting - 10 nucleotides before the HindIII site (i.e., in the non-transforming region, Figure 2) and including the beginning of the transfonming region, has been sequenced. Homologies at both nucleotides and amino acids levels with the 3' end of the coding frame (position 3246- 3476) are shown here. Dots are placed where two nucleotides are identical, and the conserved amino acids or peptides are boxed. Gaps are introduced to maximize homologies. Basic amino acids (Lys or Arg) occurring at the same position in the two polypeptides are underlined. The HindlII site (AAGCTT) on the BPV1 sequence is at position 7 and codes for the dipeptide Lys-Leu.

Table I. Landmarks on HPVla genome

Nucleotide no. Proposed role or significance Nucleotide no. Proposed role or significance 16 TAA stop codon for E2 gene 4354 AG/GTAAG splice donor 76-113 Inverted repeat IR2a 5430 PyXPyPyPyXCAG/splice acceptor 443a Polyadenylation signal for early transcription 5861 PyXPyPyPyXCAG/splice acceptor 1070 CCAAT ) Putative late 7052 PyXPyPyPyXCAG/splice acceptor 1109 TATAATAG ) promotor 7174 PyXPyPyPyXCAG/splice acceptor 1175 First ATG of L2 gene 7492 PyXPyPyPyXCAG/splice acceptor 1524 AAG/GTGA ) Possible late transcript 7793 PyXPyPyPyXCAG/splice acceptor 1837 ACATCCTAG/ ) splicing signals 5473 First ATG of El gene 1872 First ATG of LI gene 6863 First ATG of E2 gene 1877 TAA stop codon for L2 gene 6919 TGA stop codon for El gene 2122-2207 Inverted repeat IRlb 6953-6982 (nucleotide 30- 1) 3252 Relative location of the BPVI HindII site 7017- 7055 (nucleotide 8-47) deduced from Figure 4 7056- 7084 (nucleotide 48- 15) Homologies with SV40 3396 TAG stop codon for Ll gene 7481-7518 (nucleotide 40- 72) 72-nucleotide repeatb 3399- 3499 A + T-rich cluster 7795- 19 (nucleotide 42- 72) 3839 or 3885 Polyadenylation signal for late transcription + 1-14) 3849- 3918 Inverted repeat IRla 7307-7340 Inverted repeat IR2b 4056-4081 15 nucleotide direct repeat 7561- 7590 and Homology with the Alu consensus sequence 4232 8 nucleotide dyad symmetry around EcoRI site 7630- 7702 (see Figure 5) 4342 TATAAAAT Hogness box-like sequence (putative early promotor) aThe numbers of the first nucleotide of the sequence concerned are given. Please refer to Figure 2 for the positions of these landmarks on the physical map. b'The numbers give the positions in the 72-nucleotide repeat. Increasing numbers indicate homology to the SV40 late strand decreasing numbers to the SV40 early strand. a perfect direct repeat of 15 nucleotides (CCAAAGACGG- tion signal is located at nucleotide 443, in the region where TTTGC); (2) an eight-nucleotide dyad symmetry centered on transcription termination is likely to occur. the EcoRI site; and (3) a Hogness-Goldberg box-like sequence Other regions of interest (TATAAAATA) at position 4342. The coordinates of possi- ble Large inverted repeats occurring on HPVla DNA have splicing signals, i.e., AG/GTPuAG for donors and PyX- been previously reported and mapped relative to the BamHI PyPyPyXCAG for acceptors (Sharp, 1981) are given in and Table I. These data alone are not sufficient to predict a EcoRI sites (Gissmann and Zur Hausen, 1977). We have mechanism for splicing of early transcript(s). A polyadenyla- located these sequences and their coordinates are given in 234 Huns papiDlomavirus DNA sequence

7561 reading frames beginning with ATG are bona fide , or HPV1a CT ACACC GAACTCC AGGACAGGGGGAGACCTCGCAGGTCCGACAAAG at least large exons: (1) their locations can also be deduced Alu CTCACACCTGTAATCCCAGCAC'IMGGGAGGCC from the Py/Pu arrangement of the sequence (Shepherd, -154 -122 1981), as shown on Figure 2b; (2) they correspond to regions of maximum homology on genomes of papillomaviruses HPV1 a ACAGCAGAAAACACCTGTACGCAGACGGCCTTACGGACGGCGAAGATCCA (Croissant et al., 1982); (3) the LI coding region and the cor- Alu (+8) GGA GGC TGA GGCAGGAGAATCGCTT responding area of the BPV1 genome encode related proteins SV40 ori AGA GGC CGA GGCGG (see Figure 3); and (4) the codon usage in these open reading Polyoma ori ACA GGC CGG GGCCC frames is characteristic of Papovaviridae, i.e., avoidance of BKV ori GGA GGC AGA GGCGG the 'CG codons", favouring of A instead of G in the wobble Hepatitis B virus GGA GGC ACk GGCGG position (Grantham et al., 1981), and a preference for NUU over NUC. 7702 Apart from a similar codon usage, the genome of HPVla HPV1 a GAAGTCCCA GAGGTGGAGGACGAAGAGA AGGAGAATCAACGCC CT *..-.. . - .T*...... H (and most likely of all papillomaviruses) clearly appears dif- Alu GAA CCCAGGAGGTGGAGGTTGCAGTGAGCCGAGA TC GCGCCACT ferent from that of polyomaviruses (SV40, polyoma, or +75 BKV). No homology is found for either DNA or protein se- Fig. 4. Homology with the human ubiquitous repeat Alu consensus se- quences, which is consistent with the observed lack of inter- quences. The HPVla sequence between nucleotides 7561 and 7702 is shown genera DNA/DNA hybridization under non-stringent condi- together with parts of the Alu consensus sequence. The numbering system tions among papovaviruses (Lowy et al., 1980). The diversity of Alu sequence is according to Deininger et al. (1981). Gaps or looping of genomes of papillomaviruses, which contrasts with the out have been introduced to maximize . Homologies papova- between Alu consensus sequence and polyomavirus origins of replication high sequence conservation between the human and hepatitis B virus are taken from Jelinek et al. (1980) (dots above each viruses BKV and SV40, could be explained by their propensi- sequence indicate the homology with the human repeat). ty to accumulate neutral mutations at a high rate, as a result of the latent nature of the infection. The partial BPV1 DNA Table I. These inverted repeats are degenerate with - 3007o sequence presented in this paper (see Figure 4) shows that a mismatches. We have searched for sequence homology with similar protein structure is coded by DNA sequences with the 50 nucleotides around the origin of replication of poly- <75W0o homology. The genetic diversity may also arise from omaviruses, and found no significant match. In contrast, an some recombination process with the host genomic DNA, as homology with the consensus sequence of the human Alu suggested by the presence in HPVla DNA of sequences family of ubiquitous repeats (Deininger et al., 1981) has been related to the human Alu family of ubiquitous repeats. found (Figure 4). This is consistent with our observation that The genetic information of HPVla is probably carried on a a fragment of HPVla DNA is able to hybridize with a nick- single strand. Such a situation has been described for human translated total human DNA probe (O.Danos, unpublished hepatitis B virus (Galibert et al., 1979) and cauliflower mosaic result). Interestingly, the region of maximum homology virus (Franck et al., 1980). It is radically different from that (nucleotides 36-49 on the Alu consensus sequence) is adja- observed for polyomaviruses, and thus suggests other cent to the sequence shown to be homologous to the origin of strategies for gene expression, such as, early/late discrimina- replication of polyomaviruses (nucleotides 8-21) (Jelinek et tion or even replication if, as suggested by Seidmann et al. al., 1980). Figure 4 also shows that two neighbouring HPVla (1979), the parental histones associate preferentially with the sequences (nucleotides 7561 - 7590 and 7630-7702), are each coding strand of DNA during replication. Circular free viral homologous to a different unit of the Alu dimer. In addition genomes of papillomaviruses are found in a concatenated to this homology with the Alu ubiquitous repeat, the 550 state in the host's infected nucleus (Law et al., 1981), but nucleotides between inverted repeats IR2a and b have a G + C nothing is known about the processes that lead to this struc- content of 510o, a value closer to the 52.60Wo of the Alu con- ture. No sequence homology with the origins of replication of sensus sequence than to the 40.30o of the whole genome. In- polyomaviruses has been found. The location of the origin of terspersed human repeated sequences are known to be G + C replication on the HPVla genome remains unknown, rich (Rinehart et al., 1977). Observations of a broader region, although the region between nucleotides 6950 and 120, which starting at nucleotide 6950, i.e., in the E2 coding region, contains several homologies with the human Alu family of reveals some more puzzling features. There are five occur- ubiquitous repeats and the SV40 72-bp repeat, is an attractive rences of a weak (-770o) but significant homology with candidate. parts of the SV40 72-bp repeat, known as the SV40 gene Polyadenylation signals and putative promotor regions enhancer (Banerji et al., 1981), three on the sense strand, and described here, would fit a transcription pattern similar to the two on the anti-sense strand (Table I). Three degenerated one observed on the BPV1 genome (E.Amtmann and direct repeats are found in this region as is the HpaII site, at G.Sauer personal communication). It would now be of in- position 7712, which is methylated in 4001o of the viral terest to study the expression of the HPVla genome or of population (Danos et al., 1980). selected fragments of it in natural and experimental systems. Such data are indispensable complements to the knowledge of Discussion the DNA primary structure. The understanding of the struc- will HPVla is the most commonly described and studied ture of the viral genome and its potential expression allow (Zur Hausen, 1980). The knowledge us to examine its interaction with the host. The question of human papillomavirus on the differentiation process of its complete DNA sequence will be useful for further how virus expression depends There is as yet no trans- of epithelial cells remains to be answered. studies of papillomavirus biology. revealed between the genome cription data on the HPVla genome that could allow us to af- The differences here, Neverthe- organization of HPVla and polyomaviruses, raise the ques- firm that the genes we have described are genuine. If from less, several facts lead us to think that these large open tion of their evolutionary relationship. they emerged 235 O.Duos, M.Katinka, and M.Yaniv a common ancestor, the time when they diverged is certainly N.J. (1980) Cell, 21, 653-668. prior to the 80 million years estimated for the separation bet- Fareed,G.E., and Davoli,D. (1977) Annu. Rev. Biochem., 46, 471-522. Favre,M., Breitburd,F., Croissant,O., and Orth,G. (1977a) J. Virol., 21, ween polyoma and SV40 or BKV (Soeda etal., 1980b). From 1205-1209. a putative common ancestor, the two genera have conserved Favre,M., Orth,G., Croissant,O., and Yaniv,M. (1977b) J. Virol., 21, only an identical strategy for propagating their genetic infor- 1210-1214. mation, i.e., on a double-stranded circular DNA enclosed in a Franck,A., Guilley,H., Jonard,G., Richards,K., and Hirth,L. (1980) Cell, protein capsid and by colonization of cell nuclei of higher 21, 285-294. Galibert,F., Mandart,E., Fitoussi,F., Tiollais,P., and Charnay,P. (1979) vertebrates. Nature, 281, 646-650. Gissmann,L., and Zur Hausen,H. (1977) Virology, 83, 271-276. Materials and methods Goldberg,M. (1979) PhD. Thesis, Stanford University, Stanford, CA. Grantham,R., Gautier,C., Gouy,M., Jacobzone,M., and Mercier,R. (1981) Molecularcloning in M13 vectors Nucleic Acids Res., 9, r43-r74. Biohazards associated with the experiments described here have been Heilnan,C.A., Law,M.F., Israel,M.A., and Howley,P.M. (1980) J. Virol., previously examined by the French National Control Committee. 36, 395407. DNA from recombinant plasmids carrying the whole viral genome and Jelinek,W.R., Toomey,T.P., Leinwand,L., Duncan,C.H., Biro,P.A., replicative forms of M13 vectors were prepared as described prevously (Danos Prabhakara,V.C., Weissman,S.M., Rubin,C.M., Hanck,C.M., Deiinger, et al., 1980). P.L., and Schmid,C.W. (1980) Proc. Natl. Acad. Sci. USA, 77, Viral restriction fragments were prepared and inserted in the replicative 1398-1402. form of M13 mWJ43 or mp7 (a generous gift of G.Winter), the host bacteria Jenson,A.B., Rosenthal,J.R., Olson,C., Pass,F., Lancaster,W.D., and Shah, JM 101 [A(lac, pro,supE thi, F' ( D36 pro AB lacIq ZAM15)] transfected, K. (1980) J. Natl. Cancer Inst., 64, 495-500. and single-stranded DNA from recombinant phages prepared following the Lancaster,W.D. (1981) Virology, 108, 251-255. procedures described by Sanger et al. (1980). Law,M.F., Lancaster,W.D., and Howley,P.M. (1979) J. Virol., 32, 199-207. Shotgun cloning of sheared DNA was performed according to Deininger Law,M.F., Lowry,D.R., Dvoretzky,I., and Howley,P. (1981) Proc. Natl. (in preparation): sonicated viral DNA was end-repaired and labeled with T4 Acad. Sci. USA, 78, 2727-2731. DNA polymerase and [a-32P]dATP, size-fractionated on a 1.5% agarose gel. Lowy,D.R., Dvoretzky,I., Shober,R., Law,M.F., Engel,L., and Howley, Fragments ranging from 300 to 1000 nucleotides were eluted and cloned in the P.M. (1980) Nature, 287, 72-74. Hindl site of M13 mp7. The vector was treated with 1 unit of calf intestine Maxam,A.M., and Gilbert,W. (1980), Methods Enzymol., 65, 499-560. alkaline phosphatase/tg of DNA added to the Hindll restriction mix. Meinke,W., and Meinke,G.C. (1981) J. Gen. Virol., 52, 15-24. DNA sequencing Messing,J., Crea,R., and Seeburg,P.H. (1981) Nudeic Adds Res., 9, The dideoxy chain termination method (Sanger et al., 1977) was used to se- 309-321. Moar,M.H., Saviera Campo,M., Laird,H.M., and Jarrett,W.F.M. (1981a) quence DNA insertions of the various M13 recombinants. The conditions for J. Virol., 39, 945-949. the four sequencing reactions have been described by Smith (1980). Labelling Moar,M.H., and sequencing of specific fragments by partial chemical degradation was Saviera-Campo,M., Laird,H.M., and Jarrett,W.F.M. (1981b) according to Maxam and Gilbert (1980). Nature, 293, 749-751. performed Orth,G., Breitburd,F., Favre,M., and Croissant,O. (1977) in Hiatt,H.H., Computer storage and analysis of the sequence data Watson,J.D., and Winstein,J.A. (eds.), Cold Spring Harbor Conference Gel reading data were introduced and handled in a data base using the pro- on Cell Protiferation, 4, Cold Spring Harbor Laboratory Press, NY, grams BATIN, DBCOMP and DBUTIL described by Staden (1980). Analysis pp. 1043-1068. was performed using programs described by Staden (1980) or Brutlag et al. Orth,G., Favre,M., Breitburd,F., Croissant,O., Jablonska,S., Obalek,S., (1982), or derivatives designed with their subroutines. Jarzabek-Chorzelska,M., and Rzesa,G. (1980) in Essex,M., Todaro,G., Sequence data and restriction maps are available upon request. and Zur Hausen,H., (eds.), Cold Spring Harbor Conference on Cell Proliferation, 7, Cold Spring Harbor Laboratory Press, NY, pp. 259-282. Acknowledgements Proudfoot,N.J., and Brownlee,G.C. (1976) Nature, 263, 211-214. Rinehart,F.P., Wilson,V., and Schmid,C. (1977) Biochim. Biophys. Aaa, We are grateful to B.Barrell, P.Deininger, and G.Winter for advice in 474, 69-81. employing the M13 cloning and sequencing system. We are indebted to Rothstein,R., and Wu,R. (1981) Gene, 15, 167-176. B.Caudron, F.Schaeffer, R.Staden, and D.Brutlag for making available to us Sanger,F., Nicklen,S., and Coulson,A.R. (1977) Proc. Natl. Acad. Sd. computer programs (Unite Calcul - Institut Pasteur, Stanford MOLGEN USA, 74, 5463-5467. PROJECT, and the NIH SUMEX-AIM FACILITY). We thank O.Crois- Sanger,F., Coulson,A.R., Barrell,B.G., Smith,A.J.H., and Roe,B.A. (1980) sant, J.Jouanneau, G.Sauer, A.Clad, and H.Zur Hausen for communicating J. Mol. Biol., 143, 161-178. unpublished results, I.Duska for participating in the sequence analysis of Sarver,N., Gruss,P., Law,M.F., Khoury,G., and Howley,P. (1981) Mol. BPV, D.Brutlag and G.Orth for valuable comments, and C.Maczuka for help Cell. Biol., 1, 486-496. in the preparation ofthe manuscript. This work was supported by grants from Seidmann,M.M., Levine,A.J., and Weintraub,H. (1979) Cell, 18, 439-449. the C.N.R.S (LA 270), the I.N.S.E.R.M. (ATP 72-79-104, PRC 118011) and Sharp,P.A. (1981) Cell, 23, 643-646. the Fondation pour la Recherche Medicale Francaise. Shepherd,J.C.W. (1981) Proc. Natl. Acad. Sci. USA, 78, 1596-1600. Smith,A.J.H. (1980) Methods Enzymol., 65, 560-580. References Soeda,E., Arrand,J.R., Smolar,N., Walsh,J.E., and Griffin,B.E. (1980a) Banerji,J., Rusconi,S., and Schaffner,W. (1981) Cell, 27, 299-308. Nature, 283, 445-453. Brutlag,D.L., Clayton,J., Friedland,P., and Kedes,L.H. (1982) Nucleic Soeda,E., Maruyama,T., Arrand,J.R., and Griffin,B.E. (1980b) Nature, Acds Res., 10, 279-294. 285, 165-167. Chargaff,E., and Davidson,J.N. eds., (1955) The Nucleic Acds, Vol. 2, Staden,R. (1980) Nucleic Acids Res., 8, 3673-3694. published by Academic Press, NY. Wildy,P. (1971) in Melinek,J.L. (ed.), Monographsin Virology, 5, S.Karger, Clad,A., Gissmann,L., Meier,B., Freese,U.K., and Schwarz,E. (1982) Basel, pp. 38-39. Virology, in press. Zur Hausen,H. (1980) in Tooze,J. (ed.), DNA Tumor Vinrses, (part 2), Coggin,J.R.Jr., and Zur Hausen,H. (1979) Cancer Res., 39, 545-546. Cold Spring Harbor Laboratory Press, NY, pp. 371-382. Crawford,L.V. (1969) Adv. Virus Res., 14, 89-152. Croissant,O., Testaniere,V., and Orth,G. (1982) C.R. Hebd. Seances Acad. Sci., Paris, in press Danos,O., Katinka,M., and Yaniv,M. (1980) Eur. J. Biochemn., 109, 457-461. Deininger,P.L., Jolly,D.J., Rubin,C.M., Friedmann,T., and Schmid,C.W. (1981) J. Mol. Biol., 151, 17-33. Efstratiadis,A., Posakony,I.W., Maniatis,T., Lawn,R.M., O'Connell,C., Spritz,R.A., De Riel,S.K., Forget,B.G., Weissman,S.M., Slightom,J.L., Blechl,A.E., Smithies,O., Baralle,F.E., Shoulders,C.C., and Proudfoot, 236