Human Papillomavirus La Complete DNA Sequence: a Novel Type of Genome Organization Among Papovaviridae
Total Page:16
File Type:pdf, Size:1020Kb
The EMBO Journal, Vol. 1 No. 2, pp.231-236, 1982 Human papillomavirus la complete DNA sequence: a novel type of genome organization among Papovaviridae O.Danos*, M.Katinka, and M.Yaniv copy episome in the nucleus of non-producing cells, without Unite des Virus oncogenes, Departement de Biologie moleculaire, Institut apparent integration in the chromosomes (Law et al., 1981; Pasteur, 25, rue de Dr. Roux, 75724 Paris Cedex 15, France Lancaster, 1981; Moar et al., 1981a, 1981b). Therefore, they Communicated by M. Yaniv may be used as efficient eucaryotic cloning vectors (Sarver et Received on 29 January 1982 al., 1981). The complete nucleotide sequence of human papillomavirus We report here the complete nucleotide sequence of the type la (7811 nucleotides) has been established. The overall human papillomavirus type 1 subtype a (HPVla) DNA (Cog- organization of the viral genome is different from that of gin and Zur Hausen, 1979). This virus is the causative agent other related papovaviruses (SV40, BKV, polyoma). Firstly, of human deep palmo-plantar warts, and the prototype for genetic information seems to be coded by one strand. Second- one of the nine different types of human papillomaviruses ly, no significant homology is found with SV40 or polyoma characterized so far (Orth et al., 1980; Kremsdorf et al., in coding sequence for either DNA or deduced protein preparation). We show that, in contrast to SV40 or polyoma, sequences. The relatedness of human and bovine papilloma- the open coding frames are located on only one strand. The viruses is revealed by a conserved coding sequence in the two region coding for the viral structural proteins is identified and species. Two regions can be defined on the viral genome: the mapped relative to the bovine papillomavirus type l(BPVI) putative early region contains two large open reading frames genome. of 1446 and 966 nucleotides, together with several split ones, Results and corresponds to the transfonning part of the bovine papiflomavirus type 1 genome, and the remaining sequences, Determination and general features of the nucleotide which indude two open reading frames likely to encode struc- sequence of HPVia tural polypeptide(s). The DNA sequence is analysed and We have reported the cloning in Escherichia coli of the en- putative signals for regulation of gene expression, and tire HPVla genome, and its characterization (Danos et al., homologies with the Alu family of human ubiquitous repeats 1980). This homogeneous source of viral DNA has been used and the SV40 72-bp repeat are outlined. to prepare subgenomic fragments which have been cloned in Key words: Alu family/dideoxy sequencing/M13 shotgun the replicative form of M13 sequencing vectors mWJ43 or cloning/ papillomaviruses evolution mp7 (Rothstein and Wu, 1981; Messing et al., 1981), and se- quenced according to the dideoxy chain termination pro- cedure (Sanger et al., 1977). Introduction The HPVla sequence is presented in Figure 1. It is 7811 nucleotides long, and begins with the T of the single BamHI Papillomaviruses infect many higher vertebrates, including site (GGATCC) chosen as origin of the physical map (Favre man. They are very host- and tissue-specific and produce et al., 1977b). The size agrees well with the electron micro- benign tumors (warts) of the epithelium they colonize. These scopic measurements (O.Croissant, personal communication) tumors generally regress but may, in some cases (Shope and the estimated mol. wt. of 5 x 106 daltons (Crawford, papillomavirus, bovine alimentary tract papillomavirus, and 1969). The G + C content is 40.3%o resembling the 4007o found human papillomavirus type 5), undergo malignant trans- for the human genome (Chargaff and Davidson, 1955). The formation (for review, see Orth et al., 1977, 1980; Zur G + C content does not appear to be homogeneously distri- Hausen, 1980). Moreover, the cycle of these viruses appears buted on the viral genome, but is organized in clusters. to be closely linked to the state of differentiation of epithelial Dinucleotide frequency analysis reveals a rare occurrence of cells (for review, see Orth et al., 1977). Papillomaviruses and CG (20%): for SV40 and polyoma the values are even lower polyomaviruses (SV40, BKV, polyoma) have been classified (0.6% and 1.8%, respectively). morphologically (Wildy, 1971) as genera of the Papo- Figures 2a and c show the distribution of termination vaviridae since they share a common icosahedral capsid struc- codons on both strands. Only one strand has significantly ture in which a double-stranded superhelical DNA molecule is large open reading frames, likely to code for proteins, and found associated with histones (Fareed and Davoli, 1977; will be referred to as the sense strand. No open reading frame Favre et al., 1977a). This DNA is -8000 bp long in the larger than 342 nucleotides is found on the other, anti-sense, papillomaviruses compared with 5200- 5300 bp in the poly- strand. We have examined the genuineness of the coding omaviruses. Although a great deal of information is now capacities of the open reading frames on the sense strand, us- available about genome organization and molecular biology ing the procedure developed by Shepherd (1981). In this type of polyomaviruses (Soeda et al., 1980a), such data conceming of analysis, the relative distance in terms of number of point papillomaviruses are still scarce, due mainly to the lack of in mutations from an ideal coding sequence, i.e., (Pu-X-Py)n, is vitro cell culture systems in which they can be reproducibly calculated for a group of n codons in the three reading propagated. frames. The diagram in Figure 2b shows the result of this One reason for a renewed interest in papillomaviruses, is analysis which indicates, in any particular region, the frame the observation that their genome is propagated as a multi- which is most closely related to the ideal coding sequence. These regions correlate very well with the coding regions out- *To whom reprint requests should be sent. lined in Figure 2a. IRL Press Limited, Oxford, England. 02614189/82/0102-0231$2.00/0 231 O.Danos, M.Katinka, and M.Yaniv TCCACCCGTG GTCTGTMAA MGGCGGTGC CAATCAGCTT .aAWGTCTCA GGTACAGACT TAAAGCATCT ACTCAAGTTG 80 GGTTTTATTA ATAATTGCGC TMTTITATAT AATAAGTACT ATTAACACCG CACCCGTTGT GGCTAATCCC TTATGGTAIT 4000 ACTTTGACAG CATAAGCACC ACATGGCATr GGACAGATAG AAMAACACC GAGAGGATAG GTAGTGCTAG MTGTTAGTA 160 TAAAAGACTA CACCTACAGG ATGTATTGTC TTCATTGTTT ATGGTTTACC GCGCTCCAAA GACGGTTTGC CCAAAGACGG 4080 AAGTTATTG ATGAGGCTCA ACGAGAGAAG TTCTTGAGA GAGTTGCITT GCCCAGATCA GTGTCTGTGT ITTTGGGACA 240 7TTGCCAACC GCGGTTAGGA CTGTTTCAA TTTGCTGCCA AAMCTTATCTG GTCGTGCTCC AACGCCTfC CTGCCAACCA 4160 GTTAATGGG TCTAAAATT AATGGAA(TT GAITTTGCTT GGACGTGTGT ACATAGTCCC TGTATATATT CCCCTCCTAC 320 CCTAAAACGG TAGGTGTGTA CTCTTTCAA GAAITMCAA AGGACATTTC TCCCGCCAAA TTATI.CGA GCCACCGMT 4240 CCCCACATAC CTTGAAGCCTT (CMCATTGT AACAATGTA TCGCCTACGT AGAAAACGCG CTGCCCCCAA AGATATATAC 400 TCGGTCGTAA AMTCTAAAG TGATGATTGT TGTTGTTAC TACCATCATT CATTATTCTA CTTACAACAA GAACCTAGGA 4320 CCCTCATGCA AAATATCAAA CACCTGCCCA CCTGACATTC AAAATAAAAT TGAGCATACA ACAAMTGCTG ATAAATATT 480 GTTATATGCC AGAAGTMGC CTATAAAATA CACAGGTAAG ACTCTGCACA GCACCACATG GCGACACCGA TCCGGACCGT 4400 GCAATATGGC AGTCTGGGAG TTTTTGGG AGGITTGGGC ATTGGMACAG CCAGAGGCCT GGAGGAAGAA TTGGTATAC 560 CAGACAGCMT TCCGAAAGCC TCTGTATCCC ATATATTGAT GMTAITGC CITGTAITfr ITGTAATrAT TTl(TCTA 4480 TCCCCTCGGT GAGGTrGGTG GGGTTAGAGT TGCTACTCGT CCAACTCCAG TAAGGCCTAC AATACCTGTG GAMCAGTAG 640 ATGCTGACAA GCTGCTTTT GATCATTTG ATTTGCATCT TGTCTGGACA GACAATTTGG T=CTGATG CTGTCAAGGG 4560 GCCCCAGTGA AAMTTCCCC ATAGATGTTG TAGATCCTAC AGGCCCTGCT GTAlTCCCC CTACAACGAT TrAGGrAGAG 720 TGTGCTAGM CTGTTAGCCT ATPGGAGITI GTTTTATAT ATCAGGAGTC TTATGAGGTA CCGCAAATAG AAGAITA T 4640 ACTTCCCMT ACCAACTGTG CAGGTTATTG CAGAMTTCA CCCTATTCT GACATACCAA ACATTGTGC ATCTTCAACA 800 GGACAGACCT TTATTGCAAA TTGAACTCCG TTGTGTTACA TGCATAAAAA AACTGAGTGT TGCTGAAMA TTGGAGCTTG 4720 AATGMGGAG AATCTGCCAT ATrAGATCTG TTACAGGCAA TGCAACCATA CGCACTGTTT CAAGAACACA ATACAATAAC 880 TGTCAAACGG AGAMGAGTG CATAGAGTTA GAMCAGACT TAAAGCAACG T5TAGTGT GTCGCTfCTA TGCTATATAA 4800 CCCTCTrTCA CTGTTGCATC TACATCTAAT ATMGTGCTG GAGAAGCATC MACATCAGAT AFrGTTATTrG TITAGCMTGG 960 CAATGGTGGG CGAMATGCCA CCACTAAAGG ACCTGGTTCT TCAAMCGM CCAACCTFCC TAGATTTAGA TCTTTATTGT 4880 TTCAGGTGAC AGGGTGGTGG GCCAGGATAT CCCCTTGGTA GMTTAMCT TAGGCCMTGA AACAGACACA TCT=TrG 1040 TACGAGGACG TGCCTCCTCA TGACATACAC GAGGAT7ACG T5TCCCTfCA rG CCMTAT CCTITC'Trr MCCTf-2r,C 4960 TACMGAAAC AGCATITTCC AGCAGCACAC CAATTGCTGA AAGACCCTCT TTAGGCCCT CMAGATCTA TAATAGGCGT 1120 CTATTGCGA5 AAACTGGTC GATGvACCGf CCTCGCGGAT CACAGCGCCA TTAGACAGCT GAGAAMACTC CTTCTGCGAT 5040 CrATATCAAC AGGTGCMGT ACAAGACCCT AGGlTCGTTG AGCAGCCACA GTCMTGGTC ACITTTGATA ATCCACCATT 1200 CTTGAACAT CGTGTGCCCA CTGTGCACCC TACAGCGACA GTAAAATGGC ACATAATAAA GGTACTfAAA CCATT('T S120 TGAGCCAGAG CMTGATGAGG TGTCTATTAT CTTCCMACA GACTTAGATC CTC-TGCTCA GACACCAGTG CCTGAAMTA 1280 TCfGTGACG CGACAGATTG TGACAGACGCTACAGA,A CCTCAMTGG TGACCTACAT AATGCITff ( TfAGCGA 5200 GAGATGTA(;T TrATCTGAGC AAGCCCACAT TTTCGCGGGA ACCAGGGGGA CGGTTAAGGG TTAGCCGCCT TGGCAAAACT 1360 CTMATCTGAT TTATTACACG AGCCGCCGCA AAGCCAGGGG AITTCCCTGG ATTGTTCCA CAACASGAA TCGCTGGAA 5280 TCAACTATTC GTACACGCCT GGGCACAGCA ATTGGCGCCA GAACCCACMT 1TTCTATGAT TTAAGTTCTA TTGCTCCAGA 1440 GCGAACAGGA ACTTMTGCT TTAAAACGM ACTTACTTTA CACTCCTCAG GCCAGAAGCG CGGACCGAkC ACACAITGCT 5360 AGACTCMTfT CGAATATGC CTTTAGGTGA GCATAGTCAA ACAACAGTCA TTAGTTCCM MITAGGTGAC ACAGCAiTTA 1520 AGCAITAGTC CrAGATTAGA AACTAITCT ATTACAAAGC AAGACAAAM C,GTATCGA AGGCACFGT ITCTCAGCA 5440