Proc. Nati. Acad. Sci. USA Vol. 83, pp. 2531-2535, April 1986 Evolution Common evolutionary origin of and (hepadnaviruses/DNA sequence homology) ROGER H. MILLER* AND WILLIAM S. ROBINSON Stanford University School of Medicine, Stanford, CA 94305 Communicated by Robert M. Chanock, December 16, 1985

ABSTRACT Hepatitis (HBV), although classified isolate each of GSHV (14) and WHV (15). Analysis of the as a double-stranded DNA virus, has been shown recently to protein-coding capacity ofthe virus shows that only one viral replicate by reverse of an RNA intermediate. DNA strand, the minus or long strand, possesses significant Also, the putative viral polymerase has been found to share protein-coding capability. Four long open reading frames amino acid homology with of retrovi- (ORFs) have been assigned to specifying the viral core ruses. Using computer-assisted DNA and protein sequence (sometimes including a short precore region), surface (always analyses, we examined the of 13 hepadnavirus isolates including a long presurface region), and putative polymerase (nine human, two duck, one woodchuck, and one ground protein as well as an unknown protein X. All genomes are squirrel) and found that other conserved regions of the structurally colinear, with the four ORFs approximately the hepadnavirus share homology to corresponding re- same length in all isolates examined with the exception of the gions of the genomes of type C retroviruses and -like deletion of the carboxyl-terminal region of the X in endogenous human DNA elements. Specifically, the most DHBV. Of the hepadnavirus proteins encoded by the four highly ofthe HBV genome, positioned at or viral ORFs, the core, which is the nucleocapsid protein, is the near the initiation site for rirst-strand HBV DNA synthesis, is most highly conserved. We used computer-assisted DNA and homologous over 67 nucleotides to the U5 region, a comparable protein sequence analyses to compare the genomes of region in retrovirus long terminal repeats. Within a highly hepadnaviruses and retroviruses and report here that these conserved (i.e., 90%) 16-nucleotide sequence a heptanucleotide virus families share previously unreported homology be- sequence CCTTGGG is 97% homologous between 27 virus tween several well-conserved regions of their genomes, isolates. Also, we found that the highly conserved HBV core, or suggesting their common evolutionary origin. nucleocapsid, protein shares 41% homology over 98 amino acids with the carboxyl-terminal region of the p30 gag Retroviral U5-like sequence in terminal repeats of the linear nucleocapsid protein of type C retroviruses. In both cases, as HBV RNA genome with the previously reported polymerase homology, HBV is In our DNA sequence analysis of hepadnavirus genomes, we most homologous to the murine leukemia/sarcoma retrovi- found that the region located between the carboxyl-terminal ruses. Further analysis revealed additional similarities between end of ORF 4 (X ) and the amino-terminal end of ORF hepadnavirus and retroviral genomes. Taken together, our 3 (core gene) is the most highly conserved. This region is 99% results suggest that HBV and retroviruses have a common conserved over a distance of 50 nucleotides in the 11 evolutionary origin, with HBV arising through a process of mammalian hepadnavirus sequences examined. The homol- deletion from a retrovirus, or retrovirus-like, progenitor. ogy extends into the core ORF, where a mRNA poly(A) addition signal sequence is found, with the additional 33 (HBV) is the smallest double-stranded nucleotides sharing 95% homology. The homology decreases DNA virus known to infect man and is the prototype of the somewhat over the next 28 nucleotides, where the down- hepadnavirus family, which includes similar infecting stream G'T cluster, involved in poly(A) addition (16), is ducks [ (DHBV)], ground squirrels found. Overall, the entire 111-nucleotide region is 95% [ground squirrel hepatitis virus (GSHV)], and woodchucks conserved between mammalian hepadnaviruses. [woodchuck hepatitis virus (WHV)]. Until recently, little was We used the computer search program IFIND (17) to known about HBV replication due to the limited host range locate DNA sequences in the National Institutes of Health of the virus and its inability to be grown in tissue culture. database homologous to this highly conserved region of the However, experiments using virus-infected liver revealed a hepadnavirus genome. We find that this sequence is homol- novel mode of DNA replication through an RNA intermedi- ogous to the U5 region of many type C retrovirus long ate by reverse transcription, a process similar to retrovirus terminal repeats (LTRs). Alignment ofthe sequences with the replication (1). This work was soon followed by the discovery programs ALIGN (18) and SEQ (19) shows that the homology of amino acid sequence homology between the putative encompasses nucleotides 5-66 of the conserved region of polymerase protein of HBV and a conserved region of the HBV and covers nearly the entire retroviral U5 sequence reverse transcriptase of retroviruses (2, 3) and raised the (Fig. 1 Left). The homology is much less at the extreme 3' end question of the genetic relatedness of the hepadnavirus and of U5, which contains the short inverted terminal repeat retrovirus families. essential for precise integration of retroviral DNA into The small size of the HBV genome [-3200 base pairs (bp)] cellular DNA (24). The most distinctive features in this makes it especially amenable for nucleotide sequence and homologous sequence are the presence of alternating purine biological analysis. At the time of this writing, the nucleotide and pyrimidine tracts and a well-conserved (i.e., 90% ho- sequence of 13 viral genomes has been published: nine HBV isolates of surface antigen subtypes adr (4-7), adw (5, 8), ayw Abbreviations: HBV, hepatitis B virus; LTR, long terminal repeat; (9, 10), and adyw (11), two isolates ofDHBV (12, 13), and one bp, base pair(s); DHBV, duck hepatitis B virus; GSHV, ground squirrel hepatitis virus; WHV, woodchuck hepatitis virus; ORF, open reading frame. The publication costs ofthis article were defrayed in part by page charge *Present address: Hepatitis Section, Laboratory of Infectious Dis- payment. This article must therefore be hereby marked "advertisement" eases, National Institute of Allergy and Infectious Diseases, Na- in accordance with 18 U.S.C. §1734 solely to indicate this fact. tional Institutes of Health, Bethesda. MD 20892. 2531 Downloaded by guest on September 23, 2021 2532 Evolution: Miller and Robinson Proc. Natl. Acad Sci. USA 83 (1986)

ADR (1) GCTGTGCCTTGGGTGG ADR (2) GCTGTGCCTTGGGTGG ADR (3) GCTGTGCCTTGGGTGG ADR (4) GCTGTGCCTTGGGTGG ADW (1) GCTGTGCCTTGGGTGG 1 0 2 0 3 0 ADW (2) GCTGTGCCTTGGGTGG ADYW GCTGTGCCTTGGGTGG AYW (1) GCTGTGCCTTGGGTGG HEPADNAVIRUS T A C T G T T C A A F1CF TC CA A G C T G T G C C T T G G G AYW (2) GCTGTGCCTTGGGTGG DHBV (1) ACTGTACCTTTGGTAT RETROVIRUS T G A - G T G G T CJ - _ G C T G T T C C T T G G G DHBV (2) ACTGTACCTTTGGTAT GSHV GCTGTGCCTTGGATGG WHV GCTGTGCCTTGGATGG AKR GCTGATCCTTGGGAGG AMLV GCTGTTCCTTGGGAGG BKMV GCTGATCCTTGGGAGG FBJ GCTGATCCTTGGGAGG FESV GGTGTTCGTTGGGTAC FMCF GCTGTTCCTTGGGAGG FSFFV GCTGATCCTTGGGAGG 4 0 5 0 6 0 GALV GCTGTTCCTTGGGAAG MMCF GCTGTTCCTTGGGAGG MMLV GCTGTTCCTTGGGAGG AT G A C A T T G A C CC T T MMSV (1) GCTGTTCCTTGGGAGG T G G __C T __T T G GG GC G MMSV (2) GCTGTTCCTTGGGAGG RMLV GCTGTTCCTTGGGAGG A GGG T C T C C T C T G A G T G A T T G A C A T G A CC C G T ssV GTTGTTCCTTGGGAGG

FIG. 1. Comparison of the conserved region of hepadnaviruses to the U5 sequence of retroviruses. (Left) The conserved consensus mammalian hepadnavirus sequence (nucleotides 5-66) is aligned with the conserved consensus U5 sequence of the following mammalian type C retroviruses (20-23): AKR murine leukemia virus (AKR), Abelson murine leukemia virus (AMLV), BL/Ka murine nonleukemogenic virus (BKMV), FBJ murine osteosarcoma virus (FBJ), feline sarcoma virus (FESV), Friend mink focus-forming virus (FMCF), gibbon ape leukemia virus (GALV), Moloney mink cell focus-forming virus (MMCF), Moloney murine leukemia virus (MMLV), Moloney murine sarcoma virus (MMSV) isolates 1 and 2, Rausher murine leukemia virus (RMLV), and simian sarcoma virus (SSV). Dashes represent gaps inserted into the sequence to provide maximum alignment. Homologous nucleotides are enclosed within boxes. (Right) Comparison ofthe homologous central region of 16 nucleotides from the U5-like sequences of all 13 sequenced hepadnavirus genomes to the U5 sequences of the 14 mammalian type C retroviruses cited above. mologous in 14 mammalian type C retroviruses and 13 In retroviruses, synthesis of DNA by reverse transcription hepadnavirus isolates) centrally located region of 16 nucle- of RNA initiates at a tRNA molecule hybridized to a otides (Fig. 1 Right). An internal heptanucleotide sequence complementary sequence on the RNA template, called the CCTTGGG, which we term the "CTG box," is highly primer binding site, adjacent to the U5 region. Synthesis of conserved (i.e., 97% homologous) between these 27 isolates the first strand of DNA begins with U5 copied first and and is found, with some modifications, in all retroviral U5 proceeds through the repeat region (R) to the 5' end of sequences examined (unpublished data). Of the 14 homolo- genomic RNA, followed by a process called template switch- gous mammalian type C retrovirus U5 sequences, the murine ing and synthesis of the remainder of first-strand DNA (see leukemia/sarcoma virus sequences are most homologous to ref. 38 for a review). The short sequence of U5-R DNA made those of the hepadnaviruses. prior to template switching has been termed "strong stop" Recently, retroviral-like elements have been found in DNA. U5, whose function is least understood, is recognized human and simian chromosomal DNA (25-31). These endog- as the most conserved element of the retroviral LTR (20, 21, enous elements appear to be derived from a retroviral 39). Since this sequence contains numerous termination ancestor, but most are probably no longer infectious due to codons in all three frames, it is presumed that its accumulation of termination codons in the ORFs. They are function is cis acting, involved in replication or transcription unusual because, where known, the tRNA primer binding termination. sites (28, 30) are different from those known previously for Analogous to U5 in retroviruses, the U5-like sequence in retroviruses (32). These elements, as well as endogenous HBV is located at or near the sites for initiation of minus- simian retroviruses, share substantial amino acid homology strand DNA synthesis and transcription termination. Dupli- with murine leukemia/sarcoma C type retroviruses at the cated at the termini of the hepadnavirus linear RNA genome amino-terminal end of the p30 gag protein as well as the (40, 41), this sequence is positioned between the site of conserved region ofthe viral reverse transcriptase (25, 27-30, initiation of first-strand DNA synthesis and the region con- 33-36). We found that the U5 regions of the retrovirus-like taining the signals for transcription termination for all species elements share substantial homology to the 5' end of murine of viral mRNA. Data obtained from DHBV replication in retroviral U5 sequences and, conversely, to the 3' end of the infected duck livers indicate that the first DNA copied from HBV U5-like sequence (unpublished data). This indicates the viral RNA template is the short direct repeat (DR) that HBV and the endogenous human elements may both adjacent to the U5-like sequence (40, 42). This indicates that share a common ancestor with the murine type C retrovi- the 3' U5-like sequence on viral RNA may function as a ruses. binding site for the protein that primes first-strand DNA a gene for an , the direct synthesis. Proceeding 5' to 3' on the nascent DNA strand, Though lacking coding with switching may sequences necessary for DNA synthesis may continue no template ancestor of HBV have had the (and hence no strong stop DNA species), with the 5' U5-like precise recombination with cellular DNA. Hepadnaviruses, sequence of HBV synthesized last. Therefore, the U5 region however, contain a pair of 11- or 12-bp direct repeat se- of retroviruses is the first RNA copied during first-strand quences (DR-1 and DR-2) in the region adjacent to the DNA synthesis, but in hepadnaviruses this situation may be sequence that is homologous to the retrovirus U5 sequence reversed. and these appear to be involved in viral DNA replication (37). Recent analysis of integrated HBV genomes has shown that Homology between the nucleocapsid proteins of HBV and integration sometimes occurs in the region ofDR-1 and DR-2, retroviruses indicating that these direct repeat sequences are preferred sites in the viral genome for recombination. Hepadnaviruses We compared the amino acid sequences, predicted from the may thus be capable of transposition (as may have been the nucleotide sequences, ofthe hepadnavirus core, surface, and ancestral element). X genes to retroviral proteins using the PEP (19) and ALIGN Downloaded by guest on September 23, 2021 Evolution: Miller and Robinson Proc. Natl. Acad. Sci. USA 83 (1986) 2533 (18) computer programs. We found that the HBV core protein exactly reversed in the genes of the viruses infecting eukary- shared substantial homology to the carboxyl terminus of the otic cells. We have determined the codon usage for hepad- p30 gag protein of type C murine leukemia/sarcoma viruses navirus proteins by computer analysis and found that, like (Fig. 2). The degree ofhomology between these nucleocapsid retroviral oncogenes, the X gene codon usage preference is proteins is comparable to that found previously between the similar to that of the genes of eukaryotic cells. Conversely, viral polymerases (2), where 21 identical and 17 similar amino the other HBV genes clearly share the codon preference of acids are shared over a region of94 amino acids representing viral genes. By comparing the codon usage in the genes of a homology of40%. Similarly, the viral nucleocapsid proteins four mammalian hepadnavirus genomes, where the same share 28 identical and 12 similar amino acids over a region of preference was made in all isolates, the viral (A + U) to cell 98 amino acids representing a homology of 41% (Fig. 2). (G + C) usage, at the third position of the triplet, was 17 to In Moloney murine leukemia virus (Moloney MLV), the 1 in the core genes, 10 to 6 in the surface protein genes, 14 to prototype mammalian type C retrovirus, the translation 6 in the polymerase genes, and 4 to 10 in the X genes, product of the gag gene is a polyprotein of 539 amino acids, respectively. which is cleaved proteolytically into four smaller proteins of Retroviral oncogenes have been shown to have distinct 15, 12, 30, and 10 kDa by a virally encoded protease (43). preferences in codons for the amino acids phenylalanine and Previous studies have shown homology between C type glutamic acid. This unusual preference is also found in the retroviral p30 peptides by amino acid analysis directly (34, HBV X gene but not other viral genes. Specifically, for 35, 44) and analysis oftranslated ORFs from DNA sequence phenylalanine the triplet UUC is favored 3 to 1 over the triplet data (36, 45, 46). It is apparent that the p30 nucleocapsid UUU in the src gene (48), 2 to 1 in the mos gene (22), and 1.5 protein is the essential protein in the gag complex since to 1 in the X gene. For glutamic acid, the triplet GAG is mutations in p30, but not the other gag genes, are lethal for favored 4 to 1 over GAA in the src gene (48), 1.5 to 1 in the the virus (38). Like the HBV core protein, it is the carboxyl mos gene (22), and up to 6 to 1 in the HBV X gene. terminus of the retroviral p30 protein that is rich in basic Conversely, the simian virus T antigen favors the GAA triplet amino acids and is possibly involved in binding. 2 to 1 over the GAG triplet in specifying glutamic acid (49). By comparing the relatedness of the amino acid sequence Recent findings (41, 42) show that the X gene is positionally of the carboxyl terminus of the nucleocapsid protein of located at the 3' end ofthe linear HBV RNA genome (Fig. 3). Moloney murine sarcoma (Moloney MSV), HBV (subtype In our analysis ofcodon usage, we note that the eukaryotic adw2), baboon endogenous virus (BAEV), and the endoge- cell codon preference extends upstream of the conventional nous human retrovirus-like element 4-1, we found the ho- start site ofthe X gene ORF to include the viral enhancer (50). mology to be HBV to 4-1 = 24%, HBV to BAEV = 30%, This coincides with the fact that in three HBV genomes, all HBV to Moloney MSV = 41%, Moloney MSV to 4-1 = 37%, subtype adr, the X ORF extends farther into the polymerase Moloney MSV to BAEV = 74%, and BAEV to 4-1 =43%. ORF than do the X ORFs ofthe first HBV isolates sequenced No substantial amino acid homology was found when the (Fig. 3). In two adr isolates this adds an additional 61 amino core protein of HBV was compared to the nucleocapsid acids (5, 7), whereas in one it adds 78 amino acids (4) to the proteins of other viruses. Therefore, HBV and endogenous conventional X protein of 145-154 amino acids. Assuming human retrovirus-like elements share homology with the that errors in genome replication have resulted in the accu- murine type C retroviruses in U5 and the nucleocapsid, mulation of termination codons in functionally unimportant indicating an evolutionary relationship between these regions of the gene, the original ORF was at least 50% larger genomes. than in the majority of viral isolates found today. Thus, the 5' boundary of cell codon usage coincides with the 5' end of The X gene at the 3' end of the linear hepadnavirus genome the viral enhancer and includes the amino-terminal end ofthe may be of recent cellular origin longest X ORF (4), whereas the 3' boundary of cell codon usage is the carboxyl-terminal end of the X gene ORF (Fig. Certain retroviruses possess recently acquired cellular genes 3). Apparently, both the HBV enhancer and X gene were often found at the 3' end oftheirgenomes. Termed oncogenes acquired recently from the host cell genome. Therefore, like (onc) for the oncogenic potential they confer on the virus, BK virus (51), HBV appears to have an enhancer that is of they differ from other retroviral genes by the fact that they recent cellular origin. share the codon usage preference favored by eukaryotic cells It seems unlikely that the X gene is an oncogene. First, the and not viruses of (47). In codons specifying the amino acid sequence comparison of the X gene protein with same amino acid and differing only in the nucleotide at the retroviral oncogenes and cellular proteins revealed no sig- third position of the triplet, there is a strong bias at the third nificant homology with any protein sequence in the National position for cytidine rather than uridine and guanosine rather Biomedical Research Foundation database at this time (un- than adenosine in genes of eukaryotic cells. This situation is published data). Also, of humans,

HBV N T N V G L K I R Q L L W F H I S C L T F G R E T V- L EY L V SF GV W I R TP P A Y R P P N A P I L

MMSV E T N V S MI - -- - W Q SAP D I SK L E R LED L RN K T [ G D L V R E AE RI F N K RE -

HBV SF L FYITT V[ R D R G - R S P R R TR P S P R HRRS P S P ]RR R S Q FR MMSV -[- PE R EE R I R E E KE EIR RR TI D E QKEKE D R H E M J

FIG. 2. Amino acid homology between hepadnaviruses and retrovirus nucleocapsid proteins. Amino acids 89-184 ofthe core protein ofHBV subtype adw (8) are aligned to amino acids 390-476 at the carboxyl-terminal region of the p30 protein of the Moloney murine leukemia virus (MMSV) gag gene complex (22). Homologous amino acids are enclosed in boxes. Amino acid similarities are those used by Toh et al. (2) in their alignment ofhepadnavirus and retrovirus polymerases. Dashes represent gaps in the predicted amino acid sequences inserted to maximize alignment. Downloaded by guest on September 23, 2021 2534 Evolution: Miller and Robinson Proc. Nad. Acad. Sci. USA 83 (1986)

codon usage of eukaryotic cells most conserved region between hepadnavirus genomes region homologous to timits of direct poly-AY- retroviralhomologousGAGto P30 retroviral reverse viral enhancer repeats transcriptase / , addition US1- 1ah -A _-- like core core pre-surfoce surface surface X start Xstop start stap start start stop polymeraose polymerose start 9stop beginning of tongest X ORF

0 1 2 3 DISTANCE, Kb FIG. 3. Map of the linear hepadnavirus genome illustrating the region of eukaryotic cell codon usage. HBV subtype adw (8) is presented as a linear DNA molecule from data extrapolated from the determination of the 5' and 3' ends of the DHBV (40) and WHV (41) linear RNA pregenomes. Kb, kilobases. though linked to HBV infection epidemiologically and by the the essential features necessary for virion production and presence of integrated viral DNA in tumor cell DNA, genome replication by reverse transcription ofRNA inherited develops over a long period of time. In contrast, infection of from its ancestor but gradually lost some of the features with oncogene-containing retroviruses results in involved in transposition ofDNA. The HBV core protein was rapid tumor formation. It is possible that the X gene of apparently derived from the protein that gave rise to the hepadnaviruses is more analogous to the pX gene of the essential protamine-like p30 protein of the retroviral gag human T-cell lymphotropic viruses (HTLV) I, II, and III and polyprotein complex. Two essential features retained for bovine leukemia virus. This gene similarly is located at the 3' replication of the genome by reverse transcription were the end ofthe genome. We found that of all HTLV III genes, the presumed active site of the polymerase and a U5-like cis- pX gene has the highest usage of codons utilized by eukary- acting sequence at or near the initiation site of DNA synthe- otic cells (i.e., nearly 1:1 in cell:viral codon usage, ref. 52) and sis. HBV can be conceptualized as a truncated version ofthe utilizes the codon GAG preferentially in specifying the amino retrovirus genome since homologous regions are clustered in acid glutamic acid (unpublished data). Also, like the the 5' third of the MMLV genome (Fig. 4). Since the murine hepadnavirus X gene but in contrast to retroviral oncogenes, type C retroviruses were consistently most homologous to the pX gene sequences do not hybridize to cellular sequences the hepadnavirus sequences, we believe that these viruses under stringent hybridization conditions. Evidence is accu- diverged relatively recently from a common ancestor. mulating that the expression of the protein product of the pX Specific regions on retroviral genomes are favored sites for gene correlates with the activation of transcription from the intramolecular recombination (53-55). Truncated retrovi- retroviral LTR. It has been proposed that the trans-activating ruses and endogenous human retrovirus-like elements have properties of the product of this gene can result in cell been characterized and shown to have the types of deletions transformation by the activation ofcellular genes, with tumor necessary for the genesis of hepadnaviruses from a Moloney formation after a long latent period. Whether the hepadna- murine leukemia virus-like ancestor. For example, intra- virus X gene is capable of trans-activating viral or cellular molecular recombinants of spleen necrosis virus have been genes remains to be examined. isolated with the 3' end of U5 joined to the amino-terminal end of p30 gag (46) forming a virus genome structurally Hepadnavwruses appear to be truncated retroviruses analogous to the 5' region ofthe hepadnavirus genome. Also, the endogenous human retrovirus-like element 51-1 has lost Functionally important regions of genomes are conserved part of the reverse transcriptase gene and all of the envelope throughout long evolutionary periods. During the course of gene (27), a situation analogous to the configuration of the 3' its evolution, it appears that the hepadnavirus family retained end of the hepadnavirus genome. Thus, precedents exist for

R U5 GAG (PROTEASE) POLYM ERASE ENVELOPE U3 R P15 P12 P30 PIO REVERSE ENDONUCLEASE TRANSCR I PTASE 4 I MMLV [NJ I I 11

HEPADNAVIRUSES f U5- CORE x LIKE POLYMERASE FIG. 4. Summary ofhepadnavirus and retrovirus genome homology. Shaded areas represent regions ofhomology between the two genomes. Arrows point to regions of homology in the retroviral genome known to be targets for intramolecular recombination (46). MMLV, Moloney murine leukemia virus. Downloaded by guest on September 23, 2021 Evolution: Miller and Robinson Proc. Natl. Acad. Sci. USA 83 (1986) 2535

the types of deletions necessary for the genesis of the 13. Sprengel, R., Kuhn, C., Will, H. & Schaller, H. (1985) J. Med. Virol. 15, 323-333. hepadnavirus genome from a retrovirus-like genome. The 14. Seeger, C., Ganem, D. & Varmus, H. E. (1984) J. Virol. 51, 367-375. size of genome generated by such a combination ofdeletions, 15. Galibert, F., Chen, T. N. & Mandart, E. (1982) J. Virol. 41, 51-65. -3300 nucleotides, is in good agreement with the average 16. Birnsteil, M. L., Busslinger, M. & Strub, K. (1985) Cell 41, 349-359. size (i.e., 3200 nucleotides) for the hepadnavirus genome. 17. Wilbur, W. J. & Lipman, D. J. (1983) Proc. Natl. Acad. Sci. USA 80, common ancestor must at 726-730. We speculate that the have had 18. Needleman, S. B. & Wunsch, C. D. (1970) J. Mol. Biol. 48, 443-453. least two long overlapping translation frames over part ofthe 19. Brutlag, D. L., Clayton, J., Friedland, P. & Kedes, L. H. (1982) Nucleic genome. Although long overlapping ORFs are not the rule in Acids Res. 10, 279-294. retroviruses, the endogenous human element THE (31) and 20. Adachi, A., Sakai, K., Kitamura, N., Nakanishi, S., Niwa, O., HBV possess with The Matsuyama, M. & Ishimoto, A. (1984) J. Virol. 50, 813-821. both ORFs substantial overlap. endog- 21. Chen, H. R. & Barker, W. C. (1984) Nucleic Acids Res. 12, 1767-1778. enous human retroviral-like element THE is a 2.3-kilobase 22. Van Beveren, C., van Straaten, F., Galleshaw, J. A. & Verma, I. M. element that is transcriptionally active in tissue culture cells (1981) Cell 27, 97-108. and is present in the episomal state as well as integrated into 23. Reddy, E. P., Smith, M. J. & Aaronson, S. A. (1981) Science 214, chromosomal DNA. THE possesses ORFs cov- 445-450. overlapping 24. Panganiban, A. T. & Temin, H. M. (1983) Nature (London) 306, ering portions of all three translation frames. However, an 155-160. even longer region ofoverlap is found in HBV where the viral 25. O'Connell, C., O'Brien, S., Nash, W. G. & Cohen, M. (1984) polymerase ORF overlaps the presurface and surface protein 138, 225-235. ORFs spanning a distance of -1200 bp. This arrangement 26. Bonner, T. I., Birkenmeier, E. H., Gonda, M. A., Mark, G. E., Searfoss, G. H. & Todaro, G. J. (1982) J. Virol. 43, 914-924. was probably inherited from the ancestral element since the 27. Repaske, R., O'Neill, R. R., Steele, P. E. & Martin, M. A. (1983) Proc. region of overlap coincides with the region of hepadnavirus Natl. Acad. Sci. USA 80, 678-682. polymerase homology to retroviral reverse transcriptase 28. Repaske, R., Steele, P. E., O'Neill, R. R., Rabson, A. B. & Martin, (Fig. 3). It has been demonstrated that these overlapping M. A. (1985) J. Virol. 54, 764-772. 29. Bonner, T. I., O'Connell, C. & Cohen, M. (1982) Proc. NatI. Acad. Sci. ORFs could not have arisen by chance but instead descended USA 79, 4709-4713. from oligomeric repeats without termination codons (56). 30. O'Connell, C. D. & Cohen, M. (1984) Science 226, 1204-1206. The hepadnaviruses appear to occupy a unique position in 31. Paulson, K. E., Deka, N., Schmid, C. W., Misra, R., Schindler, C. W., Rush, M. G., Kadyk, L. & Leinwand, L. (1985) Nature (London) 316, a spectrum of viruses and genetic elements that utilize 359-361. reverse transcription. The copia and 17.6 transposable ele- 32. Nikbakht, K. N., Ou, C.-Y., Boone, L. R., Glover, P. L. & Yang, ments ofDrosophilia, cauliflower ofplants, the W. K. (1985) J. Virol. 54, 889-893. Ty element of yeast, -like elements of 33. Tamura, T.-A. (1983) J. Virol. 47, 137-145. the and 34. Oroszlan, S., Copeland, T. D., Gilden, R. V. & Todaro, G. J. (1981) mammals, retroviruses, hepadnaviruses share ho- Virology 115, 262-271. mology over several regions oftheir genomes. First observed 35. Oroszlan, S., Barbacid, M., Copeland, T. D., Aaronson, S. A. & in the polymerase gene, homology is now being found in other Gilden, R. V. (1981) J. Virol. 39, 845-854. conserved regions on the genomes, indicating that all may 36. Cohen, M., Rein, A., Stephens, R. M., O'Connell, C. O., Gilden, R. V., a common ancestor. Shure, M., Nicolson, M. O., McAllister, R. M. & Davidson, N. (1981) have evolved from Proc. Natl. Acad. Sci. USA 78, 5207-5211. 37. Tiollais, P., Pourcel, C. & Dejean, A. (1985) Nature (London) 317, Computer-assisted nucleic acid and protein analysis was through 489-495. the BIONET National Computer Resource for Molecular Biology, 38. Goff, S. P. (1984) Curr. Top. Microbiol. Immunol. 112, 45-71. which is supported by National Institutes of Health Grant 39. Ju, G. & Skalka, A. M. (1980) Cell 22, 379-386. U41-01685-02. We thank MaryJo Lawler for assistance with running 40. Buscher, M., Reiser, W., Will, H. & Schaller, H. (1985) Cell 40, the programs, Dr. Patricia Marion, Cam-tu Tran, and Dr. Bao Zhang 717-724. M.-A. Tang for assistance in entering hepadnavirus DNA sequences to the 41. Moroy, T., Etiemble, J., Trepo, C., Tiollais, P. & Buendia, (1985) EMBO J. 4, 1507-1514. database for analysis, and Dr. Edward Mocarski for helpful com- 42. Molnar-Kimber, K. L., Summers, J. W. & Mason, W. S. (1984)J. Virol. ments on the manuscript. This investigation was supported by Public 51, 181-191. Health Service Research Grant AI-13526 from the National Institutes 43. Shinnick, T. M., Lerner, R. A. & Sutcliffe, J. G. (1981) Nature (Lon- of Health. R.H.M. was supported by Public Health Service Fellow- don) 293, 543-548. ship Grant 1F 32 CA07221-01 from the National Institute. 44. Hunter, E., Bhown, A. S. & Bennett, J. C. (1978) Proc. Natl. Acad. Sci. USA 75, 2708-2712. 1. Summers, J. & Mason, W. S. (1982) Cell 29, 403-415. 45. Tamura, T. & Takano, T. (1982) Nucleic Acids Res. 10, 5333-5343. 2. Toh, H., Hayashida, H. & Miyata, T. (1983) Nature (London) 305, 46. O'Rear, J. J. & Temin, H. M. (1982) Proc. Natl. Acad. Sci. USA 79, 827-829. 1230-1234. 3. Patarca, R. & Haseltine, W. A. (1984) Nature (London) 309, 728. 47. Wain-Hobson, S., Nussinov, R., Brown, R. J. & Sussman, J. L. (1981) 4. Fujiyama, A., Miyanohara, A., Nozaki, C., Yoneyama, T., Ohtomo, N. Gene 13, 355-364. & Matsubara, K. (1983) Nucleic Acids Res. 11, 4601-4610. 48. Schwartz, D. E., Tizard, R. & Gilbert, W. (1983) Cell 32, 853-869. 5. Ono, Y., Onda, H., Sasada, R., Igarahi, K., Sugino, Y. & Nishioka, K. 49. Fiers, W., Contreras, R., Haegeman, G., Rogiers, R., Van de Voorde, (1983) Nucleic Acids Res. 11, 1747-1757. A., Van Heuverswyn, H., Van Herreweghe, J., Volckaert, G. & 6. Kobayshi, M. & Koike, K. (1984) Gene 30, 227-232. Ysebaert, M. (1978) Nature (London) 273, 113-120. 7. Gan, R.-B., Chu, M.-J., Shen, L.-P. & Li, Z.-P. (1984) Acta Biochim. 50. Shaul, Y., Rutter, W. J. & Laub, 0. (1985) EMBO J. 4, 427-430. Biophys. Sin. 16, 316-319. 51. Rosenthal, N., Kress, M., Gruss, P. & Khoury, G. (1983) Science 222, 8. Valenzuela, P., Quiroga, M., Zaldivar, J., Gray, P. & Rutter, W. J. 749-755. in Virus eds. B. R. & 52. Wain-Hobson, S., Sonigo, P., Danos, O., Cole, S. & Alizon, M. (1985) (1980) Genetics, Fields, N., Jaenisch, Fox, 9-17. C. F. (Academic, New York), pp. 57-70. Cell 40, 9. Galibert, F., Mandart, E., Fitoussi, F., Tiollais, P. & Charnay, P. (1979) 53. Gelmann, E. P., Petri, E., Cetta, A. & Wong-Staal, F. (1982) J. Virol. Nature (London) 281, 646-650. 41, 593-604. 10. Bichko, V., Pushko, P., Dreilina, D., Pumpen, P. & Gren, E. (1985) 54. Shoemaker, C., Hoffmann, J., Goff, S. P. & Baltimore, D. (1981) J. FEBS Lett. 185, 208-212. Virol. 40, 164-172. 11. Pasek, M., Goto, T., Gilbert, W., Zink, B., Schaller, H., MacKay, P., 55. Shoemaker, C., Goff, S., Gilboa, E., Paskind, M., Mitra, S. W. & Leadbetter, G. & Murray, K. (1979) Nature (London) 282, 575-579. Baltimore, D. (1980) Proc. Natl. Acad. Sci. USA 77, 3932-3936. 12. Mandart, E., Kay, A. & Galibert, F. (1984) J. Virol. 49, 782-792. 56. Ohno, S. (1984) Proc. Natl. Acad. Sci. USA 81, 3781-3785. Downloaded by guest on September 23, 2021