Common Evolutionary Origin of Hepatitis B Virus and Retroviruses (Hepadnaviruses/DNA Sequence Homology) ROGER H
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Nati. Acad. Sci. USA Vol. 83, pp. 2531-2535, April 1986 Evolution Common evolutionary origin of hepatitis B virus and retroviruses (hepadnaviruses/DNA sequence homology) ROGER H. MILLER* AND WILLIAM S. ROBINSON Stanford University School of Medicine, Stanford, CA 94305 Communicated by Robert M. Chanock, December 16, 1985 ABSTRACT Hepatitis B virus (HBV), although classified isolate each of GSHV (14) and WHV (15). Analysis of the as a double-stranded DNA virus, has been shown recently to protein-coding capacity ofthe virus shows that only one viral replicate by reverse transcription of an RNA intermediate. DNA strand, the minus or long strand, possesses significant Also, the putative viral polymerase has been found to share protein-coding capability. Four long open reading frames amino acid homology with reverse transcriptase of retrovi- (ORFs) have been assigned to genes specifying the viral core ruses. Using computer-assisted DNA and protein sequence (sometimes including a short precore region), surface (always analyses, we examined the genomes of 13 hepadnavirus isolates including a long presurface region), and putative polymerase (nine human, two duck, one woodchuck, and one ground protein as well as an unknown protein X. All genomes are squirrel) and found that other conserved regions of the structurally colinear, with the four ORFs approximately the hepadnavirus genome share homology to corresponding re- same length in all isolates examined with the exception of the gions of the genomes of type C retroviruses and retrovirus-like deletion of the carboxyl-terminal region of the X ORF in endogenous human DNA elements. Specifically, the most DHBV. Of the hepadnavirus proteins encoded by the four highly conserved sequence ofthe HBV genome, positioned at or viral ORFs, the core, which is the nucleocapsid protein, is the near the initiation site for rirst-strand HBV DNA synthesis, is most highly conserved. We used computer-assisted DNA and homologous over 67 nucleotides to the U5 region, a comparable protein sequence analyses to compare the genomes of region in retrovirus long terminal repeats. Within a highly hepadnaviruses and retroviruses and report here that these conserved (i.e., 90%) 16-nucleotide sequence a heptanucleotide virus families share previously unreported homology be- sequence CCTTGGG is 97% homologous between 27 virus tween several well-conserved regions of their genomes, isolates. Also, we found that the highly conserved HBV core, or suggesting their common evolutionary origin. nucleocapsid, protein shares 41% homology over 98 amino acids with the carboxyl-terminal region of the p30 gag Retroviral U5-like sequence in terminal repeats of the linear nucleocapsid protein of type C retroviruses. In both cases, as HBV RNA genome with the previously reported polymerase homology, HBV is In our DNA sequence analysis of hepadnavirus genomes, we most homologous to the murine leukemia/sarcoma retrovi- found that the region located between the carboxyl-terminal ruses. Further analysis revealed additional similarities between end of ORF 4 (X gene) and the amino-terminal end of ORF hepadnavirus and retroviral genomes. Taken together, our 3 (core gene) is the most highly conserved. This region is 99% results suggest that HBV and retroviruses have a common conserved over a distance of 50 nucleotides in the 11 evolutionary origin, with HBV arising through a process of mammalian hepadnavirus sequences examined. The homol- deletion from a retrovirus, or retrovirus-like, progenitor. ogy extends into the core ORF, where a mRNA poly(A) addition signal sequence is found, with the additional 33 Hepatitis B virus (HBV) is the smallest double-stranded nucleotides sharing 95% homology. The homology decreases DNA virus known to infect man and is the prototype of the somewhat over the next 28 nucleotides, where the down- hepadnavirus family, which includes similar viruses infecting stream G'T cluster, involved in poly(A) addition (16), is ducks [duck hepatitis B virus (DHBV)], ground squirrels found. Overall, the entire 111-nucleotide region is 95% [ground squirrel hepatitis virus (GSHV)], and woodchucks conserved between mammalian hepadnaviruses. [woodchuck hepatitis virus (WHV)]. Until recently, little was We used the computer search program IFIND (17) to known about HBV replication due to the limited host range locate DNA sequences in the National Institutes of Health of the virus and its inability to be grown in tissue culture. database homologous to this highly conserved region of the However, experiments using virus-infected liver revealed a hepadnavirus genome. We find that this sequence is homol- novel mode of DNA replication through an RNA intermedi- ogous to the U5 region of many type C retrovirus long ate by reverse transcription, a process similar to retrovirus terminal repeats (LTRs). Alignment ofthe sequences with the replication (1). This work was soon followed by the discovery programs ALIGN (18) and SEQ (19) shows that the homology of amino acid sequence homology between the putative encompasses nucleotides 5-66 of the conserved region of polymerase protein of HBV and a conserved region of the HBV and covers nearly the entire retroviral U5 sequence reverse transcriptase of retroviruses (2, 3) and raised the (Fig. 1 Left). The homology is much less at the extreme 3' end question of the genetic relatedness of the hepadnavirus and of U5, which contains the short inverted terminal repeat retrovirus families. essential for precise integration of retroviral DNA into The small size of the HBV genome [-3200 base pairs (bp)] cellular DNA (24). The most distinctive features in this makes it especially amenable for nucleotide sequence and homologous sequence are the presence of alternating purine biological analysis. At the time of this writing, the nucleotide and pyrimidine tracts and a well-conserved (i.e., 90% ho- sequence of 13 viral genomes has been published: nine HBV isolates of surface antigen subtypes adr (4-7), adw (5, 8), ayw Abbreviations: HBV, hepatitis B virus; LTR, long terminal repeat; (9, 10), and adyw (11), two isolates ofDHBV (12, 13), and one bp, base pair(s); DHBV, duck hepatitis B virus; GSHV, ground squirrel hepatitis virus; WHV, woodchuck hepatitis virus; ORF, open reading frame. The publication costs ofthis article were defrayed in part by page charge *Present address: Hepatitis Section, Laboratory of Infectious Dis- payment. This article must therefore be hereby marked "advertisement" eases, National Institute of Allergy and Infectious Diseases, Na- in accordance with 18 U.S.C. §1734 solely to indicate this fact. tional Institutes of Health, Bethesda. MD 20892. 2531 Downloaded by guest on September 23, 2021 2532 Evolution: Miller and Robinson Proc. Natl. Acad Sci. USA 83 (1986) ADR (1) GCTGTGCCTTGGGTGG ADR (2) GCTGTGCCTTGGGTGG ADR (3) GCTGTGCCTTGGGTGG ADR (4) GCTGTGCCTTGGGTGG ADW (1) GCTGTGCCTTGGGTGG 1 0 2 0 3 0 ADW (2) GCTGTGCCTTGGGTGG ADYW GCTGTGCCTTGGGTGG AYW (1) GCTGTGCCTTGGGTGG HEPADNAVIRUS T A C T G T T C A A F1CF TC CA A G C T G T G C C T T G G G AYW (2) GCTGTGCCTTGGGTGG DHBV (1) ACTGTACCTTTGGTAT RETROVIRUS T G A - G T G G T CJ - _ G C T G T T C C T T G G G DHBV (2) ACTGTACCTTTGGTAT GSHV GCTGTGCCTTGGATGG WHV GCTGTGCCTTGGATGG AKR GCTGATCCTTGGGAGG AMLV GCTGTTCCTTGGGAGG BKMV GCTGATCCTTGGGAGG FBJ GCTGATCCTTGGGAGG FESV GGTGTTCGTTGGGTAC FMCF GCTGTTCCTTGGGAGG FSFFV GCTGATCCTTGGGAGG 4 0 5 0 6 0 GALV GCTGTTCCTTGGGAAG MMCF GCTGTTCCTTGGGAGG MMLV GCTGTTCCTTGGGAGG AT G A C A T T G A C CC T T MMSV (1) GCTGTTCCTTGGGAGG T G G __C T __T T G GG GC G MMSV (2) GCTGTTCCTTGGGAGG RMLV GCTGTTCCTTGGGAGG A GGG T C T C C T C T G A G T G A T T G A C A T G A CC C G T ssV GTTGTTCCTTGGGAGG FIG. 1. Comparison of the conserved region of hepadnaviruses to the U5 sequence of retroviruses. (Left) The conserved consensus mammalian hepadnavirus sequence (nucleotides 5-66) is aligned with the conserved consensus U5 sequence of the following mammalian type C retroviruses (20-23): AKR murine leukemia virus (AKR), Abelson murine leukemia virus (AMLV), BL/Ka murine nonleukemogenic virus (BKMV), FBJ murine osteosarcoma virus (FBJ), feline sarcoma virus (FESV), Friend mink cell focus-forming virus (FMCF), gibbon ape leukemia virus (GALV), Moloney mink cell focus-forming virus (MMCF), Moloney murine leukemia virus (MMLV), Moloney murine sarcoma virus (MMSV) isolates 1 and 2, Rausher murine leukemia virus (RMLV), and simian sarcoma virus (SSV). Dashes represent gaps inserted into the sequence to provide maximum alignment. Homologous nucleotides are enclosed within boxes. (Right) Comparison ofthe homologous central region of 16 nucleotides from the U5-like sequences of all 13 sequenced hepadnavirus genomes to the U5 sequences of the 14 mammalian type C retroviruses cited above. mologous in 14 mammalian type C retroviruses and 13 In retroviruses, synthesis of DNA by reverse transcription hepadnavirus isolates) centrally located region of 16 nucle- of RNA initiates at a tRNA molecule hybridized to a otides (Fig. 1 Right). An internal heptanucleotide sequence complementary sequence on the RNA template, called the CCTTGGG, which we term the "CTG box," is highly primer binding site, adjacent to the U5 region. Synthesis of conserved (i.e., 97% homologous) between these 27 isolates the first strand of DNA begins with U5 copied first and and is found, with some modifications, in all retroviral U5 proceeds through the repeat region (R) to the 5' end of sequences examined (unpublished data). Of the 14 homolo- genomic RNA, followed by a process called template switch- gous mammalian type C retrovirus U5 sequences, the murine ing and synthesis of the remainder of first-strand DNA (see leukemia/sarcoma virus sequences are most homologous to ref. 38 for a review). The short sequence of U5-R DNA made those of the hepadnaviruses. prior to template switching has been termed "strong stop" Recently, retroviral-like elements have been found in DNA.