Identification of an Overprinting Gene in Merkel Cell Polyomavirus Provides Evolutionary Insight Into the Birth of Viral Genes
Total Page:16
File Type:pdf, Size:1020Kb
Identification of an overprinting gene in Merkel cell polyomavirus provides evolutionary insight into the birth of viral genes Joseph J. Cartera,b,1,2, Matthew D. Daughertyc,1, Xiaojie Qia, Anjali Bheda-Malgea,3, Gregory C. Wipfa, Kristin Robinsona, Ann Romana, Harmit S. Malikc,d, and Denise A. Gallowaya,b,2 Divisions of aHuman Biology, bPublic Health Sciences, and cBasic Sciences and dHoward Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA 98109 Edited by Peter M. Howley, Harvard Medical School, Boston, MA, and approved June 17, 2013 (received for review February 24, 2013) Many viruses use overprinting (alternate reading frame utiliza- mammals and birds (7, 8). Polyomaviruses leverage alternative tion) as a means to increase protein diversity in genomes severely splicing of the early region (ER) of the genome to generate pro- constrained by size. However, the evolutionary steps that facili- tein diversity, including the large and small T antigens (LT and ST, tate the de novo generation of a novel protein within an ancestral respectively) and the middle T antigen (MT) of murine poly- ORF have remained poorly characterized. Here, we describe the omavirus (MPyV), which is generated by a novel splicing event and identification of an overprinting gene, expressed from an Alter- overprinting of the second exon of LT. Some polyomaviruses can nate frame of the Large T Open reading frame (ALTO) in the early drive tumorigenicity, and gene products from the ER, especially region of Merkel cell polyomavirus (MCPyV), the causative agent SV40 LT and MPyV MT, have been extraordinarily useful models of most Merkel cell carcinomas. ALTO is expressed during, but not to study the viral and host processes required for cellular trans- required for, replication of the MCPyV genome. Phylogenetic anal- formation (9–11). More recently, a new oncogenic polyomavirus, ysis reveals that ALTO is evolutionarily related to the middle T Merkel cell polyomavirus (MCPyV), was discovered in Merkel cell antigen of murine polyomavirus despite almost no sequence sim- carcinoma (MCC), a rare but aggressive form of skin cancer, ilarity. ALTO/MT arose de novo by overprinting of the second exon providing the first established case of a human cancer stemming EVOLUTION of T antigen in the common ancestor of a large clade of mamma- from a polyomavirus infection (12, 13). lian polyomaviruses. Taking advantage of the low evolutionary In comparison with other human polyomaviruses, the MCPyV divergence and diverse sampling of polyomaviruses, we propose LT protein contains an expanded, highly divergent region of evolutionary transitions that likely gave birth to this protein. We ∼200 amino acids encompassing two conserved short linear suggest that two highly constrained regions of the large T antigen motifs (Fig. 1A and SI Appendix, Fig. S1). Given the precedent of ORF provided a start codon and C-terminal hydrophobic motif nec- protein diversity generated from the ER of polyomaviruses, we essary for cellular localization of ALTO. These two key features, investigated the implications of this highly divergent MCPyV LT together with stochastic erasure of intervening stop codons, region. We identified an alternate T antigen ORF, hereafter resulted in a unique protein-coding capacity that has been pre- referred to as ALTO, overprinting within this region in MCPyV served ever since its birth. Our study not only reveals a previously LT (Fig. 1A) in the +1 frame relative to the second exon of LT. fi unde ned protein encoded by several polyomaviruses including ALTO is expressed during, but is not required for, viral genome MCPyV, but also provides insight into de novo protein evolution. replication. Interestingly, despite almost no sequence similarity, ALTO most probably shares a common evolutionary origin with gene evolution | synonymous substitution | disordered motifs the other known case of overprinting in polyomaviruses, the second exon of MT from MPyV. Indeed, we find that preserva- he birth of new genes has fascinated biologists for decades. tion of the overprinting ALTO/MT ORF defines a large mono- TAlthough the steps required to generate a new gene by gene phyletic clade of mammalian polyomaviruses. By comparing LT duplication or gene rearrangement have been characterized, less genes in a phylogenetic context among polyomaviruses within is known about the birth of new genes de novo. One particularly and outside this clade, we propose an evolutionary model for the intriguing mechanism of de novo gene birth is via “overprinting,” generation of a previously undiscovered viral gene de novo by in which a novel overprinting gene is encoded as an alternate ORF overprinting. within an ancestral “overprinted” gene (1). Overprinting results in two unrelated functional proteins encoded as overlapping ORFs Results within the same DNA sequence. However, the origins of such A Previously Uncharacterized Overprinting ORF in MCPyV. We noted a complex evolutionary solution have remained elusive. that the expanded, highly divergent region of MCPyV LT could Viruses appear to be especially adept at this form of evolu- encode a previously undefined protein, which we named ALTO. tionary innovation. This frequent use of overprinting is likely the ALTO is the result of an overprinting ORF that is +1 frameshifted result of the severe constraints imposed on viral genome size, making gene innovation more likely to occur as overprinting rather than within a noncoding region (2). Due to the numerous Author contributions: J.J.C., M.D.D., H.S.M., and D.A.G. designed research; J.J.C., M.D.D., examples of overprinting in single-stranded RNA viruses, a great X.Q., A.B.-M., G.C.W., and K.R. performed research; A.B.-M. and G.C.W. contributed new deal of research has focused in particular on this class of viruses reagents/analytic tools; J.J.C., M.D.D., X.Q., A.R., H.S.M., and D.A.G. analyzed data; and (3–6). However, small DNA viruses, such as adenoviruses, pap- J.J.C., M.D.D., H.S.M., and D.A.G. wrote the paper. illomaviruses, and polyomaviruses, have a similar requirement to The authors declare no conflict of interest. maximize the coding capacity of their genomes. In this study, we This article is a PNAS Direct Submission. have taken advantage of our identification of an overprinting gene 1J.J.C. and M.D.D. contributed equally to this work. born in the ancestor of a large clade of polyomaviruses to in- 2To whom correspondence should be addressed. E-mail: [email protected]. vestigate the steps that allowed generation of this unique protein. 3Present address: Institute for Systems Biology, Seattle, WA 98109. ∼ Polyomaviruses are nonenveloped viruses containing an 5-kb This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. circular double-stranded DNA genome that infect a wide range of 1073/pnas.1303526110/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1303526110 PNAS Early Edition | 1of6 Downloaded by guest on September 30, 2021 Region of low are being evolutionarily preserved and protected from pre- A 100 similarity between MCPyV LT and sumably deleterious nonsynonymous changes. We investigated other human LTs whether the region of LT that overlaps ALTO displays decreased 50 synonymous site variation. We first compared divergence at % identity synonymous sites (dS) in LT genes in sequenced isolates of 0 900 MCPyV and found that the dS within the ALTO-encoding re- Amino acid = MCPyV LT DNAj j YGS/T LxCxE OBDD ATPase/HelicaseAT gion is 2.5 times lower than the rest of the gene (dS 0.048 for the region of overlap vs. 0.12 for nonoverlap). We found a sim- ALTO ilar pattern when we compared LT genes between MCPyV and B its closely related polyomaviruses (ChimpPyV1a, ChimpPyV2a, 196 429 861 3078 Transcript 1 Large T antigen and GorillaPyV), which also have the capacity to encode ALTO. (LT) In all pairwise comparisons, synonymous divergence is decreased 196 754 Transcript 2 Small T antigen directly overlapping the ALTO-encoding region (Fig. 1D), con- (ST) Transcript 3 sistent with both reading frames being maintained throughout 196 429 861 1622 2778 3078 viral evolution. Interestingly, the ALTO coding sequence was Transcript 4 57kD T antigen (57kT) intact in all MCPyV sequences deposited in GenBank from healthy individuals (n = 31) and from nonlesional skin from 880 1624 Frame 1 Alternative T antigen MCC patients (n = 3) but was predicted to have truncating or Frame 2 open reading frame SI Appendix Frame 3 880 1622 27782785 (ALTO) frame shift mutations in 19 of 46 MCC tumors ( , Table S1). These same mutations have also been previously C shown to truncate LT (14). 880 MGPLNSKNGGDQEDSASGRHTNMGPIHTGPTQDPESLPPMHPGEPPVEAHHPTARALPLGMGPSQRPRLQ TPSPEDPIYLPNTMRNPPHPLDPVAERRPPIQEENPAHPMEPVYLEILPERMAPGRISSAMNHFPPLSLP ALTO Is Expressed in Cells During Viral DNA Replication. To validate RPLRSLRSPPPQEARPGSPRLPLPRRPRHLSLQMRNTDPPPSPPRRPLLHSQESENLGGPEALQALLVQQ that ALTO is a bona fide MCPyV expressed protein, we trans- VLQALHQSQKRTEKLLFLLIFLLIFLIILAMLYIVIKQ(QI) fected an intact circular MCPyV genome (SI Appendix, Fig. S3) 1624 2785 into human HEK293 cells and monitored protein expression and D 1 Region of viral DNA replication. Under these conditions, the MCPyV ge- decreased dS nome has been shown to replicate its DNA for several days (19, 20). Lysates from transfected cells were analyzed by immunoblot 0.5 using immune rabbit serum that had been generated against MCPyV-GorillaPyV1 a mixture of two ALTO peptides (Fig.1C). We detected ALTO Synonymous MCPyV-ChimpPyV1a A site divergence (dS) MCPyV-ChimpPyV2a at the expected size (27.6 kDa; Fig. 2 ) in cells transfected with 0 0-100 100-200 200-300 300-400 400-500 500-600 600-700 700-800 wild-type DNA. Importantly, the ALTO protein was not detec- Amino acid window ted using a MCPyV genome bearing a point mutation that MCPyV LT DNAj j YGS/T LxCxE OBDD ATPase/HelicaseAT eliminated the predicted ALTO initiation codon without af- ALTO fecting the LT encoded protein sequence.