Origin, Evolution and Stability of Overlapping Genes in Viruses: a Systematic Review
Total Page:16
File Type:pdf, Size:1020Kb
G C A T T A C G G C A T genes Review Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review Angelo Pavesi Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy; [email protected]; Tel.: +39-0521905647 Abstract: During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps. Citation: Pavesi, A. Origin, Evolution and Stability of Keywords: asymmetric evolution; codon usage; de novo protein creation; modular evolution; multi- Overlapping Genes in Viruses: A variate statistics; negative selection: phylogenetic distribution; positive selection; prediction methods; Systematic Review. Genes 2021, 12, sequence-composition features; symmetric evolution; virus evolution 809. https://doi.org/10.3390/ genes12060809 Academic Editors: Luigi Viggiano and Renè Massimiliano Marsano 1. Introduction Modification of existing genes, such as duplication followed by functional divergence, Received: 6 May 2021 fusion (two adjacent genes fuse into a single gene), fission (a single gene splits into two Accepted: 24 May 2021 genes), exon shuffling (rearrangement of protein modules), or horizontal gene transfer Published: 26 May 2021 (gene exchange between unrelated species), is a common mechanism by which new genes arose during the evolution of living organisms [1–4]. However, genes can also originate de Publisher’s Note: MDPI stays neutral novo by taking place in non-coding regions, such as intergenic regions or introns [5,6]. with regard to jurisdictional claims in During their long evolutionary history viruses generated many proteins de novo published maps and institutional affil- by a mechanism called “overprinting” [7]. Overprinting is a process in which critical iations. nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF), while preserving the function of pre-existing gene [8]. It is thought that most overlapping genes evolve by this mechanism, and that consequently each overlap contains one ancestral frame and one originated de Copyright: © 2021 by the author. novo [9]. It is also believed that overprinting is a source of genetic novelties, because the de Licensee MDPI, Basel, Switzerland. novo proteins, unlike the ancestral ones, usually lack any remote homologs in databases [10]. This article is an open access article Most of new genes originated by overprinting are expressed by the sense strand. distributed under the terms and They are classified as same-strand, or parallel, overlapping genes because of transcrip- conditions of the Creative Commons tion from the same strand of DNA. They are usually denoted as +1 overlapping genes, Attribution (CC BY) license (https:// when the de novo frame is shifted one nucleotide 30 with respect to the ancestral frame creativecommons.org/licenses/by/ (Figure1A), or as +2 overlapping genes when the de novo frame is shifted two nucleotides 4.0/). Genes 2021, 12, 809. https://doi.org/10.3390/genes12060809 https://www.mdpi.com/journal/genes Genes 2021, 12, x FOR PEER REVIEW 2 of 26 Genes 2021, 12, 809 from the same strand of DNA. They are usually denoted as +1 overlapping genes, when2 of 25 the de novo frame is shifted one nucleotide 3′ with respect to the ancestral frame (Figure 1A), or as +2 overlapping genes when the de novo frame is shifted two nucleotides 3′ (Fig- ure 1B). According to genetic code, 71.6% of substitutions in the third codon position are 30 (Figure1B). According to genetic code, 71.6% of substitutions in the third codon position synonymous, compared to only 0 and 4.6% of substitutions in the second and first codon are synonymous, compared to only 0 and 4.6% of substitutions in the second and first positions respectively. In overlapping genes, therefore, a nucleotide substitution that is codon positions respectively. In overlapping genes, therefore, a nucleotide substitution that synonymous in one frame is highly likely to be non-synonymous in the alternative frame. is synonymous in one frame is highly likely to be non-synonymous in the alternative frame. Figure 1. Orientation of same-strand overlapping genes. (A) Overlapping gene with the downstream Figureframe 1. shifted Orientation one nucleotide of same-strand 30 with overlapping respect to genes. the upstream (A) Overlapping frame. There gene arewith 3 the types down- of codon streamposition frame (cp): shifte cp13d (bold one nucleotide character), 3 in′ with which respect the first to codonthe upstream position frame. of the There upstream are 3 frame types overlaps of codon position (cp): cp13 (bold character), in which the first codon position of the upstream frame the third codon position of the downstream frame; cp21 (underlined character), in which the second overlaps the third codon position of the downstream frame; cp21 (underlined character), in which codon position of the upstream frame overlaps the first codon position of the downstream frame; the second codon position of the upstream frame overlaps the first codon position of the down- streamcp32 (italicframe; character), cp32 (italic in charac whichter), the in third which codon the position third codon of the po upstreamsition of the frame upstream overlaps frame the second overlapscodon position the second of the codon downstream position of frame. the downstream (B) Overlapping frame. gene (B) with Overlapping the downstream gene with frame the shifted 0 downstreamtwo nucleotides frame 3 shiftedwith respecttwo nucleotides to the upstream 3′ with respect frame. to There the upst are 3ream types frame. of codon There position are 3 (cp): typescp12 of (bold codon character), position in(cp): which cp12 the (bold first character), codon position in which of thethe upstreamfirst codon frame position overlaps of the the up- second streamcodon frame position overlaps of the the downstream second codon frame; positi cp23on (underlinedof the downstream character), frame; in whichcp23 (underlined the second codon character),position ofin thewhich upstream the second frame codon overlaps position the thirdof the codonupstream position frame of overlaps the downstream the third codon frame; cp31 position(italic character), of the downstream in which frame; the third cp31 codon (italic position character), of the in upstreamwhich the frame third codon overlaps position the first of codonthe upstream frame overlaps the first codon position of the downstream frame. According to the ge- position of the downstream frame. According to the genetic code, a nucleotide substitution at first netic code, a nucleotide substitution at first codon position causes an amino acid change in 95.4% codon position causes an amino acid change in 95.4% of cases, at second position in 100% of cases, of cases, at second position in 100% of cases, and at third position in 28.4% of cases. and at third position in 28.4% of cases. OneOne of of the the reasons reasons why why overlapping overlapping genes genes ha haveve long attracted attentionattention ofof researchersresearch- ersis thatis that they they represent represent an an intriguing intriguing example example of adaptiveof adaptive conflict. conflict. Indeed, Indeed, they they simultane- simul- taneouslyously encode encode two two proteins proteins whose whose freedom freedom to change to change is constrained is constrained by each by other,each other, which whichwould would be expected be expected to severely to severely reduce reduce the ability the ability of the of virus the virus to adapt. to adapt. On the On other the other hand, hand,the unusual the unusual constraints constraints under under which which alternative alternative ORFs evolveORFs evolve can give can rise give to proteinsrise to pro- with teinsunusual with sequenceunusual properties,sequence properties, most importantly most importantly the potential the for potential novel structural for novel folds struc- and turalmechanisms folds and of mechanisms action. of action. ThisThis review review deals deals with with the the origin origin of of overlapping overlapping genes, genes, what what factors factors favor favor their their birth birth andand