npg 508 Cell Research (2010) 20:508-509. npg © 2010 IBCB, SIBS, CAS All rights reserved 1001-0602/10 $ 32.00 RESEARCH HIGHLIGHT www.nature.com/cr

Decoding the dual-coding region: key factors influencing the translational potential of a two-ORF-containing transcript

Han Liang1

1Department of Bioinformatics and Computational Biology, The University of Texas M. D. Anderson Center, 1515 Holcombe Blvd. Houston, TX 77030, USA Cell Research (2010) 20:508-509. doi:10.1038/cr.2010.62; published online 3 May 2010

A dual-coding region is defined as A A a stretch of DNA that encodes amino acids in overlapping reading frames. Gene B Dual-coding regions are often found in bacteriophages and viruses (e.g., HIV) B with tiny sizes; such an arrange- ment is believed to greatly increase genetic information storage efficiency [1]. In mammals, genetic information storage is not an issue because the mam- malian genome is huge and contains C large amounts of non-coding sequences. Coding regions usually amino acids only in one reading frame, but there are some exceptions. Generally speaking, dual-coding regions can arise Figure 1 Three types of dual-coding regions: (A) from nearby overlapping from three sources: (1) from nearby (two genes can be either in the same strand or in opposite strands); (B) from alternatively spliced transcripts of the same gene, where the black boxes indicate overlapping genes (two genes can be ; and (C) from different translational initiation sites of a single transcript. either in the same strand or in opposite strands); (2) from alternatively spliced transcripts of the same gene; and (3) from different translational initiation Most intriguing among the three amount and significance of true dual- sites of a single transcript (Figure 1). sources of dual-coding regions men- coding regions will remain unclear. A Dual-coding regions have recently tioned previously is that which arises fundamental question is what factors attracted wide interest [2-4]. The iden- from different translational initiation determine the on/off function of the two tification and characterization of these sites on a single transcript. This con- products potentially encoded special coding regions will improve the struct involves the generation of two in a two-ORF-containing transcript. current gene/genome annotation, and distinct products from the same Xu and colleagues address this ques- contribute to a deeper understanding of mRNA transcript. Although current tion in a recently published article [5], the protein mechanism. bioinformatic approaches can readily in which they describe an elegant in identify hidden reading frames in anno- vitro dual-coding expression system tated frames and can define transcripts they developed. In their study, two Correspondence: Han Liang that contain two open reading frames fluorescent (green fluorescent Tel: 1-713-745-9815; Fax: 1-713-563-4242 (ORFs) [2], without experimental protein [GFP] and red [RFP]) were E-mail: [email protected] evidence at the translational level, the encoded in an artificial transcript. The

Cell Research | Vol 20 No 5 | May 2010 npg 509 GFP (longer ORF) and RFP (shorter implies that the evolutionary processes Finally, the in vitro study, in a sense, ORF) started with the first AUG codon that involve these special transcripts only examines the translational potential (the start codon) in two overlapping are quite dynamic, which is similar of a transcript in a simplified environ- ORF configurations, respectively. For to our understanding of microRNAs ment. The translation of cellular mes- each configuration, the combinations [7]. Conducting additional translation senger RNA in vivo is a more complex of strong and/or weak Kozak motifs on an already-existing transcript may process and often involves many factors [6] were respectively introduced to provide an energetically-efficient way to that are specific to a given tissue or cell flank the two AUG codons. Thus, Xu generate a pool of raw materials within type. Nevertheless, Xu et al. [5] obtain et al. [5] were able to systematically the evolutionary process. But such an a set of experimentally-determined and examine the impact of three key factors overlapping reading fame arrangement biologically-sensible rules to identify on the translational activity of a two- comes with a cost: the constraints of an dual-coding transcripts, which provides ORF-containing transcript: the position existing reading frame do not allow for a valuable starting point for further of the start codon, the ORF length and free exploration of the space investigation. the strength of the Kozak motif. Using in a second reading frame. Thus, a new fluorescence imaging techniques and protein product is only occasionally References western blots, they detected dual protein selectively favored and maintained in products in four out of the eight tran- long-term evolution. 1 Normark S, Bergstrom S, Edlund T, et script subtypes under survey. Among the The study of Xu et al. [5] provides al. Overlapping genes. Annu Rev Genet ORFs, the vast majority (7/8) of ORFs crucial insights into characterizing 1983; 17:499-525. 2 Chung WY, Wadhawan S, Szklarczyk with the first AUG and the longer ORFs dual-coding transcripts in mammalian R, Pond SK, Nekrutenko A. A first were translated. For the first time, these , and raises some interesting look at ARFome: dual-coding genes in results have established a set of explicit questions. First, to what extent can mammalian genomes. PLoS Comput rules for predicting whether a two-ORF- the observation based on the artificial Biol 2007; 3:e91. containing transcript can generate two dual-coding transcript be generalized to 3 Liang H, Landweber LF. A genome-wide protein products simultaneously. other transcripts? In the experiment, the study of dual coding regions in human Using the rules inferred from their first and second AUG codons are only alternatively spliced genes. Genome Res 2006; 16:190-196. experimental studies, Xu et al. [5] fur- a few away, and the GFP 4 Tress ML, Martelli PL, Frankish A, et al. ther performed a bioinformatic analysis ORF is about two times longer than the The implications of alternative splicing to screen the dual-coding transcripts in RFP ORF. These parameters often vary in the ENCODE protein complement. the human and mouse genomes. They greatly from transcript to transcript, Proc Natl Acad Sci USA 2007; 104:5495- found that about 170 human transcripts and so the real predictive power of the 5500. are potentially dual coding and only 18 proposed model remains to be seen. 5 Xu H, Wang P, Fu Y, et al. Length of are conserved in the mouse genome. Moreover, the rules obtained from the the ORF, position of the first AUG and When the effect of nonsense-mediated study are in a qualitative format, but the Kozak motif are important factors in potential dual-coding transcripts. Cell mRNA decay was considered, those the relative expression levels of the Res 2010; 20:445-457. numbers were reduced to 80 and 9, two protein products show significant 6 Kozak M. Context effects and inefficient respectively. Only a relatively small variations among different transcript initiation at non-AUG codons in percentage of dual-coding transcripts subtypes. Hence, it would be desirable eucaryotic cell-free translation systems. are conserved between the two species, to build a quantitative model to more Mol Cell Biol 1989; 9:5073-5080. suggesting a recent origin of most dual- accurately predict the translational 7 Liang H, Li WH. Lowly expressed coding transcripts. This observation also behavior of a dual-coding transcript. human microRNA genes evolve rapidly. Mol Biol Evol 2009; 26:1195-1198.

www.cell-research.com | Cell Research