BMC Genomics Biomed Central

BMC Genomics BioMed Central Research article Open Access Evidence for conservation and selection of upstream open reading frames suggests probable encoding of bioactive peptides Mark L Crowe*†1,2, Xue-Qing Wang†3 and Joseph A Rothnagel1,2,3 Address: 1The Australian Research Council Special Research Centre for Functional and Applied Genomics, The University of Queensland, Brisbane, Queensland 4072, Australia, 2Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia and 3School of Molecular and Microbial Sciences, The University of Queensland, Brisbane, Queensland 4072, Australia Email: Mark L Crowe* - [email protected]; Xue-Qing Wang - [email protected]; Joseph A Rothnagel - [email protected] * Corresponding author †Equal contributors Published: 26 January 2006 Received: 05 September 2005 Accepted: 26 January 2006 BMC Genomics 2006, 7:16 doi:10.1186/1471-2164-7-16 This article is available from: http://www.biomedcentral.com/1471-2164/7/16 © 2006 Crowe et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Approximately 40% of mammalian mRNA sequences contain AUG trinucleotides upstream of the main coding sequence, with a quarter of these AUGs demarcating open reading frames of 20 or more codons. In order to investigate whether these open reading frames may encode functional peptides, we have carried out a comparative genomic analysis of human and mouse mRNA 'untranslated regions' using sequences from the RefSeq mRNA sequence database. Results: We have identified over 200 upstream open reading frames which are strongly conserved between the human and mouse genomes. Consensus sequences associated with efficient initiation of translation are overrepresented at the AUG trinucleotides of these upstream open reading frames, while comparative analysis of their DNA and putative peptide sequences shows evidence of purifying selection. Conclusion: The occurrence of a large number of conserved upstream open reading frames, in association with features consistent with protein translation, strongly suggests evolutionary maintenance of the coding sequence and indicates probable functional expression of the peptides encoded within these upstream open reading frames. Background ecule until reaching the first AUG, at which point transla- The 5' untranslated regions (5' UTRs) of vertebrate tion is initiated. Further work has since resulted in mRNAs typically vary from a few tens of bases up to sev- refinements to this model, such as the recognition of leaky eral hundred bases in length, and contain a variety of fea- scanning, where uAUGs in a sequence context which var- tures which affect the efficiency of translation of the main ies from the optimal consensus for initiation of transla- coding sequence (CDS) of the transcript. These include tion are bypassed by the ribosome complex, and of the length and secondary structure of the 5' UTR, the reinitiation, where the ribosome resumes scanning after sequence context of the initiation codon of the main CDS translation of upstream open reading frames (uORFs) and the presence of upstream AUG codons (uAUGs). The (reviewed in [2]). original scanning model of mRNA translation (reviewed in [1]) postulates that the translating ribosome enters the Both the leaky scanning and ribosome reinitiation mech- mRNA at the 5' end and processes linearly down the mol- anisms typically result in a decrease in efficiency of trans- Page 1 of 10 (page number not for citation purposes) BMC Genomics 2006, 7:16 http://www.biomedcentral.com/1471-2164/7/16 Table 1: Frequency of upstream AUGs and ORFs. Human Mouse Number of mRNA sequences in initial dataset (1) 16504 11291 Number of mRNA sequences containing > 1 uAUG (2) 9531 6352 Total number of uAUGs (3) 35599 24308 Number of mRNA sequences containing > 1 uORF (4) 4557 2820 Total number of uORFs (5) 8216 5487 Number of mRNA sequences containing > 1 uORF after removal of duplicates (6) 3924 2795 Total number of uORFs after removal of duplicates (7) 7138 5430 Number of mRNA sequences containing > 1 uORF after removal of blast matches (8) 3650 2678 Total number of uORFs after removal of blast matches (9) 6454 5089 Frequency of upstream AUGs and ORFs in human and mouse RefSeq mRNA sequences. Initial dataset refers to all sequences of > 60 nucleotides annotated as 5' UTRs. Numbers in parentheses indicate the corresponding stage in the filtering flowchart (figure 1). lation from subsequent initiation codons, in some cases Despite the examples above, most studies on the effects of virtually eliminating translation of the main CDS, uAUGs have considered the role of the uAUG as acting although more usually causing between two and 50-fold solely as an alternative site of ribosome initiation and reduction in protein levels [3-8]. It has therefore been pro- therefore to repress translation of the main CDS; the role posed that the presence of uAUGs may represent a of the associated uORF is apparently considered largely method of post-transcriptional regulation, through incidental. Even in reports describing the function of the repression of translation from the main AUG. This theory uORF itself, the role of the translated peptide is primarily is supported by the identification of transcriptional and a cis-acting one, either causing ribosome stalling at the ter- splice variants with an identical main CDS but with 5' mination codon of the uORF and a high level of repres- UTRs containing varying numbers of uAUGs [6,9,10], as sion of the main CDS [25,26] or, more occasionally, well as cases of an uAUG repressing translation in one tis- affecting mRNA stability. In this report we describe our sue type while not affecting it in another [11,12]. identification of a substantial number of uORF-encoded peptides (uPEPs) which are highly evolutionarily con- Furthermore a number of diseases have been associated served and which we propose have functions beyond that with uAUGs. These are caused either through mutations of simply reducing translation of the main CDS. introducing or eliminating uAUGs, resulting in a conse- quent decrease or increase in translation efficiency [13- Results 16], or by physiological changes which may affect the Presence of uAUGs and uORFs degree of repression by a particular uAUG [17] or may For this experiment, we restricted our analysis to uORFs of change splicing patterns of the 5' UTR to alter the number 20 to 99 codons in length. While these specific values are of uAUGs [6]. somewhat arbitrary, we chose them to maximise the prob- ability that the uORFs chosen for further analysis were In addition to repression of translation by uAUGs, addi- genuine and not affected by sequencing errors or cloning tional regulation of mRNAs containing uORFs can be artefacts. While it is possible that a uORF of less than 20 brought about by the uORF affecting transcript stability. codons could encode a functional peptide, ORFs longer The premature termination codon model for nonsense- than this are potentially easier to identify as conserved mediated decay (NMD) of mRNA (reviewed in [18-20]) between species and furthermore would be more amena- suggests that transcripts containing uORFs would effec- ble to subsequent experimental validation. We chose the tively have a premature termination codon at the end of upper limit after our initial results indicated that only a the uORF. Translation of the uORF would reduce progres- small fraction of uORFs (~5%) were more than 100 sion of the ribosome to the main CDS and ultimately to codons long, so their exclusion was unlikely to greatly its termination codon, and therefore may result in bias the analysis, while their inclusion would lead to the increased targetting of these molecules for NMD, thereby potential for contamination of our results by artefacts reducing the steady-state level of that particular transcript. such as co-ligated cDNA clones or misidentification of the Some uORFs do appear to have such a role in destabiliza- main CDS in a cDNA clone. tion of their mRNA by promoting NMD [21-23] while others have been shown to similarly destabilize mRNA, In the RefSeq release 6 database, there are 21768 human but via an NMD-independent pathway [24]. and 17106 mouse mature mRNA sequences. Of these, Page 2 of 10 (page number not for citation purposes) BMC Genomics 2006, 7:16 http://www.biomedcentral.com/1471-2164/7/16 Table 2: Frequency of downstream AUGs and ORFs. Human Mouse Number of mRNA sequences in initial dataset (1) 21597 14790 Number of mRNA sequences containing > 1 dAUG (2) 19853 14327 Total number of dAUGs (3) 352301 229945 Number of mRNA sequences containing > 1 dORF (4) 16965 11598 Total number of dORFs (5) 85876 55815 Number of mRNA sequences containing > 1 dORF after removal of duplicates (6) 14252 11377 Total number of dORFs after removal of duplicates (7) 69202 54403 Number of mRNA sequences containing > 1 dORF after removal of blast matches (8) 13899 11299 Total number of dORFs after removal of blast matches (9) 65258 52786 Frequency of downstream AUGs and ORFs (dAUGs and dORFs) in human and mouse RefSeq mRNA sequences. Initial dataset refers to all sequences of > 60 nucleotides annotated as 3' UTRs. Numbers in parentheses indicate the corresponding stage in the filtering flowchart (figure 1). approximately 75% and 66% respectively have an anno- in the 5' UTR. By using a relatively stringent filter, we tated 5' UTR of 60 bases or more in length (exact figures removed approximately 10% of the uPEPs which showed given in table 1). The difference in the proportion of long more than a small degree of similarity to proteins in the 5' UTRs is primarily because of a much higher number of NCBI non-redundant peptide sequence database, thus mouse sequences containing only the main CDS, with no minimising the possibility that the uORFs are in fact part 5' UTR given (11% of mouse sequences, compared with of a main CDS, and appear to be in the 5' UTR as a conse- only 4.5% of human ones).

Load more