Pseudogenes: Pseudo-Functional Or Key Regulators in Health and Disease?
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from rnajournal.cshlp.org on October 1, 2021 - Published by Cold Spring Harbor Laboratory Press REVIEW Pseudogenes: Pseudo-functional or key regulators in health and disease? RYAN CHARLES PINK, KATE WICKS, DANIEL PAUL CALEY, EMMA KATHLEEN PUNCH, LAURA JACOBS, and DAVID RAUL FRANCISCO CARTER School of Life Sciences, Oxford Brookes University, Headington, Oxford, OX3 0BP, United Kingdom ABSTRACT Pseudogenes have long been labeled as ‘‘junk’’ DNA, failed copies of genes that arise during the evolution of genomes. However, recent results are challenging this moniker; indeed, some pseudogenes appear to harbor the potential to regulate their protein-coding cousins. Far from being silent relics, many pseudogenes are transcribed into RNA, some exhibiting a tissue- specific pattern of activation. Pseudogene transcripts can be processed into short interfering RNAs that regulate coding genes through the RNAi pathway. In another remarkable discovery, it has been shown that pseudogenes are capable of regulating tumor suppressors and oncogenes by acting as microRNA decoys. The finding that pseudogenes are often deregulated during cancer progression warrants further investigation into the true extent of pseudogene function. In this review, we describe the ways in which pseudogenes exert their effect on coding genes and explore the role of pseudogenes in the increasingly complex web of noncoding RNA that contributes to normal cellular regulation. Keywords: pseudogenes; functional; noncoding RNA; transcription; RNA INTRODUCTION transcription or translation of the gene (Fig. 1A) result in the formation of ‘‘unitary’’ pseudogene (Zhang et al. 2010). The human genome, like that of other mammals, is littered Duplicated pseudogenes are created via tandem duplication with a variety of repetitive elements and noncoding genes. or uneven crossing-over (Mighell et al. 2000). These du- One such element is the pseudogene, a poor facsimile of an plicated genes lose their protein-coding potential due to the original protein-coding gene that has lost the ability to loss of promoters or enhancers or crippling mutations such produce a functional protein (Mighell et al. 2000). Because as frameshifts or premature stop codons (Fig. 1B). How- they do not code for proteins, pseudogenes are often ever, they do tend to retain their characteristic intron-exon assumed to be nonfunctional and labeled as ‘‘junk DNA.’’ structure. In contrast, retrotransposed or ‘‘processed’’ While some pseudogenes are transcriptionally silent, others pseudogenes (PPs) are produced when an mRNA transcript are active, raising the question of whether their noncoding is reverse-transcribed and integrated into the genome at a transcripts are a spurious use of cellular energy or instead new location (Fig. 1C), and therefore, they do not normally harnessed by the cell to regulate coding genes (Balakirev contain introns (Maestre et al. 1995; D’Errico et al. 2004). and Ayala 2003). This question is particularly pertinent Other common features of PPs are their polyA tracts and given the recent flurry of evidence suggesting that long direct repeats at either end of the pseudogene (Maestre noncoding RNAs play a critical role in regulating genomic et al. 1995; D’Errico et al. 2004). The retrotransposition of function (Mattick and Makunin 2006; Guttman et al. 2009; mRNAs into the genome appears to be mediated by long Caley et al. 2010). interspersed nuclear element 1 (L1) (Esnault et al. 2000; Pseudogenes can arise through a variety of mechanisms. Ding et al. 2006), and the transcriptional activity of the Spontaneous mutations in a coding gene that prevent either resulting PP depends on whether the integration event oc- curs close to another promoter (Zheng et al. 2007). The collection of processed pseudogenes in the human genome Reprint requests to: David Raul Francisco Carter, School of Life has been generated from just 10% of the coding genes Sciences, Oxford Brookes University, Gipsy Lane, Headington, Oxford, (Ohshima et al. 2003; Zhang et al. 2003). Highly expressed OX3 0BP, UK; email: [email protected]; fax: 44 1865 483242. Article published online ahead of print. Article and publication date are housekeeping genes are more likely to produce PPs, as are at http://www.rnajournal.org/cgi/doi/10.1261/rna.2658311. other highly transcribed shorter RNAs (Goncxalves et al. RNA (2011), 17:00–00. Published by Cold Spring Harbor Laboratory Press. Copyright Ó 2011 RNA Society. 1 Downloaded from rnajournal.cshlp.org on October 1, 2021 - Published by Cold Spring Harbor Laboratory Press Pink et al. neither selected for or against (Li et al. 1981). However, this premise relies on the assumption that pseudogenes are functionally inert. There is recent evi- dence that some pseudogenes are func- tionally active, and therefore, studying their evolution and conservation could support a functional role and give in- sight into their potential mechanism of action. The synonymous to nonsynonymous substitution rate (KA/KS) is a measure of the proportion of mutations in DNA sequence that also alter amino acid se- quence; it is often used to assess whether a sequence is under evolutionary con- FIGURE 1. Types of pseudogene. (A) Mutation of existing genes gives rise to unitary straint (Hurst 2002; Torrents et al. 2003). pseudogenes. (B) Duplicated pseudogenes are produced following mutation of copied genes. (C) Reverse transcription of mRNA into cDNA followed by retrotransposition into genomic It has been reported recently (for review, DNA leads to the generation of processed pseudogenes. Filled boxes represent exons; open boxes see Balakirev and Ayala 2003) that syn- represent introns; X represents a crippling mutation that ablates protein-coding potential. onymous point mutations occur far more frequently than do nonsynony- 2000). This is exemplified by the small number of genes mous base changes in the Drosophila Est-6 pseudogene encoding ribosomal proteins that account for z20% of (Balakirev et al. 2003). In chickens, it has been observed human PPs (Zhang et al. 2002). that, within the multiple IglV and IghV pseudogenes, the The term pseudogene was coined in 1977, when Jacq and number of stop codons contained in the ‘‘coding’’ sequence co-workers discovered a version of the gene coding for 5S is far lower than would be expected were nucleotide sub- rRNA that was truncated but retained homology with the stitutions occurring at random; in addition, the majority active gene in Xenopus laevis (Jacq et al. 1977). During the of stop codons introduced by point mutations are then following two decades, pseudogenes were discovered in ‘‘corrected’’ and eliminated by further point mutations a sporadic fashion (Mighell et al. 2000). The acceleration in within the same codon (Rothenfluh et al. 1995). This feature, next-generation sequencing technologies coupled with the which is also observed in VH pseudogenes in mice (Schiff monumental human genome project has yielded the full et al. 1985), may indicate either that some presumed genome sequences of a range of organisms, permitting pseudogenes are, in fact, protein-coding, or that the con- much more thorough analyses of pseudogene prevalence servation of open reading frames plays a role in any puta- (Ohshima et al. 2003; Torrents et al. 2003; Zhang et al. tive function of pseudogenes involved in somatic gene 2003). Remarkably, pseudogenes are almost as numerous as rearrangements (Balakirev and Ayala 2003). coding genes, with predictions ranging from 10,000 to A recent report described a region of the 39 UTR of the 20,000 human pseudogenes (Zhang and Gerstein 2004). The PTENP1 pseudogene that shared over 95% sequence ho- majority of human pseudogenes are PPs (Ohshima et al. mology with the PTEN coding gene (Poliseno et al. 2010). 2003; Torrents et al. 2003; Zhang et al. 2003), while the The functional implications of this pattern and high degree number of unitary pseudogenes in the human genome is of conservation are discussed in further detail below. <100 (Zhang et al. 2010). Pseudogenes are present in a wide Pseudogenes gradually accumulate mutations, and the range of species, including plants (Loguercio and Wilkins number of mutations can give us an estimate of their age. 1998; Benovoy and Drouin 2006), bacteria (Ochman and Fascinatingly, the appearance of Alu elements in Old World Davalos 2006) [though they are not as numerous in unicel- primates coincided with the peak of processed pseudogene lular organisms (Lawrence et al. 2001)], insects (Ramos- generation and subsequent radiation of primates z40 million Onsins and Aguade´ 1998; Harrison et al. 2003), and nema- years ago (Ohshima et al. 2003; Zhang et al. 2003). Con- tode worms (Harrison et al. 2001), but they are particularly servation of pseudogenes across different species has also numerous in mammals (Zhang and Gerstein 2004). been observed. Analysis of the rhesus macaque major his- tocompatibility complex (MHC) extended class II region re- vealed two pseudogenes that were found to be homologous EVOLUTION AND CONSERVATION to the human HIV TAT-specific factor-1-like and zinc finger- OF PSEUDOGENES like pseudogenes, which was suggestive of evolutionary Pseudogenes are sometimes considered to represent ‘‘neu- conservation (Sudbrak et al. 2003). Investigations by Podlaha tral sequence,’’ in which mutations that accumulate are and colleagues (Podlaha and Zhang 2004) demonstrated that 2 RNA, Vol. 17, No. 5 Downloaded from rnajournal.cshlp.org on October 1, 2021 - Published by Cold Spring Harbor Laboratory