Re-Recognition of Pseudogenes: from Molecular to Clinical Applications Xu Chen1, Lin Wan2, Wei Wang1, Wen-Jin Xi1, An-Gang Yang1 and Tao Wang3
Total Page:16
File Type:pdf, Size:1020Kb
Theranostics 2020, Vol. 10, Issue 4 1479 Ivyspring International Publisher Theranostics 2020; 10(4): 1479-1499. doi: 10.7150/thno.40659 Review Re-recognition of pseudogenes: From molecular to clinical applications Xu Chen1, Lin Wan2, Wei Wang1, Wen-Jin Xi1, An-Gang Yang1 and Tao Wang3 1. State Key Laboratory of Cancer Biology, Department of Immunology, Fourth Military Medical University, Xi’an, Shaanxi, 710032, P.R. China. 2. Department of Hematology and Oncology, Children’s Hospital of Soochow University, Suzhou, Jiangsu, 215025, P.R. China. 3. Department of Medical Genetics and Developmental Biology, Fourth Military Medical University, Xi’an, Shaanxi, 710032, P.R. China. Corresponding authors: Tao Wang, Department of Medical Genetics and Developmental Biology, Fourth Military Medical University, #169 Changle West Road, Xi'an, Shaanxi, 710032, P.R. China. Tel.: 86-29-84772709; Fax: 86-29-83253816; E-mail: [email protected] OR An-Gang Yang, State Key Laboratory of Cancer Biology, Department of Immunology, Fourth Military Medical University, #169 Changle West Road, Xi'an, Shaanxi, 710032, P.R. China. Tel.: 86-29-84772703; Fax: 86-29-83253816; E-mail: [email protected] © The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/). See http://ivyspring.com/terms for full terms and conditions. Received: 2019.09.25; Accepted: 2019.11.12; Published: 2020.01.01 Abstract Pseudogenes were initially regarded as “nonfunctional” genomic elements that did not have protein-coding abilities due to several endogenous inactivating mutations. Although pseudogenes are widely expressed in prokaryotes and eukaryotes, for decades, they have been largely ignored and classified as gene “junk” or “relics”. With the widespread availability of high-throughput sequencing analysis, especially omics technologies, knowledge concerning pseudogenes has substantially increased. Pseudogenes are evolutionarily conserved and derive primarily from a mutation or retrotransposon, conferring the pseudogene with a “gene repository” role to store and expand genetic information. In contrast to previous notions, pseudogenes have a variety of functions at the DNA, RNA and protein levels for broadly participating in gene regulation to influence the development and progression of certain diseases, especially cancer. Indeed, some pseudogenes have been proven to encode proteins, strongly contradicting their “trash” identification, and have been confirmed to have tissue-specific and disease subtype-specific expression, indicating their own value in disease diagnosis. Moreover, pseudogenes have been correlated with the life expectancy of patients and exhibit great potential for future use in disease treatment, suggesting that they are promising biomarkers and therapeutic targets for clinical applications. In this review, we summarize the natural properties, functions, disease involvement and clinical value of pseudogenes. Although our knowledge of pseudogenes remains nascent, this field deserves more attention and deeper exploration. Key words: pseudogene, classification, function, diagnosis, prognosis, therapeutics Introduction Since the completion of the Human Genome evolved from neutral selection without coding ability. Project, multiple genomic sequencing analyses have In that period, pseudogenes, along with other been successively accomplished in different noncoding factors, were all categorized as “trash” organisms, providing numerous clues for the sequences. thorough identification of the genome, transcriptome The first pseudogene was identified in 1977 and proteome. According to sequencing data, the when several mutations were simultaneously entire human genome possesses approximately more discovered in its DNA sequence. Due to internal than three billion bases; however, only 2% of DNA mutations, the pseudogene lost its coding ability and sequences encode “functional” proteins [1], and the served as a homologous gene copy as its counterpart other 98% are regarded as “trash” elements that in the genome [2]. Since then, pseudogenes have been http://www.thno.org Theranostics 2020, Vol. 10, Issue 4 1480 broadly identified in a series of organisms ranging base pair (bp)-deficiency and a 14 bp-mismatch from prokaryotes to eukaryotes [3]. Nevertheless, condition within the 5’-terminal of this copy. In because of the previous “nonfunctional” label, addition, its mRNA expression could barely be pseudogenes have for decades been considered as detected, suggesting that this gene possessed no “junk DNA”, “genomic fossils” and “gene relics”; a coding capability and was considered number of strategies were even developed to focus on “nonfunctional”. It was presumed that this type of eliminating a pseudogene when attempting to aberrant gene, displaying high sequence homology to determine its parental gene [4, 5]. With the wide functional genes, lost its coding ability due to application of next-generation sequencing technology, different mutations, such as a frameshift mutation or pseudogenes have gradually been found to exert premature stop codon in the genome, and was termed parental gene-dependent and parental gene- a “pseudogene” [11]. independent functions at the DNA, RNA and protein Since then, a large number of pseudogenes have levels; these sequences are thus involved in been gradually discovered from monocellular transcriptional and posttranscriptional modulation, organisms to multicellular organisms and from participating in the physiological maintenance of prokaryotes to eukaryotes [12] with the aid of endogenous homeostasis and in the pathological next-generation sequencing technologies. However, process of disease. Notably, a small fraction of because of the high homology of pseudogene pseudogenes reportedly maintain or have regained sequences to those of parental genes (termed protein-coding capacity [6], suggesting that “ancestral gene”, “cognate gene” or “counterpart”), pseudogenes also act as conspicuous elements that an emerging issue faced by pseudogene analysis is contributes to the transcriptome and proteome of how to distinguish them from their counterparts. different species. There have long been many attempts to identify Currently, discovered “functional” pseudogenes pseudogenes more accurately. First, the Ka/Ks index comprise only a small fraction of the total, whereas (rate of the nucleotide nonsynonymous to the majority of pseudogenes have an “unknown” synonymous substitution) was applied as a criterion status with no established identification or function. to identify pseudogenes [13] because during Moreover, a number of nonfunctional “dying” evolution, these sequences are under neutral pseudogenes indeed are present in the genome, e.g., selection, as opposed to positive or purifying the ancient O-acyltransferase-like pseudogene (ACYL3), selection. Therefore, the Ka/Ks index should be close increasing the complexity of pseudogene distribution to or equal to one [14]. In fact, under the guidance of [7]. Pseudogenes are evolutionally conserved [8], with the Ka/Ks index, over 8,000 processed pseudogenes properties of both predisposition in unique disease have been identified in various species [15]. In this subtypes [9] and tissue-specificity [10], further case, the Ka/Ks index serves as an initial step during highlighting the potential correlation between pseudogene identification. pseudogenes and certain diseases and the necessity to In addition to the Ka/Ks index, features of study their functions and mechanisms in these pseudogenes, such as a special category [16] and diseases. transcriptional capacity [17], became new proof for In this review, we summarize the identification, pseudogene identification, e.g., processed expression, evolution, biogenesis and function of pseudogenes should be found in the same genome pseudogenes. We also present evidence of that contains their paralogs, whereas unitary pseudogene involvement in different diseases and the pseudogenes exist alone without paralogs. Several promising correlation between pseudogene and pseudogenes were later confirmed to be transcribed, diagnosis, prognosis and therapeutics in the clinic. which is easily identified through RNA transcripts. Finally, we discuss the current detection methods, Although this method may help to increase accuracy, limitations and challenges of pseudogene exploration the approach is time-consuming and not efficient to optimize existing protocols to increase their because a large amount of manual work needs to be efficiency for further pseudogene research. performed and pseudogenes without transcripts are difficult to identify. Therefore, this strategy is better Identification: A Long-Term used together with other methods. Further trials are Investigation needed to expand the search scope without The first identification and naming of a decreasing accuracy. pseudogene in human history was in 1977, when Jacq With the rapid development of next-generation et al. [2] found a gene copy that was homologous to 5S sequencing technology, strategies focusing on rRNA in Xenopus laevis. By comparing its DNA pseudogene identification were changed to depend sequence with that of 5S rRNA, they discovered a 16 on bioinformatics, which significantly promoted our http://www.thno.org Theranostics 2020, Vol. 10, Issue 4 1481 recognition of pseudogenes in the whole genomes of 20% [24]. Compared with their parental genes, the different species. By conducting established pipelines