Nested Genes in the Human Genome
Total Page:16
File Type:pdf, Size:1020Kb
Genomics 86 (2005) 414 – 422 http://www.paper.edu.cn Nested genes in the human genome Peng Yua,b,c,*, Dalong Maa,b, Mingxu Xua,b,. aLaboratory of Medical Immunology, School of Basic Medical Sciences, Peking University, Beijing 100083, People’s Republic of China bCenter for Human Disease Genomics, Peking University, Beijing 100083, People’s Republic of China cCenter for Bioinformatics, Peking University, Beijing 100083, People’s Republic of China Received 29 March 2005; accepted 15 June 2005 Available online 3 August 2005 Abstract Here we studied one special type of gene, i.e., the nested gene, in the human genome. We collected 373 reliably annotated nested genes. Two-thirds of them were on the strand opposite that of their host gene. About 58% coding nested gene pairs were conserved in mouse and some were even maintained in chicken and fish, while nested pseudogenes were poorly conserved. Ka/Ks analysis revealed that nested genes were under strong selection, although they did not demonstrate greater conservation than other genes. With microarray data we observed that two partners of one nested pair seemed to be expressed reciprocally. A significant proportion of nested genes were tissue-specifically expressed. Gene ontology analysis demonstrated that quite a number of nested genes participated in cellular signal transduction. Based on these observations, we think that nested genes are a group of genes with important physiological functions. D 2005 Elsevier Inc. All rights reserved. Keywords: Nested gene; Gene-within-a-gene; Overlapping gene; Evolution; Comparative analysis; Inverse expression Nested gene, or gene-within-a-gene, refers to a gene that In addition to coding genes, pseudogenes and snoRNA is contained in another gene. In eukaryotes, nested genes are genes were also found within introns [6,7]. In human usually located within one intron of a host gene. It was first chromosome 7, 100 processed pseudogenes were reported reported in Drosophila that the gene Pcp encoding pupal to be located in introns of unrelated genes [6]. SnoRNAs in cuticle protein was found within an intron of adenosine 3 introns are processed from the pre-mRNA of the host genes (ade3), lying on the opposite DNA strand [1]. In human it [7], and they are not considered as independently tran- was first reported for the gene F8A1 (coagulation factor scribed nested genes. VIII-associated intronic transcript 1), which was entirely Although nested genes have been found for a long time, no contained in intron 22 of coagulation factor VIII (F8), also systematic study has been conducted on them. Some features on the opposite strand [2]. Parallel to the progress of of this type of gene have been observed in Drosophila. For genome sequencing projects, more and more nested genes example, most of the reported nested genes are on the strand have been discovered. In Drosophila, this category of gene opposite that of the host gene and many are intronless. In comprises about 7.5% of the total genes, among which other species, however, including human, these features are about 85% encode proteins, while the remaining 15% are reported only in sporadic cases and need to be verified on a noncoding RNAs [3]. Sequencing of the human chromo- larger scale. In addition, no systematic comparative analysis somes 21 and 22 revealed about a dozen nested genes [4,5]. has been conducted to study the evolution of nested genes and their conservation status between species. In essence, nested genes represent an extreme type of * Corresponding author. Laboratory of Medical Immunology, School of overlapping gene. As suggested by Miyata and Yasunaga Basic Medical Sciences, Peking University, Beijing 100083, People’s Republic of China. Fax: +86 10 82801149. [8], the rate of evolution can be expected to be slower in E-mail address: [email protected] (P. Yu). overlapping genes. Veeramachaneni et al. [9] had studied . Deceased. the conservation of overlapping genes between human and 0888-7543/$ - see front matter D 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2005.06.008 转载 中国科技论文在线 http://www.paper.edu.cn P. Yu et al. / Genomics 86 (2005) 414–422 415 mouse and did not observe supporting proof for the view of coding genes with good EST support. Of the remaining, 212 Miyata et al. However, the study was carried out with were pseudogenes, among which 189 appeared to be multiple types of overlapping genes mixed and was biased processed and 23 showed sign of introns. In addition there toward genes overlapping in their boundary regions (UTRs were 3 snoRNA genes. About 63% of nested genes were on or regulatory region). As nested genes are totally embedded the strand opposite that of the host, forming antiparallel in introns, we think that they may be different from other pairs. The percentage was similar for coding genes and overlapping genes. Being a segment of two transcripts pseudogenes. For the remaining 37% pairs, two partners (intronic region of the host gene and itself), the nested gene were on the same strand in a parallel manner (Table 1). No might be under a double transcription check, which might chromosomal distribution bias was detected for the nested reduce the probability of mutation. With the availability of genes. the genome sequence of multiple organisms, it is possible to carry out a refined comparative analysis to test this Gene size and overlapping pattern hypothesis. The relationship between the host and the nested gene is The host genes were relatively larger, with a mean exon also intriguing. Up to now, in only one case, the gene number of 17 (T13.8), compared to the average level of neurofibromin 1 (NF1) and its nested gene oligodendrocyte human genes, which is ¨10.4 exons per locus [14]. The myelin glycoprotein (OMG), have the two genes been nested genes were much smaller, with a mean exon number reported to have similar functions of growth suppression of 2.1 (T1.9). About 41% (64/158) of the coding nested [10]. On the other hand, it has been observed that in the loci genes had only one exon. of eukaryotic translation initiation factor (eIF)2A[11], We studied the size distribution of the introns containing insulin-like growth factor 2 receptor (Igf2r) [12], and a1- nested genes and compared with other introns of the host collagen (I) [13], intronic genes on the opposite strand genes (Fig. 1). It revealed that the introns with nested genes interfere with the expression of the host genes. As the were significantly larger than others. The median length of number is very limited, a larger scale study is needed to these introns was 21.5 kb (T23.6 kb), and about 68.2% of fully elucidate the relationship. them were >10 kb, while the median length of other introns Here we extracted the reliably annotated nested genes in of the host genes was just 2.5 kb (T2.8 kb) and only 16.7% the human genome and carried out systematic studies on the of them were >10 kb. There were 10 introns that contained gene size, strand orientation, and function category of them. nested genes that were >200 kb. Based on these observa- We also used the human–mouse genome alignment data tions, it seemed that nested genes tended to occur in large and comparative genomics database to study the conserva- introns. tion of nested genes in multiple species. The Ka/Ks ratio was Karlin et al. had reported that nested genes in human used to study the selection on nested genes. In addition, with chromosomes 21 and 22 were often located within the public microarray data we studied the expression correlation boundary intron (first or last) of host genes [5]. In our of the host and nested genes. dataset, we observed that about 62% coding genes and 59% of pseudogenes were in the internal introns, enclosed by the coding exons of the hosts. These internal introns Results were usually larger than the boundary ones and might give further support for the association between intron size and Identification of nested genes the probability of forming a nested structure (data not shown). As mentioned above, most host genes contained According to the chromosomal localization of annotated only one nested gene. For multiple-nested genes, the human genes (NCBI MapViewer Build 34.3), we initially nested genes were often located in one intron and were identified 804 nested gene pairs. However, by comparing similar to each other, indicating their possible formation by with the genes’ chromosomal alignment at the UCSC duplication. Genome Browser, we found that 285 genes’ localizations Pair-wise BLAST [15] alignment of the host and nested were greatly inconsistent between the two databases. These genes showed no association between the partners, with suspicious pairs were discarded from our dataset. We also only 8 pairs with identity greater than 20%. We also checked the EST support for coding nested genes; 146 genes with poor EST support were excluded (see Materials and methods). Table 1 Finally we obtained 373 nested gene pairs in the human Types of nested genes genome, comprising 340 host genes and 373 nested genes Nested gene Parallel Antiparallel Total (Supplementary Table 1). Of the host genes, 27 genes Coding 53 105 158 contained multiple nested genes so that the number of host Pseudogene 81 131 212 genes was less than that of the total pairs. All but 3 host SnoRNA 3 0 3 genes encoded proteins. Of the nested genes, 158 were Total 137 236 373 中国科技论文在线 http://www.paper.edu.cn 416 P. Yu et al. / Genomics 86 (2005) 414–422 Fig. 1. Distribution of the length of the introns of host genes.