Eleven Daughters of NANOG
Total Page:16
File Type:pdf, Size:1020Kb
Genomics 84 (2004) 229–238 www.elsevier.com/locate/ygeno Eleven daughters of NANOG$ H. Anne F. Booth and Peter W.H. Holland* Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, UK Received 4 December 2003; accepted 25 February 2004 Available online 15 April 2004 Abstract Nanog is a recently discovered ANTP class homeobox gene. Mouse Nanog is expressed in the inner cell mass and in embryonic stem cells and has roles in self-renewal and maintenance of pluripotency. Here we describe the location, genomic organization, and relative ages of all human NANOG pseudogenes, comprising ten processed pseudogenes and one tandem duplicate. These are compared to the original, intact human NANOG gene. Eleven is an unusually high number of pseudogenes for a homeobox gene and must reflect expression in the human germ line. A pseudogene orthologous to NANOGP4 was found in chimpanzee and an expressed pseudogene in macaque. Examining pseudogenes of differing ages gives insight into pseudogene decay, which involves an excess of deletion mutations over insertions. The mouse genome has two processed pseudogenes, which are not clear orthologues of the primate pseudogenes. D 2004 Elsevier Inc. All rights reserved. Keywords: Homeobox; Pseudogene; Molecular evolution; Enk Homeobox genes are characterized by possession of a Mitsui and colleagues [3] identified a single human recognizable 60- or 63-amino-acid motif within the orthologue of the mouse Nanog gene, mapping to encoded protein. The homeobox gene superfamily is 12p13.31. extremely diverse, particularly within animal genomes, As of February 2004, the Mouse Genome Informatics and can be subdivided into several major classes, each web site and National Center for Biotechnology Information containing numerous gene families. Among the best (NCBI) LocusLink list Nanog as the approved symbol for known are the Hox, En, Dlx, Evx, NK, and Msx gene the mouse gene (LocusID 71950), with Enk as an alternative families within the ANTP class, and the Pax, Gsc, and symbol. For the human gene, NCBI LocusLink lists Otx gene families within the PRD class. Recently, Wang NANOG as the gene symbol (LocusID 79923), with and colleagues [1] described the cloning of a novel FLJ12581 as an alternative symbol. Because the symbol member of the ANTP class in mouse, which they named Enk might cause confusion with the unrelated enkephalin Enk (early embryo specific expression NK family gene), gene, we refer to the mouse and human orthologues as referring to expression in inner cell mass cells of the Nanog and NANOG, respectively. blastocyst and sequence similarity to several NK-related Here we report the existence of ten processed homeobox genes. The same mouse gene was also cloned NANOG pseudogenes in the complete human genome, by Chambers and colleagues [2] and by Mitsui and plus one duplication pseudogene, and present full colleagues [3]; these groups showed that the encoded descriptions of their locations, organization, and relative protein plays key roles in self-renewal and maintenance ages. We also describe one pseudogene each from of pluripotency in inner cell mass and embryonic stem chimpanzee and macaque (more are expected) and two cells. Reflecting these properties, these authors designated in the complete mouse genome. We comment on the the gene Nanog, from Tı´r nan O´ g, the mythical Celtic process of pseudogene decay and propose explanations land of the ever young. Chambers and colleagues [2] and for the abundance of Nanog pseudogenes and for the particular abundance of primate pseudogenes. First, how- $ Supplementary data for this article may be found on ScienceDirect. ever, we clarify the sequence and organization of the * Corresponding author. Fax: +44-01865-271184. functional human NANOG gene, to allow comparison to E-mail address: [email protected] (P.W.H. Holland). the pseudogenes. 0888-7543/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2004.02.014 230 H.A.F. Booth, P.W.H. Holland / Genomics 84 (2004) 229–238 Results and discussion an unlinked gene (CDT1), probably artifactually. NANOG ESTs are primarily from human NT2 teratocarcinoma cells, Structure of human NANOG germ cell tumors, testis tumors, colon tumor, and adult marrow. The remaining 5 ESTs and the cDNA derive from We subjected the NCBI human genome assembly to pseudogenes and are discussed later. The NCBI contig tBLASTn searches using homeodomain sequences encoded annotation describes four exons; our analyses confirm this, by three human ANTP class genes: EN2, HOXA1, and with all splice sites obeying the GT–AG donor–acceptor NKX1.1. Among the retrieved sequences, three were home- rule [4]. The homeobox is located in exons 2 and 3 of the odomains from unnamed homeobox genes. A second four-exon gene, being interrupted between homeodomain tBLASTn search of the human genome using one of these residues 44 and 45. When an intron occurs within a unidentified homeodomains revealed several other highly homeobox, the most frequent site is between codons 44 similar sequences. One of these was an unnamed homeobox and 45 [5]. There are numerous other examples of this gene mapping to chromosome 12 at 12p13.31 (chromosom- intron location in the ANTP class, including several (but al position 7.84 Mb) and located on a genomic contig with not all) members of the Evx, Dlx, Lbx, Not, Gsx, and Hox RefSeq Accession No. NT_009714. This gene is clearly the gene families (e.g., Drosophila lab, pb, Abd-B). The human NANOG gene. Supporting evidence includes two human NANOG gene structure is shown diagrammatically 2.1kb cDNAs with GenBank Accession Nos. AK022643 in Fig. 1A. (RefSeq mRNA NM_024865) deposited by T. Isogai The genomic and mRNA sequences of NANOG are not (Chiba, Japan; unpublished) and AB093576 deposited by identical at the DNA level. In addition to several synony- S. Yamanaka (Nara, Japan [3]) and noted to be the human mous substitutions, there is one nonsynonymous difference orthologue of mouse Nanog. in exon 2, causing a difference at amino acid position 82: To confirm exon–intron organization, we searched lysine in the genomic contig and asparagine in the deposited GenBank and the NCBI human EST database and exam- full cDNA sequences. Lys and Asn are not biochemically ined the UniGene cluster for NANOG (Hs.326290). This similar amino acids, so this difference was unexpected. To yielded 33 EST sequences and a cDNA, in addition to the test if this is a naturally occurring polymorphism or a two cDNAs mentioned above. Manual comparisons indi- sequencing error, we used PCR to amplify human NANOG cated that 28 ESTs derive from the functional NANOG exon 2 from the genomic DNA of two individuals. We gene; of these, 20 match the known cDNA closely, 2 have found that one individual encoded Lys at position 82; the intron sequences (and therefore derive from unspliced other encoded Asn. This confirms there is naturally occur- message), and 4 (from the same library) are hybrids with ring variation in the NANOG protein sequence. Other non- Fig. 1. (A) Human NANOG gene structure. The four exons (labeled 1–4) are represented by horizontal bars; the 5V and 3V UTRs are white and the protein coding region is black with the homeobox colored blue. The three introns are represented by pink lines. The length in nucleotides is written underneath each of the exons and introns (with exons 1 and 4 being split into coding and noncoding/UTR regions). Note that the 3V UTR is 967 nucleotides in the version of genomic sequence deposited, but 983 nucleotides in the deposited cDNA sequences. The start (ATG) and stop (TGA) codons are labeled and represented by a green and a red vertical line, respectively. (B) Nine processed pseudogenes compared to human NANOG protein. One processed pseudogene, NANOGP11,is not shown, as it does not include sequence derived from the NANOG open reading frame. Chromosomal positions, RefSeq Accession Nos., and GenBank Accession Nos. are shown. The NANOG protein is represented by a black horizontal bar with the homeodomain colored blue. The start codon is represented by a green vertical line and the stop codon by a red vertical line. Intron positions are shown by pink triangles. For each pseudogene the black horizontal bar represents sequence similarity to the NANOG open reading frame, which can move between pseudogene reading frames (rf1, rf2, rf3) as a result of insertions and deletions. Insertions are represented by turquoise vertical lines and deletions by the absence of sequence similarity to NANOG. Below each insertion is a plus sign and below each deletion is a minus sign, followed by the number of nucleotide bases inserted/deleted at that point. Substitution mutations are not shown, except where these introduce stop codons into the reading frame homologous to NANOG. Red vertical lines indicate stop codons; these are shown when they occur in the reading frame homologous to NANOG or when they are the first stop codon encountered in rf1. In four of the pseudogenes (NANOGP2, P5, P6, P9) insertions or deletions have caused reading-frame shifts before a stop codon was encountered in rf1. In these cases, the gray dashed lines indicate continuation of rf1 until the first stop codon is encountered. Fig. 2. (A) Genes affected by tandem duplication of human NANOG. The tandem duplicates NANOG and NANOGP1 are represented by black boxes and the tandem duplicates SLC2A14 and SLC2A3 are represented by red boxes. The NANOGP1 box represents the region of homology to NANOG, not the full NANOGP1 transcript. NANOG and NANOGP1 are on the same DNA strand, while SLC2A14 and SLC2A3 are on the opposite strand (indicated by arrows underneath the genes). The intergenic distances are shown in kilobases and a scale bar of human chromosome 12 in megabases is drawn underneath.