577 (2016) 236–243

Contents lists available at ScienceDirect

Gene

journal homepage: www.elsevier.com/locate/gene

Research paper Macaca specific exon creation event generates a novel ZKSCAN5 transcript

Young-Hyun Kim a,b,1, Se-Hee Choe a,b,1, Bong-Seok Song a,1, Sang-Je Park a, Myung-Jin Kim a, Young-Ho Park a, Seung-Bin Yoon a,b,YoungjeonLeea, Yeung Bae Jin a,Bo-WoongSima, Ji-Su Kim a,b,Kang-JinJeonga, Sun-Uk Kim a,b,Sang-RaeLeea,b, Young-Il Park c, Jae-Won Huh a,b,⁎, Kyu-Tae Chang a,b,⁎ a National Primate Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Cheongju 363-883, Republic of Korea b University of Science & Technology, National Primate Research Center, KRIBB, Cheongju 363-883, Republic of Korea c Graduate School Department of Digital Media, Ewha Womans University, Seoul 120-750, Republic of Korea article info abstract

Article history: ZKSCAN5 (also known as ZFP95) is a zinc-finger belonging to the Krűppel family. ZKSCAN5 contains a Received 2 April 2015 SCAN box and a KRAB A domain and is proposed to play a distinct role during spermatogenesis. In humans, alter- Received in revised form 20 October 2015 natively spliced ZKSCAN5 transcripts with different 5′-untranslated regions (UTRs) have been identified. However, Accepted 30 November 2015 investigation of our Macaca UniGene Database revealed novel alternative ZKSCAN5 transcripts that arose due to an Available online 2 December 2015 exon creation event. Therefore, in this study, we identified the full-length sequences of ZKSCAN5 and its alternative Keywords: transcripts in Macaca spp. Additionally, we investigated different nonhuman primate sequences to elucidate the Alternative splicing molecular mechanism underlying the exon creation event. We analyzed the evolutionary features of the ZKSCAN5 Macaca transcripts by reverse transcription polymerase chain reaction (RT-PCR) and genomic PCR, and by sequencing var- Exon creation ious nonhuman primate DNA and RNA samples. The exon-created transcript was only detected in the Macaca lin- Lineage specific eage (crab-eating monkey and rhesus monkey). Full-length sequence analysis by rapid amplification of cDNA ends ZKSCAN5 (RACE) identified ten full-length transcripts and four functional isoforms of ZKSCAN5. Protein sequence analyses revealed the presence of two groups of isoforms that arose because of differences in start-codon usage. Together, our results demonstrate that there has been specific selection for a discrete set of ZKSCAN5 variants in the Macaca lineage. Furthermore, study of this locus (and perhaps others) in Macaca spp. might facilitate our understanding of the evolutionary pressures that have shaped the mechanism of exon creation in primates. © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction undergo alternative splicing, which has led to the creation of numerous isoforms (Pan et al., 2008; Sultan et al., 2008; Wang et al., 2008). Tran- Alternative splicing is a major mechanism used to generate tran- scriptome sequencing data have led to the identification of five major scriptional diversity. The majority of in higher eukaryotes under- types of alternative splicing events (Modrek and Lee, 2002; Matlin go alternative splicing during post-transcriptional processing. Mature et al., 2005; Blencowe, 2006; Pan et al., 2008; Keren et al., 2010). The mRNAs arise as a consequence of different combinations of exons in a first type is exon creation or loss (skipping), whereby an exon may be reaction controlled by the spliceosome complex, and are translated included or excluded in mature mRNA. The second type is the mutually into various protein isoforms. Thus, this mechanism is responsible for exclusive exon type, in which one of two exons is retained in the mRNA generating diversity and complexity without significantly increasing ge- after splicing, but not both. The third and fourth types are alternative nome size in eukaryotes (Black, 2003; Matlin et al., 2005; Pan et al., splicing of the 3′ and 5′ splice site, respectively. These four types of alter- 2008; Keren et al., 2010). Recently, large-scale bioinformatics analysis native splicing events generally occur when two or more splice sites are has revealed that human transcripts from ~95% of multi-exon genes recognized at one end of an exon. The fifth type of alternative splicing is intron retention, in which introns are retained within the mature mRNA transcript as part of an exon (Modrek and Lee, 2002; Matlin et al., 2005; Abbreviations: ZKSCAN5, Zinc finger with KRAB and SCAN domains 5; TE, Blencowe, 2006; Pan et al., 2008; Keren et al., 2010). Of the various Transposable element; SINE, Short interspersed element; RACE, Rapid amplification of types, exon creation or loss (skipping) is the most frequent alternative cDNA ends; UTR, Untranslated region. splicing mechanism (~40%) detected in higher eukaryotes (Sugnet ⁎ Corresponding authors at: National Primate Research Center, Korea Research Institute et al., 2004). By contrast, intron retention was calculated to be the rarest of Bioscience and Biotechnology (KRIBB), Cheongju 363-883, Republic of Korea. E-mail addresses: [email protected] (J.-W. Huh), [email protected] (K.-T. Chang). event, as it occurred at an incidence of 14.8% in a set of 21,106 known 1 These authors contributed equally to this work. human genes (Galante et al., 2004).

http://dx.doi.org/10.1016/j.gene.2015.11.051 0378-1119/© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Y.-H. Kim et al. / Gene 577 (2016) 236–243 237

In evolutionary terms, alternative splicing is a very important mech- kidney, pancreas, and stomach) and human (Homo sapiens) total RNA anism used by the genome to generate novel functions via new exon (cerebrum, heart, kidney, liver, lung, testis, and uterus) were purchased creation. Recently, a number of genomic studies have reported that al- from Clontech Laboratories Inc., USA. ternative splicing facilitates exon creation. Wang et al. (Wang et al., Individual primate DNA samples (HU: Homo sapiens,CH:Pan troglo- 2005) compared the human and mouse genomes, and found that 87% dytes, GO: Gorilla gorilla,RH:Macaca mulatta,CE:Macaca fascicularis, of the novel exons in mouse were created by alternative splicing; AGM: Cercopithecus aethiops,CO:Procolobus badius,LANG:Trachypithecus these exons were not present in the . Furthermore, sp., SQ: Saimiri sciureus,MAR:Callithrix jacchus) were provided by the late Zhang and Chasin (Zhang and Chasin, 2006) compared the human ge- Prof. Osamu Takenaka from the Primate Research Institute of Kyoto Uni- nome with eight other vertebrate genomes, and found that over 35% versity of Japan and National Primate Research Center (NPRC) of Korea. of new human exons are alternatively spliced. These results suggest All animal procedures were conducted in accordance with the guidelines that new exon creation is strongly correlated with alternative splicing of the Institutional Animal Care and Use Committee, Korea Research Insti- mechanisms. tute of Bioscience and Biotechnology (KRIBB). Recently, many of the newly created exons in the mammalian ge- nome were defined as transposable element (TE)-related exons. 2.2. Reverse transcription-polymerase chain reaction (RT-PCR) and Among the TEs, Alu elements are a major source of new exons in pri- genomic PCR amplification mate genomes (Sorek, 2007; Corvelo and Eyras, 2008; Lin et al., 2008). Alu elements are primate-specific TEs that belong to the short inter- ZKSCAN5 transcripts were analyzed by RT-PCR. M-MLV reverse tran- spersed element (SINE) family, which evolved from 7SL RNA ~65 mil- scriptase and an annealing temperature of 42 °C were used for the re- lion years ago; the human genome harbors more than 1.1 million verse transcription reaction in the presence of an RNase inhibitor copies (11%) of these elements (Batzer and Deininger, 2002). Internal (Promega). We also performed control PCR amplifications on pure sequences of Alu elements harbor potential splicing donor (GT) and ac- mRNA samples that were not subjected to reverse transcription in ceptor sites (AG) that could be recognized by human spliceosomes order to verify that the mRNA samples prepared did not contain geno- (Lev-Maor et al., 2003; Ast, 2004). Thus, Alu sequences scattered mic DNA. As a standard control, human glyceraldehyde-3-phosphate throughout the human genome could be transcribed by so-called dehydrogenase (G3PDH) was amplified using the following primers: “exonization events.” Furthermore, alternatively spliced Alu-derived G3PDH_S (5′-GAA ATC CCA TCA CCA TCT TCC AGG-3′) and G3PDH_AS exons play a major role in the enhancement of protein diversification (5′-GAG CCC CAG CCT TCT CCA TG-3′); the primers were designed on in humans, as they could lead to the creation of novel protein isoforms the basis of the human G3PDH sequence (GenBank: AC068657). Alter- (Li et al., 2001; Sorek et al., 2002; Kreahling and Graveley, 2004). native transcripts of ZKSCAN5 were amplified using the following prim- ZKSCAN5 (zinc finger with KRAB and SCAN domains 5), also known er pairs: (1) transcripts ① and ②:R1_S(5′-GAG AAC TTG AGG AAC GCA as ZFP95, is located on human 7q22 (Dreyer et al., 1999). GAC-3′)andR1_AS(5′-TTC AAT TTT CAC CAA CTT CTG G-3′); (2) tran- This gene encodes a zinc finger protein of the Krűppel family. ZKSCAN5 scripts ③, ④,and⑤:R2_S(5′-AGC CCT GGG TGA GAG AAC AT-3′) has three distinct domains, including twelve conserved zinc fingers at and R2_AS (5′-AAG TCT GGA TCC TGG GGA AC-3′)(Fig. 2). The locations the C-terminal region, a SCAN box, and a KRAB (Krűppel-associated of the primers used in the RT-PCR analysis are indicated in Fig. 1A. RT- box) A domain at the N-terminal region (Dreyer et al., 1999; Weissig PCR was conducted under the following conditions: 5 min at 94 °C et al., 2003). ZKSCAN5 is ubiquitously expressed in human adult and followed by 30 cycles of 30 s at 94 °C, 30 s at the primer-specific anneal- fetal tissues, but the gene is most strongly expressed in the testis. In ad- ing temperature (57–58 °C), and 40 s at 72 °C. Following this, a final dition, three spliced transcript forms have been detected by northern elongation step was carried out for 5 min at 72 °C. blot analysis. Among them, the smallest transcript form is testis- In order to determine the integration lineage of the AluY element in specific and is highly expressed in pachytene spermatocytes (Dreyer human and nonhuman primates, the following locus-specificprimerset et al., 1999; Weissig et al., 2003). Two alternatively spliced transcript was used for genomic PCR: G1_S (5′-CCC TCT GAG TTG GAA TGA TAA variants differing only in the 5′-untranslated region (UTR) have been TG-3′)andG1_AS(5′-AGA CAA TTG CGT TTT TCC AA-3′). The locations identified in the human transcriptome. Although additional transcript of the primers used for genomic PCR are indicated in Fig. 1A. Genomic variants have been found, their full-length sequences have not been de- PCR was conducted under the following conditions: 5 min at 94 °C; 30 termined (Dreyer et al., 1999; Weissig et al., 2003). cycles each of 30 s at 94 °C, 30 s at 56 °C, and 1 min 30 s at 72 °C; and The purpose of this study was to experimentally validate the exon a final elongation step for 5 min at 72 °C. creation mechanisms related to ZKSCAN5 transcripts. To this end, we identified and characterized the full-length sequences of ZKSCAN5 and 2.3. Rapid amplification of cDNA ends (RACE) for full-length ZKSCAN5 its transcripts in Macaca spp. Additionally, we examined alternatively sequences spliced transcripts and exon creation events associated with ZKSCAN5 in human and nonhuman primates. We also carried out evolutionary Following the manufacturer's protocol for the CapFishing™ Full- and comparative analyses with various primate genome sequences. length cDNA Premix Kit (Seegene), approximately 1–3 μg of total RNA from the cerebrum of the crab-eating monkey was reverse transcribed 2. Materials and methods with the CapFishing™ adaptor for 5′-RACE and the Oligo dT adaptor for 3′-RACE. To obtain full-length sequences of both transcript variants 2.1. Human and nonhuman primate RNA and DNA samples (original and exon-created transcripts), we designed transcript- specifi c primers for each variant. To elucidate the full-length sequence Total RNA from tissues of crab-eating monkey (Macaca fascicularis; of exon-created transcripts, we performed PCR reactions with either of cerebrum, cerebellum, kidney, liver, lung, spleen, and stomach), the following primer combinations: 5′-target site primer (5′_TSP) for African-green monkey (Cercopithecus aethiops; cerebrum, cerebellum, 5′-RACE, RACE_S/RACE_AS and RACE_S_1/RACE_AS_1 for the internal kidney, small intestine, lung, spleen, and stomach), and marmoset region, and 3′_TSP for 3′-RACE. For the original transcript, we used (Callithrix jacchus; cerebrum, heart, kidney, liver, lung, spleen, and one of the following primer combinations: 5′-target site primer_1 (5′ stomach) were extracted using the RNeasy® Mini kit (Qiagen) with _TSP_1) for 5′ RACE, RACE_S_2/RACE_AS_2 and RACE_S_1/RACE_AS_1 RNase-Free DNase treatment (Qiagen) according to the manufacturer's for the internal region, and 3′_TSP for 3′ RACE. The PCR products were protocol. All tissues for total RNA extraction were provided by the Na- size-fractionated by Tris-borate-EDTA (TBE) agarose gel electrophore- tional Primate Research Center (NPRC) of Korea. Total RNA from the tis- sis, and the smear area between 100 and 3000 bp was purified by sues of rhesus monkey (Macaca mulatta; cerebrum, colon, liver, lung, using the Wizard® SV Gel Extraction Kit (GeneAll). The PCR products 238 Y.-H. Kim et al. / Gene 577 (2016) 236–243

Fig. 1. Structural analysis of ZKSCAN5 in humans and crab-eating monkeys. The exons are numbered in Roman numerals, and the newly created exon is represented by ‘C’.Knownmotifs and transposable elements (TEs) are indicated. The small and large gray boxes represent the untranslated regions (UTRs) and the protein-coding regions, respectively. Open boxes indicate the predicted exons. Horizontal arrows indicate the reverse transcription-polymerase chain reaction (RT-PCR) primer location (R1/R2) and genomic PCR primer locations (G1). The figures are schematic and not to scale. were cloned and sequenced following the molecular cloning and se- NCBI ORF Finder program (http://www.ncbi.nlm.nih.gov/projects/ quencing procedures described in the next section. The primers used gorf/). in the bidirectional RACE experiments are indicated in Fig. 3 and listed in Fig. S5. 2.6. Sequence analysis and nucleotide sequence accession numbers 2.4. Molecular cloning and sequencing Multiple alignment (ClustalW) and manual correction of the deter- mined primate sequences were performed using the BioEdit program. To validate the amplified products and sequencing procedures, all The nucleotide sequences of newly identified ZKSCAN5 transcripts the obtained products, including the genomic PCR, RT-PCR, and RACE have been deposited in the DDBJ/EMBL/GenBank nucleotide sequence products, were separated on a 1.0–1.5% TBE agarose gel, purified using databases with the following accession numbers: DDBJ accession num- aExpin™ SV Gel Extraction Kit (GeneAll), and cloned into the TA ber AB742302, from genomic DNA of Homo sapiens; DDBJ accession cloning vector (RBC Bioscience). The cloned DNA was isolated by the al- number AB742303, from genomic DNA of Pan troglodytes;DDBJacces- kaline lysis method using a Hybrid-Q™ Plasmid Rapid Prep Kit sion number AB742304, from genomic DNA of Gorilla gorilla;DDBJac- (GeneAll). Sequencing was performed by a commercial sequencing cession number AB742305, from genomic DNA of Macaca fascicularis; company, Macrogen Inc., Korea. The unpredicted bands observed in DDBJ accession number AB742306, from genomic DNA of Cercopithecus the RT-PCR assay were also sequenced and analyzed. aethiops, DDBJ accession number AB742307, from genomic DNA of Trachypithecus sp.; DDBJ accession number AB742308, from genomic 2.5. Generation of full-length sequences of ZKSCAN5 DNA of Saimiri sciureus; DDBJ accession number AB742309, from geno- mic DNA of Callithrix jacchus; DDBJ accession numbers AB742310, In order to construct full-length sequences of ZKSCAN5 transcripts, AB742311, AB742312, AB742313, and AB742314, from mRNA samples we performed multiple comparisons and alignments or manual correc- of Macaca fascicularis; DDBJ accession numbers AB742315, AB742316, tion of the sequences of the 5′ RACE, internal region, and 3′ RACE prod- AB742317, and AB742318, from mRNA samples of Homo sapiens; DDBJ ucts using the ClustalW and BioEdit programs (Thompson et al., 1994; accession numbers AB742319, AB742320, AB742321, AB742322, and Hall, 1999). The derived full-length sequences were compared with pre- AB742323, from mRNA samples of Macaca mulatta; DDBJ accession viously defined human RefSeq transcripts (NM 014569 and NM numbers AB742324, AB742325, AB742326, and AB742327, from 145102). There is no RefSeq mRNA sequence in the GenBank database mRNA samples of Cercopithecus aethiops; DDBJ accession numbers for crab-eating monkey; therefore, for this species, we compared the AB742328, AB742329, and AB742330, from mRNA samples of Callithrix sequences with human ZKSCAN5 transcripts in order to deduce the jacchus; DDBJ accession numbers AB742331, AB742332, AB742333, full-length ZKSCAN5 sequences. The full-length ZKSCAN5 sequences AB742334, AB742335, AB742336, AB742337, AB742338, AB742339, were confirmed by examining the nucleotide sequences and poly and AB742340, from mRNA samples of Macaca fascicularis by bidirec- (A) signals, and by open reading frame (ORF) prediction using the tional RACE. Y.-H. Kim et al. / Gene 577 (2016) 236–243 239

3. Results of exon II (R1_S primer)/exon IV-V (R1_AS primer) and exon II (R2_S primer)/exon VI (R2_AS primer) (Fig. 1A). The two different 3.1. Identification and experimental validation of the exon creation event in transcript-specific primer pairs were designed on the basis of the se- ZKSCAN5 quences of regions highly conserved between humans and marmosets (nonhuman primates). The results of the RT-PCR analysis indicated In our previous study, we reported de novo transcriptome sequenc- that the R1 primer pair amplified 753-bp (①) and 266-bp (②) products, ing using the GS FLX Titanium sequencing platform in crab-eating mon- and that the R2 primer pair amplified 568-bp (③), 432-bp (④), and key tissues and established the Macaca Unigene Database (http://210. 349-bp (⑤) products (Fig. 2). Sequencing analysis of these amplified 218.223.45/)(Huh et al., 2012). From this database, we analyzed Alu products revealed that the 753-bp band was an exon-created ZKSCAN5 exonized transcripts and found two different transcripts of ZKSCAN5,in- transcript (①) containing 487-bp of intronic sequence (Fig. 2 and cluding the original and exon-created ZKSCAN5 transcripts by Alu. Fig. S1). In silico analysis predicted that an antisense-oriented AluYele- Therefore, two different transcript sequences of ZKSCAN5 derived from ment (301-bp) was integrated in the created exon sequences (Fig. 2 and our database were reconstructed through a Basic Local Alignment Fig. S1). The transcript structures of the identified RT-PCR products are Search Tool (BLAST) search using the UCSC genome browser (http:// summarized in Fig. 2. genome.ucsc.edu/; UCSC Human Genome Browser assembly ID: hg38; Using RT-PCR and sequencing analyses, we experimentally con- UCSC Rhesus Genome Browser assembly ID: rheMac3) (Fig. 1A). The firmed that the exon-created ZKSCAN5 transcript (①) is ubiquitously original transcript of ZKSCAN5 has seven exons. However, the exon- expressed in all crab-eating monkey tissues tested in this study created transcript has eight exons because of the inclusion of a newly (Fig. 2). In addition, we identified four types of ZKSCAN5 transcript var- created exon (C) between exon II and exon III (Fig. 1A). Interestingly, iants composed of different exons generated via alternative splicing, in- analysis using the RepeatMasker program (http://www.repeatmasker. cluding exon-created transcripts. Type two transcript variants (②, ③) org/) revealed that the newly created exon (C) contained primate- were more highly expressed in brain samples than in other tissues in specific AluY sequences (Fig. 1A). the crab-eating monkey (Fig. 2). To validate the existence of the two different ZKSCAN5 transcripts, we carried out RT-PCR amplification and sequencing using mRNA 3.2. Comparative analysis of exon-created ZKSCAN5 transcripts from seven types of tissues (cerebrum, cerebellum, kidney, liver, lung, spleen, and stomach) from crab-eating monkeys by using two In the GenBank database, human ZKSCAN5 has two alternatively transcript-specific primer pairs designed on the basis of the sequences spliced transcript variants (Accession Nos.: NM_014569 and

Fig. 2. RT-PCR amplification and alternatively spliced transcripts of ZKSCAN5 in the crab-eating monkey. The primer pairs used were R1S/R1AS (upper panel) and R2S/R2AS (lower panel). The exon-created transcript of ZKSCAN5 is indicated by horizontal arrows. Alternatively spliced ZKSCAN5 transcripts are given numbers, and the structure is represented with the same number in the lower panel; structurally divergent types are presented on the left. AluY elements are also indicated. 240 Y.-H. Kim et al. / Gene 577 (2016) 236–243

NM_145102). Both transcripts are composed of seven exons and monkey was 758-bp long, which is 5-bp longer than the transcript of have different 5′-UTRs, and they encode the same protein (Fig. 1B). the crab-eating monkey. However, the two sequences are 95% identical Moreover, fourteen Alu elements, belonging to three different Alu (data not shown). Together, these data suggest that that the ZKSCAN5 families, are inserted between exons II and III, including AluY(3), transcript variant that arose due to exon creation (①)isspecifictothe AluJ(3), and AluS(8), in human ZKSCAN5. Among the fourteen Alu el- Macaca lineage (crab-eating monkey and rhesus monkey), and that ements, AluY at human chr7:99,506,692–99,506,995 was exonized this mechanism of transcript generation may be exploited in a in the crab-eating monkey. However, the exon-created ZKSCAN5 species-specificmanner. transcripts observed in our in silico analysis in Macaca spp. were not detected among the human ZKSCAN5 transcripts. Therefore, we 3.3. Full-length sequencing of ZKSCAN5 compared and analyzed the genomes of various primates, including human rhesus monkey, African green monkey, and marmoset, to de- In the previous sections, we described the experimental validation of termine whether the exon-created transcripts are present in these alternatively spliced ZKSCAN5 transcript variants by RT-PCR analysis genomes. and sequencing in human and nonhuman primate tissues (Fig. 2 and The results of the comparative analysis indicated that four alterna- Fig. S2). However, this validation was based on in silico analyses of tively spliced transcripts (②, ③, ④, ⑤) were commonly detected in ZKSCAN5 in crab-eating monkeys. The full-length sequence of ZKSCAN5 all tested species. All unpredicted bands were investigated and found from this species has not been registered in the GenBank Database. to be the products of non-specificamplification. However, alternatively Therefore, we performed bidirectional RACE using transcript-specific spliced transcript ④ was not detected in marmosets (Fig. S2). Intrigu- primers to obtain full-length transcripts. The transcript-specificprimers ingly, the exon-created ZKSCAN5 transcript (①) was only detected in for bidirectional RACE are shown in Fig. 3. Additionally, to obtain the rhesus monkeys, and not in the other species (Fig. S2). Nucleotide se- full-length sequences of both the original and exon-created transcripts, quence analysis revealed that the exon-created transcript of the rhesus we designed transcript-specificprimersets(Fig. 3). Thus, bidirectional

Fig. 3. Schematic representation of experimentally amplified bidirectional rapid amplification of cDNA ends (RACE) products. The transcript-specific primer sites for bidirectional RACE of ZKSCAN5 transcripts are listed in the upper panels. The lower panel illustrates the structures of the ZKSCAN5 transcript products obtained in this study. Closed boxes indicate exon structure. Information on all identified transcript products is shown on the right. The exons are numbered with Roman numerals, and the created exon is represented as ‘C’. Y.-H. Kim et al. / Gene 577 (2016) 236–243 241

RACE (5′-and3′-RACE) and internal region amplification RT-PCRs were exon (C) does not affect ORF formation in ZKSCAN5 by the alternative performed using RACE-specific primers. For the synthesis of full-length acceptor site in exon II. cDNA sequences, RNA from the cerebellum of crab-eating monkey was used, in which all the alternatively spliced transcript variants are highly 3.4. Evolutionary analysis of the AluY element in ZKSCAN5 expressed. We obtained ten different products (T8–T17): two 5′-RACE (T8 and In silico analysis revealed an integrated AluY element in the created T9) and two internal region products (T10 and T11) for the exon- exon (C) in the exon-created transcript (Fig. 1A). To elucidate the time created ZKSCAN5 transcript, and three 5′-RACE products (T12, T13, at which the AluY elements were integrated during primate radiation, and T14) and one internal region product (T15) for the original we performed genomic PCR amplification using locus-specific primer ZKSCAN5 transcript. One internal region product (T16) and one 3′- pairs designed on the basis of highly conserved regions in the genomic RACE product (T17) were obtained from the 3′-end region (Fig. 3). DNA of the following human and nonhuman primates: hominoids The RACE products revealed multiple alternative splicing patterns (human, chimpanzee, and gorilla), Old World monkeys (rhesus mon- (Fig. 3). Although we performed bidirectional RACE to obtain the full- key, crab-eating monkey, African green monkey, colobus monkey, and length sequences of both ZKSCAN5 transcripts, we failed to obtain the langur monkey), and New World monkeys (squirrel monkey and mar- sequence of the internal region of the original transcript (Fig. 3). Thus, moset) (Fig. 5 and Fig. S4). Additionally, we validated randomly selected the full-length sequences of the ZKSCAN5 transcript variants were de- amplified products by sequencing (Fig. S4). The results indicated that duced by aligning the sequences obtained in this study, including the AluY element was integrated in a common ancestor genome prior those of the RACE, RT-PCR, and next-generation sequencing products to the divergence of the Old and New World monkeys. To improve the (Fig. 3 and Fig. 4). By using this method, we identified ten transcript var- accuracy of the analysis, we evaluated interspersed repeats using the iant forms (TV1-TV10). One of these was exon-created while the other RepeatMasker program. The results revealed that the sequences we val- nine were spliced transcript variants (Fig. 4). To determine the correct idated contain the L1ME4 repeat of the L1 family in all tested genomes. ORF of the full-length transcript variants, we used the NCBI ORF Finder However, the AluY element was only integrated from the Old World program. The results of this analysis are summarized in Fig. 4. Multiple monkeys to humans (Fig. 5). transcript variants encoding six different isoforms have been identified for this gene (Fig. 4 and Fig. S3). However, isoforms 2 and 5 cause a frameshift that skips exon V and introduces a premature termination 4. Discussion codon (TAG) at 1103-bp in exon VI (data not shown). Therefore, the identified transcripts encode four functional isoforms of ZKSCAN5. Iso- We experimentally validated the alternatively spliced transcript var- forms 1 and 3 have 690 and 617 amino acids, respectively, and transla- iants and identified the full-length sequence of ZKSCAN5 in human and tion starts at ATG in exon III, whereas isoforms 4 and 6 have 839 and 766 nonhuman primates. In total, we identified four ZKSCAN5 transcript var- amino acids, respectively, with translation starting at ATG in exon II iants, including an exon-created transcript (Fig. 2 and Fig. S2). Remark- (Fig. 4 and Fig. S3). A comparative analysis of the amino acid sequences ably, the exon-created transcript, with a 487-nt insertion between exon of the ZKSCAN5 isoforms revealed that isoform 4 encodes a protein with II and exon III, was identified exclusively in the Macaca lineage and was 97% amino acid sequence identity to human ZKSCAN5, and has the same not detected in the other primates tested in the study (Fig. 2,Fig.S1and coding DNA sequence with conserved domains. Intriguingly, created Fig. S2). In addition, the ZKSCAN5 transcript variants were ubiquitously

Fig. 4. Full-length analysis of ZKSCAN5 transcripts in the crab-eating monkey. We manually aligned the transcript products obtained in this study using the BioEdit program. The small and large black boxes represent the UTRs and the protein-coding region, respectively. Dot boxes represent the means of prediction sequence. The sequences of the prediction region derived from exons VI and VII of the fully identified TV1 transcript. The created exon region is represented as ‘C’. Information regarding manual combination and full-length generation is shown on the right. The exons are numbered with Roman numerals. 242 Y.-H. Kim et al. / Gene 577 (2016) 236–243 expressed in all tested tissues of crab-eating monkey. The exon-created identified ten full-length ZKSCAN5 transcript variants, including one transcript was expressed at low levels in the rhesus monkey in comparison exon-created transcript (TV1). Six different isoforms were identified by with crab-eating monkey; this is probably because of sub-optimal sample manual alignment. Four of these isoforms are predicted to be biologically condition, as limited primate resources were available (Fig. 2 and Fig. S2). functional forms, as they do not contain frameshifts or premature termina- However, Dreyer et al. and Weissig et al. have reported that ZKSCAN5 is tion codons (Fig. 4 andFig.S3).Significantly, the created exon (C) did not ubiquitously expressed in adult and fetal tissues, with strongest expression affect ORF formation via the alternative acceptor site in exon II of ZKSCAN5. seen in the testis (Dreyer et al., 1999; Weissig et al., 2003). Other SCAN Thus, in this study, we identified naturally occurring, polyadenylated full- box-containing zinc finger genes show high testicular expression (Lee length ZKSCAN5 transcripts in the crab-eating monkey. et al., 1997); however, we detected higher expression in the uterus than Structural analysis revealed that AluY is integrated within the exon in the testis. Therefore, we propose that ZKSCAN5 may play distinct role created between exons II and III of ZKSCAN5 (Fig. 1). Although in germ cells in humans (Fig. S2). In contrast, ZKSCAN5 was highly exonization in relation to AluY elements is not recognized as a major expressedinthebrainofnonhumanprimates(Fig. 2 and Fig. S2). mechanism of alternative splicing, the exonization mechanism is repre- To investigate whether the lineage-specific transcript containing the sentative of alternative splicing processes that lead to the acquisition of newly created exon could be translated into protein, we obtained the new exons from intronic DNA sequences (Schmitz and Brosius, 2011). full-length cDNA sequences of the original and exon-created transcripts The generation of canonical splicing sites (splicing acceptor and splicing (Fig. 3). Unfortunately, only the exon-created transcript could be identi- donor sites) by genomic insertions/deletion or mutations could also fied completely. Although we could not obtain the entire sequences for cause exonization events (Schmitz and Brosius, 2011). Furthermore, re- the original transcripts, we reconstructed the full-length sequences by cent studies have indicated that exonization events derived from TEs manual alignment of all the transcript products (Fig. 4). Thus, we such as long terminal repeat (LTR) retrotransposons (e.g., human

Fig. 5. Evolutionary investigation of TEs during primate evolution. (A) Genomic PCR analysis of the integration of AluY in humans and nonhuman primates. The primer pair, G1S and G1AS (1170-bp), was used to investigate the integration lineage of AluY in various primates. Primate samples are abbreviated as follows: (1) HU: human (Homo sapiens); (2) hominoids: CH: chimpanzee (Pan troglodytes), GO: gorilla (Gorilla gorilla); (3) Old World monkeys: RH: rhesus monkey (Macaca mulatta), CE: crab-eating monkey (Macaca fascicularis), AGM: African green monkey (Cercopithecus aethiops), CO: colobus monkey (Procolobus badius), LANG: langur monkey (Trachypithecus sp.); (4) New World monkeys: SQ: squirrel monkey (Saimiri sciureus), MAR: marmoset monkey (Callithrix jacchus). (B) Schematic representation of the integration events of TEs during primate evolution for ZKSCAN5. Each genomic event is also marked. Y.-H. Kim et al. / Gene 577 (2016) 236–243 243 endogenous retroviruses [HERVs]) and non-LTR retrotransposons References (e.g., SINE and long interspersed elements [LINEs]) have been observed Alekseyenko, A.V., Kim, N., Lee, C.J., 2007. Global analysis of exon creation versus loss and in various species (Sela et al., 2010a, 2010b). This is probably because the role of alternative splicing in 17 vertebrate genomes. RNA 13, 661–670. many TE sequences contain potential splicing sites (Makalowski et al., Ast, G., 2004. How did alternative splicing evolve? Nat. Rev. Genet. 5, 773–782. 1994). Therefore, we investigated the possibility that an integrated Batzer, M.A., Deininger, P.L., 2002. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3, 370–379. AluY element provides an alternative splicing site that generates a Black, D.L., 2003. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. new Alu-derived exon via exonization in the exon-created ZKSCAN5 Biochem. 72, 291–336. Blencowe, B.J., 2006. Alternative splicing: new insights from global analyses. Cell 126, transcript. To determine when the AluY element was evolutionarily in- 37–47. tegrated, PCR amplification was performed using various primate geno- Corvelo, A., Eyras, E., 2008. Exon creation and establishment in human genes. Genome mic samples (Fig. 5A). The results revealed that the AluY element Biol. 9, R141. Dreyer, S.D., Zheng, Q., Zabel, B., Winterpacht, A., Lee, B., 1999. Isolation, characterization, integrated into a common ancestor genome after the divergence of and mapping of a zinc finger gene, ZFP95, containing both a SCAN box and an alter- the Old World monkeys (Fig. 5). Sequencing analysis revealed that inte- natively spliced KRAB a domain. Genomics 62, 119–122. grated AluY sequences are highly conserved from human to Old World Galante, P.A., Sakabe, N.J., Kirschbaum-Slager, N., de Souza, S.J., 2004. Detection and eval- uation of intron retention events in the human transcriptome. RNA 10, 757–765. monkey lineages. These results suggest that the AluYelementwasinte- Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis grated in the intronic region of ZKSCAN5 in the ancestral genome after program for windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98. Huh, J.W., Kim, Y.H., Kim, D.S., Park, S.J., Lee, S.R., Kim, S.H., Kim, E., Kim, S.U., Kim, M.S., the divergence of the Old World monkeys. However, the AluY element Kim, H.S., Chang, K.T., 2010. Alu-derived old world monkeys exonization event and did not provide canonical splicing sites for lineage-specific exon crea- experimental validation of the LEPR gene. Mol. Cells 30, 201–207. tion events in Macaca; rather, it is integrated into the middle of the cre- Huh, J.W., Kim, Y.H., Park, S.J., Kim, D.S., Lee, S.R., Kim, K.M., Jeong, K.J., Kim, J.S., Song, B.S., Sim, B.W., Kim, S.U., Kim, S.H., Chang, K.T., 2012. Large-scale transcriptome sequenc- ated exon. ing and gene analyses in the crab-eating macaque (Macaca fascicularis) for biomed- Intriguingly, we detected a lineage-specific new exon-created tran- ical research. BMC Genomics 13, 163. Keren, H., Lev-Maor, G., Ast, G., 2010. Alternative splicing and evolution: diversification, script in Macaca spp. Therefore, we performed comparative sequence exon definition and function. Nat. Rev. Genet. 11, 345–355. analysis to address the exon creation event in a lineage-specificmanner. Kreahling, J., Graveley, B.R., 2004. The origins and implications of aluternative splicing. A number of groups have reported that the Alu element is a predomi- Trends Genet. 20, 1–4. fi Lee, P.L., Gelbart, T., West, C., Adams, M., Blackstone, R., Beutler, E., 1997. Three genes nant source of new exon creation or lineage-speci c exon creation in encoding zinc finger on human chromosome 6p21.3: members of a new primate genomes (Sorek et al., 2002; Lev-Maor et al., 2003; Sorek, subclass of the Kruppel gene family containing the conserved SCAN box domain. Ge- – 2007; Huh et al., 2010). Previous in vitro experimental analysis of Alu- nomics 43, 191 201. Lev-Maor, G., Sorek, R., Shomron, N., Ast, G., 2003. The birth of an alternatively spliced derived exonization events have revealed that the boundary sequences exon: 3′ splice-site selection in Alu exons. Science 300, 1288–1291. of the splicing site are critically involved in the process of exon inclusion Li, W.H., Gu, Z., Wang, H., Nekrutenko, A., 2001. Evolutionary analyses of the human ge- nome. Nature 409, 847–849. (Sorek et al., 2004; Ram et al., 2008). However, the AluY element did not Lin, L., Shen, S., Jiang, P., Sato, S., Davidson, B.L., Xing, Y., 2010. Evolution of alternative affect lineage-specific exon creation in ZKSCAN5, because it was posi- splicing in primate brain transcriptomes. Hum. Mol. Genet. 19, 2958–2973. tioned in the middle of the created exon. Mutations can also result in Lin, L., Shen, S., Tye, A., Cai, J.J., Jiang, P., Davidson, B.L., Xing, Y., 2008. Diverse splicing pat- terns of exonized Alu elements in human tissues. PLoS Genet. 4, e1000225. exon creation by providing splicing sites that allow intronic sequences Makalowski, W., Mitchell, G.A., Labuda, D., 1994. Alu sequences in the coding regions of to be recognized by the splicing machinery (Alekseyenko et al., 2007; mRNA: a source of protein variability. Trends Genet. 10, 188–193. Matlin, A.J., Clark, F., Smith, C.W., 2005. Understanding alternative splicing: towards a cel- Rogozin et al., 2012). Therefore, we investigated splicing sites and lular code. Nat. Rev. Mol. Cell Biol. 6, 386–398. boundary sequences in multiple primate genomes (Fig. S6). Unfortu- Modrek,B.,Lee,C.,2002.A genomic view of alternative splicing. Nat. Genet. 30, 13–19. nately, our results were not sufficient to explain the mechanisms under- Pan, Q., Shai, O., Lee, L.J., Frey, B.J., Blencowe, B.J., 2008. Deep surveying of alternative splic- fi ing complexity in the human transcriptome by high-throughput sequencing. Nat. lying lineage-speci c exon creation. Our results revealed that the Genet. 40, 1413–1415. boundary sequences of lineage-specific created exons are perfectly con- Ram,O.,Schwartz,S.,Ast,G.,2008.Multifactorial interplay controls the splicing profile of – served from humans to rhesus monkeys. However, Lin et al. experimen- Alu-derived exons. Mol. Cell. Biol. 28, 3513 3525. Rogozin, I.B., Carmel, L., Csuros, M., Koonin, E.V., 2012. Origin and evolution of tally determined that cis-sequence changes and intronic sequence spliceosomal introns. Biol. Direct 7, 11. variation contribute to species-specific splicing patterns in MAGOH Schmitz, J., Brosius, J., 2011. Exonization of transposed elements: a challenge and oppor- fl tunity for evolution. Biochimie 93, 1928–1934. (Lin et al., 2010). Therefore, we investigated the exonic and anking Sela, N., Kim, E., Ast, G., 2010a. The role of transposable elements in the evolution of non- intronic sequences of created exons in multiple genomes, including mammalian vertebrates and invertebrates. Genome Biol. 11, R59. humans and five primates; we observed minor differences in the Sela, N., Mersch, B., Hotz-Wagenblatt, A., Ast, G., 2010b. Characteristics of transposable el- ement exonization within human and mouse. PLoS One 5, e10907. intronic sequences (data not shown). However, we could not experi- Sorek, R., 2007. The birth of new exons: mechanisms and evolutionary consequences. mentally validate this or perform in silico comparison owing to the in- RNA 13, 1603–1608. complete sequence database of primate genomes. Sorek, R., Ast, G., Graur, D., 2002. Alu-containing exons are alternatively spliced. Genome Res. 12, 1060–1067. In summary, although we could not definitively define the mecha- Sorek, R., Lev-Maor, G., Reznik, M., Dagan, T., Belinky, F., Graur, D., Ast, G., 2004. Minimal nism of lineage-specific exon creation event in ZKSCAN5, our results conditions for exonization of intronic sequences: 5′ splice site formation in alu exons. Mol. Cell 14, 221–231. suggest that a cis-regulatory change may be a major contributor to the Sugnet, C.W., Kent, W.J., Ares Jr., M., Haussler, D., 2004. Transcriptome and genome con- emergence of lineage-specific exon creation or splicing patterns. This servation of alternative splicing events in humans and mice. Pac. Symp. Biocomput. study on Macaca ZKSCAN5 transcript generation provides a valuable re- 66-77. Sultan, M., Schulz, M.H., Richard, H., Magen, A., Klingenhoff, A., Scherf, M., Seifert, M., source for future investigation of the mechanisms underlying lineage- Borodina, T., Soldatov, A., Parkhomchuk, D., Schmidt, D., O'Keeffe, S., Haas, S., specific exon creation. Vingron, M., Lehrach, H., Yaspo, M.L., 2008. A global view of gene activity and alterna- tive splicing by deep sequencing of the human transcriptome. Science 321, 956–960. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position- Acknowledgements specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680. Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B., 2008. Alternative isoform regulation in human tissue This research was supported by a grant from the KRIBB Research Ini- transcriptomes. Nature 456, 470–476. tiative Program (KGM4241541), Republic of Korea. Wang, W., Zheng, H., Yang, S., Yu, H., Li, J., Jiang, H., Su, J., Yang, L., Zhang, J., McDermott, J., Samudrala, R., Wang, J., Yang, H., Yu, J., Kristiansen, K., Wong, G.K., Wang, J., 2005. Or- igin and evolution of new exons in rodents. Genome Res. 15, 1258–1264. Weissig, H., Narisawa, S., Sikstrom, C., Olsson, P.G., McCarrey, J.R., Tsonis, P.A., Del Rio- fi fi Appendix A. Supplementary data Tsonis, K., Millan, J.L., 2003. Three novel spermatogenesis-speci c zinc nger genes. FEBS Lett. 547, 61–68. Zhang, X.H., Chasin, L.A., 2006. Comparison of multiple vertebrate genomes reveals the Supplementary data to this article can be found online at http://dx. birth and evolution of human exons. Proc. Natl. Acad. Sci. U. S. A. 103, 13427– 13432. doi.org/10.1016/j.gene.2015.11.051.