View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Genomics 87 (2006) 681–692 www.elsevier.com/locate/ygeno

The complexity of antisense revealed by the study of developing male germ cells ⁎ Wai-Yee Chan a,b, , Shao-Ming Wu a, Lisa Ruszczyk a, Evelyn Law a, Tin-Lap Lee a, Vanessa Baxendale a, Alan Lap-Yin Pang a, Owen M. Rennert a

a Laboratory of Clinical Genomics, National Institute of Child Health and Human Development, National Institutes of Health, Building 49, Room 2A08, 49 Convent Drive, MSC 4429, Bethesda, MD 20892-4429, USA b Department of Pediatrics, Georgetown University, Washington, DC 20007, USA Received 22 October 2005; accepted 13 December 2005 Available online 3 February 2006

Abstract

Computational analyses have identified the widespread occurrence of antisense transcripts in the human and the mouse genome. However, the structure and the origin of the majority of the antisense transcripts are unknown. The presence of antisense transcripts for 19 of 64 differentially expressed genes during mouse spermatogenesis was demonstrated with orientation-specific RT-PCR. These antisense transcripts were derived from a wide variety of origins, including processed sense transcripts, intronic and exonic sequences of a single gene or multiple genes, intergenic sequences, and . They underwent normal and alternative splicing, 5′ capping, and 3′ polyadenylation, similar to the sense transcripts. There were also antisense transcripts that were not capped and/or polyadenylated. The testicular levels of the sense transcripts were higher than those of the antisense transcripts in all cases, while the relative expression in nontesticular tissues was variable. Thus antisense transcripts have complex origins and structures and the sense and antisense transcripts can be regulated independently. Published by Elsevier Inc.

Keywords: Antisense transcription; Intron; Exon; Intergenic; ; Mouse; Spermatogonia; Spermatocytes; Spermatids

Although antisense transcription has been known to occur Furthermore, the occurrence of antisense transcription in in prokaryotes for many years, the widespread occurrence of specific biological processes has rarely been studied. A antisense transcripts in humans and mice has only recently number of biological events in spermatogenesis such as been documented. Computational analyses estimated 8 to genomic imprinting, translation repression, and stage-specific 20% of human and mouse genes form sense–antisense alternative splicing [11] are frequently associated with transcript pairs [1–4]. A more recent study of 10 human antisense transcripts. A systematic search for antisense chromosomes indicated about 61% of surveyed loci have transcripts in spermatogenic cells has not been reported. antisense transcripts [5]. The significance of the majority of Recent profiling of expressed genes in mouse germ cells at antisense transcripts is currently unknown, though a number different stages of development, including type A spermato- of biological functions have been proposed [6]. A role for gonia, pachytene spermatocytes, and round spermatids, by antisense transcription in disease processes has also been serial analysis of gene expression (SAGE) [12] offered a suggested [7–10]. Despite this apparent biological impor- unique opportunity to examine the occurrence of antisense tance of the antisense transcripts, very little is known of the transcription during this critical process in development. This mechanisms by which they are generated. Structural study documents the occurrence of antisense transcripts in information of antisense transcripts is also very limited. these three types of germ cells. Characterization of antisense transcripts cloned revealed the complex origins and structural features of these RNA molecules. Expression studies showed ⁎ Corresponding author. Fax: +1 301 480 4700. that the sense and the antisense transcripts can be regulated E-mail address: [email protected] (W.-Y. Chan). independently.

0888-7543/$ - see front matter. Published by Elsevier Inc. doi:10.1016/j.ygeno.2005.12.006 682 W.-Y. Chan et al. / Genomics 87 (2006) 681–692

Results and discussion Table 1 Genes with presence of antisense transcripts in testis confirmed by orientation- specific RT-PCR using total RNA as template Orientation-specific reverse transcription-polymerase chain reaction (RT-PCR) Symbol Gene name SAGE tag No. in Mean Δ Approximate Spga–Spcy–Sptd cycle S/AS (S/AS) No. ± SD Sixty-four genes that had been confirmed to be differentially testicular expressed in mouse germ cells at different stages of spermato- total RNA genesis by microarray [13] and/or quantitative real-time RT- Ch10 Heat shock 10- 106–2–2/ 3.85 ± 0.35 14 PCR (QPCR) and corroborated by SAGE [12] were selected for kDa 6–0–0 this investigation. The list of the genes is shown in 1 Supplementary Table S1. Thirteen differential expression (chaperone 10) – – patterns could be distinguished (results not shown). The most Calm2 Calmodulin 2 110 304 40/ 4.07 ± 0.44 17 6–20–0 common pattern of expression was 0 or very low in Ppic Peptidylprolyl 14–1–0/7– 20–11 5.02 ± 0.75 32 spermatogonia and equally high in spermatocytes and sperma- isomerase C tids (18 genes). The second most common pattern of expression Pdcl2 Phosducin-like 2 0–82–91/0–5–7 6.52 ± 0.22 92 was 0 or very low in spermatogonia, increased in spermato- Ubb Ubiquitin B 198–769–510/ 7.35 ± 0.16 163 – – cytes, and peaked in spermatids (17 genes). The copy number of 7 59 21 Sh3- Associated 1–7–0/0–7–0 7.72 ± 1.00 211 the representative SAGE tag of the majority of the genes (52 Stam molecule with the genes) was either 0 (33 genes) or very low in spermatogonia. To SH3 domain of identify the antisense transcripts, SAGE tags matching the STAM UniGene cluster of the genes were identified and multiple Tsg1 Testis-specific 0–310–419/ 8.70 ± 0.15 416 – – SAGE tags matching the same UniGene cluster were aligned gene 1 0 31 51 Gk-rs2 Glucokinase 0–54–41/0–6–5 9.00 ± 0.32 512 with all mRNAs deposited in that UniGene cluster. Orientation activity, related of the transcript represented by the SAGE tag was determined in sequence 2 reference to the known mRNA and confirmed by orientation- Uba52 Ubiquitin A-52 531–113–203/ 9.20 ± 0.15 588 specific RT-PCR [14]. residue ribosomal 20–4–13 Following alignment of the matching SAGE tags with protein fusion product 1 sequences in their respective UniGene clusters, 41 genes (64%) Tetc3 T-complex- 4–471–360/ 9.88 ± 1.09 942 were shown to have SAGE tags matching the antisense strand associated testis 6–20–0 (Supplementary Table S1). The presence of sense–antisense expressed 3 overlapping transcript was confirmed for 19 genes with Prm1 Proteamine 1 0–41–177/0–2–26 9.88 ± 0.20 942 – – – – orientation-specific RT-PCR using testicular total RNA as the Dnajb3 DnaJ (Hsp40) 0 63 84/0 4 12 10.50 ± 0.21 1448 homolog, template (Table 1). The results of analysis of the products of subfamily B, orientation-specific RT-PCR using polyacrylamide gel electro- member 3 phoresis (PAGE) are shown in Supplementary Fig. S1. Among Ppp1cc Protein 82–363–382/ 10.70 ± 0.34 1663 the rest of the 41 genes, 10 genes were considered to be not phosphatase 1, 5–18–6 confirmed of having antisense transcripts because a band was catalytic subunit, γ isoform present in the no-primer control of the RT-PCR. The remaining Prm2 Protamine 2 0–103–147/0–1–9 12.50 ± 0.12 5792 12 genes were not analyzed by orientation-specific RT-PCR Dbil5 Diazeparm 2–64–119/0–1–4 14.50 ± 0.13 23,170 because the copy number of the antisense SAGE tags was binding inhibitor- extremely low. Expression of the antisense transcript was like 5 – – – – confirmed by cloning and sequencing of the amplicons of the Ldh3C Lactate 0 282 243/0 4 2 16.10 ± 0.89 70,239 ∼ dehydrogenase 3, orientation-specific RT-PCR. Thus, 19 of 64 ( 30%) differen- C chain, sperm tially expressed genes had antisense transcripts. specific Examination of the nucleotide sequence of the antisense Adam5 A disintegrin and 3–54–92/0–12–4 16.81 ± 1.54 114,898 amplicons showed that they could be divided into three main metalloprotease groups based on the comparison with their sense genes. Fig. 1 domain 5 Sap17 Sperm 2–116–377/ 17.47 ± 0.45 181,549 shows the three groups of antisense amplicons. In group 1 the autoantigenic 1–21–30 antisense amplicons were 100% complementary to the sense protein 17 transcripts. This group could be divided into two subgroups. In Fhl4 Four and a half 0–164–272/ 19.54 ± 1.72 762,300 subgroup 1A, the amplicon was contained in a single exon. This LIM domain 4 2–19–27 subgroup included the antisense transcript of a disintegrin and The germ cell SAGE tag numbers are from Ref. [12]. SAGE tag No. in Spga– metalloprotease domain 5 (Adam5); diazepam-binding inhibi- Spcy–Sptd, number of SAGE tags in type A spermatogonia, pachytene Δ tor-like 5 (DbiL5); DnaJ (Hsp40) homolog, subfamily B, spermatocytes, and round spermatids; S, sense; AS, antisense; Mean cycle No. ± SD, means of difference in QPCR cycle number between sense and member 3 (DnajB3); four and a half LIM domains 4 (Fhl4); antisense transcripts and standard deviation of the means with testicular total glucokinase activity-related sequence 2 (Gk-rs2); phosducin- RNA as template; Approximate S/AS, approximate ratio of expression levels of like 2 (Pdcl2); peptidylprolyl isomerase C (Ppic); γ isoform of sense/antisense transcripts calculated from QPCR cycle numbers. W.-Y. Chan et al. / Genomics 87 (2006) 681–692 683

Fig. 1. Grouping of antisense amplicons based on their structure. For groups 1A, 1B, and 2, antisense amplicons are represented by yellow and brown bars and are shown on the top. Sense gene sequences are represented by red and pink bars and are shown on the bottom. For group 3, pseudogene is represented by a striped bar. The noncoding strand of the sense gene is represented by a white bar. Complementarity is represented by short black lines linking the antisense amplicon with the sense sequence. Thick black lines represent introns. protein phosphatase 1, catalytic subunit (Ppp1cc); protamine 2 splicing of two exons in accordance with the AG/GT(C) rule (Prm2); sperm autoantigenic protein 17 (Sap17); associated [18], implying a similar RNA processing mechanism in both molecule with the SH3 domain of STAM (Sh3-Stam); and sense and antisense transcription. A number of antisense testis-specific gene 1 (Tsg1). Subgroup 1B antisense amplicons transcripts had been reported to arise in a similar manner were 100% complementary to the sense transcripts, which previously [19–25]. Thus, the noncoding strand of a gene locus comprised two (T-complex-associated testis expressed 3 could be transcribed and processed independently to produce a (Tcte3)) or three (sperm-specific lactate dehydrogenase 3, C mature transcript. Both functional genes and pseudogenes could chain (Ldh3c)) spliced exons. Similar to amplicons of group 1B, give rise to antisense transcripts as shown by the group 3 the group 2 antisense amplicon consisted of more than one antisense amplicons. Pseudogene-derived antisense transcripts exon. However, different from group 1B amplicons, only part of are not unique to germ cells. The antisense transcript of neural the antisense amplicon was complementary to the exons in the nitric oxide synthase (nNOS) was also reported to be transcribed sense transcript. Part of the antisense amplicon was comple- from a pseudogene [26]. Judging from the proportion of mentary to the intron between the two exons of the sense gene. antisense transcripts derived from pseudogenes and the number The antisense amplicon of protamine 1 (Prm1-ASF1 (GenBank of such genes identified in the mouse and human genome [27], Accession No. DQ082992)) belonged to this group. Group 3 pseudogenes may be a rich source of antisense transcripts. antisense amplicons were not complementary to the sense transcripts. Instead, they were complementary to the pseudo- Molecular cloning and characterization of antisense genes. Four antisense amplicons, namely those of calmodulin 2 transcripts (Calm-2), heat shock 10-kDa protein 1 (chaperonin 10) (Ch10), ubiquitin A-52 residue ribosomal protein fusion product 1 T-complex-associated testis expressed 3 (Uba52), and ubiquitin B (Ubb), belonged to this group. The To understand further the nature of the antisense transcripts, antisense amplicons of these genes, namely, Calm2-ASF1 we cloned and characterized the antisense transcripts of one (GenBank Accession No. DQ082990), Ch10-ASF1 (GenBank member each of group 1B, group 2, and group 3. Cloning of the Accession No. DQ082993), Uba52-ASF1 (GenBank Accession antisense transcripts of Tcte3 of group 1B yielded 18 distinct No. DQ082994), and Ubb-ASF1 (GenBank Accession No. clones, which could be assigned to two groups based on the size DQ082991), were 88–94% identical to their sense transcripts, of the transcript. Fig. 3A shows a cartoon of the Tcte3 antisense but were 99% identical to their pseudogenes on chromosomes transcripts, A-Tcte3-a (GenBank Accession No. DQ072383) 12, 2, 11, and 2, respectively (Figs. 2B to 2E). and A-Tcte3-b (GenBank Accession No. DQ072384). The Total complementarity of the antisense amplicons of the presence of the longer antisense transcript, A-Tcte3-b, was genes in group 1B to the spliced exons of the sense genes confirmed by amplifying the entire transcript using the most 5′ suggested that they were synthesized after the sense transcripts and 3′ primers in one RT-PCR. The nucleotide sequences of A- were processed. Similar observations had been made with Tcte3-a and A-Tcte3-b are shown in Fig. 3B. A-Tcte3-a was antisense transcripts of cardiac troponin 1 [15], hemoglobin β identical to the first 532 nucleotides of A-Tcte3-b except for an [16], and rat urocortin [17]. It had been postulated that these additional 18 A's. Blasting A-Tcte3-b against the mouse genome antisense transcripts were transcribed from the sense mRNA in revealed that the first 704 nucleotides of A-Tcte3-b were the the cytoplasm by RNA-dependent RNA polymerase activity reverse complement of exon 4 (164 bp) of Tcte3 (NT_039641), [4,15], a biological activity hypothesized but not yet demon- including 246 and 294 bp, respectively, on the 5′ and 3′ side of strated in eukaryotic cells. Sequence of the antisense amplicon this exon (Fig. 3B). However, the 50 nucleotides upstream of of the gene in group 2 suggested that it was generated by the poly(A) tail could not be found in the Tcte3 gene. These 50 684 W.-Y. Chan et al. / Genomics 87 (2006) 681–692

nucleotides contained a polymorphic AAAAC repeat, with seven repeats in some clones and eight repeats in the others interrupted by an AAAC sequence. The SAGE tag sequences identified in the germ cell SAGE libraries were present in the antisense transcripts. Computer-assisted translation of the antisense transcripts revealed open reading frames (ORF) of 117 to 144 bp. The largest ORF encoded a putative polypeptide of 48 amino acids with no recognizable protein motif, indicating that the antisense transcript might serve as a noncoding regulatory RNA [28]. Unlike the antisense amplicon identified by orientation-specific RT-PCR, neither of the Tcte3 antisense transcripts overlapped more than one exon. The cloning approaches selected transcripts with a poly (A) sequence at the 3′ end (3′ rapid amplification of cDNA ends (RACE)) or a cap at the 5′ end (5′ RACE). Therefore the results suggested there exist more than two species of antisense transcripts of Tcte3, some of which were not polyadenylated and capped, similar to that observed in a number of noncoding regulatory RNAs [28].

Protamin 1 and 2 Three full-length antisense transcripts (A-Prm[1-2]-a (GenBank Accession No. DQ072380), A-Prm[1-2]-b (Gen- Bank Accession No. DQ072381), and A-Prm[1-2]-c (GenBank Accession No. DQ072383)) of the sole member of group 2, i.e., Prm1, were cloned; upon analysis, these turned out to overlap Prm1 as well as the neighboring Prm2. These transcripts were alternative splice variants containing different numbers of exons. The presence of these transcripts was confirmed by single RT-PCRs using the most 3′ and 5′ primers. The structure of the antisense transcripts is shown in Fig. 4A. The nucleotide sequence of A-Prm[1-2] is shown in Fig. 4B. The shortest antisense transcript was A-Prm[1-2]-a, which had 921 bp and a poly(A) tail. It had four exons, with the first exon overlapping the most 3′ 126 bp of Prm2 (BC049612), and exons 2 and 3 were the reverse complement of Prm1 (NM_013637). A-Prm[1- 2]-b was identical to A-Prm[1-2]-a with an additional exon (exon 3) of 116 bp, which overlapped part of the intergenic sequence between Prm1 and Prm2. It had a poly(A) tail. The longest antisense transcript was A-Prm[1-2]-c, which was identical to A-Prm[1-2]-b with an additional exon (exon 2) of 147 bp and a poly(A) tail. The 5′ end of exon 2 overlapped the 5′-most 112 bp of exon 1 of Prm2. Computer-assisted translation of the three antisense transcripts identified a similar ORF of 318 bp encoding a putative polypeptide containing four casein kinase II phosphorylation sites and four myristoylation sites as shown in Fig. 4B. A number of protein- encoding antisense transcripts had been reported previously [20,23,29,30].WhetherthePrm[1-2] antisense transcripts Fig. 2. Nucleotide sequence of group 2 and group 3 antisense amplicons. encode any protein awaits validation. Primers for amplification are represented by broken line with arrow. Notation of The Prm[1-2] antisense transcripts contained multiple exons the primers refers to the sense transcript. (A) Prm1-ASF1. The antisense whose splice variants followed the AG/GT(C) rule [18]. These amplicon sequence is on the top. The complementary sense sequence is on the antisense exons overlapped with exonic, intronic, and intergenic bottom. Uppercase letters represent exon sequence and lowercase letters sequences of the sense genes. Similar to the sense transcripts, represent intronic sequence. (B) Calm2-ASF1. The antisense amplicon of calmodulin 2. (C) Ch10-ASF1, antisense amplicon of chaperonin 10. (D) the antisense transcripts underwent alternative splicing. This Uba52-ASF1, antisense amplicon of ubiquitin A-52 ribosomal protein fusion phenomenon had also been demonstrated in other studies product 1. (E) Ubb-ASF1, antisense amplicon of ubiquitin B. [21,23,31,32]. Thus, processing of some antisense transcripts W.-Y. Chan et al. / Genomics 87 (2006) 681–692 685

Fig. 3. Antisense transcripts of T-complex testis expressed 3. (A) Cartoon of the Tcte3 antisense transcripts. The SAGE tags identifying these antisense transcripts are shown in the boxes with the number of SAGE tags in each cell type shown as spermatogonia/spermatocytes/spermatids [12]. The area of complementary to the sense exon is flanked by the dotted lines. The 3′ poly(A) tail and the 5′ cap site of the antisense transcripts are shown. For the sense gene, the boxes represent exons while the thick line represents introns and flanking sequence of the gene. Empty boxes represent untranslated region and filled boxes represents translated sequence. The sizes of the exons, introns, and intergenic regions are not to scale. (B) Nucleotide sequence of A-Tcte3-b. ▵ indicates the end of sequence homology between A-Tcte3-a and A- Tcte3-b. For A-Tcte3-a, there are 18 A's after that point. The polymorphic AAAAC stretch is marked with asterisks. The number of AAAAC is polymorphic, with 6 in some clones and 7 in the others. The two SAGE tag sequences identified in germ cell SAGE libraries are underlined. ⇒⇐demarcate the boundaries of the sequence that is reverse complementary to exon 4 of Tcte3. follows rules similar to those of the sense transcripts. It was derived from the pseudogene on chromosome 4 (A-Uba52-4a interesting to note that the Prm[1-2] antisense transcripts (GenBank Accession No. DQ072385), A-Uba52-4b (GenBank spanned two neighboring genes. Thus, they might interact with Accession No. DQ072386), and A-Uba52-4c (GenBank both sense transcripts. Examination of the SAGE tags in the Accession No. DQ072387)) and four groups of antisense germ cell libraries indicated two antisense tags that overlapped transcripts derived from the pseudogene on chromosome 9 (A- with the second exon of Prm2 [12]. Since SAGE tags are close Uba52-9a (GenBank Accession No. DQ072388), A-Uba52 -9b to the 3′ end of transcripts, this observation suggested the (GenBank Accession No. DQ072389), A-Uba52-9c (GenBank presence of antisense transcripts that overlapped primarily with Accession No. DQ072390), and A-Uba52-9d (GenBank the Prm2 gene. Accession No. DQ072391)). The presence of these transcripts was confirmed by successful RT-PCR with the most 5′ and Ubiquitin A-52 residue ribosomal protein fusion product 1 most 3′ primers. Fig. 5A shows the cartoon depicting the The antisense transcripts of Uba52 in group 3 were cloned. intron–exon structure of the Uba52 antisense transcripts. The 5′ RACE yielded two groups of clones, neither of which were nucleotide sequences of the antisense transcripts of Uba52 are derived from the functional Uba52 gene on chromosome shown in Fig. 5B. It is interesting to notice that the short 8(Fig. 5A). They were derived from two putative pseudo- amplicon generated by orientation-specific RT-PCR was genes, one on the tip of the short arm of chromosome 4 derived from another pseudogene of Uba52 on chromosome (NT_039258) and the other on the long arm of chromosome 9 11, indicating that only a portion of the antisense transcripts of (NT_039472). Both pseudogenes were 97–98% homologous this gene were cloned in this exercise. to Uba52 mRNA (BC014772) with no introns and a putative All the antisense transcripts derived from the chromosome 4 ancestral poly(A) stretch with 24 and 17 A's, respectively. 3′ pseudogene predict an extra exon downstream of the 3′ end of RACEusingprimersbasedonthe5′ RACE products the pseudogene identified by the putative ancestral poly(A) tail subsequently yielded three groups of antisense transcripts (Fig. 5A, b). A-Uba52-4a comprised two exons of 170 and 415 686 W.-Y. Chan et al. / Genomics 87 (2006) 681–692

Fig. 4. Antisense transcripts of protamine 1 and protamine 2. (A) Cartoon of the Prm[1-2] antisense transcripts. The putative exons of the antisense transcripts are shown. The exons are numbered starting from the 5′ end of the antisense transcript. The rest of the notations are the same as described in the legend to Fig. 3B. The intergenic sequence between the Prm1gene and the Prm2 gene is ∼4.2 kb. (B) Nucleotide sequence and computer-predicted amino acid sequence of polypeptide encoded by A-Prm[1-2]-c. A-Prm[1-2]-adoes not have exons 2 and 3; A-Prm[1-2]-b does not have exon 2. Exon–intron boundaries are indicated by ←→. The SAGE tag sequences identified in germ cell SAGE libraries are underlined. Amino acids encoded by the ORF are shown underneath the nucleotide sequence as single-letter code. The stop codon is represented by +++. Amino acids constituting the potential casein kinase II phosphorylation sites in the putative encoded polypeptide are underlined. Amino acids constituting potential myristoylation site are marked by asterisks.

bp, respectively. The first exon was complementary to a putative the pseudogene. Splicing of the exons of A-Uba52-4a followed exon 1830 bp downstream of the pseudogene. This exon was the AG/GT(C) rule [18]. A-Uba52-4b, with 1048 bp, was the spliced to an exon starting at 84 bp 5′ of the poly(A) stretch of largest antisense transcript derived from this pseudogene. It was W.-Y. Chan et al. / Genomics 87 (2006) 681–692 687 similar to A-Uba52-4a except the second exon was larger, with PCR (Supplementary Fig. S1). The discrepancy between the 854 bp. A-Uba52-4c was different from the other two transcripts QPCR and the PAGE results could be the consequence of in that the first exon was 321 bp and was 1799 bp downstream differences in the procedure. Since RT-PCR was done using the of the pseudogene. This exon was spliced to the second exon at one-tube procedure while QPCR involved a transfer of RT the same position as the other two antisense transcripts. product from one tube to another prior to performing QPCR, Computer-assisted translation of A-Uba52-4b revealed an any loss of RT product due to adsorption onto to the wall of the ORF of 393 bp with no recognizable protein motif. RT reaction tube will greatly affect the result of the QPCR but The longest antisense transcript derived from the pseudogene not that of the RT-PCR and PAGE. This effect will be especially on chromosome 9, A-Uba52-9c, had 1920 bp with 298 bp in significant for low-abundance antisense transcripts. Thus the exon 1, 161 bp in exon 2, and 1461 bp in exon 3, including a QPCR results will not accurately reflect the relative expression poly(A) tail of ∼21 A's. Exon 3 overlapped with the levels of the sense and antisense transcripts, particularly for the pseudogene of Uba52. Exon 1 was predicted to be 4060 bp low-abundance transcripts. Another cause of the discrepancy and exon 2 to be 1868 bp, downstream of the ancestral poly(A) could be the excessive number of PCR cycles (30 cycles) stretch of the pseudogene (Fig. 5A, c). A-Uba52-9b was usually performed to reveal the presence of the antisense identical to A-Uba52-9c except that exon 3 was shorter with transcripts. only 725 bp and a poly(A) tail of ∼14 A's. A-Uba52-9a did not We also examined the expression of sense and antisense have exon 2. A-Uba52-9d was identical to A-Uba52-9a except transcripts of nine genes in nontesticular tissues, including with a LINE 1 repeat of 88 bp inserted between exons 1 and 2. whole embryo, ovary, brain, thymus, kidney, spleen, heart, 3′ extension of exon 3 of both A-Uba52-9a and A-Uba52-9d lung, and liver, by semiquantitative orientation-specific RT- was not successful and we sequenced only 170 bp from the 5′ PCR. The primers for the RT-PCR were the same as those for end and stopped at the poly(T) stretch. Different from A-Uba52- studying testicular expression of the sense and the antisense 4b, no ORF of appreciable size could be identified in any of the transcripts (Supplementary Table S2). Results of electropho- chromosome 9 antisense transcripts. resis of the RT-PCR products are shown in Fig. 6. The relative The study of the antisense transcripts of Uba52 showed that expression levels of the sense and the antisense transcripts are unlike the pseudogenes, the antisense transcripts derived from summarized in Table 2. Expression of the sense and the them could be composed of multiple exons and undergo antisense transcript of Pdcl2 was testis specific. The posttranscription processing. It is likely that these transcripts expression of the antisense transcript of Tsg1 and Tcte3 was have biological activities, unlike pseudogenes that are also testis specific. Unlike the sense transcript, the antisense presumed to be nonfunctional. A-Uba52-4b had an appreciable transcript of Prm1 was ubiquitously expressed. Expression of ORF. Even though no known motif could be identified, the Prm1 and Prm2 in nontesticular tissues was reported potential for it to encode a novel protein could not be previously [33–35] and confirmed by nucleotide sequencing precluded. The other transcripts with no significant ORF of the TA-cloned sense and antisense amplicons (data not might have regulator activities similar to that proposed for shown). The sense and the antisense transcripts of Pdcl2 and noncoding RNAs [28]. The transcription of an antisense RNA Prm2 were coexpressed in selected tissues while those of from a pseudogene was reported previously for neural nitric Calm2 and Uba52 were coexpressed in all tissues examined. oxide synthase [26]. In that case, the pseudogene-derived On the other hand, the sense transcripts of Ppp1cc and Ppic antisense transcript was shown to regulate the synthesis of the were ubiquitously expressed while their antisense transcripts neural nitric oxide synthase protein. Whether the Uba52 were found only in certain tissues. antisense transcripts serve as regulatory RNAs is currently Although the sense transcripts were often expressed at higher under investigation. levels than antisense transcripts, when the two were coex- pressed the relative expression pattern of the sense and the Expression studies of antisense transcripts antisense transcripts varied for different genes. This was contrary to that observed when mouse embryonic tail genes The relative expression levels in mouse testis of sense and were studied [36]. Coexpression of the sense and the antisense antisense transcripts of the aforementioned 19 genes were transcripts in the same cell facilitates the formation of RNA examined using QPCR with primer sets described in Supple- duplexes and may result in modulation of gene expression or mental Table S2. The data indicated that the expression level in mRNA stability and processing [6,37,38]. Tissue-specific testis of the sense transcript was always significantly higher expression of the antisense transcripts implies regulation of (p b 0.001) than that of the antisense transcript. The results are the appropriate tissue- and cell-type expression of the sense summarized in Table 1. This differential expression of the sense gene similar to that observed for Nphs1 and Dnm3 [37,39,40].It and antisense transcripts in whole testis was similar to that may also indicate regulation or activity of the antisense observed in the more differentiated germ cells such as transcript independent of the sense transcript as observed in spermatocytes and spermatids as shown by the SAGE analysis, the case of FGF-2 [41]. with the exception of Ppic and Sh3-Stam ([12] and Table 1). In Differential expression of sense and antisense transcripts in most cases, the difference in expression levels between the different cells and tissues has been reported for a number of sense and the antisense transcripts as shown by QPCR was imprinted genes [6,21,42]. Antisense transcripts have been much bigger than that revealed by PAGE analysis following RT- suggested to play important roles in the regulation of 688 W.-Y. Chan et al. / Genomics 87 (2006) 681–692 monoallelic expression in X-chromosome inactivation and including Prm1, Prm2, Tcte3, and Tsg1, coincidentally were genomic imprinting [43–45]. A number of the genes in the preferentially expressed in meiotic and postmeiotic cells (Tables present study also showed differential expression of the sense– 1 and 2). It is unknown whether these antisense transcripts have antisense transcripts. The sense transcript of these genes, a particular function in transcription regulation at these stages. W.-Y. Chan et al. / Genomics 87 (2006) 681–692 689

The present study demonstrates the occurrence of antisense TX, USA). RNA integrity and concentration were checked with the transcription in germ cells. It is tempting to speculate that this Bioanalyzer 2100 (Agilent Technologies, Germantown, MD, USA). Orienta- tion-specific RT-PCR was performed as described previously by Shendure and vigorous control of gene expression is necessary to ensure the Church [14]. The One Step RT-PCR kit (Qiagen, Valencia, CA, USA) was accuracy or to facilitate the functioning of the biological used according to the manufacturer's protocol with a total volume of 25 μl. processes occurring during spermatogenesis such as genome- The primers were designed based on the published mRNA sequence such that wide methylation–demethylation, genomic imprinting, mono- they would prime the amplification of a 120- to 230-bp sequence covering the ′ – allelic gene expression, etc. It suggests that mouse spermato- most 3 sense antisense pair. Primers were designed using Primer Express version 2.0 (Applied Biosystems, Branchburg, NJ, USA) and specificity was genesis may be a good model for studying the regulation and confirmed by a BLASTN search against the nonredundant and EST mouse biological activities of antisense transcripts. sets from NCBI. The sequence and location of the primers are described in It has been argued that antisense transcripts arise from leaky Supplemental Table S2. One microgram of total RNA was used as template in transcription of the noncoding strand. Even though this all RT reactions. In general, 30 PCR cycles were performed for the orientation- possibility cannot be completely ruled out, the fact that most specific RT-PCR with the exception of cases in which the expression level of the transcript was very high or very low. All orientation-specific RT-PCR of the antisense transcripts examined are polyadenylated or experiments were performed with controls in which no strand-specific primer processed in a manner similar to the sense transcripts suggests was added to the RT reaction [14]. Five microliters of the 25 μl product of that they are intentional transcripts. This is supported by the orientation-specific RT-PCR was analyzed by PAGE. Quantitative real-time results of a recent study comparing the genomic organization of RT-PCR analyses were carried out as described previously [50]. Sense-specific genes with or without antisense transcripts between human, first-strand cDNA was synthesized as described earlier using the Qiagen One Step RT-PCR kit. Products of the RT reaction were aliquoted for QPCR. mouse, and pufferfish [46]. Previous studies on antisense Primers for QPCR were the same as those used for orientation-specific RT- transcription either used a computational approach to look for PCR as described in Supplemental Table S2. QPCR was performed in the global presence of antisense transcripts or focused on a triplicate in the 7900 HTS Sequence Detection System (Applied Biosystems, single gene [1–4,19–26,29,30,35,37]. Few reports document Foster City, CA, USA). Gene expression data were reported after normalizing the mechanisms by which the antisense transcripts are to 18S rRNA content. The mean cycle number and the standard deviations were reported. Significance of expression level difference was analyzed using generated. In the present study, similar to a recent report [47], Student's t test. we used a SAGE database for the identification of antisense transcripts. We showed that a significant percentage of 5′ and 3′ RACE and nucleotide sequencing differentially expressed genes in spermatogenic germ cells are associated with antisense transcripts. These transcripts come To clone the antisense transcripts, anchored PCR was performed using the from a wide spectrum of sources: processed sense mRNA, the GeneRacer Kit (Invitrogen, Gaithersburg, MD, USA) according to the sense gene locus, pseudogene, two neighboring genes, and manufacturer's recommendations. 5′ RACE using the GeneRacer Kit ′ intergenic sequence. Characterization of the cloned antisense specifically targets only 5 -capped mRNA. Reverse transcription of the ligated mRNA to create RACE-ready first-strand cDNA was effected using Superscript transcripts also showed that antisense transcripts can be III reverse transcriptase (Invitrogen) and orientation-specific primer. The RACE processed, alternatively spliced, capped, and polyadenylated. product was amplified using nested primers and gene-specific primers before TA There are also antisense transcripts that do not have a 5′ cap cloned into the pCR 4-TOPO vector (Invitrogen) according to the manufac- and/or 3′ poly(A) tail. Thus the origin and structure of antisense turer's protocol. The sequences of specific primers used for 5′ and 3′ RACE are transcripts are very complex. Antisense transcripts have been indicated in Figs. 3B, 4B, and 5B. The sequence of the cloned cDNA was determined by cycle sequencing using BigDye Primer Cycle Sequencing Kits proposed to have functions in transcriptional and posttranscrip- (Applied Biosystems). DNA sequences were analyzed using DNASIS software tional regulation [6,14,28,43,44,48,49]. The complex nature of version 2.5 (MiraiBio, Alameda, CA, USA). The presence of antisense the antisense transcripts suggests that they are well suited to transcripts was confirmed by repeating the RT-PCR using the most 5′ and the provide an additional layer of vigorous regulation of gene most 3′ primers that were located in different exons in the antisense transcript. If expression. both primers were contained in the sense transcript, orientation-specific RT-PCR was performed. Open reading frames in the nucleotide sequence were predicted using the Materials and methods NCBI ORF finder program (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and ORF Finder (http://www.bioinformatics.org/sms/orf_find.html). Translated Orientation-specific RT-PCR and QPCR amino acid sequences in the potential ORFs in FASTA format were input into the SMART 4.0 database [51]. The output was ranked by e values. Domains Presence of antisense transcript was confirmed by orientation-specific RT- with scores that were less significant than the required threshold or overlapped PCR. Total RNA from mouse tissues was purchased from Ambion (Austin, with some other source of annotation were rejected.

Fig. 5. Antisense transcripts of ubiquitin A 52-residue ribosomal protein fusion product 1. (A) Cartoon of the cloned Uba52 antisense transcripts. For the functional genes and the pseudogenes, the empty and hatched boxes represent exons, while the thick line represents introns and flanking sequences. The rest of the notations are the same as described in the legend to Fig. 3B. (a) Functional gene on chromosome 8. The three exons are shown. (b) Antisense transcripts A-Uba52-4a, -4b,and -4c of the pseudogene on chromosome 4. The filled box represents the putative exon predicted by the antisense transcripts located downstream of the pseudogene. (c) Antisense transcripts A-Uba52-9a, -9b, -9c, and -9d of the pseudogene on chromosome 9. The two putative exons predicted by the antisense transcripts located downstream of the pseudogene are represented by the filled boxes. The hatched box represents the LINE1 repeat insert. (B) Nucleotide sequence of antisense transcripts of Uba52. (a) A-Uba52-4b.▵ indicates the end of sequence of A-Uba52-4a. There are 21 A's after this point. ←→indicates the exon–intron boundaries. ⇐ indicates the 5′ end of the pseudogene. The SAGE tag sequences identified in germ cell SAGE libraries are underlined. (b) A-Uba52-4c; (c) A-Uba52-9c. ▵ indicates the end of A-Uba52-9a and A-Uba52-9b. There are 14 A's following ▵ in A-Uba52-9b. The SAGE tag sequences identified in germ cell SAGE libraries are underlined. Exon 2 is missing in A-Uba52-9a. A-Uba52-9dis similar to A-Uba52-9a except with the insertion of ATATATTGTT GGCAAAGATT TTTCTCCCAT TCCATATGCC GACTATTCAAT GCTGTTTTTA CTTTGCGGTG CAGAAATCTT TTAAATTC between exons 1 and 2. 690 W.-Y. Chan et al. / Genomics 87 (2006) 681–692

Fig. 6. Polyacrylamide gel electrophoresis of the product of orientation-specific RT-PCR of nine genes in different tissues. M, size marker; S, orientation-specific RT- PCR using primer specific for the sense transcript; As, orientation-specific RT-PCR using primer specific for the antisense strand. Control with no strand-specific primer in the RT reaction was done in all experiments. The representative results of triplicate experiments are shown. W.-Y. Chan et al. / Genomics 87 (2006) 681–692 691

Table 2 Tissue distribution of sense and antisense transcript pairs in different mouse tissues Gene Embryo Testis Ovary Brain Thymus Kidney Spleen Heart Lung Liver S/AS S/AS S/AS S/AS S/AS S/AS S/AS S/AS S/AS S/AS Calm2 +++/+++ +++/+ +++/+++ +++/+ ++/+ ++/+ +++/+++ +++/+ +++/+ +++/+ Pdcl2 –/– ++/+ –/––/––/––/––/––/––/––/– Ppic ++++/+ ++++/++ ++++/+ ++/– ++++/– ++++/+ ++++/– ++++/+ ++++/– ++++/– Ppp1cc ++++/– +++/+ ++/– ++/– ++/– ++/+ ++/+ ++/– ++/– ++/++ Prm1 –/+ +++/+++ –/+ –/++ ++/+ ++/+ ++/+ ++/+ +/+ +/+ Prm2 –/– ++++/+++ –/––/– ++/++ –/– +/++ –/– +/++ –/– Tcte3 –/– ++++/++ +/– +/– ++/– +/– +++/––/– ++/––/– Tsg1 +/– +++/+ +/––/––/– +/– +/––/– +/– +/– Uba52 ++++/+ ++++/++++ ++++/+ ++++/+ ++++/+++ ++++/+ ++++/++++ ++++/+ ++++/++ +++/+ S, sense transcript expression; AS, antisense transcript expression; + indicates the presence of a band and the number of +'s represents the visual intensity of the band; – indicates the absence of a band.

Acknowledgments [14] J. Shendure, G.M. Church, Computational discovery of sense–antisense transcription in the human and mouse genome, Genome Biol. 3 (2002) This research was supported by the Intramural Research research0044.1-0044.14. Program of the NIH, National Institute of Child Health and [15] H. Bartsch, S. Voigtsberger, G. Baumann, I. Morano, H.P. Luther, Detection of a novel sense–antisense RNA-hybrid structure by RACE Human Development. experiments on endogenous troponin 1 antisense RNA, RNA 10 (2004) 1215–1224. Appendix A. Supplementary data [16] B. Bonafoux, et al., Analysis of remnant reticulocyte mRNA reveals new genes and antisense transcripts expressed in the human erythroid lineage, – Supplementary data associated with this article can be found Haematologica 89 (2004) 1434 1438. [17] M. Shi, X. Yan, D.H. Ryan, R.B.S. Harris, Identification of urocortin in the online version at doi:10.1016/j.ygeno.2005.12.006. mRNA antisense transcripts in rat tissue, Brain Res. Bull. 53 (2000) 317–324. [18] M. Burset, I.A. Seledtsov, V.V. Solovyev, Analysis of canonical and non- References canonical splice sites in mammalian genomes, Nucleic Acids Res. 28 (2000) 4364–4375. [1] R. Yelin, et al., Widespread occurrence of antisense transcription in the [19] J.P. Nemes, K.A. Benzow, M.D. Koob, The SCA8 transcript is an antisense human genome, Nat. Biotechnol. 21 (2003) 379–386. RNA to a brain-specific transcript encoding a novel actin-binding protein [2] H. Kiyosawa, et al., Antisense transcripts with FANTOM2 clone set and (KLHL1), Hum. Mol. Genet. 9 (2000) 1543–1551. their implications for gene regulation, Genome Res. 12 (2003) 1324–1334. [20] M.L. Hastings, H.A. Ingle, M.A. Lazar, S.H. Munroe, Post-transcriptional [3] J. Chen, et al., Over 20% of human transcripts might form sense–antisense regulation of thyroid hormone receptor expression by cis-acting sequences pairs, Nucleic Acids Res. 32 (2004) 4812–4820. and a naturally occurring antisense RNA, J. Biol. Chem. 275 (2000) [4] O. Rosok, M. Sioud, Systematic identification of sense–antisense 11507–11513. transcripts in mammalian cells, Nat. Biotechnol. 22 (2004) 104–108. [21] T. Li, et al., An imprinted PEG1/MEST antisense expressed predominantly [5] J. Cheng, et al., Transcriptional maps of 10 human chromosomes at 5- in human testis and in mature spermatozoa, J. Biol. Chem. 277 (2002) nucleotide resolution, Science 308 (2005) 1149–1154. 13518–13527. [6] G. Lavorgna, D. Dahary, B. Lehner, R. Sorek, C.M. Sanderson, G. Casari, [22] T.H. Vu, N.V. Chuyen, T. Li, A.R. Hoffman, Loss of imprinting of IGF2 In search of antisense, Trends Biochem. Sci. 29 (2004) 88–94. sense and antisense transcripts in Wilms tumor, Cancer Res. 63 (2003) [7] C. Tufarelli, et al., Transcription of antisense RNA leading to gene 1900–1905. silencing and methylation as a novel cause of human genetic disease, Nat. [23] R. Knee, A.W. Li, P.R. Murphy, Characterization and tissue-specific Genet. 34 (2003) 157–165. expression of the rat basic fibroblast growth factor antisense mRNA and [8] E.M. Reis, et al., Antisense intronic non-coding RNA levels correlate to protein, Proc. Natl. Acad. Sci. USA 94 (1997) 4943–4947. the degree of tumor differentiation in prostate cancer, Oncogene 23 (2004) [24] G.B. Robb, et al., Post-transcriptional regulation of endothelial nitric-oxide 6684–6692. synthase by an overlapping antisense mRNA transcript, J. Biol. Chem. 279 [9] E.M. Reis, R. Louro, H.I. Nakaya, S. Verjovski-Almeida, As antisense (2004) 37982–37996. RNA gets intronic, OMICS 9 (2005) 2012. [25] A. Hernandez, M.E. Martinez, W. Croteau, D.L. St. Germain, [10] A. Seitz, et al., Sense and antisense transcripts of the apolipoprotein E gene Complex organization and structure of sense and antisense transcripts in normal and ApoE knockout mice, their expression after spinal cord expressed from the DIO3 gene imprinted locus, Genomics 83 (2004) injury and corresponding human transcripts, Hum. Mol. Genet. 14 (2005) 413–424. 2661–2670. [26] S.A. Korneev, J.-H. Park, M. O'Shea, Neuronal expression of neural [11] K.C. Kleene, Patterns, mechanisms, and functions of translation regulation nitric oxide synthase (nNOS) protein is suppressed by an antisense RNA in mammalian spermatogenic cells, Cytogenet. Genome Res. 103 (2003) transcribed from an NOS pseudogene, J. Neurosci. 19 (1999) 217–224. 7711–7720. [12] S.M. Wu, et al., Analysis of mouse germ cell at different [27] Z. Zhang, M. Gerstein, Large-scale analysis of pseudogenes in the human stages of spermatogenesis: biological significance, Genomics 84 (2004) genome, Curr. Opin. Genet. Dev. 14 (2004) 328–335. 971–981. [28] J.S. Mattick, RNA regulation: a new genetics? Nat. Rev. Genet. 5 (2004) [13] N. Schultz, F.K. Hamra, D.L. Garbers, A multitude of genes expressed 316–323. solely in meiotic or postmeiotic spermatogenic cells offers a myriad [29] F. Runkel, M. Michels, T. Franz, Fxyd2 and Lgi4 expression in the adult of contraceptive targets, Proc. Natl. Acad. Sci. USA 100 (2003) mouse: a case of endogenous antisense expression, Mamm. Genome 14 12201–12206. (2003) 665–670. 692 W.-Y. Chan et al. / Genomics 87 (2006) 681–692

[30] A.-N. Spiess, N. Walther, N. Muller, M. Balvers, C. Hansis, R. Ivell, antisense RNA encodes a nuclear protein with MutT-like antitumor SPEER—A new family of testis-specific genes from the mouse, Biol. activity, Mol. Cell. Endocrinol. 133 (1997) 177–182. Reprod. 68 (2003) 2044–2054. [42] K. Yamasaki, K. Joh, T. Ohta, H. Masuzaki, T. Ishimaru, Neurons but not [31] A.W. Li, P.R. Murphy, Expression of alternatively spliced FGF-2 antisense glial cells show reciprocal imprinting of sense and antisense transcripts of RNA transcripts in the central nervous system: regulation of FGF-2 mRNA Ube3a, Hum. Mol. Genet. 12 (2003) 837–847. translation, Mol. Cell. Endocrinol. 162 (2000) 69–78. [43] H. Kiyosawa, K. Abe, Speculations on the role of natural antisense [32] G. Alfano, et al., Natural antisense transcripts associated with genes transcripts in mammalian X chromosome evolution, Cytogenet. Genome involved in eye development, Hum. Mol. Genet. 14 (2005) 913–923. Res. 99 (2002) 151–156. [33] R. Slomski, M. Schloesser, H. Chlebowska, J. Reiss, W. Engel, Detection [44] Y. Ogawa, J.T. Lee, Antisense regulation in X inactivation and autosomal of human spermatid-specific transcripts in peripheral blood lymphocytes imprinting, Cytogenet. Genome Res. 99 (2002) 59–65. of males and females, Hum. Genet. 87 (1991) 307–310. [45] C.J. Brown, J.C. Chow, Beyond sense: the role of antisense RNA in [34] Stanford Microarray Database [http://www.genome-www5.stanford.edu/ controlling Xist expression, Semin. Cell Dev. Biol. 14 (2003) 341–347. cgi-bin/source]. [46] D. Dahary, O. Elroy-Stein, R. Sorek, Naturally occurring antisense: [35] EMBL [http://www.harvester.embl.de/harvester/ transcriptional leakage or real overlap? Genome Res. 15 (2005) 364–368. AAH6/AAH66338.htm]. [47] J. Chen, M. Sun, L.D. Hurst, G.G. Carmichael, J.D. Rowley, Human [36] M.B. Wahl, U. Heinzmann, K. Imai, LongSAGE analysis revealed the antisense genes have unusually short introns: evidence for selection for presence of a large number of novel antisense genes in the mouse genome, rapid transcription, Trends Genet. 21 (2005) 203–207. Bioinformatics 21 (2005) 1389–1392. [48] C. Blin-Wakkach, et al., Endogenous Msx1 antisense transcript: in vivo [37] G. Solda, et al., In vivo RNA–RNA duplexes from human alpha3 and and in vitro evidences, structure, and potential involvement in skeleton alpha5 nicotinic receptor subunit mRNAs, Gene 345 (2005) 155–164. development in mammals, Proc. Natl. Acad. Sci. USA 98 (2001) [38] H.P. Luther, Role of endogenous antisense RNA in cardiac gene 7336–7341. regulation, J. Mol. Med. 83 (2005) 26–32. [49] T. Imamura, S. Yamamoto, J. Ohgane, N. Hattori, S. Tanaka, K. Shiota, [39] P. Ihalmo, et al., Molecular cloning and characterization of an endogenous Non-coding RNA directed DNA demethylation of Sphk1 CpG island, antisense transcript of Nphs1, Genomics 83 (2004) 1134–1140. Biochem. Biophys. Res. Commun. 322 (2004) 593–600. [40] D.A.F. Loebel, B. Tsoi, N. Wong, P.P.L. Tam, A conserved noncoding [50] A.L.Y. Pang, et al., Identification of differentially expressed genes in intronic transcript at the mouse Dnm3 locus, Genomics 85 (2005) spermatogenesis in the mouse, J. Androl. 24 (2003) 899–911. 782–789. [51] I. Letunic, et al., SMART 4.0: towards genomic data integration, Nucleic [41] A.W. Li, C.K.L. Too, R. Knee, M. Wilkinson, P.R. Murphy, FGF-2 Acids Res. 32(database issue) (2004) D142–D144.