Vol. 22 no. 20 2006, pages 2475–2479 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl429

Sequence analysis A large quantity of novel human antisense transcripts detected by LongSAGE Xijin Ge1, Qingfa Wu1, Yong-Chul Jung1, Jun Chen1 and San Ming Wang1,2,Ã 1Center for Functional Genomics, Division of Medical Genetics, Department of Medicine, ENH Research Institute and 2Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, 1001 University Place, Evanston, IL 60201 USA Received on April 11, 2006; revised on July 5, 2006; accepted on August 1, 2006 Advance Access publication August 7, 2006 Associate Editor: Nikolaus Rajewsky

ABSTRACT Lee et al., 2005). Therefore, the antisense transcripts identified from Motivation: Taking advantage of the high sensitivity and specificity of the currently known transcripts might not be comprehensive. LongSAGE tag for transcript detection and genome mapping, we ana- Exploring various transcript resources could identify more antisense lyzed the 632 813 unique human LongSAGE tags deposited in public transcripts present at low abundant levels. One of these transcript databases to identify novel human antisense transcripts. resources is the SAGE (serial analysis of expression; Results: Our study identified 45 321 tags that match the antisense Velculescu et al., 1995) data. SAGE collects short tags near the strand of 9804 known mRNA sequences, 6606 of which contain anti- 30 end of transcripts. The orientation of the detected transcript is sense ESTs and 3198 are mapped only by SAGE tags. Quantitative preserved in the SAGE tag sequence; therefore, SAGE tags can analysis showed that the detected antisense transcripts are present at distinguish sense and antisense transcripts. Moreover, the high sen- levels lower than their counterpart sense transcripts. Experimental sitivity of SAGE over other approaches leads to the detection of results confirmed the presence of antisense transcripts detected by more transcripts expressed at lower-abundant levels (Saha et al., the antisense tags. We also constructed an antisense tag database 2002). Therefore, exploring SAGE data might lead to the identi- that can be used to identify the antisense SAGE tags originated from fication of more antisense transcripts. Indeed, analyzing mouse the antisense strand of known mRNA sequences included in the SAGE tags has resulted in the identification of mouse antisense RefSeq database. transcripts (Quere et al., 2004; Wahl et al., 2005). Conclusions: Our study highlights the benefits of exploring SAGE data In this study, we explored the possibility of using 21 bp Long- for comprehensive identification of human antisense transcripts and SAGE tags in the public database to identify human novel antisense demonstrates the prevalence of antisense transcripts in the human transcripts. Compared with the standard 14 bp SAGE tags, the 21 bp genome. LongSAGE tags provide high specificity for mapping in the human Contact: [email protected] genome sequences (Saha et al., 2002). Thus, the genome sequences Supplementary information: Supplementary data are available at are used as the hub to compare the existing known transcripts and Bioinformatics online SAGE tags in order to identify the antisense transcripts detected by SAGE tags. Our analysis identified a substantial number of novel antisense transcripts in the . Here we report the results from this study. 1 INTRODUCTION It is well-known that the genomic loci coding for can be 2 METHODS transcribed from the opposite direction, resulting in the generation of antisense transcripts (Simons et al., 1988). Antisense transcripts 2.1 Identification of antisense SAGE tags play important roles in many biological processes including The LongSAGE tags from 29 human LongSAGE libraries were used for the imprinting (Lee et al., 1999) and regulation of gene expression study (http://www.ncbi.nlm.nih.gov/geo/). All tags were matched through (Novina and Sharp, 2994). Comprehensive identification of anti- BLAT to the human genome sequences (Kent, 2002; http://genome.ucsc. sense transcripts will enhance our understanding of biological edu/, HG17, May 2004). The NlaIII cutting site, ‘CATG’, was added to the 0 basics, and provide tools for functional interference of processes. 5 end of each 17 bp tag before mapping. For such mapping we used a tile Large-set expression sequence data have been used to identify anti- (word) size of 12 and step size of 6. No gap or mismatch was allowed. Tags uniquely mapped to the human genome were identified and used for the sense transcripts expressed in several genomes (Shendure and study. The information for each mapped tag, including the mapping location, Church, 2002; Fahey et al., 2002, Lehner et al., 2002; Yelin annotation and transcripts of mRNA and EST, was incorporated into a et al., 2003; Chen et al., 2004; Katayama et al., 2005; Siddiqui MySQL database. The uniquely mapped SAGE tags were searched against et al., 2005; Riken Group, 2005). mRNA and EST sequences in the mapped genomic loci. The orientation of Many lower abundant transcripts in the genomes analyzed so far mRNA and EST in the genome is based on the mapping information in have not been identified (Bishop et al., 1974; Chen et al., 2002; UCSC genome browser. If a tag matched to an exon or intron of a well- annotated mRNA sequence, but for the opposite direction, the tag was ÃTo whom correspondence should be addressed. classified as an antisense tag; if a tag matched to an EST aligning to the

The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 2475 X.Ge et al.

well-annotated mRNA but in the opposite orientation, the tag was classified by the RIKEN Group (2005) was downloaded from the RIKEN website as the known antisense tag. A tag matching in the opposite direction of an (http://fantom31p.gsc.riken.jp/s_as/). Conversion of the cDNA clone IDs mRNA sequence but without matching to a known antisense EST was and mRNA IDs led to 3166 RefSeq transcript IDs. The overlap of this defined as the novel antisense tag representing a novel antisense transcript list with our list of 8459 genes was calculated. detected only by the SAGE tag.

2.2 Identification of sense and antisense pairs for the known mRNAs 3 RESULTS The presence of the same UniGene cluster indicated the presence of sense– 3.1 Identification of SAGE tags representing antisense pairs. A UniGene ID for each mRNA sequence matched by anti- antisense transcripts sense tag was collected from the UniGene database (UniGene release 185). The 3.6 million copies of LongSAGE tags were used for the study. A UniGene ID for each antisense tag was collected from the SAGEmap These tags were collected from 29 LongSAGE libraries containing database (Lal et al., 1999). The version used for this analysis was based on the UniGene release 185. Only the SAGE tag that linked to a single UniGene 632 813 unique SAGE tags (Supplementary Table 5). From these cluster was used for the analysis. The copy numbers of the sense tags and unique SAGE tags, 45 321 tags were identified to match the anti- antisense tags were used for quantitative comparisons. sense strand of the mRNA sequences. Figure 1 shows a typical example of the sense–antisense tags detected for the gene ‘WD 2.3 Confirmation of antisense transcripts detected repeat domain 48’. Of these mappings, 48% of tags mapped to by antisense tags exons and 52% mapped to introns of the mRNA sequences, indi- cating no obvious preference of antisense tags for the exon or intron Fetal brain total RNA (Clontech) was treated with RNAse-free DNase I to of the mapped genes. For the 45 321 antisense tags, 20 316 tags eliminate genomic DNA contamination (Biolabs, USA). The antisense tag matched mRNA sequences were used for primer design, and strand-specific matched ESTs that map to the mRNAs but in the opposite orienta- RT–PCR was used to detect the sense or antisense transcripts (Lee et al., tion. These tags represent the antisense transcripts already detected 1999). Primers were designed with the Primer3 software (Rozen and by the ESTs. The remaining 25 005 tags represent novel antisense Skaletsky, 2000) to amplify 100 bp sequences spanning the LongSAGE transcripts for the mRNA sequences. There were 277 642 mRNAs tag. To detect the sense transcripts, the antisense primer was used for reverse matched by the 45 321 antisense tags. These ‘sense’ transcripts are transcription; to detect the antisense transcripts, the sense primer was used distributed in 13 207 UniGene clusters, of which 8 036 clusters for reverse transcription. Reverse transcriptions were performed using contain the antisense ESTs and 5171 clusters contain the novel Superscript Reverse Transcriptase (Invitrogen). Upon reverse transcription, antisense SAGE tags (Table 1, Supplementary Tables 1 and 2). the corresponding antisense or sense primers were added to the reaction for When matched to RefSeq database, the 45 321 tags are found to PCR amplification. A positive control set for each reaction used the oligo dT map as antisense to 9804 sequences; 6606 of these contain antisense priming to generate cDNA and sense–antisense primers for PCR amplifica- tion. A negative control for each reaction was the same as the positive control ESTs and 3198 are mapped only by SAGE tags. except no oligo dT and sense–antisense primers were included for the reac- tions. PCR products were visualized on 2% agarose gels. 3.2 Expression levels of antisense transcripts are lower than that of sense transcripts 2.4 Construction of antisense SAGE tag database In a given SAGE library, each SAGE tag has a defined copy Sequences in the RefSeq database were downloaded (Pruitt and Tatusova, number representing the expression levels of the corresponding 2005; http://www.ncbi.nlm.nih.gov/RefSeq/). Each sequence was converted transcripts. Comparison of the copy number between the sense to its virtual antisense transcript by reverse complementary. A 21 bp tag including CATG was extracted after each CATG site in the virtual antisense tags and the antisense tags in the same SAGE library provides a sequence. The virtual tags were matched to known transcripts in the SAGE- quantitative measure of the expression levels of the sense transcripts map database to determine whether the tags matched known EST or whether and antisense transcripts. The fetal brain LongSAGE library was the tag was a novel antisense tag. The virtual tags were also mapped to the used for the analysis (GSM31935, http://www.ncbi.nlm.nih.gov/ genome sequences to eliminate the tags mapped to more than one location projects/geo/). This library has 2259 UniGene clusters that contain in the genome. These virtual antisense tags were then used to form a virtual 7132 ‘sense’ SAGE tags with 55 824 copies that match mRNA antisense SAGE tag database to represent the antisense transcripts corre- sequences in the same orientation, with an average of 3.2 copies sponding to each sequence included in the RefSeq database. The database per UniGene cluster or 7.8 copies per tag; in comparison, there are provides related information for each virtual antisense tag, including the 2929 ‘antisense’ SAGE tags with 4280 copies that match mRNA GenBank ID and the and genome sequence location for each sequences in the opposite orientation with an average of 1.5 copies mRNA sequence. per UniGene cluster or 1.5 copies per tag. The antisense tag accounts for 41% of total unique tags, but only contributes 8% 2.5 Homologous antisense pairs in human and mouse of the total copies (Table 2, Supplementary Table 3). This indicates The HomoloGene database was downloaded from NCBI (ftp://ftp.ncbi.nih. that the antisense transcripts are expressed at levels lower than the gov/pub/HomoloGene/, October 19, 2005 version). This database was sense transcripts. Further comparison of the copy numbers between searched for mouse homolog genes of our 9804 RefSeq mRNAs that are the antisense tags with matches to ESTs and novel antisense tags the target of antisense SAGE tags according to our analysis. This led to 8459 mouse genes identified by RefSeq IDs. The 1260 mouse antisense genes shows that the antisense tags with matches to ESTs have 1.8 copies identified by Wahl et al. (2005), given in ENSEMBL gene IDs, were con- per tag whereas the novel tags have 1.2 copies per tag. This dif- verted to 834 RefSeq IDs. These IDs were then searched against the 8459 ference reflects the fact that the novel antisense tags represent the genes whose human homologs were found to be targets of antisense tags in lower abundant antisense transcripts not detected by ESTs but our analysis. Similarly, the list of 9713 mouse antisense transcripts identified detected by SAGE.

2476 Antisense transcripts revealed by LongSAGE tags

Fig. 1. Example of sense and antisense SAGE tags detected for the same gene. In a fetal brain LongSAGE library, three sense tags and one antisense tag detected the WDR48 (WD repeat domain 48) gene. All tags matched to the 30-UTR region of this gene. The antisense tag has only a single copy comparing others with multiple copies. Fig. 2. Experimental confirmation of antisense transcripts. A total of 30 antisense transcripts were used for the confirmation. In addition, ACO2 Table 1. Classification of known and novel antisense SAGE tags gene (BC014092) was included as a positive control and CD74 as a negative control, based on a previous study (Chen et al., 2004). Each confirmation used four conditions for reverse transcription, including (1) antisense primer for Classification Numbers (%) detecting sense transcript, (2) sense primer for detecting antisense transcript, (3) oligo dT primer as positive control and (4) no primer as negative control. Total tag counts 3 618 072 After reverse transcription, the corresponding antisense and sense primers Total number of unique SAGE tags 632 813 (100) were added in each reaction for PCR amplification. SAGE tags not mapped to the human genome 407 946 (64.5) SAGE tags mapped to the human genome 224 867 (36.5) (unmasked sequence) Table 3. Homolog antisense genes in human and mouse SAGE tags mapped uniquely to the human genome 186 323 (29.4) Antisense SAGE tags matched to known mRNAs 45 321 (7.2) Included number of UniGene clusters 13 207 Items Numbers Antisense SAGE tags matched to ESTs 20 316 (3.2) Included number of UniGene clusters 8036 SAGE tag detected human genes with antisense 9804 Novel antisense SAGE tags 25 005 (4.0) Mouse homolog to human genes with antisense 8459 Included number of UniGene clusters 5171 Mouse antisene genes detected by Wahl et al 834 Included in mouse homolog gene 517 Mouse antisene genes detected by RIKEN 3166 Included in mouse homolog gene 1588 Table 2. Comparison of copy numbers between sense and antisense tagsa

transcription, as compared with amplification in all positive controls Classification Copies (%) Number of Copies per Copies unique tags (%) UniGene Cluster per tag that used cDNA from oligo dT priming. Another factor could be that the RNA sample used for RT–PCR amplification is different Sense tags 55 824 (92) 7132 (59) 3.2 7.8 from the RNA samples used for long SAGE tag collection. Antisense tags 4280 (8) 2929 (41) 1.5 1.5 ESTs matched 2442 1381 0.6 1.8 3.4 Homologous antisense genes in human and mouse Novel 1838 1548 0.7 1.2 Two previous studies identified mouse genes with antisense tran- Total 60 104 (100) 10 061 (100) 4.6 3.6 scripts (Wahl et al., 2005; Riken Group, 2005). We performed analysis to determine how many of these genes are homologous aThe 2259 UniGene clusters in fetal brain longSAGE library contain both sense and to our list of 9804 human transcripts that are found to be target of antisense tags in each cluster and were used for this analysis. antisense SAGE tags. We found that 517 (62%) of the 834 genes identified by Wahl et al. (2005) and 1588 (50%) of the 3166 genes identified by RIKEN are homologous to genes in our list (Table 3. 3.3 Experimental confirmation of antisense Supplementary Tables 1 and 2). Many functionally important genes transcripts detected by antisense tags were identified in the homolog gene list, such as the breast cancer 2, Thirty antisense tags were selected for experimental confirmation, early onset (BRCA2) gene. Germ line mutations in BRCA2 have including 20 novel antisense tags and 10 antisense tags matched been linked to an elevated risk of young onset breast cancer (Woos- to antisense ESTs. The fetal brain LongSAGE library collected ter et al., 1994). RIKEN cDNA data include two antisense tran- >300 000 copies of SAGE tags. The novel antisense tags were scripts (AK133262 and Ak138160) targeting the 20th intron and last selected from those in this library with 2–4 copies to represent exon, respectively. LongSAGE tags are identified as antisense tags the low abundant antisense transcripts. For the novel antisense (CTGACTTAAGGATGAAG and CACACACACATTGATTC) tags, 15 out of 20 were detected; for the antisense tags with EST mapped to the same regions (Fig. 3). Conservation of antisense matches, all were detected (Figure 2, Supplementary Table 4). The transcripts to the BRCA2 gene targeting the same region suggests negative detection for some reactions in both types of candidate tags that these antisense transcripts might play roles in regulating likely reflects the low-efficiency of specific primers for reverse BRCA2 gene function.

2477 X.Ge et al.

Fig. 3. Conservation of antisense transcripts in human and mouse BRCA2 gene. Antisense transcripts were identified for the 20th intron and the last exon both in the human and mouse BRCA2 gene. A. The location of two antisense SAGE tags (Tag 1 and Tag 2) in the human BRCA2 gene. One antisense EST sequence (BG562419) overlaps with Tag 2. B. Two RIKEN full-length cDNA sequences, AK133262 and AK138160, are antisense to the mouse BRCA2 gene.

3.5 Database for annotating antisense tags long genes that are transcribed from opposite strands with small In a typical SAGE library, a portion of SAGE tags may be derived overlapping in their untranslated regions (UTRs). In our analysis, a from antisense transcripts. To facilitate the annotation of the SAGE tag is classified as an ordinary sense tag if it matches the sense strand tags representing antisense transcripts, an antisense tag database of any mRNA. So our list of antisense tags should not include tags was constructed that contains 185 017 virtual SAGE tags matched that mapped to such regions. Nonetheless, we cannot rule out the to antisense of well-annotated mRNA sequences included in the possibility that some antisense tags could map just outside such RefSeq database. Each virtual tag has a single mapping location in regions. Therefore, experimental verification will be needed to con- the human genome. An experimental SAGE tag that matches a firm the existence of each antisense tag. However, considering the virtual tag in the database will indicate that this tag represents consistency with the existing antisense human transcripts, homo- an antisense transcript to a sequence in the Refseq database. The logy to mouse genes with antisense transcripts and experimental related information for the identified transcripts is also included in detection of antisense transcripts for the candidate tags, it is likely the database, including the chromosome origin and sequence loca- that most of the antisense transcripts detected by SAGE tags exist. tion. Testing the database with the 45 321 antisense tags identified in As shown in Table 1, a significant portion of total unique tags this study shows that 46% of the antisense tags are included in this (about two-thirds of 632 813) could not be mapped to the genome. database, including 31% of the 25 005 novel antisense tags and Further investigation shows that most of these non-mapped tags were 64% of the 20 316 antisense tags with EST matches. Table 4 shows a detected only once (singleton). For example, 64% of total unique tags group of representative experimental SAGE tags identified as the in the fetal brain library are singleton, of which only 30% can be antisense tags by this database. The full database is included in the mapped to the genome. In contrast, 80.5% non-singleton tags can Supplementary Table 6. be mapped to the genome. One of the causes could be related with DNA sequencing error. SAGE tags are collected by single- pass DNA sequencing reaction that is prone to errors. With the same 4 DISCUSSION error rate per , Long SAGE tags will suffer a higher error Taking advantage of the high sensitivity of SAGE for transcript rate than conventional 10 bp SAGE tags owing to its longer detection and the high specificity of LongSAGE tags mapping to the sequences. The proportion of erroneous tags further increases genome, our study identified 45 321 antisense tags for 9804 when multiple SAGE libraries are pooled together, as multiple mRNAs. Comparison with those identified in previous studies genuine tags representing the same transcripts in different libraries shows that 96 (67%) of the 144 pairs identified by Shendure and will overlap while the erroneous tags do not. In addition, the origins Church (2002), 1931 (72%) of the 2667 identified by Yelin et al. of the reference human genome sequences and the SAGE tags were (2003) and 2248 (82%) of the 2736 identified by Chen et al. (2004) from different individual genomes. The differences of individual are included in our list. The number identified in this study is twice genomes, such as structural variation and SNPs, in human popula- the number of antisense transcripts identified by previous studies, tion could also affect the mapping result (Feuk et al., 2006). indicating that antisense is far more prevalent in the human genome Most genes in the genome are expressed at low-abundant levels. than previously considered. This could have significant implications with regard to the antisense Considering the complexity of SAGE library construction, some transcripts because, as shown in our study, the antisense tags are of the antisense tags might be owing to artifacts, and short 21 bp tags largely present at low copy levels. Similar observation was also are less specific than longer sequences such as EST and cDNA made in mouse antisense SAGE tags (Wahl et al., 2005). Because of sequences. Situation may also exist that a LongSAGE tag named its high sensitivity, SAGE detects more low-abundant transcripts, as novel may not be real novel, as it could map to a different location and therefore more antisense transcripts. SAGE data provide a of a gene that already have antisense EST evidence. Currently, there useful resource for antisense transcript identification. With the is no consensus in defining the sense/antisense pairs. Some previ- increasing collection of LongSAGE tags, more potential antisense ously identified sense/antisense pairs, such as BC005542 and transcripts might be identified to provide more comprehensive cov- BC062929 identified by Katayama et al. (2005), consist of two erage of antisense transcripts expressed in the genome. Of note, the

2478 Antisense transcripts revealed by LongSAGE tags

Table 4. Examples of the virtual antisense SAGE tag database

SAGE tag RefSeq UniGene LocusLink Symbol Gene name

GCCAGTCCCTGGTTCTT NM_178816 Hs.89387 255082 CASC2 Cancer susceptibility candidate 2 GCCAGTCCGTTGGCCAA NM_003051 Hs.75231 6566 SLC16A1 AKR7 family pseudogene GCCAGTCCTTGACTGCG NM_032608 Hs.417959 84700 MYO18B Myosin XVIIIB GCCAGTCCTTTCTTCTT NM_001159 Hs.406238 316 AOX1 Aldehyde oxidase 1 GCCAGTCGGTCCCGGCA NM_153609 Hs.370885 164656 TMPRSS6 Transmembrane protease, serine 6 GCCAGTCTACACTCACT NM_005909 Hs.335079 4131 MAP1B Microtubule-associated 1B GCCAGTCTCTGCAGCTC NM_012102 Hs.463041 473 RERE Arginine-glutamic acid dipeptide (RE) repeats GCCAGTCTGCGTCCGTG NM_001003682 Hs.531492 399474 CDNA DKFZp434C184 gene GCCAGTCTGGAGAAAGG NM_152268 Hs.380169 25973 Prolyl-tRNA synthetase (mitochondrial) GCCAGTGAAATCTCCCT NM_005650 Hs.475018 6942 TCF20 Transcription factor 20 (AR1) GCCAGTGACATGCTCCA NM_138434 Hs.284135 113763 C7orf29 Chromosome 7 open reading frame 29 GCCAGTGACTCTTTTCC NM_175736 Hs.179838 91010 FMNL3 Formin-like 3 GCCAGTGACTTGGCTGC NM_000676 Hs.167046 136 ADORA2B Adenosine A2b receptor GCCAGTGAGATTAGAGC XM_166213 Hs.558464 23220 DTX4 Deltex 4 homolog (Drosophila) GCCAGTGAGGGACTGCC NM_003273 Hs.31130 7108 TM7SF2 Transmembrane 7 superfamily member 2 GCCAGTGAGGGTAAGGG NM_002205 Hs.505654 3678 ITGA5 Integrin, alpha 5 GCCAGTGAGTCCATCGT NM_017802 Hs.521328 54919 FLJ20397 Hypothetical protein FLJ20397 GCCAGTGCAAGACTCCA NM_173557 Hs.465316 220441 RNF152 Ring finger protein 152 GCCAGTGCCCAGAGCAG XM_371210 Hs.307407 83756 TAS1R3 Taste receptor, type 1, member 3 GCCAGTGCCCGGAGGCA NM_172362 Hs.527656 3756 KCNH1 Potassium voltage-gated channel, subfamily H (eag-related), member 1

quantitative differences among different transcripts vary over six Feuk,L. et al. (2006) Structural variation in the human genome. Nat. Rev. Genet., 7, 85–97. orders (Holland, 2002), and SAGE cannot detect all transcripts. Holland,M.J. (2002) Transcript abundance in yeast varies over six orders of magnitude. Many low copy SAGE tags are clearly not the collection of sequenc- J. Biol. Chem., 277, 14363–14366. Katayama,S. et al. (2005) Antisense transcriptn in mammalian transcriptome. Science, ing errors but represent the true transcripts. With the improvement 309, 1564–1566. of sequencing technologies, more low-abundant transcripts would Kent,J. (2002) BLAT--the BLAST-like alignment tool. Genome Res., 12, 656–664. be detected, and more potential antisense transcripts could be Lal,A. et al. (1999) A public database for gene expression in human cancers. Cancer detected. The identification of antisense SAGE tags provides evi- Res., 59, 5403–5407. dence for the presence of antisense transcripts. Using a combination Lehner,B. et al. (2002) Antisense transcripts in the human genome. Trends. Genet., 18, 63–65. of experimental and bioinformatics approaches, we need to identify Lee,J.T. et al. (1999) Tsix, a gene antisense to Xist at the X-inactivation centre. Nat. the original antisense transcripts in the form of 30 ESTs or full- Genet., 21, 400–404. length cDNAs for genome annotation and functional study. Lee,S. et al. (2005) Detecting novel low-abundant transcripts in Drosophila. RNA, 11, 939–946. Novina,C.D. and Sharp,P.A. (2004) The RNAi revolution. Nature, 430, 161–164. Pruitt,K.D. et al. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant ACKNOWLEDGEMENTS sequence database of genomes, transcripts and . Nucleic Acids Res., 33, The authors appreciate the availability of the LongSAGE data depos- D501–D504. ited in the Gene Expression Omnibus for this study. The authors thank Quere,R. et al. (2004) Mining SAGE data allows large-scale, sensitive screening of antisense transcript expression. Nucleic Acids Res., 32, e163. Yeong Cheol Kim and Wendy S. Rubinstein for discussions, and Riken Genome Exploration Research Group and Genome Science Group and the W. James Kent for suggestions on the usage of BLAT program. FANTOM Consortium. (2005) Antisense transcription in the mammalian transcrip- This study was supported by National Institute of Health, US tome. Science, 309, 1564–1556. Department of Defense, Daniel F. and Ada L. Rice Foundation, and Rozen,S. and Skaletsky,H. (2000) Primer3 on the www for general users and for Evanston Northwestern Healthcare Breast/Ovarian Research Program. biologist programmers. Methods Mol. Biol., 132, 365–386. Saha,S. et al. (2002) Using the transcriptome to annotate the genome. Nat. Biotechnol., Conflict of Interest: none declared. 20, 508–512. Shendure,J. and Church,G.M. (2002) Computational discovery of sense-antisense tran- scription in the human and mouse genomes. Genome Biol., 3, RESEARCH0044. REFERENCES Siddiqui,A.S. et al. (2005) A mouse atlas of gene expression: Large-scale digital gene- expression profiles from precisely defined developing C57BL/6J mouse tissues and Akmaev,V.R. and Wang,C.J. (2004) Correction of sequence-based artifacts in serial cells. Proc. Natl. Acad. Sci. USA, 102, 18485–18490. analysis of gene expression. Bioinformatics, 20, 1254–1263. Simons,R.W. and Kleckner,N. (1988) Biological regulation by antisense RNA in Bishop,J.O. et al. (1974) Three abundance classes in HeLa cell messenger RNA. prokaryotes. Annu. Rev. Genet., 22, 567–600. Nature, 250, 199–204. Velculescu,V.E. et al. (1995) Serial analysis of gene expression. Science, 270, 484–487. Chen,J. et al. (2002) Identifying novel transcripts and novel genes in the human genome Wahl,M.B. et al. (2005) LongSAGE analysis revealed the presence of a large number by using novel SAGE tags. Proc. Natl Acad. Sci. USA, 99, 12257–12262. of novel antisense genes in the mouse genome. Bioinformatics, 21, 1389–1392. Chen,J. et al. (2004) Over 20% of human transcripts might form sense-antisense pairs. Wooster,R. et al. (1994) Localization of a breast cancer susceptibility gene, BRCA2, to Nucleic Acids Res., 32, 4812–4820. chromosome 13q12–13. Science, 265, 2088–2090. Fahey,M.E. et al. (2002) Overlapping antisense in the human genome. Comput. Funct. Yelin,R. et al. (2003) Widespread occurrence of antisense transcription in the human Genomics, 3, 244–253. genome. Nat. Biotechnol., 21, 379–386.

2479