Hindawi Publishing Corporation Molecular Biology International Volume 2012, Article ID 793506, 29 pages doi:10.1155/2012/793506

Research Article A Prevalence of Imprinted within the Total Transcriptomes of Tissues and Cells

Sergey V. Anisimov1, 2

1 Research Unit of Cellular and Genetic Engineering, V. A. Almazov Federal Center for Heart, Blood & Endocrinology, Akkuratova Street 2, Saint-Petersburg 197341, Russia 2 Department of Intracellular Signaling and Transport, Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Prosp. 4, Saint-Petersburg 194064, Russia

Correspondence should be addressed to Sergey V. Anisimov, [email protected]

Received 14 March 2012; Revised 23 June 2012; Accepted 28 June 2012

Academic Editor: Alessandro Desideri

Copyright © 2012 Sergey V. Anisimov. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Genomic imprinting is an epigenetic phenomenon that causes a differential expression of paternally and maternally inherited alleles of a subset of genes (the so-called imprinted genes). Imprinted genes are distributed throughout the genome and it is predicted that about 1% of the human genes may be imprinted. It is recognized that the allelic expression of imprinted genes varies between tissues and developmental stages. The current study represents the first attempt to estimate a prevalence of imprinted genes within the total human transcriptome. In silico analysis of the normalized expression profiles of a comprehensive panel of 173 established and candidate human imprinted genes was performed, in 492 publicly available SAGE libraries. The latter represent human cell and tissue samples in a variety of physiological and pathological conditions. Variations in the prevalence of imprinted genes within the total transcriptomes (ranging from 0.08% to 4.36%) and expression profiles of the individual imprinted genes are assessed. This paper thus provides a useful reference on the size of the imprinted transcriptome and expression of the individual imprinted genes.

1. Introduction vary depending on the developmental stage [10]. The goal of the study was to estimate a prevalence of imprinted Genomic imprinting is an epigenetic phenomenon that genes within the total human transcriptome, in cell and ff causes a di erential expression of paternally and maternally tissue samples in a variety of physiological and pathological inherited alleles of a minor subset of genes (the so-called conditions. imprinted genes). Genomic imprinting was first discovered Serial analysis of expression (SAGE) is a sequence- in 1984 [1, 2], and in 1991 the first imprinted genes based technique to study mRNA transcripts quantitatively (IGF2, paternally expressed; IGF2R and H19, maternally in cell populations [11]. Two major principles underline expressed) were identified in the mouse [3–5]. Since then, SAGE: first, short (10 bp) expressed sequenced tags (ESTs) the imprinting status was confirmed for numerous genes in are sufficient to identify individual gene products, and Homo sapiens and Mus musculus genomes, less for Bos taurus, second, multiple tags can be concatenated and identified Rattus norvegicus, Sus scrofa, Canis lupus familiaris, and by sequence analysis. SAGE results are reported in either Ovis aries; many more genes are considered candidates [6]. absolute or relative numbers of tags, which permits direct Functional significance of the genomic imprinting is not yet comparisons between tag catalogues and datasets [12–15]. fully understood [7–9], while alterations in the expression of Numerous technical adaptations assured a development of imprinted genes are linked to certain pathologies, including similar techniques [16], yet SAGE remains an important Angelman syndrome, Prader-Willi syndrome, and particular tool of modern molecular biology. It is widely used in a cancer subtypes. Genomic imprinting varies between species number of applications, of which a molecular dissection and tissues. Furthermore, it is a dynamic process and may of cancer genome is the major [17]. In the current study, 2 Molecular Biology International expression of established and candidate imprinted genes was values) were merged using MS Excel software. Calculations evaluated in a wide array of cell and tissue samples using of maximal and average expression of transcripts matching a comprehensive set of currently available SAGE data for established and candidate imprinted genes were performed Homo sapiens. Five hundred eighty-one SAGE catalogues using normalized tpm values. Particular values could be based on the libraries generated with most commonly used recalculated to the fraction of the total gene expression by NlaIII anchoring enzyme were screened using a conservative dividing tpm value by 1,000,000. set of criteria, and in 492 of these (accounting for nearly 36 million SAGE tags) gene expression profiles of the imprinted 2.5. Clustering Analysis. Clustering analysis was performed genes were analyzed, using a proved algorithm [18]. It was using EPCLUST Expression Profile data CLUSTering and therefore possible to estimate a prevalence of imprinted genes analysis software (http://www.bioinf.ebc.ee/EP/EP/EPCL- within the total human transcriptome. UST/). K-mean clustering analysis was performed after transposing the data matrix with initial clusters chosen by 2. Methods most distant (average) transcripts. For each dataset, the number of clusters was set to the lowest value yielding 2.1. Imprinted Gene Subsets. Established and candidate one cluster containing a solitary database entry. Hierarchi- imprinted gene subset was assembled based on the Geneim- cal clustering was performed using correlation measure- print resource (http://www.geneimprint.com/; credits to R.L. based distance/average linkage (average distance) clustering Jirtle) and Luedi et al. study [6]. Of the latter, high- method; hierarchical trees were built for individual datasets. confidence imprinted human gene candidates predicted to be imprinted by both the linear and RBF kernel classifiers learned by Equbits Foresight and by SMLR ([6], supple- 3. Results mentary data) were utilized. Redundant entries have been excluded. Established and candidate human imprinted gene subset (203 entries total) was assembled based on the Geneim- print resource and Luedi et al. study data [6]. Of the 2.2. SAGE. SAGE technology is based on isolation of short candidate imprinted genes identified in the latter, high- tags form the appropriate position within the mRNA confidence gene candidates (predicted via Equbits Foresight molecule, followed by the concatemerization of the tags, and SMLR means [6]) were selected. Following exclusion sequencing, tag extraction and gene annotation [11]. of the redundant entries, appropriate short (10 bp) SAGE The complete set of publicly available SAGE libraries tags matching NlaIII anchoring enzyme were annotated (GPL4 dataset, NlaIII anchoring enzyme) was downloaded to gene targets using CGAP (Cancer Genome Anatomy from the Gene Expression Omnibus (GEO) database Project, NCI, NIH) SAGE Anatomic Viewer (SAV) applet (National Center of Biotechnology Information (NCBI); or manually, as described earlier [18]. For a number of http://www.ncbi.nlm.nih.gov/geo/). Following an exclusion the candidate imprinted genes, a complete sequence was of the duplicate entries, SAGE libraries were annotated and unavailable via GenBank or alternative databases (e.g., sorted based on the number of tags sequenced. Noninfor- GenBank ID: NM 016158, NM 024547, NM 181648, etc.), mative (A) sequences were extracted from SAGE libraries 10 for that reason, a volume of the human imprinted gene when detected, and tags per million (tpm) values were subset subjected to tag annotation was reduced to 174 genes. recalculated accordingly for all libraries as the transcript’s Of these genes, candidate imprinted gene Q9NYI9 (PPARL; raw tag count divided by the number of reliable tags in GenBank ID: AF242527) could not be annotated with SAGE the library and multiplied by 1,000,000. SAGE libraries, tag, missing NlaIII anchoring enzyme recognition sites constructed by Potapova et al. [19],wereasubjecttoa“clean- completely. Therefore, a total of 173 genes (including 53 up” procedure through which all clones containing ≤4tags established imprinted genes and 120 candidate imprinted were excluded [20], with the remaining tags constituting the genes) were annotated with the appropriate SAGE tags pool of “reliable tags.” (Table 1) and subjected to further analysis. The complete set of publicly available human SAGE 2.3. SAGE Tag Annotation. Established and candidate im- catalogues was downloaded from the Gene Expression printed gene subset has matched CGAP (Cancer Genome Omnibus (GEO, NCBI) database. Acquired SAGE catalogues Anatomy Project, NCI, NIH) SAGE Anatomic Viewer (SAV) represent 581 SAGE libraries generated from a wide spec- applet [17]. For genes not matching SAV applet entries, and trum of cell and tissue samples in a variety of physiological when unreliable/internal tags were suggested by SAV applet  and pathological conditions. Following an exclusion of the (viz., for TIGD1, HOXA3, NTRI genes, etc.), reliable 3 end numerous duplicate GEO database entries (e.g., GSM785 tags were extracted from full-length sequences available via = GSM383907; GSM1515 = GSM383958; GSM85612 = GenBank (NCBI, NIH). GSM125353, etc.), the criteria listed below were applied when selecting libraries for the analysis of gene expression. 2.4. Expression Profiling. SAGE tags was matched the indi- SAGE libraries were selected only if they have represented vidual SAGE catalogues using MS Access software package (i) genetically unaltered/unmodified samples, (ii) SAGE cat- Query function. Individual queries (both absolute tag abun- alogues with a total number of tags ≥20,000, and (iii) a com- dance per library and normalized tag per million (tpm) plete dataset available. For example, samples GSM383929 Molecular Biology International 3 001720 175055 001098623 001001959 001080433 022437 019610 002155 004421 001127215 001014980 001126240 017818 178545 022114 002617 000147 012476 GenBank accession number BC005362 NM BC017300 NM NM NM NM c d b # R ∗ ∗ ∗ ∗ ∗ ∗ TTTT NM tag Notes TTTTTCAA NlaIII Expressed allele Status a 1p13.3 Candidate Paternal TTGGAGATCT BC105295 1q23.3 Candidate Maternal TATGAA 1p22.2 Candidate Maternal GCAGATTTAT NM 2p21 Candidate Maternal GGCTCCAAAA NM ZNF163 1p22.1 Candidate Paternal TGTACCATAG NM DVL 1p36.33 Candidate Maternal GCCCGCAGGG NM CCDC85A, KIAA1912 2p16.1 Candidate Paternal GCAGATATTC FAM132A 1p36.33 Candidate Maternal GTTTCCAGGC NM KIAA1556, KIAA1639OR11L1 1q42.13 Candidate Paternal CTGAGCGCCG 1q44 Candidate Paternal AGAAGGAAAT ARHI, NOEY2 1p31 Imprinted Paternal CAGAAAAAAA PEZ 1q32.3 Candidate Maternal AC 1: SAGE tag annotation for established and candidate imprinted gene subset. Table ) )  Drosophila NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 4, 9 kDa Growth-factor-independent 1 transcription repressor Dishevelled, dsh homolog 1 ( Coiled-coil domain containing 85A Family with sequence similarity 132, member A RNA-binding motif , X-linked-like 1 (RBMXL1), transcript variant 2 Heat shock 70 kDa protein 6 (HSP70B Obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF Olfactory receptor, family 11, subfamily L, member 1 ATP-binding cassette, subfamily G(WHITE),member8 DIRAS family, GTP-binding RAS-like 3 Protein tyrosine phosphatase, nonreceptor type 14 Gene symbol Gene name Aliases Location N 1 NDUFA4 2GFI1 8 PRDM16 PR domain containing 16 MEL1 1p36.32 Candidate Paternal AGATTGATAT NM 11 DVL1 5 BMP8B Bone morphogenetic protein 8b OP2, BMP8, MGC131757 1p35–p32 Candidate Paternal AGCAAAACTG 16 HIST3H2BB Histone cluster 3, H2bb 1q42.13 Candidate Maternal AACTCCTTCG 2021 OTX1 Q96PX6 Orthodenticle homeobox 1 2p15 Candidate Maternal GCGGTTCCAG BC007621 12 Q5EBL5 3 NM019610 13 TMEM52 Transmembrane protein 52 1p36.33 Candidate Paternal TTACACCGGC NM 14 HSPA6 910 PEX10 WDR8 Peroxisomal biogenesis factor 10 WD repeat domain 8 1p36.32 Candidate Maternal 1p36.32 GGAGGCGGCG Candidate Maternal TCGGTGCAGG NM NM 6 FUCA1 Fucosidase, alpha-L- 1, tissue FUCA17 1p36.11 OBSCN 18 Candidate19 Paternal Q8NGX0 CTATTTAGTT VAX222 Ventral anterior homeobox 2 ABCG8 NM DRES93 2p13.3 Candidate Maternal GGCGATGGGG NM 4DIRAS3 15 PTPN14 7 TP73 TP73 P73 1p36.3 Imprinted Maternal TGGTACCGCC NM 4 Molecular Biology International 000104 198507 005524 004934 001128325 153757 006887 003412 001004358 019602 012147 GenBank accession number NM CR456873 AF487338 AK122980 NM ∗ ∗ ∗ ∗ ∗ TTTTT TTTTA tag Notes NlaIII Expressed allele Status a 4q13.2 Candidate Paternal TTACTGGCCC R BC030141 3q29 Candidate Paternal CACTATATTT NM 5q23.3 Candidate Maternal GTGGGAGTGG BC108724 5p14.3 Candidate Paternal ATCGAAACTG NM 1: Continued. Table ERF2, TIS11D 2p21 Candidate Maternal TAGAAAGGCA NM IGSF9, 644ETD8, Dasm1, Kiaa1355-hp, NRT1, Ncaml, mKIAA1355 DRLM 4q22.1 Imprinted Paternal TAGCTTTTAG NM EEYOREALDH1L1; DKFZp781N0997 3q21.3ZIC, ZNF201 Candidate 2q37.1 Maternal TCTGCATCTT Candidate 3q24 Paternal CGAAAAGCTT BC027241 R Candidate Maternal ATAATAGTGG BC063500 FAM174A NM 5q21.1 Candidate Paternal ACCCAGCGGG P4501B1 2p22.2HHL, HRY, HES-1, bHLHb39, FLJ20408 Candidate Paternal AATGC FHFR, FGFR5 4p16.3 Candidate Maternal AAAGTGCATC NM GMCSF, MGC131935, MGC138897 SS2, BTLII, HSBLMHC1 6p21.32 Candidate Maternal GAAGGAAAGA NM FLJ16731, ADAMTS16s 5p15.32 Candidate Maternal TACCCCTGAA DIL1 4p16.3 Candidate Paternal TTATGGATCT NM CDH14, CDH24, CDH14L, EY-CADHERIN ) ) Drosophila Drosophila Zinc finger protein 36, C3H type-like 2 Immunoglobulin superfamily, member 9 Nucleosome assembly protein 1-like 5 Tigger transposable element derived 1 10-Formyltetrahydrofolate dehydrogenase Zic family member 1 (odd-paired homolog, Family with sequence similarity 174, member A Cytochrome P450, family 1, subfamily B, polypeptide 1 Hairy and enhancer of split 1, ( Fibroblast growth factor receptor-like 1 Colony-stimulating factor 2 (granulocyte-macrophage) Butyrophilin-like 2 (MHC class II associated) ADAM metallopeptidase with thrombospondin type 1 motif, 16 Spondin 2, extracellular matrix protein Gene symbol Gene name Aliases Location N 23 ZFP36L2 33 Q9NYJ6 34 NAP1L5 2627 TIGD1 28 MYEOV229 FTHFD Myeloma overexpressed 2 ZIC1 2q37.338 Candidate Paternal Q8TBP5 CAGAC 35 DUX2 Double homeobox 2 4q35.2 Candidate Paternal AAGGGGTGGA NM 24 CYP1B1 30 HES1 31 FGFRL1 25 RPL22 Ribosomal protein L22 EAP 2q13 Candidate Paternal GATGCTGCCA 3940 CSF2 BTNL2 37 ADAMTS16 32 SPON2 36 CDH18 Cadherin 18, type 2 Molecular Biology International 5 001080951 000168 006735 000790 001001549 001040152 138409 001989 002141 032936 GenBank accession number NM NM NM ∗ ∗ ∗ tag Notes TTTTTT NlaIII TGCTTTGCTT NM TGGCTAAATG NM Expressed allele Isoform Dependent Isoform Dependent Status a 7p12–p11.2 Imprinted 7p14.1 Candidate Maternal TAAATACATT 6q24–q25 Imprinted Paternal ATCATAATGT 7q21 Imprinted Paternal GAAGTTATAA NM 1: Continued. Table X5L, D6S2654EMRAP2 6p25.2 Imprinted Paternal 6q14.2 CCTCAGTTTG Candidate Paternal GCAAGCTGTT BC001261 NM RSS, IRBP, MEG1, GRB-IR, Grb-10, KIAA0207 PHS, ACLS, GCPS, PAPA, PAPB, PAP-A, PAPA1, PPDIV NCRNA00020ZAC, LOT1, ZAC1, MGC126275, MGC126276, DKFZp781P1017 6q24.2 Imprinted Paternal TATATATTGA BC059359 Edr, HB-1, Mar2, MEF3L, Mart2, RGAG3, KIAA1051 AIP1, SSCAM, KIAA0705 7q21.11 Candidate Maternal TATTAATAGT BC150277 OCT2, MGC32628EMT, EMTH, OCT3 6q26AADC 6q26–q27 Imprinted Imprinted Maternal Maternal AAAATTATAA TGCGCTAATC 7p12.2 BC030978 AF078749 Imprinted Family with sequence similarity 50, member B 6 open-reading frame 117 Growth factor receptor-bound protein 10 Hydatidiform mole associated and imprinted (nonprotein coding) Pleiomorphic adenoma gene-like 1 Membrane-associated guanylate kinase, WW and PDZ domain containing 2 Solute carrier family 22 (organic cation transporter), member 2 Solute carrier family 22 (extraneuronal monoamine transporter), member 3 Dopa decarboxylase (aromatic L-amino acid decarboxylase) Gene symbol Gene name Aliases Location N 41 FAM50B 42 C6orf117 4950 GRB10 GLI3 GLI family zinc finger 3 5253 HOXA5 HOXA2 Homeobox A557 Homeobox A2 TMEM60 Transmembrane protein 60 HOX1C, HOX1.3, MGC9376 DC32, MGC74482, HOX1K C7orf35 7p15.2 7q11.23 Candidate Candidate Maternal Paternal AGCCTGTTTA AATCTATCCT 7p15.2 BC013682 Candidate Maternal NM CATA 43 HYMAI 44 PLAGL1 58 PEG10 Paternally expressed 10 59 MAGI2 45 SLC22A2 4647 SLC22A3 48 BRP44L DDC Brain protein 44-like51 EVX154 CGI-129, dJ68L15.35556 HOXA3 Even-skipped homeobox 1 HOXA11 HOXA4 6q27 Homeobox A3 Homeobox A11 Homeobox A4 Candidate Paternal CAGTGTATAT HOX1E, MGC10155 HOX1I HOX1D BC000810 7p15.2 7p15.2 Candidate Candidate Maternal 7p15.2 Paternal CTCTTCCTCG 7p15.2 ACGCCCGTGG R Candidate Maternal Candidate GAGATAGCCC BC015180 Maternal NM TGCTAAGAAT BC040948 NM 6 Molecular Biology International 001099400 003040 004745 002402 001015508 017650 006712 173688 005309 138693 006528 004382 GenBank accession number NM AF038190 BC052289 NM NM AK090707 ∗ ∗ ∗ ∗ ∗ ∗ tag Notes TTTTAAC NM NlaIII Expressed allele Status a 7q21.3 Imprinted Maternal GAAGAGACAA NM 8q24.3 Imprinted Maternal CCAGGCACTC 8p12 Candidate Paternal CTGAACAAAG NM 7q36.1 Candidate Maternal CCCCTCCCTC 7q32 Imprinted Paternal CTGAATGTAC NM 7q32 Imprinted Paternal GAGGGATGGC 8p23.1 Candidate Paternal CTAAGCGCAG AK094417 1: Continued. Table NRB1, NRBI, FLJ20068, KIAA1222, neurabin-I DAP2, SAPAP2 8p23 Imprinted Paternal CCCCAGCCCC PEG1-AS, NCRNA00040 7q32 Imprinted Paternal TGTAGTGGTG NR E48 8q24.3 Candidate Paternal GAGATAAATG BC031330 GPT1, AAT1, ALT1KT3.2, TASK3, K2p9.1, TASK-3, MGC138268, MGC138270 8q24.3 Candidate Maternal CCAAGTTCAC NM PURG-A, PURG-B, MGC119274 AE2, HKB3, BND3L, NBND3, EPB3L1 PEG1, MGC8703, MGC111102, DKFZp686L18234 CIT1, COPG2AS, FLJ41646, NCRNA00170, DKFZP761N09121 FASTNKAIN3, FLJ39630 7q36.1 Candidate 8q12.3 Maternal GGGGGTGGAT Candidate Paternal GTGCCCTACC NM NM ) Drosophila Protein phosphatase 1, regulatory (inhibitor) subunit 9A homolog-associated protein 2 MEST intronic transcript 1 (nonprotein coding) Lymphocyte antigen 6 complex, locus D Glutamic-pyruvate transaminase (alanine aminotransferase) Potassium channel, subfamily K, member 9 Purine-rich element binding protein G Solute carrier family 4, anion exchanger, member 2 (erythrocyte membrane protein band 3-like 1) Mesoderm-specific transcript homolog (mouse) COPG2 imprinted transcript 1 (nonprotein coding) Discs, large ( Fas-activated serine/threonine kinase FLJ37098 fis, clone BRACE2019004 Family with sequence similarity 77, member D Gene symbol Gene name Aliases Location N 60 PPP1R9A 6566 CPA4 MESTIT1 Carboxypeptidase A4 CPA3 7q3276 Imprinted Maternal LY6D TCTGTAAATC 74 GPT 75 KCNK9 70 PURG 68 SLC4A2 67 KLF14 Kruppel-like factor 14 BTEB5 7q32.3 Imprinted Maternal TGGACTCTGG NM 63 MEST 64 COPG2IT1 71 DLGAP2 62 TFPI2 Tissue factor pathway inhibitor 2 PP5, REF1, TFPI-2, FLJ21164 7q22 Imprinted Maternal TGC 69 FASTK 7273 Q8N9I4 FAM77D 61 SGCE Sarcoglycan, epsilon ESG, DYT11 7q21–q22 Imprinted Paternal TTGGCAGTAT Molecular Biology International 7 002316 177400 001163 182505 016215 014172 144654 001001670 152911 001113407 003252 GenBank accession number AK024328 NM BC003070 NM ∗ ∗ ∗ ∗ tag Notes NlaIII Expressed allele Status a 9q21.11 Candidate Paternal TGTCTCCTTC NM 9q33.3 Candidate Maternal GGAGCCCAGC 9q31.1 Imprinted Unknown ATGGGGAGAG 9q34.3 Candidate Paternal GCACAGGCCA NM 9q21.12 Candidate Paternal TAAAAATAAA NM 9q34.39q34.3 Candidate Maternal GCCTATGGTC Candidate Paternal GGAAAGATGC NM NM 10q26.11 Imprinted Paternal AGATTGAGGC NR 1: Continued. Table X11, D9S411E, MINT1, LIN10 NPS1, LMX1.2, MGC138325, MGC142051 bB137A17.3, RP13-137A17.3 10q26.3 Candidate Maternal AACAAAATTA BC044661 TGD, ABC1, CERP, ABC-1, HDLDT1, FLJ14958, MGC164864, MGC165011 ZNEU1, MGC111117, VE-STATIN, RP11-251M1.2 PAO, DKFZp434J245 10q26.3 Candidate Maternal GAGACTCTGT NM C9orf85, MGC61599, RP11-346E17.2 FLJ46321 9q21.32PHP14, CGI-202, HSPC141, bA216L13.10, DKFZp564M173, CandidateRP11-216L13.10 MaternalC9orf116, FLJ13945, CCCCACAGGAMGC29761, RP11-426A6.4 NM SAC2, hSAC2, MSTP007, MSTPO47, FLJ13081, KIAA0966, MGC59773, MGC131851 Amyloid beta (A4) precursor protein-binding, family A, member 1 LIM homeobox transcription factor 1, beta Chromosome 10 open-reading frame 93 ATP-binding cassette, subfamily A (ABC1), member 1 Polyamine oxidase (exo-N4-amino) open-reading frame 85 Family with sequence similarity 75, member D1 Chromosome 9 open-reading frame 116, transcript variant 2 Inositol polyphosphate-5-phosphatase F Gene symbol Gene name Aliases Location N 77 APBA1 81 LMX1B 8687 Q9H6Z8 LDB1 FLJ21625 fis, clone COL08015 LIM domain binding 1 CLIM2, NLI 10q23.31 10q24.32 Candidate Candidate Paternal Maternal GCAGCAGCCT TCCTGACCAC AK025278 NM 8990 C10orf93 NKX6-2 NK6 homeobox 2 NK, NKX6B 10q26.3 Candidate Maternal ACCGAGAGCC 80 ABCA1 82 EGFL7 EGF-like-domain, multiple 7 91 PAOX 7879 NM182505 FAM75D1 83 PHPT18485 Phosphohistidine NM144654 phosphatase 1 GATA3 GATA binding protein 3 HDR 10p14 Candidate Paternal AAGGATGCCA 88 INPP5F V2 8 Molecular Biology International 002555 003311 007183 003641 000076 001144063 011786 068006 GenBank accession number NM BC007513 ∗ ∗ tag Notes TTTTAAA AF NlaIII Expressed allele Status a 11p1311p15 Imprinted Paternal Imprinted CTGGTATATG Paternal AAATATTTAC BC032861 AF086011 11p15.5 Candidate Maternal ACAAGTATTC AK093839 11p15.5 Imprinted Maternal AGCCCGCCGC NM 11p15.5 Imprinted Paternal CTTGGGTTTT BC 11p15.5 Imprinted Maternal CTGGGCCTCT 11p15.5 Imprinted Maternal GCCACCCCCT 1: Continued. Table WT1, GUD, WAGR, WT33, WIT-2 LIT1, KvDMR1, KCNQ10T1, KvLQT1-AS, long QT intronic transcript 1 bA432J24.4, RP11-432J24.4 10q26.3 Candidate Maternal GGTTCTCAGC BC030794 BWS, WBS, p57, BWCR, KIP2 11p15.5 Imprinted Maternal CCCATCTAGC NM IFI17, LEU13, CD225 11p15.5 Candidate Maternal ACCATTGGAT NM IPL, BRW1C, BWR1C, HLDA2, TSSC3 INSIGF, pp9974, C11orf43, FLJ22066, FLJ44734/ILPR, IRDN HET, ITM, BWR1A, IMPT1, TSSC5, ORCTL2, BWSCR1A, SLC22A1L, p45-BWR1A, DKFZp667A184 PEG8, MGC168198ASM, BWS, ASM1, MGC4485, PRO2605, D11S813E 11p15.5 Imprinted Paternal GAGGGCCGTT AB030733 Wilms tumor 1 KCNQ1 overlapping transcript 1 (nonprotein coding) Chromosome 10 open-reading frame 91 Cyclin-dependent kinase inhibitor 1C (p57, Kip2) FLJ36520 fis, clone TRACH2002100 Interferon-induced transmembrane protein 1 (9–27) Pleckstrin homology-like domain, family A, member 2 Insulin/insulin-like growth factor 2 (somatomedin A) Solute carrier family 22, member 18 Insulin-like growth factor 2 antisense H19, imprinted maternally expressed transcript (nonprotein coding) WT1-Alt trans Gene symbol Gene name Aliases Location 94 9596 KCNQ1OT1 97 KCNQ1DN OSBPL5 KCNQ1 downstream neighbor BWRT, HSA404617 Oxysterol binding protein-like 5 ORP5, OBPH1, FLJ42929 11p15.4 11p15.4 Imprinted Imprinted Maternal Maternal GGACCCCAAA GGGGATGGAT AB039920 NM N 92 C10orf91 93 VENTX2 VENT-like homeobox-2 NA88A, HPX42B, VENTX2 10q26.3 Candidate Maternal TGC 102 CDKN1C 99100 Q8N9U2 IFITM1 101 PHLDA2 98 PKP3 Plakophilin 3 11p15.5 Candidate Maternal AACAGTCAAA NM 104 IGF2/INS 103 SLC22A18 105 IGF2AS 106 H19 Molecular Biology International 9 198441 016522 182614 001134999 133489 001142641 178537 153633 GenBank accession number BX537513 AK096947 BC005864 NM BC013197 ∗ ∗ ∗ ∗ ∗ tag Notes NlaIII Expressed allele Status a 11p15.5 Imprinted Maternal GGCAGGAGAC BC017074 14q22.1 Candidate Paternal GTTCAAAGAC NM 12q24.33 Candidate Maternal TCAATCAGTG NM 11q13.2 Candidate Maternal11q25 TCAGGCATTT Candidate Paternal TCCCTCTTCA BC071169 R NM 13q34 Candidate Maternal GTGCCTCTGT NM 12p12.1 Candidate Maternal TGTCTTTAAA 14q32 Imprinted Paternal ATACAGAATA 12q13.3 Candidate Maternal ACCCTTGAAC NM 1: Continued. Table LQT, RWS, WRS, LQT1, SQT2, ATFB1, ATFB3, JLNS1, KCNA8, KCNA9, Kv1.9, Kv7.1, KVLQT1, FLJ26167 FLJ25045 11p15.5 Candidate Maternal TGGAGCGTCC NM FERMT2, MIG2, UNC112, KIND2 CRBP3, CRBPIII, CRBP-III 12p13.31 Imprinted Maternal CTTCCTGTTA FBRSL1, AUTS2L, KIAA1545, XTP9 BKLHD3, FLJ30685NTM, HNT, IGLON2, MGC60329 11q22.3 Candidate Paternal AAACTACAAA AK092993 SUR2, ABC37, CMD1O, FLJ36852 DLK, FA1, ZOG, pG2, PREF1, Pref-1 ) Drosophila ) Drosophila Potassium voltage-gated channel, KQT-like subfamily, member 1 Beta-1,4-N-acetyl- galactosaminyl transferase 4 homolog 2 Retinol binding protein 5, cellular RAB1B, member RAS oncogene family Kelch repeat and BTB (POZ) domain containing 3 Family with sequence similarity 70, member B ATP-binding cassette, subfamily C (CFTR/MRP), member 9 Fermitin family ( Delta-like 1 homolog ( Solute carrier family 26, member 10 Gene symbol Gene name Aliases Location N 107 KCNQ1 108 B4GALNT4 117 CDK4120 Cyclin-dependent kinase 4 Q8N7V5 Proline-rich 20A PSK-J3, CMM3 12q14.1 PRR20A, FLJ40296 Candidate Maternal 13q21.1 GAAGGAAGAA Candidate Maternal ACTCACTGGA 113114 RBP5 HOXC4 Homeobox C4118119 Q96AV8 Q9HCM7 E2F transcription factor 7 Fibrosin-1-like protein HOX3E, CP19 E2F7, FLJ12981 12q13.13 12q21.2 Candidate Maternal Candidate GTACCTGCTG Maternal TAAACTGATT NM BC016658 109 RAB1B 110 KBTBD3 111 NTRI Neurotrimin 121 FAM70B 112 ABCC9 122123 FOXG1C PLEKHC1 124 Forkhead box G1 DLK1 HFK3 14q12 Candidate Paternal GAACTATATG BC050072 115116 HOXC9 SLC26A10 Homeobox C9 HOX3B 12q13.13 Candidate Maternal TACGGCTCGC BC032769 10 Molecular Biology International 005664 001134888 004415 GenBank accession number BC089442 ∗ tag Notes TTTTAAT BC NlaIII Expressed allele Status a 14q32 Imprinted Maternal TGGGAAGTGG AB032607 15q11–q13 Imprinted Maternal16p13.3 CTGTAAAACA Candidate16q13 Paternal CAGCGTCTCC BC002582 Candidate Maternal GGA BC031797 15q11.215q11.2 Imprinted Maternal Imprinted GCCCCCAGAG Paternal TTGGTGAGGG15q11.2–q12 Imprinted BC052251 Paternal CCGCCTCCGG AF241255 15q11–q13 Imprinted BC000611 Paternal AAATAATTTA NM 1: Continued. Table GTL2, FP504, prebp1, PRO0518, PRO2160, FLJ31163, FLJ42589 AS, ANCR, E6-AP, HPVE6A, EPVE6AP, FLJ26981 PTOP, PIP1, TINT1, TPP1 16q22.1 Candidate Maternal CGGCAAAAAA BC016904 ATPVA, ATPVC,KIAA0566 ATP10C, PET1, noncoding RNA in the Prader-Willi critical region SMN, PWCR, SM-D, RT-LI, HCERN3, SNRNP-N, FLJ33569, FLJ36996, FLJ39265, MGC29886, SNURF-SNRPN, DKFZp762N022, DKFZp686C0927, DKFZp761I1912, DKFZp686M12165 D15S9, RNF63, ZFP127, ZNF127, MGC88288 ) HSAL1 16q12.1 Candidate Maternal ACATTTCTAG R BC113881 Drosophila Maternally expressed 3 (nonprotein coding) SRY (sex-determining region Y)-box 8 Chromosome 16 open-reading frame 57 Adrenocortical dysplasia homolog (mouse) Prader-Willi syndrome chromosome region 1 SNRPN upstream reading frame/small nuclear ribonucleoprotein polypeptide N Gene symbol Gene name Aliases Location SNURF/ SNRPN N 125 MEG3 126 RTL1 Retrotransposon-like 1 PEG11 14q32.31 Candidate Maternal ACGGCCTGCA NM 133134 UBE3A135 Q9P168136 Ubiquitin SOX8 protein ligase E3A 137 PRO2369 SALL1 C16orf57 138139 Sal-like 1 ACD ( FOXF1 Forkhead box F1 FKHL5, FREAC1, ACDMPV 16q24.1 Candidate 15q13.1 Maternal TTCCTCCTCT Candidate Paternal AGAACTCCAC AF119879 140 ANKRD11 Ankyrin repeat domain 11 T13, LZ16, ANCO-1 16q24.3 Imprinted Maternal AAAGCTGACA BC058001 127 ATP10A128129 PWCR1 ATPase, class V, type 10A NDN130 Necdin homolog (mouse) HsT16328131132 MAGEL2 MKRN3 MAGE-like 2 Makorin ring finger protein 15q11.2–q12 3 Imprinted Paternal ACCTTGCTGG nM15, NDNL1 BC008750 15q11–q12 Imprinted Paternal TAGCATTGTA BC035839 Molecular Biology International 11 145653 001127895 001081438 003712 022751 001025087 002146 002145 003417 014518 013362 203411 037330 GenBank accession number NM AK291466 NM NM AK122867 ∗ ∗ ∗ ∗ ∗ TTTT tag Notes NlaIII Expressed allele Status a 18q21.1 Imprinted Maternal ACCTCCCAGG 19p13.2 Candidate Paternal TGCTCGGGAA AK091743 19q12 Candidate Paternal TTCTTA 19q13.4 Imprinted Paternal TTTTCACCAT BC 1: Continued. Table FLJ36443 fis 16q24.3 Candidate Maternal ACATTCAGAA AK093762 GalNAc4ST1, GalNAc4ST 19q13.11 Candidate Maternal GTTTCCAGAG HsT829, TCEB3L2, elongin A3 LOC100131170GAREM, Gm944, C18orf11 18q12.1 17q25.3 Candidate Paternal Candidate TGCAGAGAAA Paternal GGGTCTGAGG NM AK096606 PAP2C, LPP2 19p13.3 Candidate Maternal GTGTTCTTGG NM TSHZ3, ZNF537, FLJ54422, KIAA1474 ZNF656/PW1, ZSCAN24, KIAA0287, DKFZp781A095 LIR5, ILT3, HM18, CD85K 19q13.42 Candidate Maternal GGAAAATGGG FLJ36443 fis, clone THYMU2012891 Carbohydrate (N-acetylgalactosamine 4-0) sulfotransferase 8 Transcription elongation factor B polypeptide 3C (elongin A3) FLJ39287 fis, clone OCBBF2011897 Family with sequence similarity 59, member A FLJ34424 fis, clone HHDPC2008279 Phosphatidic acid phosphatase type 2C ZIM2 zinc finger, imprinted 2/Paternally expressed 3 Leukocyte immunoglobulin-like receptor, subfamily B (with TM and ITIM domains), member 4 Gene symbol Gene name Aliases Location N 141 Q8N206 142 TMEM88 Transmembrane protein 88 FLJ20025 17p13.1 Candidate Maternal CTGGGCTTCG NM 154 CHST8 157 ZNF264 Zinc finger protein 264 ZFP264 19q13.4 Imprinted Maternal GCTTCAGTGG NM 149 TCEB3C 156 ZNF229 Zinc finger protein 229 FLJ34222 19q13.31 Candidate Maternal TTGTAACCTC NM 145 HOXB2 Homeobox B2 HOX2H 17q21.32 Candidate Maternal AAGCACAAGC NM 146 Q8N8L1 147 FAM59A 143144 PYY2 HOXB3 Peptide YY, 2 Homeobox B3 (seminal plasmin) HOX2G 17q11.2 Candidate 17q21.32 Paternal Candidate TTCACTCCCG Maternal AACTCAGCTC AF222904 NM 148 BRUNOL4 Bruno-like 4 CELF4 18q12.2 Candidate Maternal GCTGTTCTTG NM 150151 Q8NE65 Q8NB05 152 Zinc finger protein 738 PPAP2C ZNF738155 ZNF225 Zinc finger protein 225 19p13.11160 MGC119735 Candidate ZNF550 Paternal TTGGTCAGGC Zinc finger protein 550 R 19q13.31 BC034499 Candidate Paternal TGGTATGTAT NM 19q13.43 Candidate Maternal AGAAATGTAC 153 TSH3 Teashirt zinc finger homeobox 3 158 ZIM2/PEG3 159 LILRB4 12 Molecular Biology International 014453 003422 005069 005675 006698 000516 005386 080826 GenBank accession number NM NM AB014581 BC011705 NM NM ∗ ∗ ∗ ∗ ∗ ∗ tag Notes NlaIII CCTGTCCTTT NM ATTAACAAAG NM Expressed allele Isoform Dependent Isoform Dependent Status a 20q13.12 Imprinted Paternal TGTGTATGTG 20q13.3220q13.33 Imprinted Paternal Candidate TCCATTAGAA Maternal AAGGAGCGGG AJ251759 20q13.3 Imprinted 20q13.33 Candidate Maternal ACCTCACTCT BC009889 19q13.43 Candidate Maternal GTCAGAACAC 22q11.21 Candidate Paternal CAGAAGAGGC 1: Continued. end tag.  Table VPS2A, VPS2, BC2 19q13.43 Candidate Maternal GGTGATGAGG BC10 20q11.2–q12 Imprinted L3MBTL1, FLJ41181, KIAA0681, H-L(3)MBT, dJ138B7.3, DKFZp586P1522 SANG, NESPAS, GNAS1AS, NCRNA00075 IDD, MED, EDM3, FLJ90759, DJ885L7.4.1 AHO, GSA, GSP, POH, GPSA, NESP, GNAS1, PHP1A, PHP1B, C20orf45, MGC33735, dJ309F20.1.1, dJ806M20.3.3 Eaf7, MRGBP, URCC4, MRG15BP, FLJ10914 SIM, bHLHe15, MGC119447 21q22.13 Candidate Paternal AAGGAAGATT MZF1, MZF1B, ZFP98, ZSCAN6 ) Drosophila ) anchoring enzyme. Drosophila Chromatin-modifying protein 2A Bladder cancer-associated protein GNAS antisense RNA 1 (nonprotein coding) Chromosome 20 open-reading frame 20 Single-minded homolog 2 ( DiGeorge syndrome critical region gene 6 NlaIII Gene symbol Gene name Aliases Location : tag maps to other gene(s) according to CGAP (Cancer Genome Anatomy Project, NCI, NIH) SAGE Anatomic Viewer. : highly repetitive tag according to CGAP SAGE Anatomic Viewer. R: unreliable/internal tag suggested by CGAP SAGE Anatomic Viewer is replaced with reliable 3 Entries are sorted according to the established gene location. ∗ N 161 CHMP2A 165 BLCAP 166 L3MBTL L(3)mbt-like ( 168 GNASAS 169 COL9A3 Collagen, type IX, alpha 3 167 GNAS GNAS complex locus 170 C20orf20 171 SIM2 164 NNAT Neuronatin PEG5, MGC1439 20q11.2–q12 Imprinted Paternal CAGTTGTGGT NM 162163 ZNF42 ISM1 Zinc finger protein 42 Isthmin 1 homolog (zebrafish) C20orf82 20p12.1 Candidate Paternal172 AATATTATCA173 DGCR6 FLJ20464 Hypothetical protein NM FLJ20464 22q12.2 Candidate Paternal CGTGAAATTC CR456348 SAGE tags annotated for a b c# d Molecular Biology International 13

Table 2: Summary of SAGE catalogs analyzed.

Clusters Number of SAGE catalogsa Number of SAGE tagsb C (cancer tissue) 185 13,165,432 N (normal tissue and cells) 166 12,953,131 IV (cells cultured in vitro) 112 8,009,673 D (nontumorous disease tissue and cells) 29 1,840,291 Total 492 35,968,527 All SAGE catalogs screened belong to GPL4 Gene Expression Omnibus database (GEO, NCBI) platform (Homo sapiens; NlaIII-anchoring enzyme). aSAGE catalogs selected for analysis (see Supplementary Table 1 available online at doi:10.1155/2012/793506). b Number of tags subjected for analysis (with (A)10 tags excluded). and GSM180669 were excluded since these did not satisfy cri- gene, namely, PTPN14 (ACTTTTTCAA tag). Similarly, in teria (i), representing ovary surface epithelium immortalized GSM383893 SAGE catalogue (gallbladder tubular adenocar- with SV40 and lymphocytes from Down syndrome children, cinoma [17, 23]), the same gene constitutes 86.6% of the respectively; samples GSM384024 (white blood cells, CD45+, cumulative (total) gene expression of the assayed imprinted isolated from a mammary gland carcinoma; 18,741 tags) genes. In many other SAGE catalogues, expression profile and GSM1128 (breast cancer cell line; tags detected once of the assayed imprinted genes was rather more balanced. are not available) were excluded as not satisfying criteria(ii) For example, in the GSM383840 SAGE catalogue (mammary and (iii), respectively (Supplementary Table 1). Due to myoepithelium, CD10+ cells [24]), PTPN14 constitutes just the conservative nature of the criteria listed above, a total 8.7% of the cumulative (total) gene expression of the assayed number of SAGE catalogues satisfying these and thus selected imprinted genes, equal to GNAS gene (ATTAACAAAG tag). for further analysis (i.e., to the extraction of tags matching Some imprinted genes were expressed almost ubiquitously imprinted genes) was reduced to 492. Together, these 492 through the samples: for example, genes NDUFA4, RPL22, SAGE catalogues representing human samples account for Q8NE65, PTPN14, GNAS, and RAB1B (Supplementary 35.97 million SAGE tags constructed using NlaIII-anchoring Table 3). Notably, in other cases, expression of the particular enzyme. The catalogues were assigned into one of the imprinted genes either was not detected at all in all 492 following Clusters: C (cancer tissue; 185 SAGE catalogues), SAGE catalogues screened (EVX1, ACGCCCGTGG tag), or N (normal tissue and cells; 166 SAGE catalogues), IV (cells was detected only occasionally (Supplementary Table 3). cultured in vitro; 112 SAGE catalogues), or D (nontumorous For example, gene DUX2 (AAGGGGTGGA tag) expression disease tissue and cells; 29 SAGE catalogues) (Table 2,and was detected only 3 times (on a minimum level) in Supplementary Table 1). 492 SAGE catalogues representing cell and tissue samples Figure 1 shows a distribution of the analyzed established in a variety of physiological and pathological conditions: and candidate imprinted genes through the . namely, in GSM383692 SAGE catalogue (astrocytoma grade Primary analysis of the normalized expression profiles of II [25]), GSM383867 SAGE catalogue (colon carcinoma the imprinted genes demonstrated a great variability in cell line [17, 23]), and GSM383928 SAGE catalogue (ovary the cumulative gene expression for 173 genes (Figure 2, preneoplasia cell line [26]). Similarly rare was the expression Table 3, and Supplementary Table 2). Average cumulative of FAM75D1 (detected only 3 times altogether), FAM77D, gene expression of those genes in human tissues and cells ISM1, FLJ20464, and Q8NB05 (detected only 5 times, in all was 0.90% of the total gene expression: specifically, 0.95% cases on a minimum level). for both cancer and normal tissue and cells (clusters C To assess variation in the expression of individual im- and N, resp.), 0.77% for cells cultured in vitro (cluster IV), printed genes in the samples, the clustering analysis of and 0.83% for nontumorous disease tissue and cells (cluster the normalized expression profiles was performed using D). In the pool of the assessed SAGE catalogues, it ranged EPCLUST (Expression Profile data CLUSTering and analysis) from 0.08% (total blood, GSM389907 [21]) to 4.36% of software. For each dataset, the number of clusters was set the total gene expression (bronchial epithelium, GSM125353 to the lowest value yielding one cluster containing a solitary [22]). Of 492 human SAGE catalogues tested, the cumulative database entry; 5 for cancer tissue, 6 for normal tissues and expression of the imprinted genes constituted >2% of the cells, 5 for cells cultured in vitro, and 2 for nontumorous total gene expression in 21 and <0.2% in 7 catalogues. The disease tissue and cells (Figures 3–7). Notable diversity was SAGE libraries with 10% most and 10% least cumulative and observed in the transcription profiles represented by the average expression of established and candidate imprinted individual clusters, with relatively high expression levels gene subsets are listed in Table 3. characteristic for just 1-2 or a higher number of the In some samples, a major fraction of the cumulative individual imprinted genes (Figures 4(a), 4(b)–7(a),and expression of the imprinted genes was established by only 7(b)). Expectedly, in a few cases samples generated from a few highly abundant transcripts. For example, in the the same tissues/cell types did fell into the same compact GSM125353 SAGE catalogue already mentioned above, cluster of the distinct pattern (Figure 3, Figures 4(a), 4(b)– 91.9% of the cumulative (total) gene expression of the 7(a),and7(b)). However, in many other cases imprinted assayed imprinted genes is represented by the single gene expression profiles of the same/similar tissue or cell 14 Molecular Biology International f Max e Average d Sum c C 29,781.50 172.15 25,275.20 C 25,944.14 149.97 17,365.83 C 23,570.83 136.25 18,977.90 C 22,633.99 130.83 11,595.94 C 20,570.15 118.90 16,071.37 C 29,575.98 170.96 25,125.63 C 23,346.84 134.95 18,746.14 C 19,968.16 115.42 15,538.48 D 22,688.19 131.15 12,514.88 Top 10% libraries in situ in situ in situ Bronchial brushings, former smokerCentral retina (macula)Mammary gland, ductal carcinoma in situ N N 43,563.92 33,159.24 251.81 191.67 40,054.67 26,692.01 Breast, ductal carcinoma Bronchial brushings, former smokerMammary gland, ductal carcinomaOral biopsyNeuroblastoma, primary tumor, stage 4S N C 29,184.64 27,222.30 N 168.70 157.35 27,066.16 24,955.09 22,422.27 156.45 21,312.29 Breast carcinoma metastasis to lung C 23,941.22 138.39 20,247.08 Mammary gland, ductal carcinoma in situ Breast, ductal carcinoma Gallbladder tubular adenocarcinomaVascular endothelium, hemangioma, benign hyperplasia Neuroblastoma, primary tumor, stage 4S C 23,262.07 134.46 20,154.48 Hemangioma tumor C 22,622.41 130.77 12,500.82 Oral biopsyMammary gland, ductal carcinoma in situ N 21,279.52 123.00 11,932.33 Breast carcinoma cell lineMammary gland, ductal carcinomaWhole body, fetalMammary gland, ductal carcinomaBreast, ductal carcinoma Cortex, pooled sample C C IV 20,417.00 N 20,467.40 20,036.49 N 118.02 20,274.33 118.31 115.82 19,926.52 15,536.62 117.19 12,710.80 13,377.93 115.18 9,698.99 13,701.49 Sample Cluster GSM574 GSM688 GSM1730 GSM1731 GSM1516 GSM14753 GSM125353 GSM383793 GSM125352 GSM383797 GSM194651 GSM112808 GSM383794 GSM383893 GSM384016 GSM112809 GSM194652 GSM383807 GSM383800 GSM383775 GSM383790 GSM383946 GSM383812 SAGE library 3: The SAGE libraries with 10% most and 10% least cumulative and average expression of established and candidate imprinted genes subsets. b ID Table Primary a ID 76 301 46 142 427 2975 104 300 1459990 430 346 273 55143 155 428 30167 105 507 2791 561 274 28100 101 347 146 440 62118 433 416 140144149521 425 537 443 Molecular Biology International 15 f Max e Average d Sum c C 15,715.04 90.84 11,679.51 C 16,572.81 95.80 12,713.33 C 18,122.08 104.75 14,601.68 C 19,126.03 110.56 14,594.68 C 18,914.78 109.33 12,349.04 D 16,221.46 93.77 4,205.56 in 3: Continued. Table ected by primary ff in situ in situ in situ carcinoma Bronchial brushings, former smokerBronchial brushings, former smokerAdrenal cortex a pigmented nodular adrenocortical disease Bronchial brushings, current N smokerNonsmall cell lung N cancer: squamous cell carcinoma N 16,456.57 16,306.85 16,090.33 95.12 94.26 93.01 11,137.34 11,835.62 11,242.21 Brain desmoplastic medulloblastoma C 18,106.48 104.66 11,244.70 Breast carcinoma cell lineBronchial brushings, former smokerBronchial brushings, current smokerEpendymomaHippocampusBronchial brushings, former smokerLung biopsy NBronchial N brushings, current smokerNonsmall cell lung cancer: squamous IVcell carcinoma N 17,662.13 N 17,598.56 17,991.89 C 17,225.16 N 102.09 16,871.13 101.73 N 104.00 17,524.27 99.57 17,289.52 12,756.35 12,699.28 97.52 17,223.67 8,886.81 101.30 13,193.26 99.94 10,737.09 99.56 10,222.49 8,268.90 13,839.55 Bronchial epitheliumBronchial epitheliumMetaplastic bronchial epitheliumBronchial brushings, never smokerGallbladderBreast carcinoma cell lineMammary gland, ductal invasive situ D N N N 19,376.31 IV 19,369.05 19,665.60 19,653.98 N 112.00 19,247.28 111.96 113.67 113.61 19,320.00 14,577.13 111.26 14,172.30 15,918.74 15,655.95 111.68 9,389.55 15,346.14 Neuroblastoma, primary tumor, stage 4 Peripheral retinaPeripheral retina N N 18,817.72 18,781.47 108.77 108.56 12,293.99 8,727.76 Nonsmall cell lung cancer: squamous cell carcinoma Sample Cluster GSM573 GSM572 GSM1733 GSM37212 GSM14781 GSM82458 GSM85616 GSM85611 GSM125351 GSM125347 GSM125336 GSM194375 GSM383806 GSM125343 GSM125337 GSM383710 GSM125349 GSM296391 GSM125339 GSM194379 GSM194386 GSM125358 GSM383895 GSM383804 GSM112812 GSM194377 SAGE library b ID Primary a 74 299 7075994 295 216 284 329 7168 181 439 66601215172109 291 62 285 387 98 260 297 361 287 333 ID 54 263 5317811376631 262 340 306 509 437 106 92 276 35 24 96 331 16 Molecular Biology International f Max e Average d Sum c C 4,146.18 23.97 460.69 C 4,141.08 23.94 1,351.60 C 4,081.16 23.59 1,340.95 D 4,103.62 23.72 606.45 D 3,663.78 21.18 334.67 N 4,169.95 24.10 1,615.39 NN 3,794.27 3,668.81 21.93 21.21 843.17 777.29 IV 4,119.01 23.81 920.45 in 3: Continued. Table Bottom 10% libraries erentiated ff erentiated ff associated adenocarcinoma with lymphoplasmacytic infiltration Colon carcinoma, cell lineGastric epithelial tissue from the antrum Lung, poorly di Gastroesophageal junction adenocarcinoma IV 4,179.31 24.16 422.91 Kidney, embryonic cell line 293, uninduced cells AIDS-KS lesion D 4,107.85 23.74 1,245.47 Lung, tumor associated (focal fibrosis and chronic inflammation) CD4+ T cellsMedulloblastoma, cerebellumStomach, poorly di C N 4,081.98 4,088.74 23.60 23.63 804.90 978.17 Sample Cluster carcinoma Skin, melanoma C 4,072.22 23.54 2,497.12 Colon carcinoma, cell lineMedulloblastomaColon carcinoma, cell lineColon, cancer cell lineGallbladder adenocarcinomaCD15+ myeloid progenitor cellsSpermatozoaSpermatozoa, pooled sampleLymphocytes from children IV 1–4 years old (pooled samples) Colon carcinoma, cell line IVBone marrow C NCord C blood-derived activated Th1 IVcells 4,048.02Breast stroma, ductal carcinoma Nsitu 3,961.32 3,909.95 3,859.27 3,987.16 3,945.42 N IV 23.40 3,811.16 22.90 22.60 22.31 23.05 N 22.81 3,746.12 3,814.90 404.80 22.03 506.24 1,108.11 651.66 1,069.73 3,740.58 21.65 22.05 673.61 1,429.18 21.62 1,288.15 378.40 962.65 HL-60 cells IV 3,661.81 21.17 920.25 GSM668 GSM747 GSM3244 GSM3245 GSM14780 GSM14807 GSM14734 GSM14760 GSM14751 GSM82459 GSM82243 GSM14784 GSM66698 GSM383998 GSM383868 GSM383914 GSM383865 GSM383753 GSM383867 GSM383892 GSM311354 GSM180670 GSM383866 GSM136195 GSM383787 SAGE library b ID Primary a 181 553 ID 83 488 3884 180 204 17 425 120 522 304456 121 137 162 5480 153 485 130821216611552 404 50 487 94 71 506 81 372 39 261 88 259 328 20 486 183 318 423 38 234 Molecular Biology International 17 f Max e Average d Sum c C 3,413.27 19.73 636.37 D 2,684.90 15.52D 747.69 811.75 4.69 162.35 N 3,364.85 19.45 291.33 NNN 1,673.61 1,670.43 1,091.72 9.67 9.66 6.31 435.14 510.41 327.52 IV 3,293.77 19.04IV 982.36 1,879.82 10.87 639.52 3: Continued. Table libraries (one accession number selected for redundant entries). ; D: nontumorous disease tissue and cells. in vitro erentiated (scirrhous type) ff Mammary myoepithelium, CD10+ cells HL-60 cells exposed to 2.45 GHz radiofrequency for 2 h Oral brushingLymph node, B-cell lymphomaLeukocytesBreast carcinoma cell lineBreast carcinoma cell lineGastric epithelial tissuesStomachLiver cholangiocarcinoma metastasisTotal C blood after EPO treatment, pooled sample Skin, primary malignant melanomaHL-60 cells exposed to 2.45 IV GHz radiofrequency C for N 6 h IVPancreas 3,166.97Skeletal muscle, N 5 days training young men N CSkeletal muscle, detraining young men 2,988.12 2,736.32 3,290.72Total blood, 2,977.71 pooled sampleSkeletal muscle, 18.31 pretraining young 2,970.06men N 2,991.69Total blood 2,612.99 during EPO treatment, pooled sample 17.27 15.82 19.02 17.21 555.61 17.17 2,813.83 17.29 N 15.10 N 597.62 1,575.46 996.85 297.77 16.26 673.21 1,511.12 439.34 1,813.11 653.25 654.38 8.73 10.48 436.55 278.94 Gallbladder tubular adenocarcinomaPrimary bronchial epithelial cellsRetinoblastomaCartilage chondrosarcoma cell linePrimary gastric cancer, poorly Cdi IV IV 3,643.40 3,604.30 C 3,502.94 21.06 20.83 3,561.82 20.25 624.22 804.84 20.59 515.14 384.45 Sample Cluster GSM784 GSM7800 GSM66712 GSM14775 GSM66714 GSM37337 GSM383840 GSM194649 GSM383937 GSM383915 GSM383902 GSM383803 GSM383801 GSM384002 GSM383903 GSM389908 GSM135389 GSM135392 GSM389906 GSM135388 GSM389907 GSM383894 GSM383970 GSM383852 SAGE library b ID Primary a Sum: cumulative (total) tag per million (tpm) value for SAGE tags matching established and candidate imprinted genes within the SAGE library. Primary ID: listing within a full dataset (see Supplementary Table 1). ID: listing within each cluster (see Supplementary Table 2). Average: tpm value for SAGE tags matching established and candidate imprinted genes within the SAGE library. Clusters: C: cancer tissue; N: normal tissue and cells; IV: cells cultured Max: maximum tpm value for SAGE tags matching established and candidate imprinted genes within the SAGE library. 13097 466 344 143 533 391731406563 235 21151 523 170 514 29 436 434 68 84 554 40 515 578 83 175 86 236 16582 308 28 311 576 307 577 ID 168 508 281777138 217 546 475 127 Particular sum: average and maximum valuesEntries could are be sorted recalculated according to to the the fraction cumulative of (total) the tpm total value. gene expression by dividing tpm value to 1,000,000. Indexes (GSM numbers) represent GEOa database accession numbers for SAGE b c d e f 18 Molecular Biology International

Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 Ch11 3 4 1p36.33 3p26.3 4p14.3 2 6p25.3 8p23.3 9p24.3 1p36.311p36.32 2p25.3 5p15.33 6p25.2 7p22.3 10p15.3 11p15.5 7 1p36.23 3 3p26.2 4p16.2 6p25.1 8p23.2 9p24.2 1p36.22 2p25.2 3p26.1 5p15.32 7p22.2 10p15.2 1p36.21 3p25.3 4p16.1 5p15.31 6p24.3 9p24.1 11p15.4 2p25.1 3p25.2 6p24.2 7p22.1 8p23.1 10p15.1 2 1p36.13 5p15.2 6p24.1 9p23 1p36.12 2p24.3 3p25.1 4p15.33 5p15.1 6p23 7p21.3 11p15.3 1p36.11 2p24.2 9p22.3 1p35.3 2p24.1 3p24.3 4p15.32 5p14.3 6p22.3 7p21.2 8p22 10p14 11p15.2 1p35.2 2p23.3 3p24.2 5p14.2 7p21.1 9p22.2 1p35.1 2p23.2 3p24.1 4p15.31 6p22.2 8p21.3 1p34.3 2p23.1 5p14.1 7p15.3 9p22.1 10p13 11p15.1 1p34.2 2p22.3 3p23 4p15.2 6p22.1 8p21.2 9p21.3 3p22.3 5p13.3 7p15.2 10p12.33 1p34.1 2p22.2 3p22.2 6p21.33 6 9p21.2 11p14.3 2p22.1 2 3p22.1 4p15.1 5p13.2 6p21.32 7p15.1 8p21.1 10p12.32 1p33 2p21 6p21.31 9p21.1 10p12.31 11p14.2 1p32.3 2p16.3 3p21.33 4p14 5p13.1 7p14.3 10p12.2 11p14.1 2p16.2 3p21.32 7p14.2 8p12 1p32.2 2p16.1 5p12 6p21.2 9p13.3 10p12.1 1p32.1 4p13 5p11 7p14.1 8p11.23 2p15 9p13.2 10p11.23 11p13 1p31.3 2p15 3p21.31 4p12 5q11.1 6p21.1 8p11.22 1p31.2 2p14 4p11 7p13 8p11.21 9p13.1 10p11.22 2p13.3 3p21.2 4q11 9p12 11p12 2p13.2 5q11.2 6p12.3 8p11.1 10p11.21 1p31.1 2p13.1 3p21.1 7p12.3 9p11.2 4q12 6p12.2 7p12.2 8q11.1 10p11.1 11p11.2 2p12 3p14.3 5q12.1 6p12.1 7p12.1 9p11.1 1p22.3 4q13.1 5q12.2 6p11.2 8q11.21 3p14.2 5q12.3 6p11.1 7p11.2 9q11 10q11.1 11p11.12 1p22.2 2p11.2 4q13.2 6q11.1 8q11.22 3p14.1 5q13.1 6q11.2 7p11.1 10q11.21 1p22.1 2p11.1 4q13.3 11p11.11 1p21.3 3p13 5q13.2 7q11.1 8q11.23 11q12.1 1p21.2 2q11.1 4q21.1 6q12 9q12 10q11.22 1p21.1 2q11.2 3p12.3 4q21.21 5q13.3 8q12.1 1p13.3 3p12.2 4q21.22 6q13 7q11.21 10q11.23 11q12.1 1p13.2 2q12.1 3p12.1 4q21.23 5q14.1 8q12.2 2q12.2 3p11.2 6q14.1 7q11.22 9q13 11q12.2 1p13.1 2q12.3 3p11.1 4q21.3 8q12.3 10q21.1 11q12.3 1p12 3q11.1 5q14.2 6q14.2 1p11.2 2q13 3q11.2 7q11.23 8q13.1 9q21.11 10q21.2 1p11.1 2q14.1 4q22.1 6q14.3 9q21.12 11q13.1 1q11 3q12.1 5q14.3 8q13.2 10q21.3 2q14.2 3q12.2 4q22.2 6q15 9q21.13 3q12.3 4q22.3 7q21.11 8q13.3 11q13.2 2q14.3 3q13.11 5q15 9q21.2 1q12 4q23 8q21.11 10q22.1 2q21.1 3q13.12 5q21.1 6q16.1 7q21.12 9q21.31 11q13.3 2q21.2 3q13.13 4q24 5q21.2 6q16.2 7q21.13 8q21.12 1q21.1 2q21.3 3q13.2 7q21.2 8q21.13 9q21.32 10q22.2 1q21.2 2q22.1 3q13.31 4q25 5q21.3 6q16.3 11q13.4 2q22.2 3q13.32 7q21.3 8q21.2 9q21.33 1q21.3 2q23.12q22.3 5q22.1 10q22.3 11q13.5 2q23.2 3q13.33 4q26 5q22.2 1q22 2q23.3 3q21.1 5q22.3 6q21 8q21.3 9q22.1 11q14.1 1q23.1 2q24.1 3q21.2 4q27 5q23.1 7q22.1 10q23.1 3q21.3 9q22.2 11q14.2 1q23.2 2q24.2 4q28.1 6q22.1 2 8q22.1 9q22.31 10q23.2 1q23.3 3q22.1 5q23.2 6q22.2 7q22.2 9q22.32 11q14.3 2q24.3 3q22.2 4q28.2 7q22.3 8q22.2 10q23.31 1q24.1 5q23.3 6q22.31 9q22.33 10q23.32 11q21 1q24.2 2q31.1 3q22.3 7q31.1 8q22.3 1q24.3 3q23 4q28.3 5q31.1 6q22.32 1q25.1 7q31.2 8q23.1 9q31.1 10q23.33 11q22.1 1q25.2 2q31.2 6q22.33 10q24.1 2q31.3 3q24 5q31.2 7q31.31 9q31.2 1q25.3 2q32.1 4q31.1 8q23.2 10q24.2 11q22.2 3q25.1 5q31.3 7q31.32 9q31.3 10q24.31 1q31.1 2q32.2 3q25.2 4q31.21 6q23.1 8q23.3 11q22.3 1q31.2 3q25.31 4q31.22 7q31.33 10q24.32 2q32.3 3q25.32 6q23.2 7q32.1 9q32 10q24.33 1q31.3 3q25.33 4q31.23 5q32 7q32.2 8q24.11 11q23.1 4q31.3 6q23.3 10q25.1 1q32.1 2q33.1 3q26.1 5q33.1 7q32.3 4 8q24.12 9q33.1 11q23.2 1q32.2 2q33.2 4q32.1 6q24.1 10q25.2 2q33.3 3q26.2 5q33.2 7q33 9q33.2 10q25.3 1q32.3 4q32.2 6q24.2 8q24.13 1q41 2q34 3q26.31 5q33.3 6q24.3 9q33.3 10q26.11 11q23.3 1q42.11 3q26.32 4q32.3 7q34 1q42.12 2q35 3q26.33 5q34 6q25.1 8q24.21 9q34.11 10q26.12 3q27.1 4q33 6q25.2 11q24.1 1q42.13 2 2q36.1 3q27.2 4q34.1 7q35 8q24.22 9q34.12 10q26.13 2q36.2 3q27.3 4q34.2 5q35.1 9q34.13 1q42.2 2q36.3 6q25.3 7q36.1 8q24.23 11q24.2 1q42.3 3q28 4q34.3 2 9q34.2 10q26.2 2q37.1 5q35.2 6q26 11q24.3 1q43 2q37.2 4q35.1 7q36.2 8q24.3 2 3q29 5q35.3 9q34.3 3 1q44 2q37.3 4q35.2 6q27 7q36.3 10q26.3 5 11q25

Ch12 Ch13 Ch14 Ch15 Ch16 Ch17 Ch18 Ch19 Ch20 Ch21 Ch22

12p13.33 13p13 14p13 15p13 18p11.32 12p13.32 16p13.3 17p13.3 20p13 21p13 22p13 12p13.31 13p12 14p12 15p12 18p11.31 19p13.3 12p13.2 16p13.2 17p13.2 18p11.23 12p13.1 20p12.3 13p11.2 18p11.22 22p12 16p13.13 17p13.1 12p12.3 14p11.2 15p11.2 20p12.2 12p12.2 21p12 13p11.1 16p13.12 19p13.2 12p12.1 17p12 18p11.21 13q11 16p13.11 20p12.1 12p11.23 14p11.1 15p11.1 22p11.2 13q12.11 12p11.22 15q11.1 16p12.3 19p13.13 12p11.21 13q12.12 14q11.1 17p11.2 18p11.1 12p11.1 16p12.2 13q12.13 15q11.2 19p13.12 20p11.23 21p11.2 22p11.1 12q11 13q12.2 2 14q11.2 3 16p12.1 12q12 13q12.3 15q12 18q11.1 2 17p11.1 20p11.22 22q11.1 13q13.1 16p11.2 19p13.11 14q12 15q13.1 12q13.11 13q13.2 17q11.1 20p11.21 21p11.1 12q13.12 15q13.2 18q11.2 14q13.1 16p11.1 13q13.3 15q13.3 17q11.2 12q13.13 14q13.2 20p11.1 2 16q11.1 19p12 21q11.1 22q11.21 13q14.11 14q13.3 12q13.2 15q14 17q21 18q12.1 14q21.1 20q11.1 12q13.3 13q14.12 19p11 21q11.2 13q14.13 14q21.2 15q15.1 17q21.1 12q14.1 16q11.2 20q11.21 22q11.22 12q14.2 13q14.2 14q21.3 15q15.2 18q12.2 19q11 17q21.2 12q14.3 13q14.3 15q15.3 20q11.22 22q11.23 14q22.1 12q15 15q21.1 16q12.1 17q21.31 19q12 13q21.1 14q22.2 18q12.3 20q11.23 21q21.1 12q21.1 22q12.1 14q22.3 15q21.2 12q21.2 13q21.2 16q12.2 17q21.32 2 19q13.11 2 13q21.31 14q23.1 15q21.3 20q12 22q12.2 12q21.31 14q23.2 17q21.33 13q21.32 15q22.1 16q13 18q21.1 19q13.12 12q21.32 14q23.3 21q21.2 12q21.33 13q21.33 15q22.2 17q22 20q13.11 15q22.31 16q21 22q12.3 13q22.1 14q24.1 17q23.1 19q13.13 21q21.3 12q22 15q22.32 18q21.2 20q13.12 13q22.2 17q23.2 12q23.1 13q22.3 14q24.2 15q22.33 19q13.2 12q23.2 15q23 16q22.1 17q23.3 18q21.31 21q22.11 22q13.1 13q31.1 14q24.3 19q13.31 12q23.3 15q24.1 17q24.1 18q21.32 2 20q13.13 13q31.2 14q31.1 15q24.2 12q24.11 15q24.3 16q22.2 17q24.2 18q21.33 19q13.32 21q22.12 13q31.3 14q31.2 22q13.2 12q24.12 14q31.3 15q25.1 16q22.3 17q24.3 21q22.13 12q24.13 13q32.1 15q25.2 18q22.1 20q13.2 12q24.21 13q32.2 14q32.11 19q13.33 12q24.22 14q32.12 15q25.3 16q23.1 17q25.1 21q22.2 13q32.3 2 12q24.23 14q32.13 20q13.31 22q13.31 14q32.2 16q23.2 17q25.2 18q22.2 19q13.41 13q33.1 2 15q26.1 20q13.32 12q24.31 16q23.3 13q33.2 14q32.31 18q22.3 19q13.42 13q33.3 14q32.32 15q26.2 16q24.1 21q22.3 22q13.32 12q24.32 17q25.3 2 20q13.33 2 16q24.2 19q13.43 13q34 14q32.33 15q26.3 18q23 12q24.33 16q24.3 3 22q13.33

Figure 1: A schematic representation of the analyzed established (53, filled arrowheads) and candidate (120, empty arrowheads) imprinted genes distribution through the human genome. Chromosome layout is via NCBI (Build 37.2). Numbers next to some of the arrowheads indicate the number of entries per locus.

types fall into different clusters. Similarly, though in many entries). Similarly, only one cluster in the normal tissue and cases imprinted gene expression profiles of the same/similar cell dataset has a homogenous composition (cluster 5, 2 tissue or cell types fell into the closely matching area entries), matching both available SAGE libraries constructed of the hierarchical tree built for the individual datasets from placenta (GSM14849, also designated GSM383945; (clusters C, N, IV, and D) (Figures 4(c), 5(c), 5(d), 6(c), GSM14750, also designated GSM383947 [17]) (Figure 3), and 7(c)), in other cases notable variability was observed with all other clusters composed of the samples of diverse in the distribution of imprinted gene expression profiles origins. Illustratively, this particular cluster brakes down of the same/similar tissue or cell types. For example, at (i.e., cluster content get redistributed to the clusters of the K-mean clustering, small-size cluster 3 in cancer tissue smaller size) only if the number of K-mean clusters for the dataset (3 entries) is composed entirely of neuroblastoma dataset is increased from the set value of 6 to 26, while some samples (Figures 4(a) and 4(b)); however, other entries other clusters break down more readily. In the hierarchical representing tumors of the same histological properties trees, most densely packed areas (representing most similar [27] fell into cluster 1 (composed of 141 entries in total). transcription profiles) are generally composed of the samples Cluster 4 in the same dataset (12 entries) is composed of the same/similar tissue or cell types. For example, one of entirely of carcinoma samples, while cluster 2 (28 entries) is the densest areas in four hierarchical trees built is composed composed of carcinoma samples predominantly (19 entries), of 19 samples matching bronchial brushings (Figures 5(c) with other samples representing astrocytoma (3 entries), and 5(d))[22], with all 5 other samples of the same glioblastoma multiforme (2 entries), cystadenoma (1 entry), origin falling into the nearest vicinity within the hierarchical rhabdosarcoma (1 entry), and unclassified breast cancer (2 tree (Figure 5(c)). At the same time, some SAGE libraries Molecular Biology International 19

300 300 300

200 200 200 (tpm) (tpm) 100 (tpm) 100 100

0 0 0 (a) (b) (c)

40000 40000 40000

30000 30000 30000

20000 20000 20000 (tpm) (tpm) (tpm) 10000 10000 10000

0 0 0 (d) (e) (f)

Figure 2: Histogram of average ((a), (b), and (c)) and maximum ((d), (e), and (f)) tag per million (tpm) values of the pool of imprinted genes and gene candidates for the normalized SAGE catalogues: cancer tissue ((a), (d); 185 catalogues); normal tissues and cells ((b), (e); 166 catalogues); cells cultured in vitro ((c), (f); 112 catalogues). Corresponding histogram pairs are built following a sorting by the maximum value in the pool.

Size = 2 5961 TFPI2 5109.43 4257.86 3406.29

2554.71 DLK1 1703.14 851.57 PTPN14

(a) GSM14749 GSM14750

0 5960 (b)

Figure 3: An example of gene expression pattern recognized by K-mean clustering analysis (normal tissue and cells, cluster 5). Graph line (a) and cluster contents (b). Vertical bars denote individual genes. Exponential shades of grey code (5 colors) are based on the normalized tpm values. GSM14749: first trimester placenta; GSM14750: full-term placenta. Imprinted genes with peak expression values in the cluster are indicated. PTPN14: protein tyrosine phosphatase, nonreceptor type 14; TFPI2: tissue factor pathway inhibitor 2; DLK1: delta-like 1 homolog (Drosophila). representing the samples of the identical origin fell into the biological function of imprinted genes both for Homo sapiens separate K-mean clusters and into well-separated areas of and animal species (Mus musculus in particular). Over the the hierarchical tree. This was observed, for example, for 3 course of the decade, we witness an expansion of the list of available peripheral retina samples, from which GSM572 and the established imprinted genes [6, 30]. It is most probable GSM573 [28] fell into cluster 3, and GSM383968 [29]fells that novel candidate imprinted genes will be identified in intocluster1(Figure 4). the future, and features of the imprinted genes will be confirmed for some candidates. In the current study, a 4. Discussion comprehensive list of the human imprinted genes and high- confidence gene candidates (203 entries total) became a Mechanism of genomic imprinting plays important, yet subject for a large-scale in silico gene expression profiling. not fully understood role in many physiological processes: Available nucleotide sequences (174 genes and gene candi- in particular, in the control of growth and development. dates) have been utilized for the extraction of the appropriate Since the identification of the first imprinted genes (IGF2, short SAGE tags matching NlaIII anchoring enzyme, most IGF2R, and H19) in mouse in 1991, a large volume of common in generating SAGE libraries. Notably, candidate information has been accumulated on the identity and imprinted gene Q9NYI9 (PPARL) did not bear NlaIII 20 Molecular Biology International

= Size 141 Size = 28 25276 Cluster 1 25276 Cluster 2 20220.8 20220.8 15165.6 15165.6 10110.4 10110.4 5055.2 5055.2

Size = 12 25276 Size = 3 25276 Cluster 4 Cluster 3 20220.8 20220.8 15165.6 15165.6 10110.4 10110.4 5055.2 5055.2

25276 Size = 1 20220.8 Cluster 5 15165.6 10110.4 5055.2

(a) Cluster 1 Cluster 2

GSM14731 GSM14743 GSM14732 GSM14746 GSM14733 GSM2451 GSM14734 GSM696 GSM14737 GSM701 GSM14739 GSM726 GSM14740 GSM735 GSM14741 GSM756 GSM14742 GSM110379 GSM14745 GSM194375 GSM14747 GSM194376 GSM14751 GSM194379 GSM14760 GSM194380 GSM14761 GSM194381 GSM14762 GSM194382 GSM14763 GSM194383 GSM14764 GSM194385 GSM14765 GSM383690 GSM14766 GSM383812 GSM14767 GSM383813 GSM14768 GSM383815 GSM14769 GSM383821 GSM14772 GSM383825 GSM14773 GSM383828 GSM14774 GSM383859 GSM14775 GSM383921 GSM14776 GSM384025 GSM14779 GSM383908 GSM14782 GSM14786 GSM14787 GSM14788 GSM14790 Cluster 3 GSM14791 GSM14792 GSM14793 GSM112808 GSM14794 GSM112809 GSM14795 GSM112812 GSM14797 GSM14806 GSM14807 GSM1498 GSM1516 GSM2384 Cluster 4 GSM2385 GSM2408 GSM14753 GSM2443 GSM2578 GSM1730 GSM670 GSM1731 GSM671 GSM1733 GSM686 GSM688 GSM687 GSM194377 GSM689 GSM383790 GSM690 GSM383793 GSM697 GSM383794 GSM698 GSM383797 GSM699 GSM383807 GSM700 GSM383893 GSM723 GSM727 GSM736 GSM737 Cluster 5 GSM740 GSM743 GSM744 GSM14781 GSM757 GSM765 GSM7800 GSM792 GSM8505 GSM8867 GSM110381 GSM112795 GSM112797 GSM112807 GSM112813 0 25275 GSM194378 GSM194384 GSM303325 GSM303340 GSM303349 GSM383688 GSM383691 GSM383692 GSM383693 GSM383696 GSM383697 GSM383698 GSM383699 GSM383700 GSM383702 GSM383703 GSM383709 GSM383710 GSM383713 GSM383717 GSM383720 GSM383722 GSM383723 GSM383724 GSM383726 GSM383727 GSM383753 GSM383767 GSM383768 GSM383769 GSM383770 GSM383771 GSM383781 GSM383783 GSM383785 GSM383786 GSM383791 GSM383796 GSM383808 GSM383811 GSM383814 GSM383816 GSM383817 GSM383818 GSM383819 GSM383820 GSM383823 GSM383848 GSM383850 GSM383884 GSM383892 GSM383894 GSM383897 GSM383903 GSM383904 GSM383915 GSM383926 GSM383940 GSM383970 GSM383994 GSM383995 GSM383996 GSM383998 GSM384006 GSM384007 GSM384008 (b)

Figure 4: Continued. Molecular Biology International 21

GSM112809 GSM112808 GSM112812 GSM112807 GSM112797 GSM792 GSM383781 GSM383783 GSM383717 GSM2408 GSM383722 GSM8867 GSM383713 GSM14788 GSM14769 GSM14733 GSM383770 GSM14781 GSM14794 GSM14790 GSM690 GSM14774 GSM14732 GSM1731 GSM765 GSM743 GSM14765 GSM1730 GSM14753 GSM383793 GSM383797 GSM383794 GSM194377 GSM1733 GSM383893 GSM194379 GSM383790 GSM383807 GSM688 GSM194376 GSM383821 GSM384025 GSM383813 GSM383825 GSM194375 GSM194380 GSM194383 GSM110379 GSM383690 GSM194385 GSM14746 GSM194382 GSM383859 GSM735 GSM696 GSM726 GSM383921 GSM194381 GSM383816 GSM383818 GSM383812 GSM383815 GSM727 GSM383908 GSM14745 GSM14743 GSM756 GSM383828 GSM194378 GSM383696 GSM383724 GSM2451 GSM383786 GSM383720 GSM670 GSM383897 GSM14747 GSM14797 GSM689 GSM383726 GSM698 GSM737 GSM744 GSM194384 GSM14768 GSM383814 GSM701 GSM700 GSM112795 GSM723 GSM14786 GSM14776 GSM14731 GSM383697 GSM383727 GSM14767 GSM383688 GSM671 GSM8505 GSM383808 GSM383884 GSM697 GSM383820 GSM383819 GSM383817 GSM383811 GSM383823 GSM383848 GSM383769 GSM383771 GSM1516 GSM736 GSM384006 GSM110381 GSM303325 GSM303340 GSM383904 GSM303349 GSM383995 GSM14742 GSM383702 GSM687 GSM686 GSM383994 GSM383996 GSM1498 GSM383709 GSM383710 GSM2443 GSM14739 GSM383692 GSM383699 GSM14766 GSM14773 GSM2578 GSM14737 GSM383696 GSM14763 GSM383703 GSM14779 GSM14761 GSM383753 GSM14782 GSM14793 GSM14772 GSM14792 GSM384008 GSM384007 GSM14791 GSM14795 GSM757 GSM38998 GSM14760 GSM14734 GSM740 GSM2385 GSM14740 GSM14762 GSM383700 GSM383691 GSM14741 GSM383915 GSM2384 GSM383698 GSM14787 GSM383970 GSM14806 GSM14807 GSM383785 GSM383723 GSM699 GSM383926 GSM7800 GSM383892 GSM383894 GSM112813 GSM383791 GSM383768 GSM383767 GSM14751 GSM383903 GSM14775 GSM383940 GSM383850

0 25275 0 0.293 0.586 0.879 (c)

Figure 4: Gene expression patterns recognized by K-mean and hierarchical clustering analysis (cancer tissue, 185 SAGE catalogues). K-mean clustering analysis, (a) graph lines and (b) cluster contents; vertical bars denote individual genes. (c) Hierarchical cluster tree. Exponential shades of red code (15 colors) are based on the normalized tpm values. 22 Molecular Biology International

40055 Cluster 1 Size = 111 40055 Cluster 2 Size = 4 33379.17 33379.17 26703.33 26703.33 20027.5 20027.5 13351.67 13351.67 6675.83 6675.83

40055 40055 33379.17 Cluster 3Size = 1 33379.17 Cluster 4 Size = 46 26703.33 26703.33 20027.5 20027.5 13351.67 13351.67 6675.83 6675.83

40055 = 40055 = 33379.17 Cluster 5Size 2 33379.17 Cluster 6 Size 2 26703.33 26703.33 20027.5 20027.5 13351.67 13351.67 6675.83 6675.83

(a) Cluster 1 Cluster 2 GSM14754 GSM574 GSM14756 GSM125352 GSM14757 GSM125353 GSM14777 GSM194651 GSM14780 GSM14784 GSM14785 GSM14789 Cluster 3 GSM14798 GSM383963 GSM14799 GSM14801 GSM14802 Cluster 4 GSM1734 GSM2386 GSM14755 GSM2455 GSM14796 GSM3242 GSM572 GSM3245 GSM573 GSM571 GSM676 GSM675 GSM677 GSM678 GSM708 GSM685 GSM719 GSM695 GSM729 GSM706 GSM738 GSM713 GSM760 GSM721 GSM780 GSM781 GSM125335 GSM784 GSM125336 GSM786 GSM125337 GSM819 GSM125338 GSM824 GSM125339 GSM311354 GSM125340 GSM118371 GSM125341 GSM125348 GSM125342 GSM125356 GSM125343 GSM135388 GSM125344 GSM135389 GSM125345 GSM135390 GSM125346 GSM135391 GSM125347 GSM135392 GSM125349 GSM136194 GSM125350 GSM136195 GSM125351 GSM136196 GSM125354 GSM136198 GSM125355 GSM136199 GSM125357 GSM154137 GSM125358 GSM16892 GSM136197 GSM180670 GSM194647 GSM194648 GSM194650 GSM194649 GSM194652 GSM296374 GSM296377 GSM296376 GSM296391 GSM296379 GSM383775 GSM296381 GSM383831 GSM296382 GSM383895 GSM296385 GSM383901 GSM296387 GSM384035 GSM296392 GSM85611 GSM296393 GSM85616 GSM296644 GSM85623 GSM301092 GSM301093 GSM37211 GSM383686 Cluster 5 GSM383772 GSM14749 GSM383776 GSM383784 GSM14750 GSM383832 GSM383833 GSM383834 Cluster 6 GSM383835 GSM383836 GSM1499 GSM383837 GSM82458 GSM383838 GSM383839 GSM383840 GSM383842 GSM383843 GSM383847 GSM383869 GSM383887 0 40054 GSM383888 GSM383902 GSM383907 GSM383911 GSM383937 GSM383946 GSM383962 GSM383965 GSM383968 GSM383971 GSM383972 GSM384002 GSM384003 GSM384012 GSM384017 GSM384026 GSM384027 GSM384028 GSM384029 GSM384030 GSM384031 GSM384032 GSM384033 GSM384036 GSM389906 GSM43175 GSM549178 GSM82243 GSM82459 GSM85622 GSM383900 (b)

Figure 5: Continued. Molecular Biology International 23

GSM14750 GSM14749 GSM383963 GSM194652 GSM194650 GSM14755 GSM383831 GSM194651 GSM573 GSM574 GSM14796 GSM125353 GSM125352 GSM85611 GSM85616 GSM125349 GSM383895 GSM571 GSM708 GSM677 GSM383901 GSM729 GSM738 GSM194647 GSM194648 GSM384035 GSM383775 GSM676 GSM125351 GSM125358 GSM125347 GSM125337 GSM125336 GSM125342 GSM125350 GSM125345 GSM125343 GSM125357 GSM125344 GSM125341 GSM125339 GSM125340 GSM125338 GSM125354 GSM125335 GSM125355 GSM125346 GSM780 GSM85623 GSM296376 GSM719 GSM760 GSM118371 GSM296377 GSM296391 GSM383869 GSM383686 GSM136197 GSM384036 GSM713 GSM781 GSM125356 GSM125348 GSM14785 GSM706 GSM296387 GSM296379 GSM296392 GSM85622 GSM296374 GSM384017 GSM383839 GSM383833 GSM383834 GSM383842 GSM383838 GSM384028 GSM384030 GSM384026 GSM384033 GSM384027 GSM383843 GSM383836 GSM678 GSM383784 GSM383832 GSM384029 GSM384032 GSM384031 GSM136199 GSM14754 GSM572 GSM384012 GSM383965 GSM311354 GSM14784 GSM14756 GSM14801 GSM383835 GSM383837 GSM194649 GSM383946 GSM383888 GSM383776 GSM14798 GSM136194 GSM136198 GSM136196 GSM14789 GSM14802 GSM43175 GSM549178 GSM721 GSM136195 GSM383847 GSM383902 GSM383840 GSM685 GSM154137 GSM82458 GSM786 GSM135391 GSM135390 GSM301093 GSM296393 GSM819 GSM301092 GSM383968 GSM383972 GSM383971 GSM1734 GSM296385 GSM296644 GSM383911 GSM675 GSM180670 GSM1499 GSM384003 GSM824 GSM14780 GSM2386 GSM383962 GSM14777 GSM296382 GSM383907 GSM784 GSM384002 GSM383887 GSM2455 GSM3242 GSM296381 GSM3245 GSM389906 GSM37211 GSM695 GSM383772 GSM14799 GSM383937 GSM14757 GSM135389 GSM135392 GSM135388 GSM16892 GSM383900 GSM82243 GSM82459

0 40054 0 0.285 0.57 0.856 (c) NDUFA4 NM144654 RAB1B TSH3 GNAS PTPN14 Q8NE65

GSM676… GSM125351 bronchial brushings, former smoker GSM125358 bronchial brushings, never smoker GSM125347 bronchial brushings, former smoker GSM125337 bronchial brushings, current smoker GSM125336 bronchial brushings, current smoker GSM125342 bronchial brushings, current smoker GSM125350 bronchial brushings, former smoker GSM125345 bronchial brushings, former smoker GSM125343 bronchial brushings, former smoker GSM125357 bronchial brushings, never smoker GSM125344 bronchial brushings, former smoker GSM125341 bronchial brushings, current smoker GSM125339 bronchial brushings, current smoker GSM125340 bronchial brushings, current smoker GSM125338 bronchial brushings, current smoker GSM125354 bronchial brushings, former smoker GSM125335 bronchial brushings, current smoker GSM125355 bronchial brushings, never smoker GSM125346 bronchial brushings, former smoker GSM680…

(d)

Figure 5: Gene expression patterns recognized by K-mean and hierarchical clustering analysis (normal tissue and cells, 166 SAGE catalogues). K-mean clustering analysis, (a) graph lines and (b) cluster contents; vertical bars denote individual genes. Arrowheads point out 3 SAGE libraries generated from peripheral retinal samples. (c) Hierarchical cluster tree, fragment (d) enlarged is highlighted. Exponential shades of red code (15 colors) are based on the normalized tpm values. 24 Molecular Biology International

12711 Cluster 1 Size = 74 12711 = 10168.8 10168.8 Cluster 2 Size 7 7626.6 7626.6 5084.4 5084.4 2542.2 2542.2

12711 12711 10168.8 Cluster 3Size = 110168.8 Cluster 4 Size = 28 7626.6 7626.6 5084.4 5084.4 2542.2 2542.2

12711 = 10168.8 Cluster 5 Size 2 7626.6 5084.4 2542.2

(a) Cluster 1 Cluster 2 GSM14804 GSM14759 GSM14805 GSM383804 GSM14916 GSM383800 GSM383806 GSM14918 GSM383861 GSM668 GSM383930 GSM683 GSM62245 GSM703 GSM711 GSM714 GSM717 Cluster 3 GSM732 GSM747 GSM383853 GSM383879 GSM110378 GSM110380 GSM113363 GSM136123 Cluster 4 GSM136124 GSM26246 GSM14735 GSM264073 GSM14736 GSM14752 GSM37337 GSM14758 GSM383730 GSM14764 GSM383731 GSM1515 GSM383734 GSM680 GSM383735 GSM710 GSM383737 GSM722 GSM383759 GSM730 GSM9220 GSM383760 GSM9221 GSM383763 GSM383805 GSM383766 GSM383733 GSM383801 GSM383736 GSM383802 GSM383738 GSM383803 GSM383765 GSM383798 GSM383849 GSM383799 GSM383851 GSM383956 GSM383852 GSM41248 GSM383854 GSM48251 GSM383855 GSM62240 GSM383856 GSM62241 GSM62242 GSM383857 GSM62243 GSM383858 GSM62244 GSM383864 GSM384020 GSM383865 GSM383866 GSM383867 GSM383868 GSM383872 Cluster 5 GSM383873 GSM383874 GSM32266 GSM383875 GSM45608 GSM383876 GSM383877 GSM383878 GSM383880 GSM383881 GSM383882 0 12710 GSM383898 GSM383905 GSM383906 GSM383925 GSM383928 GSM383933 GSM383934 GSM383938 GSM383939 GSM383948 GSM383949 GSM383952 GSM384010 GSM549179 GSM549180 GSM66698 GSM66712 GSM66714 (b)

Figure 6: Continued. Molecular Biology International 25

GSM383853 GSM383736 GSM383738 GSM710 GSM383956 GSM680 GSM383800 GSM383798 GSM383799 GSM62245 GSM62242 GSM62243 GSM384020 GSM383930 GSM14759 GSM383861 GSM14764 GSM48251 GSM383733 GSM14736 GSM14735 GSM730 GSM732 GSM383765 GSM383759 GSM383806 GSM383804 GSM383805 GSM9220 GSM549180 GSM14758 GSM14752 GSM383734 GSM136123 GSM14918 GSM383933 GSM136124 GSM1515 GSM722 GSM383873 GSM383876 GSM383948 GSM703 GSM41248 GSM62244 GSM62241 GSM62240 GSM110378 GSM383898 GSM110380 GSM383803 GSM26246 GSM383857 GSM113363 GSM9221 GSM549179 GSM383851 GSM383878 GSM384010 GSM383763 GSM711 GSM383872 GSM383760 GSM383952 GSM683 GSM383856 GSM383881 GSM383875 GSM383879 GSM383877 GSM383874 GSM383880 GSM383882 GSM383939 GSM383938 GSM717 GSM383864 GSM747 GSM714 GSM14804 GSM14805 GSM668 GSM66698 GSM66712 GSM383925 GSM383766 GSM383730 GSM383731 GSM14916 GSM383854 GSM383802 GSM264073 GSM37337 GSM383737 GSM383735 GSM383849 GSM383949 GSM66714 GSM383905 GSM383906 GSM383858 GSM383852 GSM383801 GSM383934 GSM383928 GSM45608 GSM32266 GSM383865 GSM383868 GSM383866 GSM383867 GSM383855

0 12710 0 0.305 0.609 0.914 (c)

Figure 6: Gene expression patterns recognized by K-mean and hierarchical clustering analysis (cells cultured in vitro, 112 SAGE catalogues). (a) K-mean clustering analysis, graph-lines and (b) cluster contents; vertical bars denote individual genes. (c) Hierarchical cluster tree. Arrowheads point out 13 SAGE libraries generated from undifferentiated embryonic stem cells (ESC). Exponential shades of red code (15 colors) are based on the normalized tpm values.

recognition sites. This limitation of the conventional SAGE RsaI (second and third most common in generating SAGE protocol can generally be overcome by using an alternative libraries) as well, though it bears one for MmeI utilized anchoring enzyme [16]. However, gene Q9NYI9 does not in LongSAGE protocol. Taken together, not 174 but 173 bear recognition sites for anchoring enzymes Sau3AI and genes (missing Q9NYI9 (PPARL))—including 53 established 26 Molecular Biology International

14578 Cluster 1 Size = 28 14578 Cluster 2 Size = 1 11662.4 11662.4 8746.8 8746.8 5831.2 5831.2 2915.6 2915.6

(a) Cluster 1 Cluster 2 GSM14744 GSM389908 GSM14748 GSM3240 GSM3241 GSM3243 GSM3244 0 14577 GSM136119 GSM136120 GSM136121 GSM154136 GSM154138 GSM154139 GSM194386 GSM194387 GSM237078 GSM37212 GSM383787 GSM383788 GSM383826 GSM383885 GSM383886 GSM383914 GSM383973 GSM384016 GSM389907 GSM52500 GSM52501 GSM52502 (b) GSM389908 GSM37212 GSM384016 GSM194386 GSM14744 GSM154139 GSM194387 GSM136121 GSM136120 GSM136119 GSM383826 GSM383886 GSM383885 GSM154138 GSM3243 GSM14748 GSM383973 GSM383788 GSM154136 GSM52500 GSM389907 GSM52501 GSM52502 GSM237078 GSM3240 GSM3241 GSM383914 GSM3244 GSM383787

0 14577 0 0.306 0.612 0.918 (c)

Figure 7: Gene expression patterns recognized by K-mean and hierarchical clustering analysis (nontumorous disease tissue and cells, 29 SAGE catalogues). (a) K-mean clustering analysis, graph lines and (b) cluster contents; vertical bars denote individual genes. (c) Hierarchical cluster tree. Exponential shades of red code (15 colors) are based on the normalized tpm values.

imprinted genes and 120 candidate imprinted genes—were represent a comprehensive assay of tissues and cell types in annotated with the appropriate SAGE tags. The latter was physiological and a variety of pathological conditions. Gene matched the pool of 492 normalized SAGE catalogues rep- expression of imprinted genes was assessed in the normalized resenting libraries derived from human samples, constructed SAGE catalogues representing the transcriptomes of these using NlaIII anchoring enzyme and together accounting samples, according to the straightforward algorithm of for 35.97 million SAGE tags. Collectively, these catalogues in silico analysis. Molecular Biology International 27

As with nearly any other gene, expression of imprinted genomic imprinting in particular normal cell and cancer genes is not a constant, but rather a dynamic function of subtypes, suggested by high expression of these genes, thus cell type and state. In the current study, a great variability should be a subject of the follow-up studies. Expression of was observed in both cumulative/total expression of the individual imprinted genes varies to even further extent in studied imprinted genes and that of the individual genes. The the samples screened. Expression of the candidate imprinted cumulative expression of 173 studied imprinted genes ranges gene even-skipped homeobox 1 (EVX1) was not detected in from 0.08% (total blood) to 4.36% (bronchial epithelium) of any sample submitted to the analysis, while the expression of the total gene expression (Table 3). In some samples (Table 3 many more (DUX2, FAM75D1, Q8NB05, FLJ20464, ISM1, and Supplementary Table 2), imprinted genes-associated FAM77D, and others) was detected only in a few samples, proportion of the transcriptome is obviously above what is always on a minimal level. In contrast, further imprinted to be expected from such a limited group of genes, clearly genes (NDUFA4, RPL22, Q8NE65, GNAS, PTPN14, RAB1B, reflecting the importance of the biological roles played by the and others) were expressed in the majority of the samples latter. At the same time, overall expression of the imprinted screened, often on high level (Supplementary Table 3). genes was equal in the clusters of cancer tissues and normal Illustratively, a notable variation in the cumulative tissue and cells (clusters C and N, 0.95% for both clusters) expression of the imprinted genes and in the expression of and lower for the cells cultured in vitro (cluster IV, 0.77%). individual imprinted genes is observed in the cells cultured The current study apparently represents the first attempt in vitro, including cells of the same type (e.g., numerous to estimate an impact of imprinted genes on the total volume medulloblastoma, glioblastoma multiforme, and breast car- of the transcriptome. Obvious biases affect an accuracy of the cinoma cell lines) (Supplementary Table 2 and Figure 6). algorithm applied, suggesting both underestimation (proba- This observation further supports earlier suggestion that ble existence of yet unidentified imprinted genes, unavailable cell culture conditions contribute to the maintenance or information on gene structure for some imprinted genes, alteration of the imprinted gene expression [42, 43]. absence of anchoring enzyme recognition sites for at least Taken together, a screening of the normalized expression one gene) and overestimation (unconfirmed imprinting profiles of a comprehensive panel of the established and status of some of the candidate imprinted genes, SAGE tags candidate imprinted genes within the publicly available matching more than one gene; see Table 1) of the relative human SAGE datasets was performed in the current study: size of the imprinted transcriptome. Despite this, provided the first to estimate a prevalence of imprinted genes within data on the estimated cumulative/total expression of the the total human transcriptome in a large scale. This paper known imprinted genes (their number well corresponding thus provides a useful reference on the relative size of to the predicted number of imprinted genes in human the imprinted transcriptome and on the expression of the genome [31, 32]) in a variety of tissues and cells is most individual imprinted genes. interesting. Until now, little information was available on the overall expression of imprinted genes in the cells of Acknowledgments different types. It is generally believed that many imprinted genes are highly expressed in the developing and adult This study was supported by the grants by The Ministry of brain tissue [33], placenta [34], and undifferentiated stem Education and Science of the Russian Federation, Federal tar- cells [35]. Discrete studies identify certain highly expressed get program “Research and Pedagogical Cadre for Innovative imprinted genes as the potential biomarkers of cancer Russia” for 2009–2013 (State Contract 14.740.11.0004) and subtypes [36, 37]. In contrast, imprinted genes are known by MCB Program, Russian Academy of Sciences. The author to be expressed on relatively low level in adult blood is grateful to all GEO database contributors. cells [38]. This information is supported by the observed values of the cumulative expression of the imprinted genes through the screened samples (Table 3 and Supplementary References Table 2): cumulative expression of the imprinted genes is [1]J.McGrathandD.Solter,“Completionofmouseembryoge- generally high in many assessed brain-derived samples and nesis requires both the maternal and paternal genomes,” Cell, low in blood samples. It was also observed earlier that vol. 37, no. 1, pp. 179–183, 1984. major upregulation of gene expression of the numerous [2] M. A. H. Surani, S. C. Barton, and M. L. Norris, “Development imprinted genes is associated with early differentiation and of reconstituted mouse eggs suggests imprinting of the development, rather than with undifferentiated status of genome during gametogenesis,” Nature, vol. 308, no. 5959, pp. stem cells [39, 40]. Concordantly, in the current study, all 548–550, 1984. of the 13 SAGE libraries generated from undifferentiated [3] M. S. Bartolomei, S. Zemel, and S. M. Tilghman, “Parental im- embryonic stem cells (ESCs)—namely, lines HES3, HES4 printing of the mouse H19 gene,” Nature, vol. 351, no. 6322, [17, 23, 41], BG01, H1, H7, H9, H13, H14, HSF6 [17, 23]— pp. 153–155, 1991. [4]D.P.Barlow,R.Stoger,B.G.Herrmann,K.Saito,and uniformly demonstrate intermediate cumulative expression N. Schweifer, “The mouse insulin-like growth factor type-2 of the imprinted genes (Supplementary Table 2) and fit receptor is imprinted and closely linked to the Tme locus,” closely in the hierarchical tree built for the corresponding Nature, vol. 349, no. 6304, pp. 84–87, 1991. cluster (cluster IV; Figure 5(c)). However, many samples with [5] T. M. DeChiara, E. J. Robertson, and A. Efstratiadis, “Parental high cumulative expression of the imprinted genes do not imprinting of the mouse insulin-like growth factor II gene,” fit into any of the groups listed above. Important role of Cell, vol. 64, no. 4, pp. 849–859, 1991. 28 Molecular Biology International

[6]P.P.Luedi,F.S.Dietrich,J.R.Weidman,J.M.Bosko,R.L. [25]K.Boon,J.B.Edwards,C.G.Eberhart,andG.J.Riggins, Jirtle, and A. J. Hartemink, “Computational and experimental “Identification of astrocytoma associated genes including cell identification of novel human imprinted genes,” Genome surface markers,” BMC Cancer, vol. 4, article 39, 2004. Research, vol. 17, no. 12, pp. 1723–1730, 2007. [26] C. D. Hough, C. A. Sherman-Baust, E. S. Pizer et al., [7] D. Lucifero, J. R. Chaillet, and J. M. Trasler, “Potential sig- “Large-scale serial analysis of gene expression reveals genes nificance of genomic imprinting defects for reproduction differentially expressed in ovarian cancer,” Cancer Research, and assisted reproductive technology,” Human Reproduction vol. 60, no. 22, pp. 6281–6287, 2000. Update, vol. 10, no. 1, pp. 3–18, 2004. [27] M. Fischer, A. Oberthuer, B. Brors et al., “Differential ex- [8]I.M.Morison,J.P.Ramsay,andH.G.Spencer,“Acensusof pression of neuronal genes defines subtypes of disseminated mammalian imprinting,” Trends in Genetics,vol.21,no.8,pp. neuroblastoma with favorable and unfavorable outcome,” 457–465, 2005. Clinical Cancer Research, vol. 12, no. 17, pp. 5118–5128, 2006. [9] G. Moore and R. Oakey, “The role of imprinted genes in [28]D.Sharon,S.Blackshaw,C.L.Cepko,andT.P.Dryja,“Profile ,” Genome Biology, vol. 12, no. 3, article 106, 2011. of the genes expressed in the human peripheral retina, macula, [10] W. Reik and J. Walter, “Genomic imprinting: parental influ- and retinal pigment epithelium determined through serial ence on the genome,” Nature Reviews Genetics, vol. 2, no. 1, analysis of gene expression (SAGE),” The Proceedings of the pp. 21–32, 2001. National Academy of Sciences of the United States of America, [11] V. E. Velculescu, L. Zhang, B. Vogelstein, and K. W. Kinzler, vol. 99, no. 1, pp. 315–320, 2002. “Serial analysis of gene expression,” Science, vol. 270, no. 5235, [29] C. B. Rickman, J. N. Ebright, Z. J. Zavodni et al., “Defining the pp. 484–487, 1995. human macula transcriptome and candidate retinal disease [12] A. Lal, A. E. Lash, S. F. Altschul et al., “A public database for genes using EyeSAGE,” Investigative Ophthalmology and Visual gene expression in human cancers,” Cancer Research, vol. 59, Science, vol. 47, no. 6, pp. 2305–2316, 2006. no. 21, pp. 5403–5407, 1999. [30] A. Zhang, D. A. Skaar, Y. Li et al., “Novel retrotransposed [13] A. E. Lash, C. M. Tolstoshev, L. Wagner et al., “SAGEmap: a imprinted locus identified at human 6p25,” Nucleic Acids public gene expression resource,” Genome Research, vol. 10, Research, vol. 39, no. 11, pp. 5388–5400, 2011. no. 7, pp. 1051–1060, 2000. [31] A. I. Diplas, L. Lambertini, M. J. Lee et al., “Differential [14] A. H. C. van Kampen, J. M. Ruijter, B. D. S. van Schaik et al., expression of imprinted genes in normal and IUGR human “Gene expression informatics and analysis,” in Bioinformatics placentas,” Epigenetics, vol. 4, no. 4, pp. 235–240, 2009. for Geneticists, M. R. Barnes and I. C. Gray, Eds., pp. 319–344, [32] A. Henckel and P. Arnaud, “Genome-wide identification of John Wiley & Sons, Chichester, UK, 2003. new imprinted genes,” Briefings in Functional Genomics and [15] E. A. Gibb, E. A. Vucic, K. S. Enfield et al., “Human cancer Proteomics, vol. 9, no. 4, Article ID elq016, pp. 304–314, 2010. long non-coding RNA transcriptomes,” PLoS ONE, vol. 6, no. [33] W. Davies, A. R. Isles, and L. S. Wilkinson, “Imprinted 10, Article ID e25915, 2011. gene expression in the brain,” Neuroscience and Biobehavioral [16] S. V. Anisimov, “Serial analysis of gene expression (SAGE): Reviews, vol. 29, no. 3, pp. 421–430, 2005. 13 years of application in research,” Current Pharmaceutical Biotechnology, vol. 9, no. 5, pp. 338–350, 2008. [34] D. Haig, “Genomic imprinting and kinship: how good is the evidence?” Annual Review of Genetics, vol. 38, pp. 553–585, [17]K.Boon,E.C.Osorio,´ S. F. Greenhut et al., “An anatomy of 2004. normal and malignant gene expression,” The Proceedings of the National Academy of Sciences of the United States of America, [35] B. W. Sun, A. C. Yang, Y. Feng et al., “Temporal and parental- vol. 99, no. 17, pp. 11287–11292, 2002. specific expression of imprinted genes in a newly derived [18] S. V. Anisimov, “A large-scale screening of the normalized Chinese human embryonic stem cell line and embryoid mammalian mitochondrial gene expression profiles,” Geneti- bodies,” Human Molecular Genetics, vol. 15, no. 1, pp. 65–75, cal Research, vol. 86, no. 2, pp. 127–138, 2005. 2006. [19]O.U.Potapova,S.V.Anisimov,M.Gorospeetal.,“Targets [36] I. Ariel, O. Lustig, T. Schneider et al., “The imprinted H19 gene of c-Jun NH2-terminal kinase 2-mediated tumor growth as a tumor marker in bladder carcinoma,” Urology, vol. 45, no. regulation revealed by serial analysis of gene expression,” 2, pp. 335–338, 1995. Cancer Research, vol. 62, no. 11, pp. 3257–3263, 2002. [37]M.J.Cooper,M.Fischer,D.Komitowskietal.,“Devel- [20] S. V. Anisimov and A. A. Sharov, “Incidence of “quasi-ditags” opmentally imprinted genes as markers for bladder tumor in catalogs generated by serial analysis of gene expression progression,” JournalofUrology, vol. 155, no. 6, pp. 2120– (SAGE),” BMC Bioinformatics, vol. 5, article 152, 2004. 2127, 1996. [21] E. Varlet-Marie, M. Audran, M. Ashenden, M. T. Sicart, and [38] J. M. Frost, D. Monk, T. Stojilkovic-Mikic et al., “Evaluation of D. Piquemal, “Modification of gene expression: help to detect allelic expression of imprinted genes in adult human blood,” doping with erythropoiesis-stimulating agents,” American PLoS ONE, vol. 5, no. 10, Article ID e13556, 2010. Journal of Hematology, vol. 84, no. 11, pp. 755–759, 2009. [39] S. V. Anisimov, K. V. Tarasov, D. Riordon, A. M. Wobus, and K. [22] K. M. Lonergan, R. Chari, R. J. DeLeeuw et al., “Identification R. Boheler, “SAGE identification of differentiation responsive of novel lung genes in bronchial epithelium by serial analysis genes in P19 embryonic cells induced to form cardiomyocytes of gene expression,” American Journal of Respiratory Cell and in vitro,” Mechanisms of Development, vol. 117, no. 1-2, pp. Molecular Biology, vol. 35, no. 6, pp. 651–661, 2006. 25–74, 2002. [23] G. J. Riggins and R. L. Strausberg, “Genome and genetic [40] J. C. Lui, G. P. Finkielstain, K. M. Barnes, and J. Baron, “An resources from the cancer genome anatomy project,” Human imprinted gene network that controls mammalian somatic Molecular Genetics, vol. 10, no. 7, pp. 663–667, 2001. growth is down-regulated during postnatal growth deceler- [24] M. Allinen, R. Beroukhim, L. Cai et al., “Molecular charac- ation in multiple organs,” American Journal of Physiology— terization of the tumor microenvironment in breast cancer,” Regulatory Integrative and Comparative Physiology, vol. 295, Cancer Cell, vol. 6, no. 1, pp. 17–32, 2004. no. 1, pp. R189–R196, 2008. Molecular Biology International 29

[41]M.Richards,S.P.Tan,J.H.Tan,W.K.Chan,andA.Bongso, “The transcriptome profile of human embryonic stem cells as defined by SAGE,” Stem Cells, vol. 22, no. 1, pp. 51–64, 2004. [42] K. P. Kim, A. Thurston, C. Mummery et al., “Gene-specific vulnerability to imprinting variability in human embryonic stem cell lines,” Genome Research, vol. 17, no. 12, pp. 1731– 1742, 2007. [43] J. M. Frost, D. Monk, D. Moschidou et al., “The effects of culture on genomic imprinting profiles in human embryonic and fetal mesenchymal stem cells,” Epigenetics,vol.6,no.1,pp. 52–62, 2011. Copyright of Molecular Biology International is the property of Hindawi Publishing Corporation and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.