<<

The Journal of

Genome-wide Analysis of by Expressed Sequence Tag Profiling

Cosmas C. Giallourakis,*,†,1 Yair Benita,*,†,‡,1 Benoit Molinie,*,† Zhifang Cao,*,†,‡ Orion Despo,*,† Henry E. Pratt,*,† Lawrence R. Zukerberg,x Mark J. Daly,{,k John D. Rioux,# and Ramnik J. Xavier*,†,‡,{,k

Profiling studies of mRNA and microRNA, particularly microarray-based studies, have been extensively used to create compendia of genes that are preferentially expressed in the immune system. In some instances, functional studies have been subsequently pursued. Recent efforts such as the Encyclopedia of DNA Elements have demonstrated the benefit of coupling RNA sequencing analysis with information from expressed sequence tags (ESTs) for transcriptomic analysis. However, the full characterization and identification of transcripts that function as modulators of immune responses remains incomplete. In this study, we demonstrate that an integrated analysis of human ESTs provides a robust platform to identify the immune transcriptome. Be- yond recovering a reference set of immune-enriched genes and providing large-scale cross-validation of previous microarray stud- ies, we discovered hundreds of novel genes preferentially expressed in the immune system, including noncoding RNAs. As a result, we have established the Immunogene database, representing an integrated ESTroad map of expression in human immune cells, which can be used to further investigate the function of coding and noncoding genes in the immune system. Using this approach, we have uncovered a unique metabolic gene signature of human macrophages and identified PRDM15 as a novel overexpressed gene in human lymphomas. Thus, we demonstrate the utility of EST profiling as a basis for further deconstruction of physiologic and pathologic immune processes. The Journal of Immunology, 2013, 190: 5578–5587.

he immune system is our central defense against bacterial in health and disease, the full spectrum of genes and pathways and viral pathogens. As such, misregulation of certain capable of modulating immune responses is not known. T genes that are critical for normal immune function can re- Analyzing the function of genes that are highly enriched or sult in immunodeficiency and autoimmune disorders. Furthermore, specific for immune cell/tissue expression may aid in capturing disorders such as diabetes, obesity, and are increasingly important regulators of immune function. It is important to note viewed as having roots in immunological dysfunction (1). De- that this premise does not require that ubiquitously expressed genes spite this recognition of the immune system as a central player cannot be critically important in immune function and develop- ment. However, considering examples such as BCR/TCR signaling *Gastrointestinal Unit, Massachusetts General Hospital and Harvard Medical School, cascade components, it is striking that genes that are expressed Boston, MA 02114; †Center for the Study of Inflammatory Bowel Disease, Massa- specifically in immune cells/tissues are often central players in chusetts General Hospital and Harvard Medical School, Boston, MA 02114; immune physiology. In addition, genes responsible for Mende- ‡Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA 02114; xDepartment of , Massachusetts General Hospital and lian immunodeficiences or autoimmunity often show enriched Harvard Medical School, Boston, MA 02114; {Analytic and Translational Genetics expression patterns in the immune system. Examples include the Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114; kBroad Institute of Massachusetts Institute of Technol- causes of Omenn syndrome (RAG1/2; Online Mendelian Inheri- ogy and Harvard University, Cambridge, MA 02142; and #Laboratory in Genetics tance in Man [OMIM] identification number 603554), hyper-IgM and Genomic Medicine of Inflammation, Universite´ de Montre´al, Montreal, QC H1T syndrome (AICDA; OMIM identification number 308230), and X- 1C8, Canada linked lymphoproliferative disorder (SH2D1A; OMIM identifica- 1 C.G. and Y.B. contributed equally to this work. tion number 308240). Received for publication December 20, 2012. Accepted for publication March 29, Historically, gene discovery in the immune system has relied on 2013. unbiased searches for genes exhibiting differential expression in This work was supported by Grants DK083756, DK086502, HL088297, and DK043351 from the National Institutes of Health (to R.J.X.). cells/tissues of the immune system as compared with other cells/ Address correspondence and reprint requests to Dr. Ramnik J. Xavier, Center for tissues. These approaches have used techniques such as differential Computational and Integrative Biology, Massachusetts General Hospital, Richard B. display, microarray profiling, serial analysis of , Simches Research Center, 185 Cambridge Street, Boston, MA 02114. E-mail ad- and, most recently, RNA sequencing (RNA-Seq) (2–11). In the dress: [email protected] case of microarray profiling, experiments often reveal a large The online version of this article contains supplemental material. number of differentially expressed genes, making cross-validation Abbreviations used in this article: BGC, germinal center ; ENCODE, Encyclo- by alternative approaches a challenge. In addition, microarrays pedia of DNA Elements; EST, expressed sequence tag; FDR, false discovery rate; GNF, Genomics Institute of the Novartis Research Foundation; GO, ; suffer from difficulty in detecting low-abundance transcripts and GWAS, genome-wide association study; KEGG, Kyoto Encyclopedia of Genes and may not contain valid probes for mRNAs, noncoding RNAs Genomes; lncRNA, long noncoding RNA; MS, multiple sclerosis; NCBI, National Center for Biotechnology Information; ncRNA, noncoding RNA; ODF3B, outer (ncRNAs), or , because their design is based on human dense fiber of sperm tails 3B; OMIM, Online Mendelian Inheritance in Man; qPCR, annotation. Recent studies have used RNA-Seq data to examine quantitative PCR; RNA-Seq, RNA sequencing; SNP, single polymorphism; expression profiles of immune cells; however, this has not dimin- TSS, start site; WIP, weighted immune percentage. ished the utility of expressed sequence tag (EST) data, as recently Copyright Ó 2013 by The American Association of Immunologists, Inc. 0022-1767/13/$16.00 demonstrated by the Encyclopedia of DNA Elements (ENCODE) www.jimmunol.org/cgi/doi/10.4049/jimmunol.1203471 The Journal of Immunology 5579 consortium’s extensive use of EST data (12–14). For example, To identify potential divergent ncRNAs emanating near the TSSs of EST data are often used to aid in deciphering expression of al- immune-enriched -coding genes, we downloaded EST data from the ternative 59 untranslated region isoforms, because RNA-Seq alone University of California Santa Cruz genome browser. Using custom PERL scripts, we analyzed the data to identify sequences that met the following is insufficient to define transcription start sites (TSSs) of genes. requirements: 1) at least one EST is spliced and transcribed on the opposite With the goal of identifying genes enriched in the immune strand of a coding gene and has a start site within 5 kb of the TSS of the system, we developed a rigorous quantitative systems-level ex- RefSeq Immunogene; 2) the spliced EST does not show evidence of splicing . pression database called Immunogene, which harnesses the ∼7 to a neighboring RefSeq gene; and 3) no open reading frame of 100 aa could be identified in the longest EST associated with the coding RefSeq, million human ESTs in the UniGene database. To date, the if there was more than one EST that met our criteria. We used the data UniGene database has not been exploited in a systematic fashion feature ‘‘intronOrientation’’ downloaded from the University of California for genome-wide immune system gene discovery, although it has Santa Cruz Genome Browser to confirm EST strand orientation. successfully been used for retinal gene discovery (15, 16). Early Gene ontology and Kyoto Encyclopedia of Genes and Genomes studies employing UniGene for analysis of the tran- pathway functional enrichment analysis scriptome in zebrafish further highlighted the value of this approach (17). Notably, the UniGene database has the benefit of not being Enrichment analyses for gene ontology (GO) biological process and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were conducted biased toward human gene annotation, representing instead a re- based on various gene list categories including the EST-defined Immu- source of sequence data coupled with tissue/cell-type annotation. nogenes with IDs (n = 2,232) and the reference set of genes (n = Using this approach, we demonstrate unique transmembrane, en- 570) documented in the literature as being preferentially expressed in the zymatic, and nutrient/ion transport network signatures in the im- immune system. These gene lists were submitted to the Database for An- mune system. We also demonstrate the utility of the Immunogene notation, Visualization and Integrated Discovery (http://david.abcc.ncifcrf. gov/) for enrichment analyses of GO biological processes and KEGG database as an aid in deciphering -related , identifying pathways, with the number of genes in each category and Bonferroni- PRDM15 as a novel overexpressed gene in human B cell lympho- corrected p values displayed (19). mas. Thus, we show that EST profiling can significantly enhance our Microarray analysis understanding of the ensemble of genes and processes important for immune function. GC robust multiarray average–normalized gene expression profiles across 79 tissues were obtained from the Genomics Institute of the Novartis Research Foundation (GNF) consortium (20). Expression profiles from the Materials and Methods Affymetrix U133A platform (Affymetrix) and GNF custom probes were Data acquisition and statistics used. In accordance with Affymetrix guidelines, probes with highest ex- pression value below log2(100) across all tissues were removed. Expres- Human UniGene database build 202 and mouse build 163 were obtained sion profiles were clustered using Cluster 3 (21) and visualized using from the National Center for Biotechnology Information (NCBI; ftp://ftp. JavaTreeView (22). Immune enrichment was calculated with the R pro- ncbi.nih.gov/repository/UniGene). These databases contain mapping of gram (version 2.5.1) using the Wilcoxon -sum test for each probe, and each EST to a UniGene cluster and mapping of UniGene clusters to genes p values were corrected using the FDR method (18). In cases in which and tissue of origin for each EST. Upon analysis of library descriptors, we multiple probes were mapped to the same gene, the most significant p found that the human database contained 469 unique tissue/cell types, of value was accepted. The following tissues in the GNF dataset were clas- which 91 were associated with the immune system, allowing each EST to sified as immune and tested versus all other tissues: bone marrow, CD19 be classified as either immune or nonimmune according to its associated B cells, tonsils, lymph nodes, , CD4 T cells, CD8 T cells, CD56 library descriptor. A normalized or weighted immune percentage (WIP) T cells, , CD33 myeloid cells, CD14 , dendritic was calculated for each UniGene cluster using the following formula: cells, fetal liver, CD105 endothelial cells, leukemia cell lines, lymphoma Normalized WIP score per UniGene cluster = (A/B)/(A/B + C/D), where A cell lines, and erythroid cells. For Fig. 5E, we used the normalized log2 is the number of ESTs from immune-related libraries in UniGene Cluster median-centered expression of PRDM15 and several coregulated genes X, B is the total number of ESTs from immune-related libraries in the obtained from published microarray expression profiling (23). entire UniGene database, C is the number of ESTs from non–immune- related libraries in UniGene Cluster X, and D is the total number of ESTs Human-mouse homolog genes from non–immune-related libraries in the entire UniGene database. A p value for immune enrichment was computed for each UniGene For each human protein, the homolog mouse protein was identified using cluster using a x2 test in R (version 2.5.1). For this test, a 2 3 2 contin- NCBI HomoloGene (24). In some cases, multiple mouse matched gency table was created with ESTs derived from immune tissues versus a single human protein and vice versa. For the purpose of immune enrich- nonimmune tissues (Table I). The p value was calculated with respect to ment, at least one mouse homolog was required to be immune enriched. the number of ESTs in all clusters that had at least 10 ESTs. Cells, plasmids, and Abs The p values for all clusters with at least 10 ESTs were corrected for multiple hypotheses testing using the false discovery rate (FDR) of Human PRDM15 (GenBank identification number BC067102) was ob- Benjamini and Hochberg (18). tained from Open Biosystems and subcloned into N-terminal and C- For Fig. 5A, we identified the U133 Plus2/U133A quantile-normalized terminal FLAG-tagged pCMV-3xFLAG vectors using 59 EcoRI and 39 relative expression level of PRDM15 in various subtypes of lymphoma cell HindIII; vectors were generated by modifying ClonTech pCMV- vector lines compared with median expression levels of PRDM15 in all lym- by introduction of appropriate tags and modification of the multiple phoma cell types in the body atlas database of NextBio. These normalized cloning site. Primary Abs used were anti-PRDM15 (Santa Cruz Biotech- relative expression levels were used to generate a box plot to show the nology; sc-83314), anti-FLAG M2 (Sigma-Aldrich; F1804), and anti- expression level of PRDM15 in different lymphoma subtypes. Non- (Sigma-Aldrich; F2066). Secondary Abs for ECL development were HRP- lymphoma cell lines from and colon cancer were also used for com- conjugated anti-mouse (Amersham Biosciences; NXA931) and anti-rabbit parison. (Amersham Biosciences; NA934). Western blotting, immunofluorescence, and immunohistochemistry Table I. ESTs derived from immune tissues versus nonimmune tissues Human Cell Line Blot II (ProSci; 1502) was obtained containing 15 mg protein from various human cell lines. Anti-PRDM15 Western blotting was Individual Entire UniGene performed under standard procedures with 1:1000 primary Ab and 1:3000 Cluster Collection secondary Ab. For immunofluorescence, HEK293T cells were transfected with Lipofectamine (Invitrogen) with PRDM15 FLAG-tagged constructs ESTs derived from X 575,059 and stained with either anti-FLAG (1:400) or anti-PRDM15 (1:200) Abs. immune tissues Immunohistochemistry on human tonsil or follicular lymphomas obtained ESTs derived from Y 5,474,516 from the Pathology Core at Massachusetts General Hospital was performed nonimmune tissues as described using anti-PRDM15 Ab (1:100) (25). 5580 IDENTIFICATION OF IMMUNOGENES BY EST PROFILING

RNA-Seq analysis richment based on RT-PCR or Northern blots. To remove selection biases as much as possible, we cataloged the expression pattern of sets of genes that Reads per kilobase of model per million mapped reads calls from an would be generally regarded as preferentially expressed in the immune system RNA-Seq atlas were downloaded from Medicalgenomics (http://medical- (e.g., cluster of differentiation Ags, genes and receptors, and TLR genomics.org/rna_seq_atlas), which encompasses ribodepleted RNA-Seq signaling components) using various Web-based resources that catalog these data from 11 different tissues (colon, , hypothalamus, kidney, liver, genes. We consider this very stringent approach necessary, because not all TLR lung, skeletal muscle, , testes, ovary, and adipose tissue). Immune genes (e.g., TLR5) or CD Ags (e.g., CD56) are enriched in the immune system. enrichment was calculated for each gene using the Wilcoxon signed-rank test. In the RNA-Seq atlas, spleen was classified as an immune tissue and RT-PCR analysis tested versus all other tissues. In this dataset, 21,725 RefSeq isoforms for mRNAs or ncRNAs were designated by NM or NR accessions, encom- First-strand cDNA from human spleen was obtained from OriGene. passing 14,713 unique gene symbols (10). Of the 26,552 UniGene clusters Reactions were performed in 25 ml volumes with 40 ng sample first-strand analyzed for their WIP score, there were RNA-Seq data for 11,449 corre- cDNA and 640 nM concentrations of both forward and reverse primers. sponding genes. Furthermore, of the 3352 Immunogene clusters identified in PCR was performed under the following conditions: initial denaturation of ° ° ° the UniGene database (see above analysis), there were RNA-Seq data for 95 C for 5 min followed by 34 cycles of 95 C for 30 s and 60 C for 30 s. 1336 corresponding genes. The s, W-, and Z-score statistics were calculated To control for specificity of PCR reactions and primer-dimers, the same for all 11,449 genes versus the 1,336 Immunogenes to test for the presence of PCR reaction was also conducted with no input cDNA. All primers were a trend toward immune enrichment based on RNA-Seq data. designed to be exon spanning. PCR products were cloned and sequenced to verify amplification of expected target long ncRNA (lncRNA). A complete Human disease single nucleotide polymorphism analysis list of UniGene clusters and associated PCR primers used to amplify spliced ncRNAs is shown in Table II. For quantitative RT-PCR experi- The genome-wide association study (GWAS) single nucleotide polymor- ments, total RNA was extracted from the indicated cell lines; the RT step phism (SNP) catalog was downloaded from http://www.genome.gov/ was performed with Superscript, and the quantitative PCR (qPCR) step gwastudies/. SNPs were considered immune disease/trait-associated if they was performed with SYBR Green (Invitrogen). associated with the following diseases/traits: atopic dermatitis, systematic erythematosus, Behcet’s disease, primary biliary cirrhosis, rheuma- toid arthritis, psoriasis, psoriatic arthritis, vitiligo, sarcoidosis, inflammatory Results bowel disease, Crohn’s disease, , primary sclerosing chol- EST profiling is a robust method to measure mRNA immune angitis, type 1 diabetes, type 1 diabetes autoantibodies, multiple sclerosis enrichment (MS), ankylosing spondylitis, juvenile idiopathic arthritis, IgE levels, IgA levels, allergic rhinitis, leprosy, IgE grass sensitization, CMV Ab response, To identify immune-enriched genes using EST datasets, we de- Graves disease, or celiac disease. These diseases/traits were associated with veloped a quantitative schema outlined in Fig. 1. We use the term 1102 unique entries in the catalog, encompassing 808 unique immune ‘‘immune-enriched’’ to indicate higher mRNA transcript levels in disease/trait-associated SNPs given overlapping identification of SNPs among diseases and studies. The remaining diseases/traits were classified immune tissues compared with nonimmune tissues. as nonimmune and were associated with 8795 entries encompassing 7355 The NCBI Human UniGene database (build 202) contained unique SNPs. We next calculated the distance from each immune disease/ 6,731,038 ESTs derived from 8,648 human cDNA libraries. These trait-associated SNP, as well as each nonimmune disease/trait-associated ESTs were clustered by sequence alignment with no genomic SNP, to the nearest TSS of a coding RefSeq. We then calculated the fre- scaffold into 171,154 clusters. Of the resulting clusters, we focused quency of immune disease/trait-associated SNPs within binned distances from EST-defined Immunogenes versus the frequency of nonimmune disease/ on the 26,552 that had at least 10 ESTs, encompassing a total of trait-associated SNPs within the same binned distances from EST-defined 6,049,575 ESTs. The UniGene tissue library descriptors were Immunogenes (Fig. 4C). We computed a p value by permutation testing, annotated and standardized, resulting in 469 tissues/cell types. Of selecting 808 SNPs at random from the non–immune disease/trait-associated these tissues/cell types, 91 (∼20%) were classified as immune tis- SNP pool 10,000 times and calculating the percentage of SNPs with a nearest EST-defined Immunogene each time. sues, with the remaining defined as nonimmune tissues. Because each EST in a particular UniGene cluster has an associated library Gold-standard immune-enriched database annotation identification number, we were able to create a binary classifica- We compiled a list of genes documented in the literature to be preferentially tion scheme, categorizing each EST as to whether it derived from expressed in the immune system. We defined evidence for immune en- immune or nonimmune tissues (Supplemental Fig. 1A).

Table II. Complete list of UniGene clusters and associated PCR primers used to amplify spliced ncRNAs

UniGene Cluster (Primer Name) Sequence Product (bp) EST Accession No. NR Accession No. Hs.662020 (ncRNA4) F: 59-GGAGGATATGGCTCAGGACA-39 346 AK097922 NR_045486.1 R: 59-TGGGCCACCAAGTTGTTTAT-39 Hs.690944 (ncRNA6) F: 59-CACCTCATTCAGTGGCTCCT-39 353 DA953527 R: 59-GGCAATGTCAGGGAAAAGAA-39 Hs.439791 (ncRNA12) F: 59-TGCCTCACCACTTGTCTCAG-39 823 CR598049 NR_024464.1 R: 59-CAGGATGACTCAGTGGAGCA-39 Hs.667175 (ncRNA16) F: 59-CCAACCCAGGAGACAGTGTT-39 275 AL832284 R: 59-CGTTCTCCCCATGTCTCTGT-39 Hs.657722 (ncRNA17) F: 59-CCTCCTGCCTCCTTGGTAAT-39 399 CD367998 NR_026544.1 R: 59-TGGTATGTTCACGGCTGAAA-39 Hs.130180 (ncRNA18) F: 59-GCCATCTTGACTCACTGCAA-39 332 DA937565 R: 59-AGGATGTGTGAGCCATGACA-39 Hs.672896 (ncRNA20) F: 59-AGAGATGGAACCAGCTCGAA-39 366 DA468624 R: 59-GATTCCCTCCAGCTCTCAAA-39 (AS-LRRK2) F: 59-CTTCAACCCAGCAAATCTCC-39 158 DA673065 R: 59-ATCAGCCCTTCCAATGTCAC-39 (LRRK2) F: 59-TGACAGCACAGCTAGGAAGC-39 128 NM_198578.3 R: 59-TGGAAGATTGATGTCCCAAA-39 (PRDM15) F: 59-CGCAGTACAGAGCATTCAGC-39 186 NM_022115.3 R: 59-AGCTGTAACTGGCGTTCAGG-39 F, Forward; R, reverse. The Journal of Immunology 5581

our methods (Fig. 2C). Using a conservative WIP score threshold of 0.15 and p , 0.05, we identified a total of 2834 (10.6%) clusters matching 2232 unique NCBI Gene LocusLink identification numbers as being immune enriched. We termed these 2232 genes ‘‘Immuno- genes.’’ Although we focus on the results of human immune profiling in this report, application of our methodology to the mouse UniGene database ultimately identified 3108 unique mouse genes as immune enriched (Supplemental Fig. 1B, 1C, Supplemental Table I). Microarray, GO, KEGG, and RNA-Seq signatures of immune-enriched genes We next assessed the strength of EST profiling against microarray profiling to provide: 1) an independent method of cross-validation of each platform; and 2) increased cellular resolution within the immune system of our EST-derived dataset. We quantified immune enrichment by calculating a Wilcoxon rank-sum test and adjusted for multiple hypothesis testing (FDR corrected p , 0.05) using the publicly available BioGPS atlas (9). This atlas is a broad com- FIGURE 1. Schematic of the human EST-based profiling analysis pendium of human tissues, with 22 of 79 profiled tissues cate- pipeline. Starting with the NCBI UniGene database build 202, 26,552 gorized as immune tissues. Of the 2232 Immunogenes, 1927 had clusters with at least 10 ESTs were profiled for their expression in immune representative probes on the microarray. Strikingly, 1522 out of tissues. Based on the genome-wide distribution of WIP scores, a p value , 1927 (79%) of the genes found to be immune enriched based on was calculated for each gene. A significant p value ( 0.05) indicated that . there were enough ESTs to conclude that a particular WIP score was EST profiling WIP score (WIP 0.15) also exhibited significantly , higher than average. Clusters identified as immune enriched were cross- higher immune expression by microarray (p 0.05; FDR micro- validated using a microarray data platform and GO/KEGG annotation with array Wilcoxon test) (Fig. 3A). Hierarchical clustering and heat map subsequent experimental follow-up. representation of the 1927 EST Immunogene profiles separated by immune versus nonimmune tissue are shown in Fig. 3B, revealing enrichment of EST-derived Immunogenes in the BioGPS compendia Next, we introduced a scoring system to quantify the level of and providing further cross-validation of our methodology. immune expression for each of the 26,552 clusters. Specifically, We also reasoned that if expression is an indicator of potential we calculated a WIP score for each Unigene cluster which reflects function, then enrichment would also be found in GO biological the normalized percent of ESTs derived from immune tissues process categories and KEGG pathways relevant to the immune within each cluster. To assess whether the WIP score for each system, which was indeed the case (Fig. 3C, Supplemental Fig. UniGene cluster was significantly enriched in immune expression, 1D). In addition, analysis of publicly available RNA-Seq data we applied a x2 statistic and corrected for multiple hypothesis comparing spleen to 10 other nonimmune tissues revealed that the testing using the FDR method (see Materials and Methods and list of Immunogenes defined by EST profiling showed a signifi- Table I). The distribution of the WIP score (mean 0.095) and p cantly higher tendency to be expressed in spleen versus nonimmune values for all 26,552 UniGene clusters are shown in Fig. 2A and tissues (Z = 7.03 for all genes, Z = 151.42 for Immunogenes, Supplemental Table I. Wilcoxon signed-rank test) (Fig. 3D). KIAA0226L (WIP = 0.37; p = To evaluate whether our EST immune-based expression profiling 1.65 3 10238) is an example of a novel immune-enriched gene was sufficiently sensitive and specific to discover immune-related identified by EST profiling. Interestingly, KIAA0226L has been genes, we compiled a reference set of genes (n = 570) documented implicated via proteomics to be involved in autophagy (26), and in the literature as being preferentially expressed in the immune RNA-Seq has demonstrated expression of KIAA0226L in human system as defined by RT-PCR or Northern blot analysis (Supple- spleen and mouse C19+ B cells, but not in mouse CD4+ T cells (3, mental Table I). To cross-validate our literature annotation for this 10). As the ability of EST analysis to spotlight immune-enriched immune-enriched gold-standard set of genes, we performed GO genes was confirmed by multiple analyses, we designed a browseable biological processes, KEGG pathway, and microarray enrichment Web-based portal of Immunogenes accessible at http://xavierlab2. analyses on this gene set. GO enrichment analysis revealed en- mgh.harvard.edu/ESTProfiler/index.html. richment in categories such as immune system process, leukocyte activation, activation, and inflammatory response (Supple- EST profiling reveals immune-enriched transmembrane, mental Fig. 2A). All KEGG pathways identified as being signif- metabolic, and enzymatic signatures icantly overrepresented were immune pathways (Supplemental To fully characterize the Immunogene dataset, we took advantage Fig. 2B). Microarray expression analysis of our compiled immune- of structure-function relationships. For this approach, we analyzed enriched reference set showed that these genes are preferentially each Immunogene systematically by: 1) determining the presence expressed in human immune tissues, thus providing a high- of potential protein domains using Simple Modular Architecture confidence set of immune-enriched genes against which we could Research Tool and PFAM database criteria; 2) checking for ho- benchmark our EST profiling results (Supplemental Fig. 2C, 2D). mology to any known or predicted proteins in multiple species We identified 3352 UniGene clusters with a significant p value using BLASTP alignments; and 3) identifying predicted or known (p , 0.05) and a WIP score greater than the average (0.095). transmembrane regions using the TMHMM prediction algorithm. Importantly, the average WIP scores for the gold-standard genes Structural annotation/prediction revealed that 20% of Immuno- were significantly higher, with a mean score of 0.44 and an SD of genes (440 of 2232) have known or predicted enzymatic functions 0.235, suggesting that our EST profiling methodology was robust using an enzymology classification system or modeling. (Fig. 2B). In addition, the vast majority of our gold-standard genes These observations led us to question whether such a large fraction had significant p values (p , 0.05), indicating the specificity of of in the Immunogene gene set was a reflection of the 5582 IDENTIFICATION OF IMMUNOGENES BY EST PROFILING

unique metabolic pathways and needs of immune cells. We used a similar number of UniGenes (460 of 2611) that were significantly underrepresented in immune tissues (WIP , 0.05; p , 0.05) as a control set of enzymes. Using two independent expression maps, BioGPS (p = 1.4 3 10215) and NCBI serial analysis of gene expression tags (p =1.23 10220), we observed that our EST im- mune inventory was most enriched in monocytes/dendritic cells compared with other cell types. In contrast, the control set of enzymes was enriched in the nervous system (p =13 10213). These results strongly support our conclusion that EST profiling has revealed a genome-wide compendium of enzymes that are likely to be particularly relevant to immune system function and devel- opment. For example, one novel potential enzyme identified as being immune-enriched was ARL11 (WIP = 0.3; p =0.028),whichis predicted to an ADP-ribosylation factor. In other instances, we identified enzymes that have clear en- zymatic function, but which have only been evaluated to a limited degree in immune system function. One such enzyme is fructose 1,6-diphosphatase, which is known to be critical in gluconeogenesis in the liver, but was revealed by EST profiling to be enriched in macrophages. Fructose 1,6-diphosphatase was previously found to be a vitamin D–responsive gene in monocytes (27). Thus, our data highlight how EST profiling can also help focus attention on po- tential intriguing links between and immune function (27, 28). In addition to the enzymatic footprints of immune cells, our structure-function analysis demonstrated highly significant enrich- ment of genes encoding transmembrane proteins (n = 215; p =73 10210 ‘‘integral to membrane’’ GO annotation) with an even greater number of genes (n = 285) identified by transmembrane prediction algorithms. This result correlated with our finding that the largest PFAM and InterPro classes of protein domains in the Immunogene set were Ig domains (n = 87; p = 3.9 3 10212). These transmembrane proteins included cytokine receptors, ion channels such the ORAI2, as well as novel trans- membrane proteins (n = 69) and predicted secreted proteins (n = 22). One example of a transmembrane protein that we identified as being immune enriched is the nucleotide transporter SLC29A3, which causes H syndrome, Faisalabad histiocytosis, and Rosai- Dorfman disease (OMIM identification number 602782) when mutated. Patients with H syndrome have overlapping features of hemophagocytic syndromes, suggesting that functional character- ization of SCL29A3 is likely to yield insights into immune func- tion (29). The strong representation of known or predicted nutrient and ion transporters is also consistent with the unique metabolic needs of the immune system. Our results provide a guide to further dissect the links between metabolism and immune function. An in- tegrated portrait of immune cell transporters and enzymes generated using Immunogene is shown in Fig. 4A. Discovery and verification of lncRNAs using EST profiling Given that our EST profiling is agnostic to the type of transcripts it identifies, we also identified several immune-enriched microRNAs, such as miR-155 (BIC) (p = 2.33 3 10214). Further inspection of the human EST dataset revealed that 30% (n = 388) of the UniGene clusters with WIP . 0.20 and p , 0.05 were not annotated as FIGURE 2. Quantitative EST-based immune expression enrichment analysis utilizing 26,552 UniGene clusters. (A) The frequency distribution of the WIP score (mean 0.095) and p values (x2 metric with FDR cor- rection) for 26,552 UniGene clusters with $10 ESTs is shown. Red indi- (gray; mean 0.095) versus the WIP scores of the 570 known immune- cates clusters/genes significantly enriched in immune expression (WIP . enriched reference gene sets (red; mean 0.44). (C) The distribution of WIP 0.095; p , 0.05), whereas blue indicates clusters/genes significantly under- scores and p values for the 570 known immune-enriched reference gene expressed in the immune system (WIP , 0.05 and p , 0.05). (B)The sets showing that 85% (red) of this gold-standard gene set have significant frequency distribution (density) of WIP scores for 26,552 UniGene clusters WIP and p values. The Journal of Immunology 5583

FIGURE 3. Cross-validation of EST profiling methodology with microarray profiling, GO pathway analysis, and epigenetic signatures. (A) Of the 1927 genes found to be significantly enriched by EST profiling, 1522 genes (79%) exhibited significantly higher expression by Wilcoxon rank-sum test and FDR (p , 0.05) in the immune system based on analysis of the GNF BioGPS microarray data tissue compendium. (B) Microarray profile clustering in GNF tissue/cell compendium of 1927 immune-enriched genes defined by EST profiling segregating immune versus nonimmune tissue types. (C) Examples of GO biological process terms statistically enriched (p , 0.05, Bonferroni corrected) for all EST Immunogenes. (D) The Wilcoxon signed-rank (y-axis) for 1336 Immunogenes compared with their WIP score (x-axis) showing that most Immunogenes show a positively signed rank. hypothetical or validated reference protein-coding genes at NCBI. asked how many protein-coding Immunogenes harbored a diver- Manual curation of these clusters revealed that a large fraction was gently transcribed lncRNA based on the presence of a divergent likely composed of genomic contaminants, alternative , or spliced EST. For this analysis, we relaxed our stringency require- untranslated regions of known protein-coding genes (see Materi- ments such that a single EST from any tissue was sufficient to be als and Methods). However, 46 UniGene clusters demonstrated counted, because most lncRNAs are transcribed at significantly evidence of splicing and did not show any significant similarity to lower levels than mRNAs. Interestingly, we identified 188 Immuno- known protein-coding genes using BLASTX, suggesting that these genes with divergently transcribed lncRNAs, including multiple clusters likely represent lncRNAs (.200 bp). Using RT-PCR, we susceptibility genes for immune disorders such as LRRK2, CYLD, confirmed expression of 10 of 17 (59%) randomly selected clusters and MFHAS1 (Supplemental Fig. 3C). We had previously shown in human spleen (see Supplemental Fig. 3A, 3B, and Table II for that LRRK2 is a IFN-g–responsive gene. Strikingly, we show that examples). One such human-specific lncRNA identified as an the antisense LRRK2 lncRNA, which we term antisense (AS)- Immunogene was AK056817 (WIP = 0.34; p = 7.0 3 1029). Al- LRRK2, is coordinately upregulated with LRRK2 upon IFN-g though the functions of the ncRNAs are largely unknown, analysis stimulation (Fig. 4B). Thus, EST profiling can also be used to reveal of ENCODE ChIP-Seq data in leukemic K562 cells revealed IFN- novel lncRNAs responsive to specific immune-signaling pathways inducible binding of STAT1 to the region of AK056817, (30, 32, 33). suggesting that this lncRNA may function in IFN response path- ways (30). Thus, we suggest that integrating additional genome- Immunogenes are enriched in primary immunodeficiency genes wide datasets can be used to refine potential functional pathways and in genomic regions associated with immune diseases/traits of Immunogenes. in GWAS studies AK056817 represents a likely lncRNA that is transcribed in an We found that the Immunogene compendium is significantly intergenic region away from other known genes. Yet many coding enriched in the KEGG category ‘‘primary immunodeficiency’’ genes have recently been found to exist as pairs with lncRNAs (hsa05340; n = 28 of 34; p = 2.03 3 10214). This KEGG pathway that are divergently transcribed on the opposite strand with a TSS does not include other primary immunodeficiency genes that we within a few kilobases of the known mRNA TSS (31). We therefore identified as Immunogenes such as CARD9, CLEC7A, IL17RA 5584 IDENTIFICATION OF IMMUNOGENES BY EST PROFILING

FIGURE 4. Immune cell transporters and enzymes identified using Immunogene, lncRNA coregulated with LRRK2, and proximity of immune disease/ trait GWAS SNPs to Immunogenes. (A) Graphical representation of known or predicted locations and functions of transmembrane proteins and/or genes related to metabolism or nutrient/ion transport in the Immunogene dataset. Green circles indicate transmembrane proteins or transporters associated with complex or Mendelian diseases; red circles indicate transporters with known function; black circles indicate transmembrane proteins with predicted lo- calization and no known function. (B) A divergently transcribed lncRNA is coregulated with LRRK2. Top panel, Schematic of exon structure of LRRK2 and the divergent AS-LRRK2 transcript(s), with location of SNP rs11175593 associated with Crohn’s disease (37). Bottom panel, The THP-1 monocytic cell line was stimulated with IFN-g for the indicated time. RT-qPCR of LRRK2 and AS-LRRK2 transcript levels is shown normalized to S18 transcript and relative to expression levels of LRRK2 in resting THP-1 cells. Data shown are representative of one of two biological replicates with errors bars repre- senting SD of triplicate technical replicates. (C) The distance (2-kb bins) between SNPs associated by GWAS with immune diseases/traits (red) and the nearest Immunogene is shown in comparison with the distance between SNPs not associated with immune diseases/traits (blue) and the nearest Immu- nogene.

(familial candidiasis 2, 4, 5; OMIM identification numbers 212050, (34) (Supplemental Table I). These results demonstrate the ability 613108, and 613953), CXCR4 (WHIM syndrome; OMIM identifi- of our approach to highlight genes responsible for monogenic cation number 193670), (hyper-IgE syndrome; OMIM immune-related human diseases. identification number 243700), WAS (Wiskott-Aldrich syndrome; In terms of complex traits, GWAS have identified thousands of OMIM identification number 301000), and SH2D1A (X-linked genetic loci related to disease phenotypes or traits, although lymphoproliferative disease; OMIM identification number 308240) identification of the exact causative polymorphism and mechanism The Journal of Immunology 5585

(s) of action including relevant target gene(s) remains a challenge. confirmed that PRDM15 was not only overexpressed in follicular We postulated that EST-defined Immunogenes may be significantly lymphoma cell lines, but also in primary follicular lymphomas more enriched at loci implicated in immune-related diseases/traits compared with primary normal B cell populations (Fig. 5E). Strik- (e.g., inflammatory bowel disease) given their expression in ingly, immunohistochemistry analyses also revealed that PRDM15 disease/trait-relevant tissue/cells (see Materials and Methods for was overexpressed in five of seven independent human follicular details of immune-related diseases/traits analyzed). To test this lymphoma samples examined, as compared with normal tonsillar possibility, we plotted the frequency of SNPs associated with hu- germinal centers (Fig. 5F). Thus, our results demonstrate how man immune diseases/traits relative to the distance to the nearest initial insights gained by EST profiling can be used to identify Immunogene. The distance distribution of SNPs associated with a novel overexpressed in human B cell lym- nonimmune-related diseases/traits to the nearest Immunogene was phomas. used as a comparison. As shown in Fig. 4C, there is a higher fre- quency of SNPs associated with immune-related diseases/traits Discussion near Immunogenes, as compared with SNPs not associated with In this study, we present the application of a quantitative genome- immune-related SNPs. Strikingly, we found that 254 of 808 wide analysis of the human/mouse dbEST database composed of (31.44%) immune-related disease/trait-associated SNPs had their .6 million data points to elucidate a collection of genes enriched closest RefSeq coding gene as an EST-defined Immunogene versus in immune expression, which we term immunogenes. Using this 783 of 7355 (10.65%) nonimmune disease/trait-associated SNPs database, we were able to identify an immune transmembrane (p , 1024 based on permutation testing). and enzymatic signature. Furthermore, we implicated a novel A recent GWAS study identified the tagging SNP rs140522 transcription factor in B cell oncogenesis. Our results provide an as associated with MS. In this study, the authors cited the nearby important complement to other studies aimed at understanding the gene encoding cytochrome oxidase deficient homolog 2 (SCO2)as immune system including the RefDIC, ImmGen, and Immune the MS candidate gene within the locus (35). However, the nearest Response In Silico databases, all of which largely used microarray gene to rs140522 is not SCO2, but rather outer dense fiber of profiling (2–5). sperm tails 3B (ODF3B), which previously has not been impli- Having a genome-wide set of genes with preferential immune cated in immune function. Our approach identified ODF3B as expression, we undertook a systems-level analysis of the Immu- novel Immunogene (Hs.531314; WIP = 0.229; p = 0.005). Thus, nogene compendium, finding a unique enzymatic signature of ODF3B is an alternative candidate gene for further testing in the immune cells. The study of the large set of known or predicted pathophysiology of MS, illustrating the utility of integrating our enzymes, transporters, and corresponding metabolites is likely to EST expression profiling results with GWAS studies. yield critical insights into the metabolism of immunity. Further- more, the Immunogene database provides an enhanced view of the Identification of PRDM15 as a novel B cell gene overexpressed potential spectrum of potential transmembrane cell-surface markers in human B cell lymphomas expressed on various immune cells. These novel transmembrane Im- We identified PRDM15, a finger and SET domain-encoding munogenes represent potential candidates for targeted therapeutics gene, as one of the most highly immune-enriched (WIP = 0.71; or cell purification over the long-term via the development of Abs p = 5.1 3 10219) genes with little known function(s). We chose to directed against these Ags. We demonstrated how the Immunogene focus on PRDM15 given our interest in B cell lymphoma patho- approach can be used to identify genes relevant to the pathogenesis genesis coupled with our observation that a high fraction of the of lymphomas. Along these lines, we dissected the expression pat- PRDM15-derived ESTs were from either normal germinal center tern of PRDM15, finding it to be overexpressed at the level of mRNA B cells (BGC) or from B cell lymphoma cell line cDNA libraries and protein in human follicular lymphomas. Thus, delineation of (36). BGC are thought to be the normal counterpart of some hu- PRMD15 function may prove to be important in understanding BGC man B cell malignancies, including , Burkitt physiology and B cell malignancies. lymphoma, and diffuse large B cell lymphoma. To refine and The identification of genes involved in both Mendelian and further corroborate our EST data, we therefore analyzed the - complex traits encompassed within the dataset support the utility ative expression pattern of PRDM15 among various types of B cell of the Immunogene database as an aid in genetic studies. In this lymphomas and non–B cell cancer-derived cell lines using pub- regard, we demonstrate how the Immunogene dataset can be used lished microarray data (23). As shown in Fig. 5A, PRDM15 showed to spotlight candidate genes in which GWASstudies, genetic linkage the highest expression in cell lines derived from follicular lym- studies, quantitative trait locus analysis, or mouse mutagenesis phomas (n = 9) and the second highest expression level in Burkitt screens have identified genomic intervals containing numerous lymphomas (n = 17), compared with germinal B cell– and activated genes. In addition, the Immunogene database can be harnessed to B cell–type non-Hodgkin diffuse large B cell lymphoma (n = 11), probe the variation of immune system responses through focused Hodgkin lymphomas (n = 9), lung cancers (n =99),andcolon analysis of SNPs and copy number variations of the identified cancers (n = 52). These results supported the observation that Immunogenes. The analysis of genetic variation in the Immunogene PRDM15 is specifically overexpressed in human B cell lympho- dataset, through methods such as coding variant SNP analysis or mas. These data are consistent with RT-qPCR data showing that the resequencing, is poised to yield a pool of human genes at the cor- Burkitt lymphoma cell line, Daudi, has 8–35 times more PRDM15 nerstone of variation in human immune responses, including sus- mRNA expression than several other cell lines/types (Fig. 5B). To ceptibility to autoimmune diseases such as lupus, inflammatory examine PRDM15 expression at the protein level, we confirmed the bowel disease, and diabetes. In addition, we demonstrated how EST nuclear localization of PRDM15 in HEK293T cells (Fig. 5C) and data can be used to identify lncRNAs in autoimmune disease loci, used Western blot analysis to demonstrate that PRDM15 protein is with the identification of IFN-inducible AS-LRRK2 lncRNA in the expressed in multiple Burkitt lymphomas and some monocytic Crohn’s disease LRRK2 locus. Our data suggest that Crohn’s leukemias, but not in T cell leukemias, suggesting that PRDM15 is disease-associated SNPs such as rs11175593, which lies in an in- expressed at higher levels in Burkitt B cell lymphomas not only at tron of the AS-LRRK2 lncRNA, may modulate the function and/or the mRNA level, but also at the protein level (Fig. 5D). In terms expression of not only LRRK2, but also its coregulated partner AS- of follicular lymphoma expression, additional microarray analysis LRRK2. Furthermore, extension of the Immunogene paradigm to 5586 IDENTIFICATION OF IMMUNOGENES BY EST PROFILING

FIGURE 5. The Immunogene PRDM15 is overexpressed in human lymphomas. (A) Box plot of normalized microarray expression levels of PRDM15 across 744 cell lines from various cancer types with PRDM15 levels shown for B cell lymphoma subtypes, lung cancer, and colon cancer. (B) PRDM15 mRNA is overexpressed in Burkitt lymphoma. RT-qPCR of PRDM15 transcript levels in EBV-transformed B cell lymphoblastoid GM06990 and GM12878 cell lines, embryonic kidney HEK293T cells, Burkitt lymphoma Daudi cells, and monocytic THP-1 cells. Levels were normalized to S18 transcript and expressed relative to PRDM15 levels in GM06990 cells. (C) PRDM15 localizes to the nucleus. HEK293T cells were mock transfected or transfected with N-terminal FLAG-tagged PRDM15 and stained with anti-PRDM15 Ab and DAPI (original magnification 340). (D) Western blot analysis of PRDM15 in multiple human cell lines including Daudi and Raji cells (Burkitt lymphomas), K562 cells (myelogenous leukemia, BCR-ABL positive), U-937 cells (histiocytic lymphoma, negative IgH, negative EBV, monocytic), THP-1 cells (acute monocytic leukemia), HL-60 cells (acute promyelocytic leukemia), Jurkat cells (acute T cell leukemia), and MOLT4 cells (acute T-lymphoblastic leukemia). * indicates a likely nonspecific band as it does not correspondto any PRDM15 predicted isoform weight. (E) PRDM15 mRNA is overexpressed in primary follicular B cell lymphomas. Normalized log2 median-centered expression of PRDM15 in human follicular lymphomas (n = 7) versus normal immune cell populations including peripheral blood B cells (PBC B cell; n = 5), peripheral blood CD4+ T cells (PBC T cell; n = 5), germinal center B cells (GBC; n = 1), memory B lymphocytes (MBC; n = 1), umbilical cord B lymphocytes (UBC B cell; n = 1), and umbilical cord T lymphocytes (UBC T cell; n = 1). (F) PRDM15 protein is expressed in normal germinal centers and overexpressed in primary human follicular lymphomas. Shown are the results comparing PRDM15 levels in normal human tonsils and two of seven representative independent primary human follicular lymphomas samples that were stained with anti-PRDM15 Ab by immunohistochemistry. other tissues may aid in deciphering human genes likely to be eses to study known and novel genes in a variety of immune path- uniquely associated with other systems, such as the nervous system. ways. We anticipate that by starting with the Immunogene dataset as a robust input, accompanied by multiple new layers of annotation, Acknowledgments this resource will provide a dynamic platform to generate hypoth- We thank Natalia Nedelsky for substantive editorial assistance. The Journal of Immunology 5587

Disclosures 19. Dennis, G., Jr., B. T. Sherman, D. A. Hosack, J. Yang, W. Gao, H. C. Lane, and R. A. Lempicki. 2003. DAVID: Database for Annotation, Visualization, and The authors have no financial conflicts of interest. Integrated Discovery. Genome Biol. 4: 3. 20. Su, A. I., M. P. Cooke, K. A. Ching, Y. Hakak, J. R. Walker, T. Wiltshire, A. P. Orth, R. G. Vega, L. M. Sapinoso, A. Moqrich, et al. 2002. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. USA 99: References 4465–4470. 1. Osborn, O., and J. M. Olefsky. 2012. The cellular and signaling networks linking 21. de Hoon, M. J., S. Imoto, J. Nolan, and S. Miyano. 2004. Open source clustering the immune system and metabolism in disease. Nat. Med. 18: 363–374. software. 20: 1453–1454. 2. Hijikata, A., H. Kitamura, Y. Kimura, R. Yokoyama, Y. Aiba, Y. Bao, S. Fujita, 22. Saldanha, A. J. 2004. Java Treeview—extensible visualization of microarray K. Hase, S. Hori, Y. Ishii, et al. 2007. Construction of an open-access database data. Bioinformatics 20: 3246–3248. that integrates cross-reference information from the transcriptome and proteome 23. Rosenwald, A., A. A. Alizadeh, G. Widhopf, R. Simon, R. E. Davis, X. Yu, of immune cells. Bioinformatics 23: 2934–2941. L. Yang, O. K. Pickeral, L. Z. Rassenti, J. Powell, et al. 2001. Relation of gene 3. Heng, T. S., and M. W. Painter; Immunological Genome Project Consortium. expression phenotype to immunoglobulin mutation genotype in B cell chronic 2008. The Immunological Genome Project: networks of gene expression in lymphocytic leukemia. J. Exp. Med. 194: 1639–1647. immune cells. Nat. Immunol. 9: 1091–1094. 24. Wheeler, D. L., T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, 4. Abbas, A. R., D. Baldwin, Y. Ma, W. Ouyang, A. Gurney, F. Martin, S. Fong, D. M. Church, M. DiCuccio, R. Edgar, S. Federhen, et al. 2007. Database M. van Lookeren Campagne, P. Godowski, P. M. Williams, et al. 2005. Immune resources of the National Center for Biotechnology Information. Nucleic Acids response in silico (IRIS): immune-specific genes identified from a compendium Res. 35(Database issue): D5–D12. of microarray expression data. Genes Immun. 6: 319–331. 25. Wu, C. L., S. D. Kirley, H. Xiao, Y. Chuang, D. C. Chung, and L. R. Zukerberg. 5. Hyatt, G., R. Melamed, R. Park, R. Seguritan, C. Laplace, L. Poirot, S. Zucchelli, 2001. Cables enhances cdk2 15 by Wee1, inhibits cell R. Obst, M. Matos, E. Venanzi, et al. 2006. Gene expression microarrays: growth, and is lost in many human colon and squamous cancers. Cancer Res. 61: glimpses of the immunological genome. Nat. Immunol. 7: 686–691. 7325–7332. 6. Hoffman, B. G., K. L. Williams, A. H. Tien, V. Lu, T. R. de Algara, J. P. Ting, 26. Behrends, C., M. E. Sowa, S. P. Gygi, and J. W. Harper. 2010. Network orga- and C. D. Helgason. 2006. Identification of novel genes and transcription factors nization of the human autophagy system. Nature 466: 68–76. involved in spleen, thymus and immunological development and function. Genes 27. Solomon, D. H., M. C. Raynal, G. A. Tejwani, and Y. E. Cayre. 1988. Activation Immun. 7: 101–112. of the fructose 1,6-bisphosphatase gene by 1,25-dihydroxyvitamin D3 during 7. Hashimoto, S. I., T. Suzuki, S. Nagai, T. Yamashita, N. Toyoda, and K. Matsushima. monocytic differentiation. Proc. Natl. Acad. Sci. USA 85: 6904–6908. 2000. Identification of genes specifically expressed in human activated and mature 28. Muindi, J. R., Y. Peng, J. W. Wilson, C. S. Johnson, R. A. Branch, and dendritic cells through serial analysis of gene expression. Blood 96: 2206–2214. D. L. Trump. 2007. fructose 1,6-bisphosphatase and cytidine deami- 8. Castle, J. C., C. D. Armour, M. Lo¨wer, D. Haynor, M. Biery, H. Bouzek, nase enzyme activities: potential pharmacodynamic measures of calcitriol effects R. Chen, S. Jackson, J. M. Johnson, C. A. Rohl, and C. K. Raymond. 2010. in cancer patients. Cancer Chemother. Pharmacol. 59: 97–104. Digital genome-wide ncRNA expression, including SnoRNAs, across 11 human 29. Morgan, N. V., M. R. Morris, H. Cangul, D. Gleeson, A. Straatman-Iwanowska, tissues using polyA-neutral amplification. PLoS ONE 5: e11779. N. Davies, S. Keenan, S. Pasha, F. Rahman, D. Gentle, et al. 2010. Mutations in 9. Wu, C., C. Orozco, J. Boyer, M. Leglise, J. Goodale, S. Batalov, C. L. Hodge, SLC29A3, encoding an equilibrative nucleoside transporter ENT3, cause a fa- J. Haase, J. Janes, J. W. Huss, III, and A. I. Su. 2009. BioGPS: an extensible and milial histiocytosis syndrome (Faisalabad histiocytosis) and familial Rosai- customizable portal for querying and organizing gene annotation resources. Dorfman disease. PLoS Genet. 6: e1000833. Genome Biol. 10: R130. 30. Rozowsky, J., G. Euskirchen, R. K. Auerbach, Z. D. Zhang, T. Gibson 10. Krupp, M., J. U. Marquardt, U. Sahin, P. R. Galle, J. Castle, and A. Teufel. 2012. R. Bjornson, N. Carriero, M. Snyder, and M. B. Gerstein. 2009. PeakSeq enables RNA-Seq Atlas—a reference database for gene expression profiling in normal systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. tissue by next-generation sequencing. Bioinformatics 28: 1184–1185. 27: 66–75. 11. Gu, J., T. He, Y. Pei, F. Li, X. Wang, J. Zhang, X. Zhang, and Y. Li. 2006. 31. Hung, T., Y. Wang, M. F. Lin, A. K. Koegel, Y. Kotake, G. D. Grant, Primary transcripts and expressions of intergenic microRNAs detected H. M. Horlings, N. Shah, C. Umbricht, P. Wang, et al. 2011. Extensive and by mapping ESTs to their flanking sequences. Mamm. Genome 17: 1033–1041. coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat. 12. Dunham, I., A. Kundaje, S. F. Aldred, P. J. Collins, C. A. Davis, F. Doyle, Genet. 43: 621–629. C. B. Epstein, S. Frietze, J. Harrow, R. Kaul, et al; ENCODE Project Consor- 32. Rinn, J. L., and H. Y. Chang. 2012. Genome regulation by long noncoding tium. 2012. An integrated encyclopedia of DNA elements in the . RNAs. Annu. Rev. Biochem. 81: 145–166. Nature 489: 57–74. 33. Cabili, M. N., C. Trapnell, L. Goff, M. Koziol, B. Tazon-Vega, A. Regev, and 13. Gerstein, M. B., A. Kundaje, M. Hariharan, S. G. Landt, K. K. Yan, C. Cheng, J. L. Rinn. 2011. Integrative annotation of human large intergenic noncoding X. J. Mu, E. Khurana, J. Rozowsky, R. Alexander, et al. 2012. Architecture of RNAs reveals global properties and specific subclasses. Genes Dev. 25: 1915– the human regulatory network derived from ENCODE data. Nature 489: 91– 1927. 100. 34. McKusick, V. A. 2007. Mendelian Inheritance in Man and its online version, 14. Thurman, R. E., E. Rynes, R. Humbert, J. Vierstra, M. T. Maurano, E. Haugen, OMIM. Am. J. Hum. Genet. 80: 588–604. N. C. Sheffield, A. B. Stergachis, H. Wang, B. Vernot, et al. 2012. The accessible 35. International Multiple Sclerosis Genetics Consortium, Wellcome Trust Case landscape of the human genome. Nature 489: 75–82. Control Consortium 2, S. Sawcer, G. Hellenthal, M. Pirinen, C. C. Spencer, 15. Katsanis, N., K. C. Worley, G. Gonzalez, S. J. Ansley, and J. R. Lupski. 2002. A N. A. Patsopoulos, L. Moutsianas, A. Dilthey, Z. Su, et al. 2011. Genetic risk and computational/functional genomics approach for the enrichment of the retinal a primary role for cell-mediated immune mechanisms in multiple sclerosis. transcriptome and the identification of positional candidate retinopathy genes. Nature 476: 214–219. Proc. Natl. Acad. Sci. USA 99: 14326–14331. 36. Chiarle, R., Y. Zhang, R. L. Frock, S. M. Lewis, B. Molinie, Y. J. Ho, 16. Liang, S., S. Zhao, X. Mu, T. Thomas, and W. H. Klein. 2004. Novel retinal D. R. Myers, V. W. Choi, M. Compagno, D. J. Malkin, et al. 2011. Genome-wide genes discovered by mining the mouse embryonic RetinalExpress database. Mol. translocation sequencing reveals mechanisms of breaks and rear- Vis. 10: 773–786. rangements in B cells. Cell 147: 107–119. 17. Song, H. D., X. J. Sun, M. Deng, G. W. Zhang, Y. Zhou, X. Y. Wu, Y. Sheng, 37. Barrett, J. C., S. Hansoul, D. L. Nicolae, J. H. Cho, R. H. Duerr, J. D. Rioux, Y. Chen, Z. Ruan, C. L. Jiang, et al. 2004. Hematopoietic gene expression profile S. R. Brant, M. S. Silverberg, K. D. Taylor, M. M. Barmada, et al; NIDDK IBD in zebrafish kidney marrow. Proc. Natl. Acad. Sci. USA 101: 16240–16245. Genetics Consortium; Belgian-French IBD Consortium; Wellcome Trust Case 18. Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: Control Consortium. 2008. Genome-wide association defines more than 30 a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57: 289–300. distinct susceptibility loci for Crohn’s disease. Nat. Genet. 40: 955–962.