View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

FEBS 29359 FEBS Letters 579 (2005) 1795–1801

Minireview Study of stem cell function using microarray experiments

Carolina Perez-Iratxeta, Gareth Palidwor, Christopher J. Porter, Neal A. Sanche, Matthew R. Huska, Brian P. Suomela, Enrique M. Muro, Paul M. Krzyzanowski, Evan Hughes, Pearl A. Campbell, Michael A. Rudnicki, Miguel A. Andrade* Ontario Genomics Innovation Centre, Ottawa Health Research Institute, Molecular Medicine Program, 501 Smyth Road, Ottawa, Canada K1H 8L6 Accepted 10 February 2005

Available online 21 February 2005

Edited by Robert Russell and Giulio Superti-Furga

If we understand the variety of cells of an organism as dif- Abstract DNA Microarrays are used to simultaneously mea- sure the levels of thousands of mRNAs in a sample. We illustrate ferent expressions of the same genome, it is clear that a collec- here that a collection of such measurements in different cell types tion of cell expression snapshots can give us information and states is a sound source of functional predictions, provided about that cell and of the relations between its genomeÕs the microarray experiments are analogous and the cell samples . For example, the concerted expression of two genes are appropriately diverse. We have used this approach to study across many cellular states would indicate their functional stem cells, whose identity and mechanisms of control are not well relation, even in the total absence of any other experimental understood, generating Affymetrix microarray data from more characterization. But to make this analysis meaningful two than 200 samples, including stem cells and their derivatives, from conditions apply to the data; it must be both human and mouse. The data can be accessed online (StemBase; comparable across samples and exhaustive at the genomic http://www.scgp.ca:8080/StemBase/). scale. One technique to measure gene expression that fulfills 2005 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. these conditions is DNA microarrays, which allow the simul- taneous measurement of mRNA levels from thousands of Keywords: Stem cell; Genomics; Microarray analysis; genes [7]. Despite some inherent limitations of this technology Microarray database [8], they offer a very complete view of the cell state in terms of gene expression. Following this, we have collected Affymetrix DNA micro- array data for more than 200 samples including stem cells and their derivatives from human and mouse. Samples were 1. Introduction provided by researchers of the Canadian Stem Cell Network (SCN; http://www.stemcellnetwork.ca/). In this work, we illus- Stem cells are key to embryonic development and are also trate how we are using these data to characterize stem cells, present in many adult tissues, where they are responsible for their genetic networks, and characteristic genes. The data are regeneration and maintenance. The main characteristic of stem available in an online database, (StemBase; http://www.scgp. cells is their capability for differentiating and self-renewing. ca:8080/StemBase/). This suggested their potential use for therapy to supplement a degenerating tissue with fresh cells or to produce new tissue [1,2]. Encouraging examples have demonstrated the use of 2. Results stem cells for cell therapy in dystrophic mice [3], and the treat- ments of human blood cancer [4] and ParkinsonÕs disease [5]. 2.1. Data production and reproducibility controls However, most of these applications are hampered by our Our goal is to generate gene expression profiles from multi- incomplete knowledge of stem cells [6]. Stem cell identity is still ple samples and to compare them. One difficulty with this ap- an issue, the control of genes known to be important in stem proach is that technical variability in microarray data can cell functions is poorly understood, and it is likely that there obscure the biological differences between samples. are many genes critical to stem cell function not yet character- Firstly, there is variability between different microarray plat- ized. Here, we present an approach to tackle the study of stem forms [9]. To avoid this, we have used only Affymetrix DNA cells at these three levels – cell identity, genetic network, and microarrays and mostly a unique type of array set per organ- gene – by collecting gene expression data from a variety of ism (MOE430 for mouse, and HGU133 for human). stem cell types and their derivatives, and by studying the data Secondly, even within the same platform and array type, at different levels using computational tools and biological there is variability that can arise from many points of the pro- databases. cess of chip and data production, including the bioinformatics of probe design [10], array manufacture, sample preparation, target labeling, hybridization, and scanning. In order to ensure data consistency between experiments and between operators, *Corresponding author. Fax: +613 737 8803. samples were analyzed in triplicate when possible, and techni- E-mail address: [email protected] (M.A. Andrade). cal staff responsible for data generation underwent rigorous

0014-5793/$30.00 2005 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.febslet.2005.02.020 1796 C. Perez-Iratxeta et al. / FEBS Letters 579 (2005) 1795–1801 periodic evaluation. The current average number of replicates The most basic and general query in StemBase is to obtain a per sample in StemBase is 2.80, with an average correlation be- selection of a sub-matrix of this matrix operating on the prop- tween replicates of R2 = 0.95. erties attached to columns (samples) and rows (probe sets). The goal of large part of our work in StemBase is to enrich the infor- 2.2. Database and data mining mation attached both to samples and probe sets, so that more The StemBase user interface is designed as an Apache/PHP biologically meaningful queries can be operated in the data. front-end application to a MySQL database. The database col- Each probe set has associated information taken from the lects data into experiments, each of which is a collection of one Affymetrix NetAffx annotation database [11] such as related or more samples. An experiment may be as simple as compar- gene names and symbols, links to DNA and protein sequence ing two samples, or may represent a more complex set of sam- databases, and (GO) terms [12]. NetAffx anno- ples, such as a timecourse. Within each sample, there are tations are periodically updated as the underlying databases typically three replicates; each replicate is associated with are updated, and these changes are reflected in StemBase. This Affymetrix hybridization data. is crucial given the current instability of probe annotation [13]. The data in StemBase are from mouse and human samples, We are working on methods to extend the GO annotations of with a preponderance of human hematopoietic stem cells and the probe sets [14]. Additionally, we associate to the probe sets mouse embryonic stem cells (Table 1). Future additions to the MEDLINE references to the scientific bibliography derived database will increase the variety of samples represented. from their links to sequence databases. Their relevance to stem Experiment types are: comparisons between undifferentiated cells is evaluated using an information extraction method [15] stem cells and their derivatives (some as a time series); related so that a probe set selection can be quickly contrasted to the types of stem cells; stem cells in mixed populations and some current knowledge on stem cells. selections of the population based on usage of markers (using Each sample in the data matrix has a number of associated multiple passages); stem cell tumours; and stem cells in a vari- free text descriptions (including experimental conditions, cell ety of conditions (culture media, knock-in of a gene, knock out type and tissue of origin, organism, purity, etc.), and informa- of a gene, etc.). tion about the experiment of which it is a member. In order to The data in StemBase can be considered to form a matrix allow database queries encompassing derivation relationships with samples represented in the columns and microarray probe between cell types, we have created a cellular ontology that de- sets in the rows. Each matrix entry represents a hybridization scribes the developmental relationships between stem cells, value observed for a probe set in a given sample (Fig. 1(a)). committed progenitors, and differentiated cell types, and asso- ciated each sample with a term from this ontology. This ontol- ogy is an expansion of the MeSH ontology category MeSH A Table 1 (Medical Subjects Headlines developed at the National Li- Number of samples deposited in StemBase (December 2004) by cell brary of Medicine, http://www.nlm.nih.gov/mesh/), which in- ontology term cludes both tissue descriptions and a subcategory of cell 293FT 1 types, customized to the samples contained in StemBase. The Adipospheres 1 latest version, currently containing 110 terms, is accessible Blastocyst 30 Bone 3 through StemBase. Bone marrow 18 Each entry in the data matrix represents the hybridization Brain 1 observed for a probe set in a given sample, which is assumed C2C12 1 to reflect the level of mRNA expression in the cell. There are Connective tissue 1 Cord blood 11 several ways to obtain this value starting from the microarray Demispheres 1 scan images and pooling several replicates (Fig. 1(b)). In the Embryonic 8 current version of StemBase, the replicate probe set hybridiza- Fetal blood 1 tion values are generated using the MAS5.0 algorithm (devel- Fetal liver 3 oped by Affymetrix; http://www.affymetrix.com/support/ FM95-14 11 FM95-14 myotube 1 technical/whitepapers/sadd_whitepaper.pdf), and the hybrid- HeLa 14 ization value of the probe set for the sample is calculated as I-6 ES 2 the mean of the hybridization values of the replicates. The J1 ES/EB 13 majority of experiments in StemBase are performed with trip- Limb bud mesenchyma 3 Mammary gland 5 licate hybridizations (currently, 55 samples have only a single Mo7-e 4 chip hybridization, 6 have duplicates, 152 have triplicates, Mobilized peripheral blood 4 two have four hybridizations, and one sample has five repli- Mouse embryonic fibroblasts 2 cates). The StemBase interface also offers the option of show- Muscle 2 ing the Affymetrix Present/Absent calls for the probe set. Muscle derived pluripotent (adult) stem cell 1 Myoblast 3 Future versions of StemBase will include newer methods for Myosphere 3 microarray data normalization, such as RMA [16] and GC- Myotube 1 RMA [17] that outperform MAS5.0 [18]. Neural stem cells 7 Data in StemBase can be browsed by experiment, or by p107-/- 2 P19 embryonic carcinoma 4 searching for terms in the text description of experiments and R1 ES/EB 21 samples. We also offer searches that will retrieve a data matrix Retinal spheres 4 of the form described above. For this query method, samples Undifferentiated mammospheres 2 can be restricted by species, keywords, or by terms or branches V6.5 ES/EB 11 of the cell ontology described above. Probe sets can be selected C. Perez-Iratxeta et al. / FEBS Letters 579 (2005) 1795–1801 1797

Fig. 1. Data organization in StemBase. (a) The hybridization data can be considered as a matrix (violet square) with rows for probe sets and columns for samples. Both probe sets and samples have properties attached that can be used for data selection of a sub-matrix (black squares within the violet square). (b) Each entry in the matrix (3 · 3 in this example) represents a hybridization value of a probe set in a sample, which could correspond to multiple replicates produced on the same sample. This value can be produced in alternative ways. (c) The values produced for the matrix (colours: e.g., red means the probe set hybridized in that sample; green means that it did not) could be used now as input by many methods of analysis, for example, hierarchical clustering. by their location of expression, function, or biological process 120 GO terms from the GO web site (http://www.geneontol- according to associated GO terms. Probe set IDs and sample ogy.org/GO.slims.html). IDs can also be used to query specific data in the database. We coded each sample in StemBase as a vector with 120 The resulting matrix of data can then be downloaded by users coordinates corresponding to the fraction of genes in each of in tabular format and fed into algorithms to derive knowledge the slimmed GO categories that were detected as being ex- about the relations between the probe set values and the sam- pressed in the sample (according to a call of present in more ples observed, for example, a two-dimensional hierarchical than 50% of their replicates, as given by MAS5.0). clustering of probe sets and samples (Fig. 1(c)) [19]. Figs. 2(a) and (b) represent a fitted two-dimensional projec- tion of the vectors obtained for the mouse and human samples, 2.3. Analysis of gene expression patterns respectively (multidimensional scaling computed with xgobi, 2.3.1. Cell function profile. The complete set of the cellular http://www.research.att.com/areas/stat/xgobi/). The proximity functions of the genes expressed in a particular cell type can be between samples corresponding to the same or similar tissues used to define that cellÕs identity and to learn about its biology. can be taken as an indication that there is a correlation be- Comparison of function profiles from multiple cells to their tween gene expression profiles and cellular types, with similar relations in tissue position and development can be used to cells giving similar gene expression profiles. This is very impor- learn about organism differentiation. tant to support subsequent analysis. The overview of the data The gene expression data for each sample in StemBase con- indicates the current status of sampling of tissues in the data- stitutes a function profile based on mRNA transcript levels. base and suggests the major groupings that can be used later to This can be obtained in a computer usable way since, as it search for genes particular of cellular sub-types. The mouse was described above, each microarray probe set is associated data (Fig. 2(a)) indicate that the major data separation could to GO terms. However, the level of detail in GO complicates be observed between embryonic and hematopoietic stem cells the analysis of these profiles. A more appropriate set of anno- with all dermal related cellular types in between; the human tations is given by the GO slims, Ôslimmed downÕ versions of data (Fig. 2(b)) also suggest a good separation of hematopoi- the GO terms. Here, we used the ‘‘Generic GO Slim’’ list of etic stem cells from the rest. 1798 C. Perez-Iratxeta et al. / FEBS Letters 579 (2005) 1795–1801

(a) ples), or almost ubiquitously (in >97% of samples) according to the A/P calls given by MAS5.0 (that is probe sets were as- signed a call of present if the majority of the replicate calls were present and otherwise were taken as absent). After this fil- tering, 16262 probe sets remained for 43 mouse samples and 17850 probe sets for 28 human samples. We calculated the Pearson correlation coefficient r for every pair of probes, using their scaled hybridization values. All pairs with |r| P 0.75 were collected: 1073206 pairs in mouse with a false discovery rate (FDR) lower than 0.00001, and 1519661 pairs in human with a FDR lower than 0.004 [20]. This represents about one hundred links per probe set. It is obvious that the evaluation of such a large amount of data re- quires an automated method. To do a global evaluation of the network, we examined the functions of connected genes. We propose that pairs of genes sharing GO annotations should be more common in pairs of positively correlated genes than in pairs of genes taken at ran- dom. This is because correlatively expressed genes will tend to participate in the same pathways, form part of the same pro- (b) tein complex, be present in the same cellular location, etc. We checked this particularly in the mouse network. Firstly, we selected all pairs of related probe sets pointing to the differ- ent genes. We used the probe set annotations with 120 generic GO slims (as previously). Of 890817 pairs, 112655 were be- tween probe sets with at least one common GO slim. In agree- ment with the correlation score computed, the ratio of pairs with the same function to the total of pairs computed increases with the strength of the correlation (see Fig. 3). An additional indication of the robustness of the positive correlations of the network was given by the fact that a randomization of the pairs (by shuffling all second members) resulted in 86185 pairs with common GO slim, which is 76% of the original number. Again, this difference was more pronounced for connections with better correlation values. We checked similarly the smaller number of negative corre- lations. For the mouse network, those were 22087, with 1915 between genes sharing at least one GO term. Now, the randomization of the pairs resulted in an increase on the number of pairs sharing one GO term to 2226. Once again, Fig. 2. Stem cell function profiles. Each symbol corresponds to the two-dimensional projection of a vector whose coordinates represent 400000 0.2 the fraction of genes expressed in the sample for each of 120 functional categories. (a) Mouse samples. (b) Human samples. The colours identify different tissue/cell types. Major ones are: green, embryonic; red, hematopoietic; and brown, muscle. 300000 0.15

2.3.2. Genetic networks. The large number of samples in diff GO StemBase provides an ideal basis for the computation of net- 200000 0.1

# of pairs same GO works of functional relationships between genes. The complete ratio same/dif set of data for all the samples of a given organism can be used to suggest a relation between pairs of transcripts according to 100000 0.05 their related expression. This correlation can be positive if they are expressed in the same set of samples or negative if their expression is mutually exclusive. 0 0 We computed a network of relationships between probe sets 0.75 0.80 0.85 0.90 0.95 for mouse (using most of the samples analyzed with the Score of association MOE430 chip) and human (using the HG-U133 chip). The hybridization values (obtained as described above) were scaled Fig. 3. Network evaluation of pairs using GO terms. Red bars: number of pairs with at least one GO term in common; blue bars: by dividing each value by the trimmed mean (98th percentile) number of pairs with no single GO term in common. Line: ratio of of all the values in the chip. We then filtered the resulting sets pairs. x-axis: score range of the pairs: 0.75 is 0.75 < x < = 0.80, 0.80 is by removing probe sets hybridizing very rarely (in <3% of sam- 0.80 < x < = 0.85, and so on until 1.00. C. Perez-Iratxeta et al. / FEBS Letters 579 (2005) 1795–1801 1799

Fig. 4. Analysis of the Elac1/Elac2. (a) Heat map of the hybridization to probes associated to Elac1 and Elac2 (rows) across mouse samples in StemBase (columns). The numbers in the labels of the columns refer to a sample identifier in StemBase; samples of embryonic stem cells are marked in blue, adult stem cells in orange, and differentiated cells have no mark. (b) Phylogenetic analysis of Elac1/2 and related sequences. Representative protein sequences were retrieved from GenBank [31] using PSIBLAST [32]. Multiple sequence alignments of Elac1 and Elac2 families were produced using ClustalW [33] and manually corrected. The alignments were used to produce phylogenetic trees using njplot [34]. The trees were drawn using the Phylodendron web site (http://iubio.bio.indiana.edu/treeapp/ D.G. Gilbert Indiana University). See below the database identifiers corresponding to the sequence names used in the trees. Numbers after the sequence names in the tree of the repeat indicate the start and the end of the fragment used for the alignment. The Elac1/RnaZ tree indicates that Elac1/RnaZ exists in bacteria, archaea (coloured in red), and in the vertebrate lineage (coloured in blue). The Elac2 tree suggest vertical transmission of Elac2 in the whole eukaryotic kingdom. A region of similarity between Elac1 and Elac2 is duplicated in Elac2. The tree of this repeat in the sequences included in the previous two trees is consistent with the early formation of Elac2 upon eukaryotic divergence from prokaryotes, followed by the internal duplication of the repeat before the divergence of yeast. Sequences and genbank ids are: Elac2_Mouse Mus musculus gij29477227, Elac2_Rat Rattus norvegicus gij26000218, Elac2_Human Homo sapiens gij12804973, Elac2_Fish Tetraodon nigroviridis gij47217945, Elac2_Fly Drosophila melanogaster gij21430922, Elac2_Pfalciparum Plasmodium falciparum 3D7 gij23497694, Elac2_Pyoelii Plasmodium yoelii yoelii gij23486012, Elac2_Yeast Saccharomyces cerevisiae gij486557, Elac2_Plant Arabidopsis thaliana gij26449703, Elac2_Celegans Caenorhabditis elegans gij47270763, Elac2_Cbriggsae Caenorhabditis briggsae gij39587637, RnaZ_Ecoli Escherichia coli gij41017568, Elac1_Human Homo sapiens gij8574425, Elac1_Mouse Mus musculus gij18044182, Elac1_Rat Rattus norvegicus gij34932428, Elac1_Fish Tetraodon nigroviridis gij47217496, RnaZ_Bsubtilis Bacillus subtilis gij1303962, RnaZ_Saureus Staphylococcus aureus subsp. aureus N315 gij13701303, RnaZ_Bhalodurans Bacillus halodurans C-125 gij10174330, RnaZ_Scoelicolor Streptomyces coelicolor A3(2) gij6714761, RnaZ_Mmazei Methanosarcina mazei Goe1 gij20904666, RnaZ_Afulgidus Archaeoglobus fulgidus DSM 4304 gij2649658, RnaZ_Sulfolobus Sulfolobus solfataricus P2 gij13814227, RnaZ_Cpneumoniae Chlamydophila pneumoniae CWL029 gij4376278, RnaZ_Synechocystis Synechocystis sp. PCC 6803 gij16331469. (c) Modular organization of human Elac1 and yeast Elac2 according to the regions of similarity delimited by converging PSIBLAST searches. Reciprocal searches with the fragments in between the coloured domains did not produce matches outside the families depicted in the phylogenetic trees. 1800 C. Perez-Iratxeta et al. / FEBS Letters 579 (2005) 1795–1801 the difference was more pronounced for the connections with 3. Conclusion better scores. The results indicate that genes negatively corre- lated will tend to have different functions. StemBase is a collection of microarray data from stem cells The networks described here are obviously dependant on the that was designed to help ongoing scientific efforts on stem cell set of samples chosen. Their utility, however, is extraordinary characterization. StemBase fills a distinct niche, being more fo- in guiding more focused analyses on particular genes because cused than larger microarray resources (such as the database at they can reveal genetic interactions specific to stem cells, and the Medical University of South Carolina [28] and the Gene the examination of the underlying data will point to the field Expression Omnibus from the National Centre for Biotechnol- of action of the selected pairs of interacting genes, as we will ogy Information [29]), yet more diverse than those with a nar- illustrate in the following section. rower focus (such as the Stem Cell database that keeps data 2.3.3. Characterization of two related genes: Elac1 and about hematopoietic stem cells [30] and the Myeloid Micro- Elac2. Here we illustrate how the database and methods pre- array Database [2]). sented in the previous sections (cell functional pattern and net- We have shown that such a collection of centralized high work analysis) may be used to add new functional data about throughput genomic data becomes useful if it is gathered with genes not yet experimentally characterized. a particular biological goal in mind. Homogeneity in both the One of the most apparent trends in the current database set is platform and the methodology is critical to allow for ease of the difference between embryonic and non-embryonic samples. comparison between samples because it is difficult to control This pattern is well represented in the gene Elac2, which we all the variables in the design of a microarray that affect the found primarily expressed in mouse embryonic stem cells. The measurements. Another key aspect is the process of data cura- three probe sets for this gene are negatively correlated with one tion and the organization of the data in samples (with multiple probe set for the homolog gene Elac1 (values 0.498, 0.575, replicates) and experiments with a rationale binding them. and 0.496) (there is another probe set for Elac1 but no hybrid- Accurate definition of the data is imperative in order to extract ization to it was detected). Accordingly, we found that Elac1 is biological conclusions from its analysis. primarily expressed in adult stem cells and absent to embryonic Our study of the data deposited in StemBase operated at stem cells. Thus Elac1 and Elac2 have near complementary three levels. The functional analysis of the genes expressed in expression patterns (see Fig. 4(a)). This is interesting because particular samples can be used for the general characterization Elac1 and Elac2 have similarities in function and sequence but of the samples. The observed correlation between the functions their possible relation to stem cells had not been discussed yet. of expressed genes and stem cell type indicates that expression Human Elac2 was associated to prostate cancer and noted to measurements reflect genuine properties of the cells analyzed, be similar in sequence to the shorter gene Elac1 [21]. Whereas and that the data can be used to find genes important in stem Elac1 seems to be ubiquitous in the three kingdoms of life (euk- cell function. The production of gene expression data on a arya, bacteria, and archaea), the related Elac2 is found only in large variety of stem cell samples allowed us to deduce a net- Eukarya. Our phylogenetic analysis of Elac2 confirms the work of genetic interactions based on the correlation between distribution of the gene in the whole eukaryotic domain. The probe sets of a chip. We used these networks to recognize gen- distribution of RnaZ/Elac1 is more complex as some of its pro- eral trends in expression patterns, to identify functionally re- tein domains seem to be present in vertebrate sequences but lated genes, and to focus on genes following interesting missing from the remaining eukaryotic organisms (see Fig. trends, like Elac1 and Elac2. Ultimately, thanks to the distri- 4(b)). Sequence analysis also shows that the similarity between bution of the data provided by StemBase, other researchers Elac1 and Elac2 sequences is restricted to three regions, one of will be able to generate similar analyses. which is duplicated in Elac2 (already noted in [21])(see Fig. In the future, we plan to update the database with new and 4(c)). The phylogenetic analysis of the repeated region indicates better methods to interpret the raw microarray data, to include that the duplication occurred after the origin of Elac2 and be- newer database annotations and more complex querying fore the divergence of the yeasts from the eukaryotic lineage. mechanisms, and to complement the data with additional stem This domain rearrangement is likely responsible of the func- cell data and other related publicly available microarray data. tional differences between these sequences and would explain We hope that the model used to develop StemBase, an exam- their different patterns of expression in relation to stem cells. ple of a small specialized microarray database, will be valuable The functional characterization of the Elac1 and Elac2 fam- for developers of databases of experimental data. ilies is currently scarce. Both RnaZ/Elac1 and Elac2 proteins were shown to produce the mature tRNA 30 end: Elac1 in ar- Acknowledgements: Thanks to the members of the Canadian Stem Cell chaea and plants [22] and Elac2 in Drosophila melanogaster Network for supporting the Stem Cell Genomics Project with stem cell [23] and human [24]. Different substrate specificities were pro- samples. StemBase and the SCGP are supported by Genome Canada, the Ontario Research and Development Challenge Fund, the Stem Cell posed for them [25]. The expression of Elac2 in adult mouse, Network, and the Canada Foundation for Innovation. We are grateful rat tissue [26], and human [21] was described as ubiquitous. to the technical staff of the Ontario Genomics Innovation Centre. Our analysis focused on stem cells adds more information showing that Elac1 and Elac2 are expressed in mouse stem cells with complementary patterns of expression. The presence References of both genes in the vertebrate lineage and their sequence sim- ilarity across multiple domains supports their involvement in [1] Prockop, D.J., Gregory, C.A. and Spees, J.L. (2003) One strategy similar yet complementary functions important for stem cells. for cell and gene therapy: harnessing the power of adult stem cells to repair tissues. Proc. Natl. Acad. Sci. USA 100 (Suppl. 1), The relation of Caenhorabditis elegans Elac2 to germline pro- 11917–11923. liferation [27] and the involvement of human Elac2 in prostate [2] Ma X, H.T., Peng, H., Lin, S., Mironenko, O., Maun, N., cancer [21] support this association. Johnson, S., Tuck, D., Berliner, N., Krause, D.S. and Perkins, C. Perez-Iratxeta et al. / FEBS Letters 579 (2005) 1795–1801 1801

A.S. (2002) Development of a murine hematopoietic progenitor [19] Eisen, M.B., et al. (1998) Cluster analysis and display of genome- complementary DNA microarray using a subtracted complemen- wide expression patterns. Proc. Natl. Acad. Sci. USA 95 (25), tary DNA library Blood 100 (3), 833–844. 14863–14868. [3] Sampaolesi, M., et al. (2003) Cell therapy of alpha-sarcoglycan [20] Benjamini Y, H.Y. (1995) Controlling the false discovery rate: a null dystrophic mice through intra-arterial delivery of mesoan- practical and powerful approach to multiple testing. J. R. Statist. gioblasts. Science 301 (5632), 487–492. Soc. B 57, 289–300. [4] Barker, J.N. and Wagner, J.E. (2003) Umbilical-cord blood [21] Tavtigian, S.V., et al. (2001) A candidate prostate cancer suscep- transplantation for the treatment of cancer. Nat. Rev. Cancer 3 tibility gene at 17p. Nat. Genet. 27 (2), 172–180. (7), 526–532. [22] Schiffer, S., Rosch, S. and Marchfelder, A. (2002) Assigning a [5] Lindvall, O., Kokaia, Z. and Martinez-Serrano, A. (2004) Stem function to a conserved group of proteins: the tRNA 30- cell therapy for human neurodegenerative disorders – how to processing . EMBO J. 21 (11), 2769–2777. make it work. Nat. Med. 10 (Suppl.), S42–S50. [23] Dubrovsky, E.B., et al. (2004) Drosophila RNase Z processes [6] Raff, M. (2003) Adult stem cell plasticity: fact or artifact?. Annu. mitochondrial and nuclear pre-tRNA 30 ends in vivo. Nucleic Rev. Cell. Dev. Biol. 19, 1–22. Acids Res. 32 (1), 255–262. [7] Schena, M., et al. (1995) Quantitative monitoring of gene [24] Takaku, H., et al. (2003) A candidate prostate cancer suscepti- expression patterns with a complementary DNA microarray. bility gene encodes tRNA 30 processing endoribonuclease. Nucleic Science 270 (5235), 467–470. Acids Res. 31 (9), 2272–2278. [8] Evertsz, E.M., et al. (2001) Hybridization cross-reactivity within [25] Takaku, H., et al. (2004) The N-terminal half-domain of the long homologous gene families on glass cDNA microarrays. Biotech- form of tRNase Z is required for the RNase 65 activity. Nucleic niques 31 (5), 1182, 1184, 1186 passim. Acids Res. 32 (15), 4429–4438. [9] Yauk, C.L., et al. (2004) Comprehensive comparison of six [26] Dumont, M., et al. (2004) Structure of primate and rodent microarray technologies. Nucleic Acids Res. 32 (15), e124. orthologs of the prostate cancer susceptibility gene ELAC2. [10] Mecham, B.H., et al. (2004) Increased measurement accuracy for Biochim. Biophys. Acta 1679 (3), 230–247. sequence- verified microarray probes. Physiol. Genom. 18 (3), [27] Smith, M.M. and Levitan, D.J. (2004) The Caenorhabditis 308–315. elegans homolog of the putative prostate cancer susceptibility [11] Liu, G., et al. (2003) NetAffx: Affymetrix probesets and annota- gene ELAC2, hoe-1, plays a role in germline proliferation. Dev. tions. Nucleic Acids Res. 31 (1), 82–86. Biol. 266 (1), 151–160. [12] Harris, M.A., et al. (2004) The gene ontology (GO) database and [28] Argraves, G.L., Barth, J.L. and Argraves, W.S. (2003) The informatics resource. Nucleic Acids Res. 32 (Database issue), MUSC DNA microarray database. Bioinformatics 19 (18), 2473– D258–D261. 2474. [13] Perez-Iratxeta, C. and Andrade, M.A. (2005) Annotation of [29] Wheeler, D.L., et al. (2004) Database resources of the National microarray probes with gene names changes much. Submitted. Center for Biotechnology Information: update. Nucleic Acids [14] Muro, E.M., Perez-Iratxeta, C. and Andrade, M.A. (2005) Res. 32 (1), D35–D40. Extending the set of Gene Ontology terms associated to Affyme- [30] Phillips, R.L., et al. (2000) The genetic program of hematopoietic trix probes. Submitted. stem cells. Science 288 (5471), 1635–1640. [15] Suomela, B.P. and Andrade, M.A. (2005) Ranking the whole [31] Wheeler, D.L., et al. (2005) Database resources of the national medline database according to a large training set using text center for biotechnology information. Nucleic Acids Res. 33 indexing. BMC Bioinformatics. In press. (Database Issue), D39–D45. [16] Irizarry, R.A., et al. (2003) Summaries of Affymetrix GeneChip [32] Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a probe level data. Nucleic Acids Res. 31 (4), e15. new generation of protein database search programs. Nucleic [17] Wu, Z. and Irizarry, R.A. (2004) Stochastic models inspired by Acids Res. 25 (17), 3389–3402. hybridization theory for short oligonucleotide arrays. RECOMB, [33] Chenna, R., et al. (2003) Multiple sequence alignment with the 98–106. clustal series of programs. Nucleic Acids Res. 31 (13), 3497–3500. [18] Porter, C.J., et al. (2005) Comparison and optimization of two [34] Perriere, G. and Gouy, M. (1996) WWW-query: an on-line measures of gene expression: serial analysis of gene expression retrieval system for biological sequence banks. Biochimie 78 (5), and Affymetrix cDNA microarrays. Submitted. 364–369.