Appendix a Details on Selected Resources
Total Page:16
File Type:pdf, Size:1020Kb
Appendix A Details on selected resources A.1 Data sets Here we describe three data sets that are used in many different examples throughout the book. A.1.1 ALL The ALL data were reported by Chiaretti et al. (2004) The data were normalized using quantile normalization, and expression estimates were computed using RMA (Irizarry et al., 2003b). We consider the comparison of the 37 samples from patients with the BCR/ABL fusion gene resulting from a chromosomal translocation (9;22) with the 42 samples from the NEG group. They are available in the R package ALL. A.1.2 Renal cell cancer The kidpack package contains gene expression data from 74 renal cell carci- noma (RCC) patient biopsy samples (Sultmann¨ et al., 2005). The samples were hybridized to two-color cDNA arrrays with PCR products of 4224 clones spotted in duplicate. These clones had been selected by their pre- sumed relevance to cancer and on the basis of a previous genome-wide array study of kidney cancer. A pool of various RCC samples was used as a reference sample, which was always labeled with the red dye. The data were normalized with the vsn package, and we base our analysis on the generalized log-ratio values quantifying the difference between the tumor samples and the reference sample. The exprSet object esetSpot contains the normalized expression values and patient data. The RCC samples belong to three different histological types, clear cell (ccRCC), papillary (pRCC), and chromophobe (chRCC). A.1.3 Estrogen receptor stimulation The data are from eight samples from an estrogen receptor positive breast cancer cell line. After serum starvation, four samples were exposed to es- trogen and then harvested for analysis with Affymetrix human genome U-95Av2 genechips after 10 hours for two samples and 48 hours for the other two. The remaining four samples were left untreated and harvested 444 AppendixA. Details onSelectedResources at corresponding times. The data are explained in more detail by Scholtens et al. (2004) and are available in the estrogen package. A.2 URLs for projects mentioned Below, we list URLs for different projects and computational resources that were discussed in the text. AmiGO http://www.godatabase.org/ BioCarta www.biocarta.com Bioconductor www.bioconductor.org BioPAX www.biopax.org BOOST www.boost.org cMAP http://cmap.nci.nih.gov CRAN http://cran.r-project.org Entrez http://www.ncbi.nlm.nih.gov/entrez GEO www.ncbi.nlm.nih.gov/geo GO www.geneontology.org The evidence codes: www.geneontology.org/GO.evidence.html Graphviz www.graphviz.org KEGG KEGG: www.kegg.org The dbget archive: www.genome.jp/dbget/dbget.links.html The SOAP API: http://www.genome.ad.jp/kegg/soap libcurl http://curl.haxx.se NAR Nucleic Acids Research on-line category listing http://www3.oup. co.uk/nar/database/c Omegahat http://www.omegahat.org OMIM http://www.ncbi.nlm.nih.gov/Omim PubMed www.ncbi.nlm.nih.gov/entrez R http://www.r-project.org Reactome www.reactome.org Resourcerer http://pga.tigr.org/tigr-scripts/magic/r1.pl W3C http://w3c.org References Affymetrix. Affymetrix Microarray Suite Users Guide. Affymetrix, Santa Clara, CA, version 4.0 edition, 1999a. 19, 20 Affymetrix. Affymetrix Microarray Suite Users Guide. Affymetrix, Santa Clara, CA, version 5.0 edition, 2001b. 14, 19, 20, 26, 37 Affymetrix. Statistical algorithms reference guide. Technical report, Affymetrix, Santa Clara, CA, 2001. 19 Affymetrix. Statistical algorithms description document. Technical report, Affymetrix, Santa Clara, CA, 2002. 19 Affymetrix. GeneChipr Expression Analysis Technical Manual. Affymetrix, Santa Clara, CA, rev 4.0 edition, 2003c. 14 D. Amaratunga and J. Cabrera. Analysis of data from viral DNA microchips. Journal of the American Statistical Association, 96:1161– 1170, 2001. 22 C. Ambroise and G. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. of the U.S.A., 99:6562–6566, 2002. 423 M. Astrand. Contrast normalization of oligonucleotide arrays. Journal of Computational Biology, 10:95–102, 2003. 24 K. A. Baggerly, J. S. Morris, and K. R. Coombes. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics, 20:777–85, 2004. 93, 100 R. Balasubramanian, T. LaFramboise, D. Scholtens, et al. A graph theoretic approach to testing associations between disparate sources of functional genomics data. Bioinformatics, 20:3353–3362, 2004. 346, 369, 370 P. Baldi and A. Long. A Bayesian framework for the analysis of mi- croarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17:509–519, June 2001. 232 446 References D. M. Bates and D. G. Watts. Nonlinear regression analysis and its applications. Wiley, New York, 1988. 277 G. D. Battista, P. Eades, R. Tamassia, et al. Graph Drawing: Algo- rithms for the visualization of graphs. Prentice Hall, Upper Saddle River, 1999. 360 Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57:289–300, 1995. 234, 238, 254, 266, 416, 439 Y. Benjamini and D. Yekutieli. The control of the false discovery rate in multiple hypothesis testing under dependency. Annals of Statistics, 29:1165–88, 2001. 266 C. Berge. Graphs and Hypergraphs. North-Holland, Amsterdam, 1973. 329, 337, 342 J. Bertin. Semiologie Graphique. Walter de Gruyter, Inc., Berlin, 2 edition, 1973. 162 M. Bittner, P. Meltzer, Y. Chen, et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406:536–540, 2000. 217 B. M. Bolstand, R. A. Irizarry, M. Astrand, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19:185–193, 2003. 20, 21, 23, 64 I. Borg and P. Groenen. Modern Multidimensional Scaling. Springer- Verlag, New York, 1997. 173 S. P. Borgatti, M. G. Everett, and P. R. Shirey. LS sets, lambda sets and other cohesive subsets. Social Networks, 12:337–357, 1990. 345 M. Boutros, A. A. Kiger, S. Armknecht, et al. Genome-wide RNAi analysis of growth and viability in Drosophila cells. Science, 303: 832–835, 2004. 72, 74 J. M. Bower and H. Bolouri, editors. Computational Modeling of Genetic and Biochemical Networks. MIT Press, Cambridge, 2001. 125 A. Brazma, P. Hingamp, J. Quackenbush, et al. Minimum informa- tion about a microarray experiment (MIAME): toward standards for microarray data. Nature Genetics, 29:365–371, 2001. 8 L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996a. 295 References 447 L. Breiman. Out-of-bag estimation. Technical report, Statistics Department, University of California Berkeley, 1996b. URL ftp: //ftp.stat.berkeley.edu/pub/users/breiman/. 299 L. Breiman. Random forests. Machine Learning, 45:5–32, 2001. 279, 296 L. Breiman, J. H. Friedman, R. A. Olshen, et al. Classification and Regression Trees. Wadsworth, 1984. 279, 294 C. A. Brewer. Color use guidelines for mapping and visualization. In A. MacEachren and D. Taylor, editors, Visualization in Modern Cartography. Elsevier Science, Tarrytown, NY, 1994a. 162 C. A. Brewer. Guidelines for use of the perceptual dimensions of color for mapping and visualization. In J. Bares, editor, Color Hard Copy and Graphic Arts III, volume 2171, pages 54–63. Proceedings of the International Society for Optical Engineering (SPIE), Bellingham, 1994b. 162 P. Buhlmann¨ and B. Yu. Analyzing bagging. Annals of Statistics, 30:927–961, 2002. 295 C. Burges. A tutorial on support vector machines for pattern recog- nition. Knowledge Discovery and Data Mining, 2:121–167, 1998. 425 A. Butte and I. Kohane. Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing, 5:415–426, 2000. 195 E. Camon, M. Magrane, D. Barrell, et al. The Gene Ontology an- notation (GOA) database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res, 32:D262–D266, 2004. 123 A. E. Carpenter and D. M. Sabatini. Systematic genome-wide screens of gene function. Nature Reviews in Genetics, 5:11–22, 2004. 71 D. B. Carr, R. Littlefield, W. Nicholson, et al. Scatterplot ma- trix techniques for large N. Journal of the American Statistical Association, 83:424–436, 1987. 163 J. M. Chambers. Programming with Data: A Guide to the S Language. Springer-Verlag, New York, 1998. 150 D. Chessel, A. B. Dufour, and J. Thioulouse. The ade4 package — I: One-table methods. RNews, 4(1):5–10, 2004. URL http: //CRAN.R-project.org/doc/Rnews. 360 S. Chiaretti, X. Li, R. Gentleman, et al. Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of 448 References patients with different response to therapy and survival. Blood, 103: 2771–2778, 2004. 201, 232, 249, 252, 262, 300, 374, 443 R. Cho, M. Campbell, E. Winzeler, et al. A genome-wide transcrip- tional analysis of the mitotic cell cycle. Molecular Cell, 2:65–73, 1998. 370 D. Cilloni, A. Guerrasio, E. Giugliano, et al. From genes to therapy: the case of Philadelphia chromosome-positive leukemias. Annal of the New York Academy of Sciences, 963:306–312, 2002. 201 Ciphergen. ProteinChip System Users Guide. Ciphergen Biosystems, 2000. 92 W. S. Cleveland. Visualizing Data. Hobart Press, Summit, New Jersey, 1993. 162 W. S. Cleveland. The Elements of Graphing Data (Revised). Hobart Press, Summit, New Jersey, 1994. 162 W. Conover. Practical Nonparametric Statistics. John Wiley& Sons, New York, 1971. 193 R. D. Cook and S. Weisberg. Residuals and Influence in Regression. Chapman & Hall, New York, 1982. 196 L. Cope, R. Irizarry, H. Jaffee, et al. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics, 20:323–331, 2004. 31, 32 T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. McGraw-Hill, New York, 1990. 348, 357 A.