The Eurosterone DNA Array
Total Page:16
File Type:pdf, Size:1020Kb
Background documents for presentation at: IBRO NEUROSCIENCE SCHOOL: Elba Workshop and Summer School May 4 - 10 Marina di Campo, Elba , Italy Neurobiology of stress in health and disease; Org: E. Ronald de Kloet (Leiden University) ELBA SUMMER SCHOOL The Eurosterone DNA Array Richard Lathe, University of Edinburgh Background documents 1. Mirnics, K. (2001) Microarrays in brain research; the good, the bad and the ugly. Nature Reiews Neuroscience 2, 445-447 2. Lathe, R. (1999) Mathematical analysis of ligation reactions and hybridisation parameters. Hons. Neuroscience and Pharmacology, University of Edinburgh. 3. Rose, K., Mason, J.O. and Lathe, R. (2002) Hybridization parameters revisited: solutions containing SDS. BioTechniques, in press. PERSPECTIVES the hybridization is repeated (each time start- OPINION ing with a new isolation), the more reliable the gene expression measurement becomes20,21. In our system, a single microarray measure- Microarrays in brain research: ment is likely to represent a biological differ- ence between samples at the 99% confidence the good, the bad and the ugly level if it shows a >1.9-fold difference between the fluorescent intensities. However, 1.3–1.4-fold changes in gene expression Károly Mirnics might be reliably detected across repeated experiments that involve the same samples3,20. Making sense of microarray data is a microarray assay due to isolation, labelling, There are arguments against reporting the complex process, in which the interpretation hybridization printing, scanning and image- ‘most changed’ 1% or 5% of genes. Such a of findings will depend on the overall analysis variation. Identification of the exact measurement neither accounts for the assay experimental design and judgement of the source of noise in each category, unless exces- noise, nor does it take into account that the investigator performing the analysis. As a sive, is not particularly critical. However, it is biological variability between the samples result, differences in tissue harvesting, essential to identify probes corresponding to depends on the nature of the comparison. microarray types, sample labelling and data individual genes that tend to be ‘noisy’ or For example, a comparison between the same analysis procedures make post hoc sharing show preferential labelling with one of the cortical region of two adult mice might reveal of microarray data a great challenge. To fluorochromes. only a few true expression differences ensure rapid and meaningful data (<0.5%), whereas a brain–liver microarray exchange, we need to create some order Experiment replication. In repeated experi- comparison is likely to report >20% of differ- out of the existing chaos. In these ground- ments, always reverse the fluorescent labels entially expressed genes. In both of these breaking microarray standardization and between the experimental and control experiments, it would be a mistake to use a data sharing efforts, NIH agencies should sample9. This will help obviate labelling bias. measurement based on the ordering of a take a leading role Spot replicates on the array increase the relia- fixed percentage of the most changed genes. bility of the fluorescent intensity measurement For a graphical representation of a different Microarray data obtained in neuroscience within an experiment, but they do not elimi- example, see FIG. 1. experiments is very different from that nate the need of replicated experiments21,as obtained in cancer expression profiling owing they do not increase measurement reliability Assessment of complex expression differences. to its extensive overlap with experimental between the experiments. Is it necessary to This is the most complex part of data-min- noise (FIG. 1)1–3. At the tissue level, expression always repeat the same sample hybridization ing. Many excellent expression data analysis changes often do not exceed a two-fold differ- several times? If interested in changes that are methods have evolved or have been adopted ence between the experimental and control robust, this may not be necessary4,10. If assess- for use in microarray analysis, including samples4–8, resulting in a substantial overlap ment of de novo induction of gene expression hierarchical22 and K-mean clustering, expres- between real expression differences and assay with a drug treatment is required, resources sion-tree harvesting23, gene shaving24, self- noise. Because of this noise and the cellular may be better applied by comparing ten treat- organizing maps25,26, gene group analysis4, heterogeneity of brain tissue, few successful ed mice to ten matched controls than compar- factor analysis, support vector machines27 microarray expression profiling studies have ing three mouse pairs and repeating each and principal-component analysis (PCA)28. been made so far. However, despite the tech- hybridization four times. The critical issue is However, the biological question that is nical difficulties, careful experimental design, how the data will be analysed. While it is likely being asked and prior knowledge about the combined with extensive data verification and that few observations will report false-positive biological changes should serve as guides in conservative interpretation of data, have or -negative expression differences across the choosing the data-analysis method. Under- yielded a number of groundbreaking brain ten individual comparisons, it is almost cer- standing the strengths and weaknesses of microarray expression studies4,8–19. tain that a consistently changed gene expres- each method is essential — inappropriate sion pattern across the majority of the com- data analysis methods will generate ‘ordered Data-analysis considerations parisons will be due to biological differences, noise’.For example, hierarchical clustering Noise of the assay. It is essential to ¶assess and not experimental artefact. A relatively will find clusters even in randomly generated assay noise and to establish reliable cutoffs simple calculation of probability, taking into data points or in data where the samples are using sound statistical methods. This account noise cutoffs, number of expressed unrelated. Furthermore, it might cluster approach has a proven track record in science; genes, number of experiments and frequency together transcripts that are expressed in dif- there is no reason to treat microarray data dif- of changed observations will reliably separate ferent cell types. Similarly, classifying the ferently. When using complementary DNA expression differences due to a biological genes by response into 20 preset groups (cDNA) arrays, the hybridization of two phenomenon from assay noise. using K-mean clustering is questionable, aliquots of the same sample onto one unless the investigator has prior reason to microarray allows the magnitude and distrib- Reliability cutoffs. Once the noise estimate of expect a preset number of clusters. PCA ution of assay noise to be determined. Using a the system has been established, one can cal- assessments are relatively easy to perform, new sample isolation (and new reverse tran- culate reliability cutoffs for the individual data they explain the mathematical variability in scription), this hybridization should be point observations. These cutoffs will be dif- the data set very well, but relating this vari- repeated as many times as possible3,20. These ferent for single and repeated experimental ability to biological phenomena represents a controls will reveal the combined noise of the comparisons. The greater the number of times complex challenge. 444 | JUNE 2001 | VOLUME 2 www.nature.com/reviews/neuro © 2001 Macmillan Magazines Ltd PERSPECTIVES Data verification. If the project is focused on ab–2.0 –1.6 1.6 2.0 a small group of changes, microarray gene expression data should be verified, preferably –2.0 –1.6 1.6 2.0 Two aliquots of using a method that will be informative about same PFC the anatomical localization of the transcript sample (for example, in situ hybridization)29. This 0.5% = 50 genes eliminates potential harvesting artefacts and discerns a uniform transcript decrease from >1.6 = 0 genes a selective messenger RNA (mRNA) loss in a Observation frequency subpopulation of the cells. For verification of complex gene expression patterns, I recommend a three-step proce- dure29. First, in the initial microarray data- PFC control vs mining, a gene-expression pattern is defined. Observation frequency schizophrenia This is followed by the generation of a second microarray data set using a new group of sub- 0.5% = 50 genes jects with the same condition. So, the gene >1.6 = 50 genes expression pattern discovered in the initial Observation frequency data set is verified on the second data set. Finally, selected single-gene observations, criti- 1.0 cal for the interpretation of data, are verified Cy3/Cy5 ratio by an independent method (for example, PFC control vs in situ hybridization). kidney Data-sharing considerations Control < Control = Control > 0.5% = 50 genes As we are breaking new ground in sharing sample sample sample complex data sets, we have to make sure that >1.6 = 200 genes the guidelines of sharing are based on sound Observation frequency scientific and moral foundations. Sharing in- adequately controlled and unreliable data will Assay noise Cy3/Cy5 ratio generate poor results and false conclusions. Figure 1 | Distribution of microarray data in a typical dual-fluorescence cDNA array experiment. Mining, sharing and comparing microarray a | X axis represents expression difference (Cy3/Cy5