Affymetrix® Data Mining Tool
User’s Guide
Version 3.0
For Research Use Only. Not for use in diagnostic procedures. Affymetrix Confidential
700233 Rev. 3 Trademarks Affymetrix®, GeneChip®, EASI™, ™,, ™, HuSNP™, GenFlex™, Jaguar™, MicroDB™, 417™, 418™, 427™, 428™, Pin-and-Ring™, Flying Objective™, NetAffx™ and CustomExpress™ are trademarks owned or used by Affymetrix, Inc. Microsoft® is a registered trademark of Microsoft Corporation. Oracle® is a registered trademark of Oracle Corporation. Limited License PROBE ARRAYS, INSTRUMENTS, SOFTWARE AND REAGENTS ARE LICENSED FOR RESEARCH USE ONLY AND NOT FOR USE IN DIAGNOSTIC PROCEDURES. NO RIGHT TO MAKE, HAVE MADE, OFFER TO SELL, SELL, OR IMPORT OLIGONUCLEOTIDE PROBE ARRAYS OR ANY OTHER PRODUCT IN WHICH AFFYMETRIX HAS PATENT RIGHTS IS CONVEYED BY THE SALE OF PROBE ARRAYS, INSTRUMENTS, SOFTWARE, OR REAGENTS HEREUNDER. THIS LIMITED LICENSE PERMITS ONLY THE USE OF THE PARTICULAR PRODUCT(S) THAT THE USER HAS PURCHASED FROM AFFYMETRIX. Patents Software products may be covered by one or more of the following patents: U.S. Patent Nos. 5,733,729; 5,795,716; 5,974,164; 6,066,454; 6,090,555; 6,185,561 and 6,188,783; and other U.S. or foreign patents. Copyright ©1999, 2001 Affymetrix, Inc. All rights reserved. Contents
CHAPTER 1 Welcome 3
Data Mining Tool User’s Guide 3 What’s New in DMT 3.0 3 Conventions Used 4
On-line Documentation 5
Technical Support 6
Your Feedback is Welcome 6
CHAPTER 2 Installing Data Mining Tool 3.0 9
Before You Begin 9 Microsoft® SQL Server LIMS Users 9 Oracle® LIMS Users 9 MicroDB™ Users 9
Installing Data Mining Tool 10
Creating an Oracle® Alias 17 Oracle 8.1.7 Alias Configuration 17
CHAPTER 3 Affymetrix® Data Mining Tool Overview 25
Access Data 25 Affymetrix Publish Database 25 Affymetrix® Analysis Data Model 26
i ii Contents
DMT Windowpanes 27
Query Data 30 Building and Running a Query 30
Viewing Query Results 33 Tables 33 Graphs 38
Analyze Query Results 43 Statistical Analyses 43 Cluster Analysis 43 Matrix Analysis 46
CHAPTER 4 Getting Started 49
Starting DMT 49
Managing Database Connections 50 Registering a Database 51 Unregistering a Database 52 Selecting a Database 53
Specifying the Default Directory 54
CHAPTER 5 Building and Running a Query 59
Building a Query 59 Starting a New Query 59 Specifying the Filters 61 Query Builder 68 Selecting Analyses for the Query 70 Specifying Analysis Filters 70
Running a Query 79 Affymetrix® Data Mining Tool User’s Guide iii
Normalizing GeneChip® Signal Data 79 Choosing Normalization Before a Query or Pivot 80 Choosing Normalization After a Query or Pivot 81 Normalization Options 81
CHAPTER 6 Managing Queries 87
Saving a Query 87 Using the Save As Command 88
Opening a Previously Saved Query 89
Deleting a Query 90
CHAPTER 7 Query Results Tables 93
Experiment Information Table 93 GeneChip® Data Mode 94 Spot Data Mode 95
Query Table 96
Pivot Data Table 97 Selecting Results for the Pivot Table 99 Running the Pivot Operation 101 Including Probe Descriptions in the Pivot Table 102 Including Annotations in the Pivot Table 102 Sorting Pivot Table Columns 103 Pivot Options 104
Working with Tables 106 Finding Probes 106 Viewing Descriptions & Obtaining Further Gene Information 107 Annotating Probes 108 Adding Probes to the Filter Grid 109 iv Contents
Copying Tables 110 Exporting Data 111 Expanding the Results Pane 111 Clearing the Results Pane 112
CHAPTER 8 Annotations 115
Annotating Probes 115 Loading Annotations 116 Querying Annotations 118 Adding Probes to the Filter Grid 121 Deleting Annotations 122
CHAPTER 9 Probe Lists 127
Creating Probe Lists 127 Creating a Probe List from the Query or Pivot Table 128 Creating a Probe List from Cluster Analysis 130 Creating a Probe List from Search Array Descriptions 131 Creating a Probe List from Filter 132 Creating a Probe List by Combining Existing Lists 132
Loading a Probe List 134 Specifying Probe List Members 134 Specifying an Input File 135
Using Probe Lists 137 Adding a Probe List to the Filter Grid 137 Displaying Selected Probe List Members 138
Managing Probe Lists 140 Viewing and Editing Probe List Members 140 Combining Probe Lists 142 Affymetrix® Data Mining Tool User’s Guide v
Exporting a Probe List 143 Deleting a Probe List 144
CHAPTER 10 Array Sets 149
Creating an Array Set 149
Working with Array Sets 151 Viewing Array Sets 151
Managing Array Sets 152 Editing an Array Set 152 Deleting an Array Set 153
CHAPTER 11 Graphing Results 157
Scatter Graph 158 Plotting the Scatter Graph 158 Working with the Scatter Graph 161 Scatter Graph Options 168
Fold Change Graph 171 Plotting the Fold Change Graph 173 Working with the Fold Change Graph 176 Fold Change Graph Options 183
Series Graph 185 Plotting the Series Graph 186 Working with the Series Graph 188 Series Graph Options 191
Histogram 193 Plotting the Histogram 193 Working with the Histogram 195 Histogram Options 199 vi Contents
Other Graphing Features 202 Enlarging the Graph Pane 202 Changing Graph Colors 202 Copying and Clearing Graphs 204 Printing Graphs 204
CHAPTER 12 Statistical Analyses 209
Selecting an Operator 209
Average, Median, Standard Deviation or Inter-Quartile Range 210
Fold Change 212
T-Test 214
Mann-Whitney Test 216
Count & Percentage 218
CHAPTER 13 Matrix Analysis 223
Overview 223 Population Size 224
Running a Matrix Analysis 225
CHAPTER 14 Cluster Analysis 231
Self Organizing Map (SOM) Algorithm 231 Running a SOM Cluster Analysis 232 Saving a Probe List 237 SOM Filters 238 SOM Parameters 239 Affymetrix® Data Mining Tool User’s Guide vii
Correlation Coefficient Clustering Algorithm 240 Running the Correlation Coefficient Cluster 241 Correlation Coefficient Clustering Options 244 Effect of Changing Algorithm Parameters 246 Saving and Importing Seed Patterns 248
Saving a Probe List 251
CHAPTER 15 DMT Tutorial 255
Introduction 255 Step 1: Restoring the MicroDB™ Database 256 Step 2: Starting DMT 256 Step 3: Registering the Database 256 Step 4: Selecting the Tutorial Database 258 Step 5: Opening the DMT Session 258
Lesson 1: Identifying Highly Expressed Genes 259 Step 1: Specifying a Filter 259 Step 2: Selecting Analyses for the Query 260 Step 3: Pivoting on Signal & Detection Call 260 Step 4: Querying and Pivoting the Data 262 Step 5: Sorting the Pivot Table by Signal 263 Step 6: Saving a Probe List 263 Step 7: Plotting the Series Line Graph 264 Lesson 1 Summary 268 Suggested Exercise 269
Lesson 2: Calculating Averages of Replicates 270 Step 1: Specifying a Probe List for the Filter 270 Step 2: Selecting Analyses for the Query 272 Step 3: Pivoting on Signal 273 Step 4: Query and Pivot the Data 274 Step 5: Selecting Average & Standard Deviation Operators 276 Step 6: Sorting the Pivot Table 279 viii Contents
Step 7: Displaying Probe Set Descriptions 280 Lesson 2 Summary 281 Suggested Exercise 281
Lesson 3: Summarizing Qualitative Data 282 Step 1: Pivoting on Detection Call 282 Step 2: Performing Count & Percentage Analysis 284 Step 3: Sorting Pivot Table Results 286 Step 4: Saving a Probe List 287 Step 5: Annotating Probe List Members 287 Lesson 3 Summary 288 Suggested Exercise 288
Lesson 4: Evaluating Difference Between Two Tissues 289 Step 1: Pivoting on Signal 290 Step 2: Mann-Whitney Test 292 Step 3: Annotating Probe Sets 295 Step 4: Saving a Probe List 295 Lesson 4 Summary 296 Suggested Exercise 296
Lesson 5: Evaluating Change Call Consistency 297 Step 1: Clearing the Filter Grid & Selecting Comparison Analyses 299 Step 2: Pivoting on Difference Call 300 Step 3: Comparison Ranking 301 Step 4: Annotating Probe Sets 303 Step 5: Saving a Probe List 304 Lesson 5 Summary 304 Suggested Exercise 304
Lesson 6: Self Organizing Map (SOM) Cluster Analysis 305 Step 1: Clearing the Filter Grid & Selecting Analyses 306 Step 2: Pivoting on Signal 307 Step 3: Computing Average Signal 308 Step 4: SOM Cluster Analysis 310 Affymetrix® Data Mining Tool User’s Guide ix
Step 5: Saving & Annotating a Probe List 318 Lesson 6 Summary 318
APPENDIX A Filter Grid 323
GeneChip Data Mode 323 Statistical Expression Algorithm 323 Empirical Expression Algorithm 324
Spot Data Mode 330
APPENDIX B Working with Windows & Tables 333
Query Windowpanes 333 Expanding a Windowpane 333 Resizing a Windowpane 333 Clearing the Results or Graph Pane 334
Tables 334 Selecting the Entire Table 334 Selecting Rows 334 Resizing Columns 335 Hiding Columns 335 Reordering Columns 336
APPENDIX C Query Table Data 339
GeneChip® Data Mode 339 Statistical Expression Algorithm Metrics 339 Empirical Expression Algorithm Metrics 340
Spot Data Mode 346 x Contents
APPENDIX D DMT Algorithms 349
The SOM Algorithm 349 Neighborhood 351 Learning Rate 352
The Correlation Coefficient Clustering Algorithm 353
The Matrix Algorithm 354
APPENDIX E Toolbars & Shortcuts 359
DMT Main Toolbar 359
Session Toolbar 360
Shortcut Descriptions 361 1 Chapter 1 Welcome 1
Welcome to the Affymetrix® Data Mining Tool (DMT) User’s Guide. The DMT filters, queries and analyzes publish databases of GeneChip® or spotted array expression data.
Data Mining Tool User’s Guide
This manual explains how to use DMT to: