BASIC SCIENCE SEMINARS IN NEUROLOGY

SECTION EDITOR: HASSAN M. FATHALLAH-SHAYKH, MD Application of Microarrays to Neurological Disease

Lisa-Marie Sturla, PhD; Ana Fernandez-Teijeiro, MD, PhD; Scott L. Pomeroy, MD, PhD

odern microarray-based functional genomics holds great promise for revealing novel molecular and cellular mechanisms of disease. First introduced commercially in 1996, microarrays have been used widely to monitor the expression of thousands of in biological samples, as described in the following paragraphs. Other mi- croarray-basedM genomic applications are also in development, including comparative genomic hy- bridization, on-chip sequencing, and novel drug discovery. For example, DNA array-based com- parative genomic hybridization identifies chromosomal gains and losses with greatly improved resolution compared with conventional methods that use metaphase as hybridiza- tion targets.1 This increase in resolution will continue to improve as the technology advances. More- over, microarrays provide a better platform for automation than is possible with standard meta- phase techniques. Where genetic mutations and aberrations are already well characterized, microarrays can be customized to be effectively used as a diagnostic and prognostic tool.2,3 In the field of drug discovery, microarrays have the potential to dramatically enhance progress, being used at all stages from target discovery (through validation of new molecular targets and understanding modes of action) to predicting patient response.4

These devices are beginning to revolu- the application of microarray technology tionize how scientists explore the opera- and emerging data analysis techniques to tion of normal cells in the body and the pediatric brain tumors.8 Using microar- molecular aberrations that underlie medi- rays that monitor the expression of more cal disorders. DNA microarrays, which are than 6800 genes, we endeavored to de- based on well-established principles of finitively differentiate a group of embryo- nucleic acid hybridization, simulta- nal tumors whose diagnosis on the basis neously interrogate thousands of genes.5-7 of morphologic features remains contro- The actual mechanics of data capture from versial and to predict outcome in the most raw material are ever-improving and well common of these tumors, medulloblas- documented, and it is the analysis and dis- toma, for which patient response to treat- covery of meaningful expression pat- ment is unpredictable. terns within these data to which we now There are 2 general approaches to data must turn our attention. analysis: supervised and unsupervised. Un- Analytical approaches to gene ex- supervised methods are applied to the en- pression analysis using a cancer classifi- tire gene expression data set without any cation model are illustrated in the recent previous knowledge of sample classifica- article by Pomeroy et al.8 Several impor- tion, allowing an impartial assessment of the tant clinical questions were answered via underlying features within a data set. Two examples of unsupervised methods are prin- From the Division of Neuroscience, Department of Neurology, Children’s Hospital, cipal component analysis and self- Harvard Medical School, Boston, Mass (Drs Sturla, Fernandez-Teijeiro, and Pomeroy); organizing maps (SOMs). Principal com- and the Unidad de Oncologia Pediatrica, Hospital de Cruces-Baracaldo, Basque ponent analysis allowed us to differentiate Country, Spain (Dr Fernandez-Teijeiro). at a molecular level between the different

(REPRINTED) ARCH NEUROL / VOL 60, MAY 2003 WWW.ARCHNEUROL.COM 676

©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/29/2021 brain tumor types and normal cer- primitive neuroectodermal tumors) in sonic hedgehog (shh)–related pro- ebellum (Figure 1). The marker a data set, may miss more subtle dis- teins are highly expressed in desmo- genes responsible for this distinc- tinctions. We found this to be true for plastic medulloblastomas, suggest- tion supported the conclusion that outcome prediction. Neither princi- ing that they arise as a consequence medulloblastomas are derived from pal component analysis nor SOMs of dysregulated shh signaling. Thus, cerebellar granule cell precursors and identified prognostically significant microarray analysis can identify gene that they are molecularly distinct from subgroups of medulloblastomas, so expression profiles that signify an ac- supratentorial primitive neuroecto- we turned to supervised analysis. Ex- tivated regulatory pathway or inter- dermal tumors. This argues against pression profiles were obtained from acting molecular processes leading to the hypothesis that medulloblasto- 60 children with medulloblastomas a known cellular response. mas are a subset of primitive neuro- who received similar treatment and There are, of course, limita- ectodermal tumors, differing only in whose outcome was known. Super- tions to any approach that involves their location in the cerebellum. Self- vised methods were used to “learn” the generation of such a large amount organizing maps are ideally suited for the distinction between survivors of data for each of a relatively small exploratory data analysis in the gen- and patients who failed treatment group of samples. One of the most sig- erally large and complex data sets gen- (Figure 3). Using take-one-out cross- nificant risks is finding statistically sig- erated in the study of a particular dis- validation, gene expression patterns nificant associations by chance. Con- ease, in our case brain tumors. Using predict survival with substantially sequently, identification of gene SOMs, we identified 2 distinct bio- more accuracy than current clinical expression patterns that may under- logical subtypes of medulloblasto- risk criteria. Several supervised analy- lie the pathogenesis of brain tumors mas with low and high ribosomal pro- sis methods showed a similar degree requires validation. Validation of the tein expression (Figure 2). Electron of accuracy, including k-nearest expression of single genes can be done microscopy subsequently con- neighbor, support vector machines, using well-established techniques firmed that these differences in ribo- and structural pattern localization such as Northern or Western blot- somal gene expression were re- analysis by sequential histograms. ting, as well as immunohistochemis- flected at a cellular level by differences Supervised methods were also try or in situ hybridization. Hypoth- in ribosome biogenesis. Although this used to successfully classify classic eses that arise from the interpretation was not an expected result, it pro- and desmoplastic medulloblasto- of significant patterns of gene expres- vided us with an interesting thera- mas (histologically confirmed by a sion can be tested in a variety of ways. peutic target. Sirolimus and its ana- single neuropathologist). These al- For example, we used electron mi- logues are currently under clinical gorithms allowed us not only to clas- croscopy to demonstrate that tu- investigation in tumors reliant on the sify tumors and predict outcome but mors with increased coordinate ex- PI3K signaling pathway and ribo- also to discover previously un- pression of ribosomal have some biogenesis.9 known relationships between coor- high numbers of free ribosomes. Our This approach, although useful dinate gene expression and tumor gene expression–based outcome pre- in its ability to pull out prominent characteristics. For example, we dem- dictions must be validated in an in- structure (eg, medulloblastoma vs onstrated that the genes encoding dependent, prospective cohort of

A B 0.000 0.000 Comp3 Comp3

4.000

13.000 0.000 4.000 Comp2

Comp2 18.000 0.000 0.000 –4.000 Comp1 0.000 Comp1 Medulloblastoma –4.000 –13.000 Malignant Glioma –18.000 Rhabdoid Tumor (Extrarenal) Atypical Rhabdoid Tumor (CNS) Normal Cerebellum PNETs

Figure 1. Principal component analysis, with axes representing the 3 principal components (Comp) (linear combinations of genes) accounting for most of the data variance, using all genes exhibiting variation across the data set (A) and using the top 10 genes most highly associated with each tumor class (B). CNS indicates central nervous system; PNETs, primitive neuroectodermal tumors.

(REPRINTED) ARCH NEUROL / VOL 60, MAY 2003 WWW.ARCHNEUROL.COM 677

©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/29/2021 Class 0 Class 1

40S RIBOSOMAL S19 40S RIBOSOMAL PROTEIN S17 Metallopanstimulin 1 RPL 12 Ribosomal Protein L12 Ribosomal Protein S18 Enhancer of Rudimentary Homolog mRNA RPS5 Ribosomal Protein S5 RPL31 Ribosomal Protein L31 RPL18 Ribosomal Protein L18 Ribosomal Protein S7 Type II Inosine Monophosphate Dehydrogenase Ribosomal Protein S10 MRNA RpS8 Gene for Ribosomal Protein S8 60S RIBOSOMAL PROTEIN L23 CAG-isl 7 (Trinucleotide Repeat-Containing Sequence) RPS 14 Gene (Ribosomal Protein S14) 60S RIBOSOMAL PROTEIN L13 40S RIBOSOMAL PROTEIN S15A Human Ribosomal Protein S24 LAMR1 Laminin Receptor (2H5 Epitope) Ribosomal Protein L27a mRNA Alpha-Tubulin mRNA LCR1 HOMOLOG SnRNP Core Protein Sm D2 mRNA RPL8 Ribosomal Protein L8 5-Aminoimidazole-4-Carboxamide-1-Beta-D-Ribonucleotide Transformylase/Inosinicase RPL7A Ribosomal Protein L7a RPS25 Ribosomal Protein S25 RPL35a Ribosomal Protein L35a RPS21 Ribosomal Protein S21 60S RIBOSOMAL PROTEIN L18A HSPB1 Heat Shock 27kD Protein 1 RPS28 Ribosomal Protein S28 RPS9 Ribosomal Protein S9 Alpha-Tubulin Isotype H2-Alpha Gene, Last Exon 11 Beta-Hydroxysteroid Dehydrogenase Type II mRNA RPL17 Ribosomal Protein L17 DNA Polymerase Delta Small Subunit mRNA RPL4 Ribosomal Protein L4 RPS3 Ribosomal Protein S3 Ribosomal Protein S6 Gene and Flanking Regions Ribosomal Protein L11 L44L Gene (L44-Like Ribosomal Protein) PERIPHERAL-TYPE BENZODIAZEPINE RECEPTOR DP2 (Humdp2) mRNA Ribosomal Protein S12 Polyadenylate Binding Protein II Ribosomal Protein L21 mRNA Ribosomal Protein S20 ALDH6 Aldehyde Dehydrogenase 6 SPTAN1 Spectrin, Alpha, Nonerythrocytic 1 (Alpha-Fodrin) HISTONE H3.3 Rab Geranylgeranyl Transferase, Alpha-Subunit ZNF91 Zinc Finger Protein 91 (HPF7, HTF10) ARF3 ADP-Ribosylation Factor 3 PHOSPHATIDYLINOSITOL 4-KINASE ALPHA Protein Containing SH3 Domain, SH3GL3 Clone 23745 mRNA GNB1 Guanine Nucleotide Binding Protein (G Protein) RalGDS-Like 2 (RGL2) mRNA, Partial cds Putative Ubiquitin C-Terminal Hydroplase (UHX1) mRNA Carboxylesterase (hCE-2) mRNA Neuroendocrine-Dlg (NE-Dlg) mRNA Gps 1 (GPS1) mRNA PDE4B Phosphodiesterase 4B Protocadherin 43 mRNA for Abbreviated PC43 Dynactin Serotonin N-Acetyltransferase Gene ATP6C Vacuolar H+ ATPase Proton Channel Subunit Unc-18 Homologue Vacuolar-Type H(+)–ATPase 115 kDa Subunit Cellular Adhesion Regulatory Molecule mRNA From TYL Gene HLA-DQB1 Gene SNRPN Small Nuclear Ribonucleoprotein Polypeptide N GNA11 Guanine Nucleotide Binding Protein (G Protein), Alpha 11 (Gq Class) SWI/SNF Complex 60 kDa Subunit (BAF60a) mRNA, Alternatively Spliced MEN1 Gene (Menin) Extracted From Human Menin (MEN1) Gene Phosphatidylinositol 4-Kinase P21-Activated Protein Kinase (Pak1) Gene Adducin, Alpha Subunit, Alt. Splice 2 DGCR6 Protein Human 16p13.1 BAC Clone CIT987SK-362G6 KIAA0275 Gene OX40L RECEPTOR PRECURSOR Steroid Receptor Coactivator (SRC-1) mRNA Striatum-Enriched Phosphatase (STEP) mRNA, Partial cds Trophinin mRNA CCAAT BOX-BINDING TRANSCRIPTION FACTOR 1 RAGE Gene THRA Thyroid Hormone Receptor, (v-erb-a) Oncongene Homolog KIAA0267 Gene, Partial cds KIAA0250 Gene RagB Protein Nip1 (NIP1) mRNA Bak Protein mRNA Rev-ErbAAlpha Protein (hRev Gene) ZNF173 Acid Finger Protein ZNF173 CLCN4 4 DLG2 Homolog 2 of Drosophila Large Discs

Figure 2. Self-organizing maps were used to discover 2 predominant classes of medulloblastoma: class 0 (high ribosome content) and class 1 (low ribosome content). The top 50 genes for each class are shown. Each column represents an individual sample and each row represents a single gene. Relative gene expression is depicted by red when high and blue when low. mRNA indicates messenger RNA.

(REPRINTED) ARCH NEUROL / VOL 60, MAY 2003 WWW.ARCHNEUROL.COM 678

©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/29/2021 A PROBABLE UBIQUITIN CARBOXYL-TERMINAL HYDROLASE Helix-Loop-Helix Protein Delta Max, Alt. Splice 1 NTRK3 Neurotrophic Tyrosine Kinase, Receptor, Type 3 (TrkC) Non-Lens Beta Gamma-Crystallin Like Protein (AIM1) mRNA, Partial cds INS Insulin POLYPOSIS LOCUS PROTEIN 1 EDN3 Endothelin 3 KIAA0224 Gene mRNA-Associated Protein mmp41 mRNA FBN1 Fibrillin 1 (Marfan Syndrome) LTC4S Leukotriene C4 Synthase KIAA0182 Gene, Partial Cds Dynein, Heavy Chain, Cytoplasmic Carboxylesterase (hCE-2) mRNA COL6A1 Collagen, Type VI, Alpha 1 Unknown Product SPINK1 Serine Protease Inhibitor, Kazal Type 1 Clone 23908 mRNA Sequence Clone 23865 mRNA Sequence SATB1 Special AT-Rich Sequence Binding Protein 1 RagB Protein PCOLCE Procollagen C-Endopeptidase Enhancer Lim-Domain Transcription Factor Lim-1 CD47 Antigen PTK7 Protein-Tyrosine Kinase 7 Ribosomal Protein L18a Homolog Arrestin, Beta 2 2 (hBNaC2) mRNA, Alternatively Spliced Cysteine-Rich Fibroblast Growth Factor Receptor (CFR-1) mRNA AMT Glycine Cleavage System Protein T (Aminomethyltransferase) CREB Gene, Exon Y ETS-RELATED TRANSCRIPTION FACTOR ELF-1 Myosin MMP2 Matrix Metalloproteinase 2 KIAA0190 Gene, Partial cds Prostaglandin Transporter hPGT mRNA 3 COX10 Cytochrome C Oxidase Subunit X MAGE-8 Antigen (MAGE8) Gene Splicing Factor SRp40-1 (SRp40) mRNA GGCX Gamma-Glutamyl Carboxylase Putative G Protein-Coupled Receptor (AZ3B) mRNA GAMMA-GLUTAMYLTRANSPEPTIDASE 5 PRECURSOR SUR Sulfonylurea Receptor (Hyperinsulinemia) NEURAL CELL ADHESION MOLECULE Apoptotic Cysteine Protease Mch4 (Mch4) mRNA Elastin, Alt. Splice 2 HEMATOPOIETIC LINEAGE CELL–SPECIFIC PROTEIN Phospholipase C LAR-Interacting Protein 1b mRNA

B HMGIY High Mobility Group (Nonhistone Chromosomal) Protein Isoforms I and Y High Conductance Inward Rectifier Alpha Subunit mRNA Nucleoside-Diphosphate Kinase CCND1 Cyclin D1 (PRAD1; Parathyroid Adenomatosis 1) Serine Hydroxymethyltransferase, Cytosolic, Alt. Splice 2 RPL7A Ribosomal Protein L7a Proteinase-Activated Receptor-2 mRNA GIPR Gastric Inhibitory Polypepetide Receptor LTBP1 Latent Transforming Growth Factor Beta Binding Protein 1 I Kappa B Epsilon (lkBe) mRNA Mitotic Centromere-Associated Kinesin mRNA Autotaxin mRNA Ribosomal Protein S18 RPS16 Ribosomal Protein S16 Semaphorin-III (Hsema-I) mRNA HOXB5 Homeo Box B5 (2.1 Protein) Keratinocyte Lectin 14 (HKL-14) mRNA TMOD Tropomodulin GPX2 Glutathione Peroxidase 2, Gastrointestinal SnRNP Core Protein Sm D2 mRNA CMRF35 mRNA RNS1 Ribonuclease A (Pancreatic) RPL32 Ribosomal Protein L32 PTB Ribosomal Protein L26 SWI/SNF Complex 60 KDa Subunit (BAF60c) mRNA 40S RIBOSOMAL PROTEIN S19 Ribosomal Protein S10 mRNA Enhancer of Rudimentary Homolog mRNA PDGF Receptor Beta-Like Tumor Suppressor (PRLTS) AFFX-BioC-3_st (Endogenous Control) Octamer Binding Trascription Factor 1 (OTF1) mRNA 40S RIBOSOMAL PROTEIN S17 RPS11 Ribosomal Protein S11 KIAA0165 Gene RIBOSE-PHOSPHATE PYROPHOSPHOKINASE III T-COMPLEX PROTEIN 1, GAMMA SUBUNIT Cri-Du-Chat Region mRNA, Clone CSA1 40S RIBOSOMAL PROTEIN S14 CYTOCHROME C OXIDASE POLYPEPTIDE VIII-LIVER/HEART PRECURSOR MYBL2 V-myb Avian Myeloblastosis Viral Oncogene Homolog-Like 2 Nuclear Antigen H731 mRNA MANB Mannosidase Alpha-B (Lysosomal) Pancreatic Gamma-Glutamyltransferase TKT Transketolase (Wernicke-Korsakoff Syndrome) High-Mobility Group Phosphoprotein Isoform I-C (HMGIC) mRNA SOX2 SRY (Sex Determining Region Y)-Box 2 CDC2 Cell Division Cycle 2, G1 to S and G2 to M Unproductively Rearranged Lg Mu-Chain mRNA V-Region (VD) HFat Protein HSPA6 Heat Shock 70-kDa Protein 6 (HSP70B')

Figure 3. Top 50 genes associated with survival (A) and treatment failure (B). Each column represents an individual sample and each row represents a single gene. Relative gene expression is depicted by red when high and blue when low. mRNA indicates messenger RNA.

(REPRINTED) ARCH NEUROL / VOL 60, MAY 2003 WWW.ARCHNEUROL.COM 679

©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/29/2021 (Figure 1B). These calculations were per- A Class A Class B formed using Mathsoft software avail- Unknown able on the Internet at http://www Mean Expression for Class .mathsoft.com.

Self-organizing Maps

We performed SOMs using the Gene- Cluster software package available on the Internet at http://www.genome .wi.mit.edu. As an exploratory data analy- sis method, SOMs identify groups of samples with common gene expression patterns within a large heterogeneous sample set. To calculate SOMs, one ini- tially randomly selects and maps a grid of nodes onto the tumor sample set. Through an iterative series of calculations testing B C the similarity of gene expression be- tween samples, the geometry of the nodes is adjusted to reflect the data structure. If the number of nodes exceeds the num- ber of “natural” clusters in the sample set, then the nodes will converge to reflect the DB DB natural clustering. In our case, applying this unsupervised approach to the me- dulloblastoma data set led to the discov- ery that 2 is the optimum number of groups identifiable by SOMs (Figure 2).

SUPERVISED

To build supervised classifiers, we de- fined target classes based on morpho- logic features, tumor class, or treat- Figure 4. Pictorial representation of supervised analysis methods for sample classification. A hypothetical ment outcome. The method is illustrated distribution of samples in multidimensional gene expression space is shown in 2 dimensions. The samples by our analysis of treatment outcome are identified as class A (red) or class B (green) to demonstrate how the following algorithms assign by (Figure 3). In this case, we created 2 binary decision an unclassified sample (blue). For outcome predictions, the 2 classes correspond to patients who died of tumor progression due to treatment failure vs survivors after therapy. A, k-Nearest neighbor classes of patients based on clinical out- assigns the unknown (test) sample to a class based on its proximity in gene expression space to the come. Gene expression profiles from pa- surrounding samples (neighbors). In this case, the unknown sample has a gene expression profile closer to tients who died of progressive disease that of the samples in class A. B, Weighted voting determines a decision boundary (DB) midway between due to treatment failure were com- the mean gene expression levels of 2 groups of samples. The closer each gene of the unknown sample is to pared with expression profiles of pa- the DB for a particular gene, the less weight that gene carries toward the assignment of the sample to an tients who were still alive at the end of outcome class. The unknown sample is assigned to the class with the most positive voting genes, class B in this example. C, Support vector machine algorithms determine a DB based on the optimum separation of the study and who had been followed for the 2 classes according to their gene expression profiles. It assigns a classification to the unknown sample at least 1 year after cessation of therapy. according to its position in gene expression space in relation to that of the DB. In this case, a hypothetical Genes correlated with the 2 outcome DB is shown assigning the unknown sample to class A. classes were identified by sorting all of the genes on the array according to the patients before gene expression pro- genes) that account for a significant frac- signal-to-noise statistic.10 We built a filing can be used for risk stratifica- tion of the variance in scatterplots. To sample classifier in cross-validation by tion in future clinical trials. It is evi- study the natural clustering of the brain removing 1 sample and then using the dent, then, that the hypotheses tumor samples, we initially considered rest as a training set and then repeating generated from the analysis of com- the subset of genes with the highest this procedure until all samples were variation across samples (Figure 1A). In tested as “unknowns.” Several models plex gene expression patterns must be this case, the top 3 principal compo- were built using different numbers of tested by independent measures be- nents account for approximately 43% of marker genes, and the final chosen fore final conclusions can be reached. the variance of the marker genes. We model was the one that minimized the then plotted principal components based total error (number of samples that were METHODS on the top 10 marker genes associated misclassified) in cross-validation. For with each tumor, selected by the signal- this, k-nearest neighbor, weighted vot- 10 UNSUPERVISED to-noise statistic. The top 3 principal ing, and support vector machine algo- components of this data set accounted rithms were used (Figure 4). Principal Component Analysis for approximately 61% of the variance, and the degree of separation and clus- k-Nearest Neighbors A simple multidimensional scaling of the tering of tumor types was significantly data set was obtained by plotting the top improved over that obtained by the The k-nearest neighbor algorithm11 was principal components (combinations of analysis of genes with highest variation used to predict the class of the un-

(REPRINTED) ARCH NEUROL / VOL 60, MAY 2003 WWW.ARCHNEUROL.COM 680

©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/29/2021 known sample by calculating the dis- COMMENT by the investigation of psychiatric tance of that sample from those sur- disorders, which seem to result rounding it in gene expression space from the interplay of polygenic and (class-specific marker genes, ie, those as- RELEVANCE TO THE STUDY OF NEUROSCIENCE epigenetic factors on multiple brain sociated with survival or treatment circuits.20 failure). The unknown sample was pre- AND THE PRACTICE dicted to be in one or the other out- OF NEUROLOGY come class based on the similarity of CONCLUSIONS gene expression with that of most of the The use of microarrays provides a k-nearest (neighbor) samples (Figure springboard from which we can start The development of array-based 4A). Marker genes were chosen from to examine the cellular pathogen- DNA mutation screening may, in those highly correlating with the pre- esis underlying neurological disease the future, prove to be beneficial determined classes using the signal-to- and perhaps narrow down the search for the identification of an indi- noise statistic. to a manageable group of therapeu- vidual’s genetic propensity to tic targets. Examples of this can be acquire disorders such as Alzhei- Weighted Voting seen in neurological disorders such mer or Parkinson disease. Conse- quently, at-risk candidates can be The weighted voting algorithm10 makes as multiple sclerosis, Huntington dis- ease, Parkinson disease, and Alzhei- selected for close monitoring, a weighted linear combination of rel- 14-17 evant “marker” or “informative” genes mer disease. Applying microar- intensive preventive care, and early obtained in the training set to provide a ray technology to a transgenic animal clinical intervention. Other applica- classification scheme for new samples. model of Huntington disease re- tions may include screening of The selection of marker genes for each sulted in the finding that genes en- patients for gene variants that affect outcome class was determined by com- coding certain neurotransmitters, cal- the individual’s response to certain puting the signal-to-noise statistic of each cium and retinoid signaling pathway medications, allowing the physician gene for the predefined classes. The al- components, were down-regulated, to tailor the best treatment regimen gorithm determined the decision bound- whereas those encoding inflam- for a given disease in an individual ary (halfway) between the outcome class matory components were up- patient. means for each gene (Figure 4B). To pre- regulated.16 These findings were un- dict the class of the unknown sample, USEFUL WEB SITES each gene in the sample expression pro- expected consequences of the mutant file casts a vote, and the unknown sample huntington protein and could only is assigned to the class with the most have been discovered on this scale by The following Web sites are useful positive voting genes. The distance of microarray analysis. in the study of microarray-based that sample from the decision bound- Multiple sclerosis is a complex functional genomics: ary determines the weight that each gene disorder with multiple clinical sub- carries in this voting process. The closer • Center for Genome Research: types that cannot be diagnosed by http://www-genome.wi.mit.edu/ each gene of the unknown sample is to clinical criteria at initial presenta- the decision boundary for a particular 18 • National Center for Biotechnol- tion. Immunomodulatory treat- ogy Information: http://www gene, the less weight that gene carries ment has proved to be relatively toward the assignment of the sample to .ncbi.nlm.nih.gov/ an outcome class. Confidence in the class successful in relapsing-remitting • Affymetrix: http://www.affymetrix prediction of the unknown sample was disease, but it is not as useful in .com/products/app_exp.html. determined by the size of the voting mar- primary or secondary progressive gin responsible for putting the sample disease.18,19 To date, there are no in in one class vs the other. vivo markers that allow specific Accepted for publication December 11, direction of treatment. Microarray 2002. Support Vector Machines analysis has begun to dissect the Author contributions: Study molecular heterogeneity of multiple concept and design (Dr Pomeroy); ac- The basic idea behind support vector ma- sclerosis, identifying genes related quisition of data (Drs Sturla, Fer- chines is to construct an optimal class- to cell metabolism, structure, cyto- nandez-Teijeiro, and Pomeroy); separating hyperplane (decision bound- ary) by mapping the gene expression kines, and cell adhesion molecules. analysis and interpretation of data data to a high-dimensional space.12,13 Lin- In addition, a gene not previously (Drs Sturla and Pomeroy); drafting ear separation in this higher dimen- associated with multiple sclerosis, of the manuscript (Drs Sturla, Fer- sional space corresponds to a nonlin- encoding the Duffy chemokine nandez-Teijeiro, and Pomeroy); criti- ear decision boundary separating the 2 receptor, was identified using this cal revision of the manuscript for im- outcome classes (Figure 4C). This al- technology.14,17 Although these portant intellectual content (Dr lowed us to more optimally separate out- results are preliminary, eventually Pomeroy); statistical expertise (Dr come classes than with weighted vot- microarray expression analysis Pomeroy); obtained funding (Dr ing, where the decision boundary is may lead to the identification of Pomeroy); administrative, techni- linear. As with weighted voting, samples markers that are detectable in living cal, and material support (Dr Sturla); are assigned to a class by their position in relation to the decision boundary, and, patients, allowing prognosis to be study supervision (Dr Pomeroy). again, confidence of classification is de- accurately predicted at the time of Corresponding author and re- pendent on the relative distance of the initial diagnosis and treatment to prints: Scott L. Pomeroy, MD, PhD, sample from that boundary into a par- be tailored accordingly. An even Division of Neuroscience, Depart- ticular class. greater future challenge is offered ment of Neurology, Children’s Hospi-

(REPRINTED) ARCH NEUROL / VOL 60, MAY 2003 WWW.ARCHNEUROL.COM 681

©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/29/2021 tal, 300 Longwood Ave, Boston, MA 6. DeRisi J, Penland L, Brown PO, et al. Use of cDNA 14. Whitney LW, Becker KG, Tresser NJ, et al. Analy- 02115 (e-mail: scott.pomeroy@tch microarray to analyse gene expression patterns sis of gene expression in mutiple sclerosis le- in human cancer. Nat Genet. 1996;14:457-460. sions using cDNA microarrays. Ann Neurol. 1999; .harvard.edu). 7. Lockhart DJ, Winzeler EA. Genomics, gene ex- 46:425-428. pression and DNA arrays. Nature. 2000;405:827- 15. Ginsberg SD, Hemby SE, Lee VM, Eberwine JH, 836. Trojanowski JQ. Expression profile of tran- REFERENCES 8. Pomeroy SL, Tamayo P, Gaasenbeek M, et al. Pre- scripts in Alzheimer’s disease tangle-bearing CA1 diction of central nervous system embryonal neurons. Ann Neurol. 2000;48:77-87. 1. Theillet C, Orsetti B, Redon R, Manoir SD. Ge- tumor outcome based on gene expression. Na- 16. Luthi-Carter R, Strand A, Peters NL, et al. De- nomic profiling: from molecular genetics to DNA ture. 2001;415:436-442. creased expression of striatal signaling genes in arrays. Bull Cancer. 2001;88:261-268. 9. Hidalgo M, Rowinsky EK. The rapamycin- a mouse model of Huntington’s disease. Hum Mol 2. Jain AN, Chin K, Borresen-Dale AL, et al. Quanti- sensitive signal transduction pathway as a target Genet. 2000;9:1259-1271. tative analysis of chromosomal CGH in human for cancer therapy. Oncogene. 2000;19:6680- 17. Steinman L. Gene microarrays and experimental breast tumors associates copy number abnor- 6686. demyelinating disease: a tool to enhance seren- malities with p53 status and patient survival. Proc 10. Eisen MB, Spellman PT, Brown PO, Botstein D. dipity. Brain. 2001;124:1897-1899. Natl Acad Sci U S A. 2001;98:7952-7957. Cluster analysis and display of genome-wide ex- 18. Bitsch A, Bruck W. Differentiation of multiple scle- 3. Hui AB, Lo KW, Yin XL, Poon WS, Ng HK. Detec- pression patterns. Proc Natl Acad Sci U S A. 1998; rosis subtypes: implications for treatment. CNS tion of multiple gene amplifications in glioblas- 95:14863-14868. Drugs. 2002;16:405-418. toma multiforme using array-based comparative 11. Dasarathy VB. Nearest Neighbour (NN) Norms: NN 19. Goodin DS, Frohman EM, Garmany GP, et al. genomic hybridisation. Lab Invest. 2001;81:717- Pattern Classification Techniques. Los Alamitos, Disease modifying therapies in multiple sclero- 723. Calif: IEEE Computer Society Press; 1991. sis: report of the Therapeutics and Technology 4. Clarke PA, Poele RT, Wooster R, Workman P. Gene 12. Mukherjee S, Tamayo P, Mesirov JP, Slonim D, Assessment Subcommittee of the American expression and microarray analysis in cancer bi- Verri A, Poggio T. Support Vector Machine Clas- Academy of Neurology and the MS Council for ology, pharmacology, and drug development: sification of Microarray Data, CBCL Paper #182/AI Clinical Practice Guidelines. Neurology. 2002;58: progress and potential. Biochem Pharmacol. 2001; Memo #1676. Cambridge: Massachusetts Insti- 169-178. 62:1311-1336. tute of Technology; 1999. 20. Mirnics K, Middleton FA, Lewis DA, Levitt P. 5. Schena M, Shalon D, Davis RW, Brown PO. Quan- 13. Brown MP, Grundy WN, Lin D, et al. Knowledge- Analysis of complex brain disorders with gene titative monitoring of gene expression patterns with based analysis of microarray gene expression data expression microarrays: schizophrenia as a dis- a complementary DNA microarray. Science. 1995; by using support vector machines. Proc Natl Acad ease of the synapse. Trends Neurosci. 2001;24: 270:467-470. Sci U S A. 2000;97:262-267. 479-486.

Call for Papers

ARCHIVES Express

The ARCHIVES launched a new ARCHIVES Express section in the September 2000 issue. This section will enable the editors to publish highly selected papers within approxi- mately 2 months of acceptance. We will consider only the most significant research, the top 1% of accepted pa- pers, on new important insights into the pathogenesis of disease, brain function, and therapy. We encourage authors to send their most exceptional clinical or basic research, designating in the cover letter a request for ex- pedited ARCHIVES Express review. We look foward to pub- lishing your important new research in this accelerated manner.

Roger N. Rosenberg, MD Editor

(REPRINTED) ARCH NEUROL / VOL 60, MAY 2003 WWW.ARCHNEUROL.COM 682

©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/29/2021