Supplemental Materials Methods, Figures and Tables Analysis Of

Supplemental Materials Methods, Figures and Tables Analysis of blood-based gene expression in idiopathic Parkinson disease Supplementary Methods Figures: e-1, e-2, e-3 Tables: e-1, e-2, e-3, e-4 Ron Shamir, PhD,* Christine Klein, MD,*,† David Amar, PhD,* Eva-Juliane Vollstedt, MD, Michael Bonin, PhD, Marija Usenovic, PhD, Yvette C. Wong, PhD, Ales Maver, MD, PhD, Sven Poths, Hershel Safer, PhD, Jean-Christophe Corvol, MD, PhD, Suzanne Lesage, PhD, Ofer Lavi, MSc, Günther Deuschl, MD, PhD, Gregor Kuhlenbaeumer, MD, PhD, Heike Pawlack, BSc, Igor Ulitsky, PhD, Meike Kasten, MD, Olaf Riess, MD, Alexis Brice, MD, Borut Peterlin, MD, PhD,† Dimitri Krainc, MD, PhD,† 1 Supplementary Methods Sample collection, RNA isolation and microarray processing We collected samples of venous blood using a standardized blood withdrawal protocol. PaxGene (Qiagen) and EDTA tubes were obtained. As participants were recruited at different time points, PaXGene tubes were inverted 10x directly after blood collection, placed at room temperature for 24 hours and subsequently frozen at -80 0C until RNA extraction. RNA was extracted after patient recruitment at all centers was completed and performed by the same individual. Samples were processed according to manufacturer’s protocols. From each patient, four whole blood samples were collected in Paxgene Vacutainer (BD Biosciences). RNA was isolated using Paxgene 96 RNA purification kit (BD Biosciences). Quality of RNA specimen was checked on an Agilent BioAnalyzer 2100 (Agilent, Germany) and processed for Affymetrix Gene Chips using Affymetrix 3´-IVT Express labeling kit (Affymetrix, Santa Clara). For globin reduction, from each sample, 1.5 μg of total RNA was treated using the GLOBINclear™ Human Kit (Ambion, Austin, TX, USA) according to the manufacturer's instructions. Fragmented and labeled cDNA was hybridized onto GeneChip® Human Genome U133 Plus 2.0 Array (Affymetrix). Staining of biotinylated cDNA and scanning of arrays were performed according to the manufacturer's recommendations. Computational Preprocessing The original data contained microarray expression profiles from 523 individuals. We tested three preprocessing methods: RMA, GC-RMA, and MAS5. Under the assumption that blood expression profiles should be highly correlated, we tested the effect of the methods on the sample correlation. Correlation distribution between sample groups, where grouping of the samples was by the year of the RNA extraction, demonstrated that the MAS5 method achieved lower correlation scores than the other methods, while RMA had a slight advantage over GC- RMA (Figure e-1). Using RMA as the selected preprocessing method, the analysis identified 37 samples that had low correlation with other years (<0.8), which were removed from the dataset. 2 Batch Effect Reduction Batches refer to a set of samples produced in the same laboratory on the same date. The majority of batches contained both patient and control samples. To test if the data contained such effects, we first applied an SVM classifier using all probes to predict the batch of samples. We grouped batches by year and lab, producing five batches. Leave-one-out cross-validation achieved ROC score of 0.999, and 98.6% accuracy, showing an extremely high batch effect. Similar results were obtained even after removing thousands of genes that individually had significant association with the batches. We thus used the fSVA method1 to reduce batch effects which produced a model that reduced confounding effects in new independent samples from new batches. The fSVA method searched for surrogate variables that represented a significant source of variation in the data, which could be correlated with the batch or represent variance due to unknown factors. The training set was used to infer surrogate variables, which were then regressed out from the training set and subsequently from the validation and test sets. Cross validation To test the performance of our framework on data from new batches, we repeatedly removed a complete batch from the data, learned a classifier (i.e., the complete process, including the fSVA model, feature selection and SVM) using the samples from the remaining batches, and then used it to predict the labels of samples in the excluded batch. In the first stage of our analysis in which we analyzed the training set, we removed batches with less than 10 samples. Comparison to previous studies We compared our signature to those obtained by six other studies. We could not perform meta- analysis since most studies measured only a few genes2-4 and/or had very small sample sizes5, 6. However, we did apply the signature of Scherzer et al. 2007 to our data and vice versa. Analysis and Statistics Classification was performed using linear SVM. We used the CMA R package7 for feature selection. Functional and pathway enrichment analyses were done in EXPANDER8. Network 3 analysis was done using the Cytoscape9 plug-in GeneMANIA10. All statistical analyses were performed in R. ROC curves were generated using the ROCR R package11. All datasets have been deposited in Gene Expression Omnibus (GEO; accession number GSE99039). The analysis R code is available at https://github.com/Shamir-Lab/GENEPARK. Supplementary References 1. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 2012;28:882-883. 2. Grunblatt E, Zehetmayer S, Jacob CP, Muller T, Jost WH, Riederer P. Pilot study: peripheral biomarkers for diagnosing sporadic Parkinson's disease. Journal of neural transmission 2010;117:1387-1393. 3. Molochnikov L, Rabey JM, Dobronevsky E, et al. A molecular signature in blood identifies early Parkinson's disease. Molecular neurodegeneration 2012;7:26. 4. Chikina MD, Gerald CP, Li X, et al. Low-variance RNAs identify Parkinson's disease molecular signature in blood. Movement disorders : official journal of the Movement Disorder Society 2015;30:813-821. 5. Shehadeh LA, Yu K, Wang L, et al. SRRM2, a potential blood biomarker revealing high alternative splicing in Parkinson's disease. PloS one 2010;5:e9104. 6. Mutez E, Larvor L, Lepretre F, et al. Transcriptional profile of Parkinson blood mononuclear cells with LRRK2 mutation. Neurobiology of aging 2011;32:1839-1848. 7. Slawski M, Daumer M, Boulesteix AL. CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data. BMC bioinformatics 2008;9:439. 8. Ulitsky I, Maron-Katz A, Shavit S, et al. Expander: from expression microarrays to networks and functions. Nature protocols 2010;5:303-322. 9. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 2011;27:431-432. 10. Montojo J, Zuberi K, Rodriguez H, et al. GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics 2010;26:2927-2928. 11. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics 2005;21:3940-3941. 4 Supplementary Figures Figure e-1. Preprocessing of the original 523 blood expression profiles. We tested the effect of different preprocessing methods on the sample correlation under the assumption that blood expression profiles should be highly correlated. The distribution of the correlation between samples is shown after preprocessing by three different methods (RMA, GC-RMA, and MAS5), grouping the samples by the year of the RNA extraction. 37 samples had low correlation with other years (<0.8) and were removed from the data. A) Correlation between pairs of 2010 microarrays. RMA and GC-RMA average > 0.96. B) Correlation between 2008a and 2009 microarrays. RMA average: 0.93, GC-RMA average: 0.9. For all graphs: y-axis = count (of sample pairs), x-axis = correlation. 5 Figure e-2. Leave-batch-out cross-validation analysis on the training set. Each point shows the AUC score or accuracy (y-axis) for a different signature size (x-axis). 6 Figure e-3. Gene signature performance on idiopathic PD, controls and other neurodegenerative diseases (NDD). The plots show performance using our signature on the independent test set when 1) comparing IPD to all others (Control and NDD) (purple line, AUC score 0.63, p=0.033), and 2) comparing diseases (IPD and NDD) to controls (orange line, AUC score 0.72, p=7.5E-5). Of note, our classifier was not initially trained to differentiate between IPD and NDD. For comparison, the result reported in Scherzer et al. (2007) is also shown (grey line, AUC score 0.69, p=0.047), which used fewer samples (105 individuals) and a classifier initially trained to differentiate between PD samples and both NDD samples and healthy controls. 7 Supplementary Tables Table e-1. Demographic data of clinical cohorts after preprocessing IPD NDD NDD Controls (MSA, CBD, (Subset PSP or PDD) of HD) Total # 205 21 19 233 Age 62 ± 11 66 ± 10 51 ± 10 58 ± 30 % Males 95 (50%) 10 (56%) 8 (42%) 75 (35%) Age at onset 56 ± 11 65 ±8 43 ± 16 N/A UPDRS Scales UPDRS I 2.2 ± 2.1 2.0 ± 2.3 0.5 ± 0.9 UPDRS II 9.6 ± 6.3 22.7 ± 7.7 0.2 ± 0.5 UPDRS III 23.3 ± 9.9 29.3 ± 5.4 0.6 ± 1.4 UPDRS IV 2.6 ± 3.0 1.6 ± 1.9 0.1 ± 0.4 Hoehn & Yahr Stages Stage 0 7 0 Stage 1 58 0 Stage 2 70 1 Stage 3 30 3 Stage 4 8 2 Stage 5 0 3 MoCA score 27 ± 3 26 ± 3 UPDRS I-IV = Unified Parkinson’s Disease Rating Scale part I-IV MoCA = Montreal Cognitive Assessment 8 Table e-2. Characteristics of training, validation and test cohorts after preprocessing Training Validation Test # IPDs 140 35 30 # Controls 153 40 40 Average Age 62.48 64.84 65.35 Age SD 10.47 8.72 9.14 % Female 0.61 0.46 0.52 9 Table e-3.

Supplemental Materials Methods, Figures and Tables Analysis Of

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support