computational proteomics

Laboratory for Computational Proteomics

www.FenyoLab.org E-mail: [email protected] Facebook: NYUMC Computational Proteomics Laboratory Twitter: @CompProteomics Subscriber access provided by NEW YORK UNIV Article Integrated 19 Transcriptomic and Proteomic Datasets Derived from Glioma Cancer Stem Cell Lines Cheryl F. Lichti, Huiling Liu, Alexander S. Shavkunov, Ekaterina Mostovenko, Erik P. Sulman, Ravesanker Ezhilarasan, Qianghu Wang, Roger Kroes, Joseph R. Moskal, David Fenyö, Betül Akgöl Oksuz, Charles A Conrad, Frederick F. Lang, Frode S. Berven, Ákos Végvári, Melinda Rezeli, György Marko-Varga, Sophia Hober, and Carol L. Nilsson J. Proteome Res., Just Accepted Manuscript • Publication Date (Web): 24 Nov 2013 Downloaded from http://pubs.acs.org on November 26, 2013

Just Accepted

“Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties. Page 1 of 42 Journal of Proteome Research

1 2 3 4 5 6 Integrated Transcriptomic and Proteomic Datasets Derived from 7 8 Glioma Cancer Stem Cell Lines 9 10 11 12 13 14 15 Cheryl F. Lichti 1, Huiling Liu 1, Alexander S. Shavkunov 1, Ekaterina Mostovenko 1, Erik P. 16 17 2 2 3 4 18 Sulman , Ravesanker Ezhilarasan , Qianghu Wang , Roger A. Kroes , Joseph C. 19 20 Moskal4, David Fenyö 5, Betül Akgöl Oksuz 5, Charles A. Conrad 6, Frederick F. Lang 7, 21 22 Frode S. Berven 8, Ákos Végvári 9, Melinda Rezeli 9, György MarkoVarga 9, Sophia 23 24 10 1 25 Hober , and Carol L. Nilsson* 26 27 28 29 1Department of Pharmacology and Toxicology, UTMB Cancer Center, University of Texas 30 31 32 Medical Branch, Galveston, Texas 77555, United States 33 34 35 2 36 Department of Radiation Oncology, The University of Texas M.D. Anderson Cancer Center, 37 38 Houston, Texas, United States 39 40 41 42 3Department of Bioinformatics, The University of Texas M.D. Anderson Cancer Center, 43 44 45 Houston, Texas, United States 46 47 48 49 4The Falk Center for Molecular Therapeutics, McCormick School of Engineering and Applied 50 51 Sciences, Northwestern University, 1801 Maple Street, Evanston, Illinois 60201, United States 52 53 54 55 5Department of Biochemistry, New York University Langone Medical Center 56 57 58 New York, New York 10016, United States 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 2 of 42

1 2 3 4 5 6 6 Department of NeuroOncology, The University of Texas M.D. Anderson Cancer Center, 7 8 Houston, Texas, United States 9 10 11 12 7Department of Neurosurgery, The University of Texas M.D. Anderson Cancer Center, Houston, 13 14 Texas, United States 15 16 17 18 8 19 Department of Biomedicine, University of Bergen, 5009 Bergen, Norway 20 21 22 9 23 Clinical Science & Imaging, Biomedical Center, Department of Measurement 24 25 Technology and Industrial Engineering, Lund University, 221 84 Lund, Sweden 26 27 28 29 10 School of Biotechnology, Department of Proteomics, Royal Institute of Technology, 106 91 30 31 32 Stockholm, Sweden 33 34 35 36 *To whom correspondence should be addressed: Department of Pharmacology and Toxicology, 37 38 UTMB Cancer Center, University of Texas Medical Branch, Galveston, Texas 775551074, 39 40 United States, Telephone: +14097471840. FAX: +14097729648. Email: 41 42 [email protected]. 43 44 45 46 47 Running title: Chromosome 19 Transcripts and in Glioma Stem Cells 48 49 50 51 52 53 54 55 56 57 58 59 60 2

ACS Paragon Plus Environment Page 3 of 42 Journal of Proteome Research

1 2 3 Abstract 4 5 6 One subproject within the global Chromosome 19 Consortium is to define chromosome 7 8 19 and protein expression in gliomaderived cancer stem cells (GSCs). 9 10 11 Chromosome 19 is notoriously linked to glioma by 1p/19q codeletions and clinical tests 12 13 are established to detect that specific aberration. GSCs are tumorinitiating cells and are 14 15 hypothesized to provide a repository of cells in tumors that can selfreplicate and be 16 17 18 refractory to radiation and chemotherapeutic agents developed for the treatment of 19 20 tumors. In this pilot study, we performed RNASeq, labelfree quantitative protein 21 22 measurements in six GSC lines, and targeted transcriptomic analysis using a 23 24 25 chromosome 19specific microarray in an additional six GSC lines. The data have been 26 27 deposited to the ProteomeXchange with identifier PXD000563. Here, we present 28 29 insights into differences in GSC gene and protein expression, including the identification 30 31 32 of proteins listed as having no or low evidence at the protein level in the Human Protein 33 34 Atlas, as correlated to chromosome 19 and GSC subtype. Furthermore, the 35 36 37 upregulation of proteins downstream of adenovirusassociated viral integration site 1 38 39 (AAVS1) in GSC11 in response to oncolytic adenovirus treatment was demonstrated. 40 41 Taken together, our results may indicate new roles for chromosome 19, beyond the 42 43 44 1p/19q codeletion, in the future of personalized medicine for glioma patients. 45 46 47 48 Keywords : ChromosomeCentric Human Proteome Project, proteins, mRNA, RNA 49 50 51 Seq, mass spectrometry, bioinformatics, glioma, glioma stem cells, cancer proteomics, 52 53 chromosome 19, oncolytic virus, neurocan core protein, symplekin 54 55 56 57 58 59 60 3

ACS Paragon Plus Environment Journal of Proteome Research Page 4 of 42

1 2 3 Introduction 4 5 6 The objective of the ChromosomeCentric Human Proteome Project (CHPP) is 7 8 to map all proteins encoded by the human chromosomal complement and identify 9 10 1 11 compelling correlates of protein biological functions and their role in disease . The C 12 13 HPP has joined forces with the Encyclopedia of DNA Elements (ENCODE) project to 14 15 achieve these goals 2. Importantly, the ENCODE consortium has provided an initial 16 17 3 18 “parts list” of the , and continues to refine the first draft, published in 19 20 2012. The ENCODE project is highly synergistic with the goals of the CHPP, in which 21 22 global research efforts are focused on the identification and characterization of missing 23 24 25 proteins, those that lack any credible mass spectrometric or detection. 26 27 Because not all human proteins are expressed in all tissues, a variety of normal and 28 29 diseased tissues are currently under scrutiny by consortium members. The 30 31 32 Chromosome 19 Consortium is an international, multicenter, multiinvestigator group 33 34 that develops complementary analytical platforms and integrates results derived from 35 36 4 37 evidence of chromosome 19 activity in human tissues . Several disease studies are 38 39 underway within the Consortium, including the role of chromosome 19 in 40 41 neurodegeneration, lung cancer, prostate cancer, and glioma. 42 43 44 The inherent genomic instability in gliomas results in chromosomal duplications, 45 46 amplifications of specific , and activating mutations 5. Chromosome 19 is linked to 47 48 glioma by 1p/19q codeletions, which are a positive prognostic indicator: 123 months 49 50 6 51 mean survival versus 16 months in patients with tumors that are 1p/19q intact . 52 53 Because tumors with the codeletion respond favorably to temozolomide, clinical testing 54 55 is recommended. 7 Further, upregulation of a novel, previously uncharacterized 56 57 58 59 60 4

ACS Paragon Plus Environment Page 5 of 42 Journal of Proteome Research

1 2 3 chromosome 19 protein, ER membrane protein complex subunit 10 (EMC10), located in 4 5 6 a genomic region implicated in many cancers (19q13.33) was found to suppress glioma 7 8 growth. However, amplification and overexpression of Rhophilin2 (RHPN2), another 9 10 11 chromosome 19 protein, was recently linked to dramatically decreased survival in 12 8 13 glioma patients. 14 15 In many cancer types, small subsets of tumorinitiating and treatmentresistant 16 17 18 cells have been identified. In the cancer stem cell (CSC) hypothesis, stem cells have 19 20 the capacity to make new tumors and produce progeny cells of many different types. 21 22 Cancer stemlike cells have been described for leukemias 9 as well as solid tumors, 23 24 10 25 including glioma . Traditional chemotherapeutic regimens developed to debulk tumors 26 27 have little effect on CSCs, and radiation therapy is equally inefficient 11 . 28 29 The Cancer Genome Atlas Research Network has defined moleculargenetic 30 31 12 32 subtypes of gliomas: mesenchymal, classical, neural and proneural . Mesenchymal 33 34 subtypes are characterized by neurofibromin (NF1, chromosome 17) loss, phosphatase 35 36 37 and tensin homolog (PTEN, chromosome 10) loss or mutation, and inactivating cellular 38 39 tumor antigen p53 (TP53, chromosome 17) mutations. Classical subtypes are typified 40 41 by EGFR (chromosome 7) amplification, overexpression or mutation and PTEN loss or 42 43 44 mutation. Proneural (PN1) subtypes are characterized by plateletderived growth factor 45 46 receptor alpha (PDGRFA, chromosome 4) amplification, mutations of isocitrate 47 48 dehydrogenase (IDH1, chromosome 2) and phosphatidylinositol3kinase (PI3K, protein 49 50 51 group), and expression of proneuronal markers such as OLIG2 (chromosome 21). 52 53 Finally, neural (PN2) subtypes carry EGFR amplification or overexpression and express 54 55 neuronal markers. While tumor classifications based on this system differ in their 56 57 58 59 60 5

ACS Paragon Plus Environment Journal of Proteome Research Page 6 of 42

1 2 3 response to treatments and partly guide patient treatment plans, median survival rates 4 5 6 do not differ greatly between patients with gliomas characterized by different molecular 7 8 genetic subtypes. 9 10 11 Expanding our understanding of the biological drivers of treatment resistance in 12 13 GSCs could serve to identify new therapeutic targets. In this study, we applied a global, 14 15 integrated transcriptomicproteomic workflow (Figure 1 ) in the analysis of six GSC lines 16 17 4 18 and a targeted C19 transcription analysis of twelve GSC lines, derived from four 19 20 different GSC subtypes. We compared protein and mRNA expression in GSCs to their 21 22 current evidence status in the Human Protein Atlas 13 and neXtProt 14 ; several of the 23 24 25 proteins identified were previously listed as having no to low evidence of expression. 26 27 We report our findings derived from analysis of six GSC lines and the intercell line 28 29 differences in chromosome 19 expression at the level of transcription and protein 30 31 32 expression and discuss them in the context of GSC subtypes. Further, proteomic 33 34 studies of GSC responses to a therapeutic oncolytic adenovirus, Delta24RGD, 35 36 37 revealed differential upregulation of proteins located downstream from AAVS1 38 39 (19q13.4) in one of four GSC lines. 40 41 42 43 44 Experimental Section 45 46 Cell culture conditions 47 48 Isolation of GSCs (GSC2, 11, 13, 17, 23, 811) from patient tumors was 49 50 15 51 performed as previously described in accordance with the institutional review board of 52 53 The University of Texas M.D. Anderson Cancer Center, and are named in the order that 54 55 they were acquired. GSCs were cultured according to a published method 15,16 . All cell 56 57 58 59 60 6

ACS Paragon Plus Environment Page 7 of 42 Journal of Proteome Research

1 2 3 lines were tested to exclude the presence of Mycoplasma infection. Downstream 4 5 6 transcriptomic and proteomic analyses were performed on identical cell culture batches 7 8 in order to reduce the influence of batch variance in the comparative assays. 9 10 11 12 13 Oncolytic virus treatment 14 15 GSCs (GSC2, 11, 13, and 23) were cultured as described in the previous 16 17 4 18 section, then dissociated and plated in 12 well plates (2 x 10 cells per well) and 19 20 immediately infected with Delta24RGD at a multiplicity of infection of 10. Cells were 21 22 harvested for proteomic analysis at 24 hrs and 48 hrs after infection or control 23 24 25 treatment. 26 27 28 29 Transcriptomic Analysis of 6 Glioma Stem Cell Lines 30 31 32 Pairedend sequencing assays were performed on 37 glioma stem cell lines 33 34 (GSC) using Illumina HiSeq platform. Only data from cell lines matching validated 35 36 37 proteomic datasets (six as of the date of manuscript submission) were considered in 38 39 this study. Each GSC line generated about 50 million pairedends, each end was 75 bp 40 41 in size. The average phred quality scores (APQS) 17 of each specimen ranged from 42 43 44 35.58 to 36.10. In order to generate more stable transcriptome mapping results, a 45 46 trimming procedure was performed by using phred quality score<7. After trimming, the 47 48 APQSs ranged from 37.01 to 37.23. Then BurroughsWheeler alignment 18 , Samtools 19 , 49 50 20 51 and Genome Analysis Toolkit were used to map short reads to the human 52 53 transcriptome and RPKM values were generated for each of the 135,994 transcripts of 54 55 21,165 protein coding genes in Ensembl database (release version: Ensembl 64). 56 57 58 59 60 7

ACS Paragon Plus Environment Journal of Proteome Research Page 8 of 42

1 2 3 4 5 6 Targeted Transcriptomic Analysis of C19 Transcripts 7 8 A custom targeted oligonucleotide microarray platform was used to examine the 9 10 4,21 11 expression of 1,382 chromosome 19specific transcripts . A full description of our 12 4 13 targeted transcriptome profiling method was recently published . Importantly, the quality 14 15 of this platform has been rigorously evaluated in terms of dynamic range, discrimination 16 17 18 power, accuracy, reproducibility and specificity. The ability to reliably measure even low 19 20 levels of statistically significant differential stems from coupling (a) 21 22 stringently designed and quality controlled chip manufacturing and transcript labeling 23 24 25 protocols, (b) rigorous data analysis algorithms, and (c) flexible ontological and 26 27 interactome analyses capable of demonstrating significant correlations between the 28 29 expression of specific gene sets. 30 31 32 In these studies, the relative quantitation of individual transcript abundance in 12 33 34 GSC lines was compared to that derived from human neural stem cells (hNSCs, a kind 35 36 37 gift from Dr. Ping Wu, UTMB Dept. of Neuroscience). Three each mesenchymal (GSC2, 38 39 17, 37), classical (106, 11, 47), PN1 (811, 23, 46) and PN2 (13, 34, 35) were included 40 41 in the study. Briefly, total RNA extracted and purified from 12 defined gliomaderived 42 43 44 stem (GSC) cell lines and normal hNSCs was used as the substrate for RNA 45 46 amplification and labeling. We employed a universal reference design 22 and 47 48 comprehensive statistical analysis platforms to facilitate the acquisition of expression 49 50 51 profiles. Following hybridization and sequential highstringency washes, individual Cy3 52 53 and Cy5 fluorescence hybridization to each spot on the microarray was quantitated by a 54 55 high resolution confocal laser scanner. For each time point, RNA samples from each of 56 57 58 59 60 8

ACS Paragon Plus Environment Page 9 of 42 Journal of Proteome Research

1 2 3 lines were analyzed in triplicate. As each transcriptspecific oligonucleotide was also 4 5 6 spotted in triplicate on the array, there were a total of 27 individual expression 7 8 measurements per gene in each experimental group. Statistically significant 9 10 11 differentially expressed genes were identified using the permutationbased Significance 12 13 Analysis of Microarrays algorithm (SAM software package, v4.0, Stanford University, 14 15 Palo Alto, CA) 23 . In our analyses, appropriately normalized data were analyzed using 16 17 18 twoclass, unpaired analysis on a minimum of 5000 permutations and was performed by 19 20 comparing expression data derived from the different GSC lines versus hNSCs. In order 21 22 to maximize the information derived from the subsequent GoMinerbased ontological 23 24 25 analyses, the cutoff for significance in these experiments was set at a false discovery 26 27 rate (FDR) of approximately 10%. 28 29 Genes identified as differentially expressed by SAM analysis were examined for 30 31 32 their biological association to the (GO) categories as defined by the GO 33 34 Consortium. This provides both additional statistical stringency to the identified genes 35 36 37 and identifies groups of related genes or “gene families” on chromosome 19 which were 38 39 modulated in the various lines. Analyses were performed using the ontological mapping 40 41 software GoMiner 24 . This software calculated the enrichment or depletion of individual 42 43 44 ontological categories with genes that had changed expression and identified cellular 45 46 pathways potentially relevant to GSC development. Pathways within three independent 47 48 functional hierarchies, namely biological process, molecular function, and cellular 49 50 51 component, were queried. 52 53 54 55 Proteomic Analysis of GSCs 56 57 58 59 60 9

ACS Paragon Plus Environment Journal of Proteome Research Page 10 of 42

1 2 3 2x10 6 cells were lysed with RIPA buffer (Thermo Fisher Scientific, Rockford, IL, 4 5 6 25mM TrisHCl pH 7.6, 150 mM NaCl, 1% NP40, 1% sodium deoxycholate, 0.1% SDS) 7 8 mixed with Halt protease inhibitor EDTAfree, Halt phosphatase inhibitor cocktail and 9 10 11 Pierce universal nuclease (Thermo Fisher Scientific, Rockford, IL).The protein 12 13 concentration was determined by BCA Protein Assay Kit (Pierce), and the resulting 14 15 protein (100 g total protein) was reduced and alkylated. Five L of 200 mM tris (2 16 17 18 carboxyethyl) phosphine (TCEP) buffered with triethylammonium bicarbonate (TEAB) 19 20 was added to each sample (final TCEP concentration was 10 mM) and incubated at 55 21 22 °C for 1 h. Five L of 375 mM iodoacetamide (buffered with TEAB) was added and 23 24 25 incubated in the dark for 30 min. Proteins were precipitated in four volumes (440 L) of 26 27 ice cold acetone for 2 h at 20 °C. Samples were centrifuged at 10,000× g for 30 min (4 28 29 °C) after which the supernatants were removed and discarded. Pellets were air dried 30 31 32 and resuspended in 12.5 L of 8 M urea. Trypsin (10 g in 87.5 L of TEAB buffer) was 33 34 added, and the samples were incubated for 24 h at 37 °C. An external standard 35 36 37 comprised of proteins from all cell lines (“M37”) was used for relative quantitation. Block 38 39 randomization (random.org) was employed and M37 was included in each block. 40 41 Samples were analyzed in triplicate. 42 43 44 Chromatographic separation and mass spectrometric analysis was performed 45 46 with a nanoLC chromatography system (EasynLC 1000, Thermo Scientific), coupled 47 48 online to a hybrid linear ion trapOrbitrap mass spectrometer (Orbitrap Elite, Thermo 49 50 25 51 Scientific) through a NanoFlex II nanospray ion source (Thermo Scientific) . Mobile 52 53 phases were 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (ACN, 54 55 B). After equilibrating the column in 95% solvent A and 5% solvent B, the samples (5 µL 56 57 58 59 60 10

ACS Paragon Plus Environment Page 11 of 42 Journal of Proteome Research

1 2 3 in 5% v/v ACN/0.1% (v/v) formic acid in water, corresponding to 1 g cell protein digest) 4 5 6 were injected onto a trap column (C 18 , 100 µm ID×2 cm) and subsequently eluted (250 7 8 nL/min) by gradient elution onto a C 18 column (10 cm x 75 m ID,15 m tip, ProteoPep 9 10 11 II, 5m, 300 Å, , New Objective). The gradient was as follows: isocratic at 5% B, 08 12 13 min; 5% to 30% B, 8188 min; 30% to 95% B, 188220 min; and isocratic at 95% B, 14 15 220240 min. Total run time, including column equilibration, sample loading, and 16 17 18 analysis was 260 min. 19 20 All LCMS/MS data were acquired using XCalibur, version 2.7 SP1 (Thermo 21 22 Fisher Scientific). The survey scans ( m/z 3501650) (MS) were acquired in the Orbitrap 23 24 25 at 60,000 resolution (at m/z = 400) in profile mode, followed by top 10 Higher Energy 26 27 Collisional Dissociation (HCD) fragmentation centroid MS/MS spectra, acquired at 15K 28 29 resolution in datadependent analyses (DDA) mode. The automatic gain control targets 30 31 6 4 32 for the Orbitrap were 1 x 10 for the MS scans and 5 x 10 for MS/MS scans. The 33 34 maximum injection times for the MS1 and MS/MS scans in the Orbitrap both 200 ms. 35 36 37 For MS/MS acquisition, the following settings were used: parent threshold = 10,000; 38 39 isolation width = 4.0 Da; normalized collision energy = 30%; activation time = 100 ms. 40 41 Monoisotopic precursor selection, charge state screening, and charge state rejection 42 43 44 were enabled, with rejection of singly charged and unassigned charge states. Dynamic 45 46 exclusion was used to remove selected precursor ions (−/+10 ppm) for 90 s after 47 48 MS/MS acquisition. A repeat count of 1 and a maximum exclusion list size of 500 was 49 50 51 used. The following ion source parameters were used: capillary temperature 275 °C, 52 53 source voltage 2.2 kV, source current 100 uA, and Slens RF level 40%. 54 55 56 57 58 59 60 11

ACS Paragon Plus Environment Journal of Proteome Research Page 12 of 42

1 2 3 Proteomic Data Analysis 4 5 6 MS files (.raw) were imported into Progenesis LCMS (version 18.214.1528, Nonlinear 7 8 Dynamics) for m/z and retention time alignment. The top 5 spectra for each feature 9 10 11 were exported (charge deconvolution, top 1000 peaks) as a combined .mgf file for 12 13 database searching in PEAKS 26 (version 6, Bioinformatics Solutions Inc., Waterloo, ON) 14 15 against the UniprotKB/SwissprotHuman database (July 2013 version, 20,264 proteins), 16 17 18 appended with the cRAP contaminant database. PEAKS DB and Mascot (version 19 20 2.3.02, Matrix Science) searches were performed with a parent ion tolerance of 10 ppm, 21 22 fragment ion tolerance of 0.025 Da, fixed carbamidomethyl cysteine, and variable 23 24 25 modifications of oxidation (M), phosphorylation (STY), and deamidation (NQ). Trypsin 26 27 was specified as the enzyme, allowing for 2 missed cleavages and a maximum of 3 28 29 PTMs per peptide. An additional search for unexpected modifications was performed 30 31 32 with the entire Unimod database. Finally, homology searching was performed using the 33 34 SPIDER algorithm 27 to identify peptides resulting from nonspecific cleavages or amino 35 36 37 acid substitutions. Mascot and PEAKS SPIDER searches were combined (inChorus), 38 39 using a 1% false discovery rate cutoff for both search engines. The resulting peptide 40 41 spectrum matches (95% peptide probability) were imported into Progenesis LCMS. 42 43 44 Conflict resolution was performed manually to ensure that a single peptide sequence 45 46 was assigned to each feature by removing lower scoring peptides. The resulting 47 48 normalized peptide intensity data were exported, and the peptide list was filtered to 49 50 51 remove nonunique peptides, methioninecontaining peptides, and all modified peptides 52 53 except cysteine carbamidomethylation. For quantification, the filtered list of peptide 54 55 intensities was imported into DanteR (version 0.1.1) 28 , and intensities for peptides of the 56 57 58 59 60 12

ACS Paragon Plus Environment Page 13 of 42 Journal of Proteome Research

1 2 3 same sequence were combined to form a single entry. The resulting peptide intensities 4 5 6 were log 2 transformed and combined to protein abundances (RRollup) using the default 7 8 settings, excluding onehit wonders (50% minimum presence of at least one peptide, 9 10 11 minimum dataset presence 3, pvalue cutoff of 0.05 for Grubbs’ test, minimum of 5 12 13 peptides for Grubbs’ test). The resulting proteins were quantified by 1way ANOVA 14 15 relative to M37; pvalue adjustment for multiple testing was performed according to 16 17 29 18 Benjamini and Hochberg . Chromosome 19 proteins were then subjected to further 19 20 bioinformatic analysis as described in the following section. 21 22 For identification of chromosome 19 proteins, peptides modified by acetylation, 23 24 25 phosphorylation, methionine oxidation, and Nterminal pyroglutamate from glutamine 26 27 were considered in addition to peptides considered for quantification. BLAST searches 28 29 were conducted in those instances where protein assignments were based on one or 30 31 32 two peptides in order to confirm uniqueness. The list of all identified proteins, along with 33 34 number of peptides supporting protein assignments, can be found in S Table 2 . The 35 36 37 data have been deposited to the ProteomeXchange via the PRIDE partner repository 38 39 with identifier PXD000563. 40 41 42 43 44 Bioinformatic Analysis of Transcriptomic and Proteomic Data 45 46 For comparison of protein expression for cell lines measured in separate 47 48 analytical sets, unsupervised hierarchical clustering and PCA analysis was performed 49 50 51 using fold changes relative to M37 for all proteins measured in the six cell lines. Fold 52 53 change values were standardized to Zscores, imported into DanteR, and mean 54 55 centered to 0. Hierarchical clustering was performed for both proteomic and 56 57 58 59 60 13

ACS Paragon Plus Environment Journal of Proteome Research Page 14 of 42

1 2 3 transcriptomic data by use of a Euclidean distance metric, an average agglomeration 4 5 6 method, and no row scaling, with the results visualized as heat maps. 7 8 9 10 11 Ingenuity Pathway Analysis 12 13 Normalized quantitative data sets were analyzed by use of Ingenuity Pathway 14 15 Analysis (Ingenuity Systems, version 16542223, build 220217, 22 June 2013, 16 17 18 (www.ingenuity.com). The data set contained protein identifiers, fold changes relative to 19 20 M37 and ANOVA pvalues. The protein networks were created using proteins with p 21 22 value <0.05 and molecular interactions described in the scientific literature. Networks 23 24 25 represent a highly interconnected set of proteins derived from the input data set. 26 27 28 29 Results and Discussion 30 31 32 Chromosome 19 Targeted Transcriptomics 33 34 Of the 1,382 chromosome 19 genes analyzed, between 7075% were expressed 35 36 37 in each of the cell lines. Utilizing the reference design, a key element of our 38 39 transcriptomic platform, affords us the ability to customize our analyses to identify 40 41 statistically significantly differential chromosome 19specific gene expression patterns 42 43 44 between the phenotypically diverse stem cell subtypes. These tailored analyses allow 45 46 us potential insights into important questions such as; (i) what are the differences 47 48 between each of the GSC subtypes and normal human neural stem cells (e.g., 49 50 51 transcripts associated with either “stemness” or frank differences between neural and 52 53 glialderived cells), and (ii) what are the differences between each of the GSC subtypes 54 55 56 57 58 59 60 14

ACS Paragon Plus Environment Page 15 of 42 Journal of Proteome Research

1 2 3 (i.e., genes potentially involved in glioma stem cell progression). For this 4 5 6 communication, we will primarily focus on the former. 7 8 The identity of the differentially expressed transcripts for PN1, PN2, classical, 9 10 11 and mesenchymal GSC subtypes as compared to hNSCs (at a 10% False Discovery 12 13 Rate; FDR) are listed in Supplementary Table 1. A significant proportion (~20%) of the 14 15 transcripts were differentially expressed in the proneural (PN1 and PN2) and classical 16 17 18 subtypes when compared to those expressed in hNSCs. The identity of these 19 20 transcripts, as well as the gene families represented by these genes were largely 21 22 similar. In stark contrast, the mesenchymal GSCs were clearly differentiated from the 23 24 25 proneural and classical GSCs at the level of both proportion of differentially expressed 26 27 transcripts identified (>30%, Table 1 ), as well as the identity of those transcripts. 28 29 GoMinerbased ontological analyses of the mesenchymal (M) GSCs compared to both 30 31 32 hNSCs and the neural (PN1) GSCs ( Table 2 ) demonstrated enrichment in several 33 34 molecular pathways with established relevance to stem cell genesis and progression. Of 35 36 37 particular interest, increases in the expression of specific genes related to carbohydrate 38 39 binding were significantly associated with the mesenchymal phenotype when compared 40 41 to both hNSCs and the less invasive proneural GSCs. The expression of cell surface 42 43 44 glycoconjugates are well established to play a major role in the modulation of the 45 46 invasive phenotype 30 . There are also several unique molecular pathways identified in 47 48 this analysis. For example, despite the large number genes encoding transcription 49 50 51 factors present on chromosome 19 (the large number can skew significance values), 52 53 there is still a very significant modulation of their expression in the mesenchymal GSCs. 54 55 56 57 58 59 60 15

ACS Paragon Plus Environment Journal of Proteome Research Page 16 of 42

1 2 3 Chromosome 19 transcripts potentially encoding candidate proteins as yet 4 5 6 unidentified (open reading frames; ORFs) were also queried in these analyses. Of 7 8 particular interest, of the 43 ORFs represented on the arrays, 31 (72%) were expressed 9 10 11 in the GSC lines. Of these, 17 (55%) were differentially expressed in 2 or more of the 12 13 subtypes. The remaining transcripts were present in the only one of the subtypes. Our 14 15 integrated platform will not only facilitate the identification of these novel proteins, but 16 17 18 help elucidate their role in the development of highly malignant glial tumors. 19 20 Lastly, the expression of 12,042 genes of 481 GBM samples and 10 solid tissue 21 22 normal samples was detected using Affymetrix human genome U133A platform, 23 24 25 downloaded from TCGA data portal (https://tcgadata.nci.nih.gov/tcga/). Among these 26 27 genes, 762 genes are located on chromosome 19. 144, 155, 83 and 99 samples of the 28 29 481 GBM tumors were classified into classical, mesenchymal, neural, proneural 30 31 31 32 subtypes respectively . 30%~40% of the genes were differentially expressed in 33 34 proneural, classical, mesenchymal and classical subtypes when compared to those 35 36 37 expressed in normal samples. Mesenchymal subtypes were differentially expressed 38 39 from the proneural subtype, which is highly consistent with the finding in GSCs. 40 41 42 43 44 Identification of No- to Medium-Evidence Chromosome 19 Proteins 45 46 We detected 213 proteins and quantified (2 unique peptides/protein) 184 47 48 chromosome 19 proteins at high confidence expressed in six GSC lines 49 50 51 (Supplementary Tables 2 and 3 ) and queried the Human Protein Atlas (HPA, 52 53 proteinatlas.org) and neXtProt (www.nextprot.org) to examine evidence for existence of 54 55 these proteins. For all identified proteins, evidence of existence, according to HPA and 56 57 58 59 60 16

ACS Paragon Plus Environment Page 17 of 42 Journal of Proteome Research

1 2 3 neXtProt, and peptide counts can be found in S Table 2 and in the following discussion. 4 5 6 Several identified proteins had been seen previously only at the RNA level or were of 7 8 low confidence ( Table 3 ). Of the verified protein hits, protein tweety homolog 1 (TTYH1) 9 10 11 and neurocan (NCAN) are low confidence entries. Mucin16, ribonucleoprotein PTB 12 13 binding 1 and E3 ubiquitinprotein ligase (MUC16, RAVER1 and UHRF1, respectively) 14 15 have no record in HPA, but do show protein evidence in neXtProt. Two proteins, 16 17 18 ceramide synthase 1 (CERS1) and longchain fatty acid transport protein 1 (SLC27A1) 19 20 have only RNA evidence in HPA, but both show evidence at the protein level in 21 22 Proteomics DB (proteomicsdb.org). 23 24 25 Neurocan (NCAN, core protein of an extracellular matrix protein) is an intriguing 26 27 finding, because it is a brainspecific chondroitin sulfate proteoglycan that is believed to 28 29 mediate neuronal adhesion and neurite growth during development. At the mRNA level, 30 31 32 increased levels of NCAN have been seen in nonsmallcell lung cancer brain 33 34 metastases and in astrocytoma tissues, linking it to tumor invasivity 4. Our original list of 35 36 37 proteins included RUXGL_HUMAN. However, upon performing a BLAST search, we 38 39 found that the peptides assigned to RUXGL were also shared by RUXG_HUMAN. This 40 41 illustrates the caution that must be exercised when validating the identification of newly 42 43 44 identified proteins. 45 46 A large number of proteins (87) display medium evidence for existence 47 48 (Supplementary Table 2 ). Within the chromosome 19 proteomic data, we found 49 50 51 expression of zincfinger proteins listed as having medium levels of protein evidence. 52 53 Certain zinc finger proteins play a central role in aberrant signaling processes in 54 55 glioma 32 . ZNF428 (C19orf37) has few findings in the literature. Interestingly, this protein 56 57 58 59 60 17

ACS Paragon Plus Environment Journal of Proteome Research Page 18 of 42

1 2 3 was relatively decreased in GSC2, 11 and 13, a distribution which does not coincide 4 5 6 with the current classification of GSCs. 7 8 Five ORFs in the proteomic dataset were also detected at the transcript level 9 10 11 [C19orf10, C19orf43, C19orf1 (TOMM40), C19orf5 (MAP1S), and C19orf37 (ZNF428)]. 12 13 C19orf10, a high evidence protein, encodes a stromalderived growth factor, previously 14 15 detected in bone marrow and synovial fluid 33 . Its expression in GSCs has not been 16 17 18 described previously, and its concentration was within a twofold range across the cell 19 20 lines. It may be of interest to glioma pathology, because it promotes lymphoid cell 21 22 proliferation. C19orf43 is a mediumlevel evidence 18 kDa protein to which no biological 23 24 25 function has yet been ascribed. This protein was decreased ~20fold in GSC11 relative 26 27 to M37, whereas another protein, C19orf1 (TOMM40) was highly increased in this cell 28 29 line. It encodes an import protein integral to the outer mitochondrial membrane. 30 31 32 33 34 Quantitative Proteomics 35 36 37 For the six cell lines, 161 chromosome 19 proteins were quantified relative to 38 39 M37 in at least one cell line (Supplementary Table 3); volcano plots illustrating fold 40 41 changes are shown in Figure 2. In three of the six cell lines (GSC2, Figure 2A; GSC11, 42 43 44 Figure 2C; and GSC13, Figure 2D), the protein demonstrating the highest relative fold 45 46 change is symplekin (SYMPK), a component of the mRNA polyadenylation machinery 47 48 that has been associated with tumorigenicity in colon 34 and lung 35 cancers. Our findings 49 50 51 represent the first report of SYMPK in gliomas, making it a target for our followup 52 53 studies. GSC2, GSC11, and GSC13 also demonstrated decreased levels of ZNF428 54 55 relative to M37. Along with GSC23, these same cell lines showed significantly 56 57 58 59 60 18

ACS Paragon Plus Environment Page 19 of 42 Journal of Proteome Research

1 2 3 decreased levels of far upstream elementbinding protein 2 (KHSRP), a key regulator of 4 5 36 6 miRNAs in the DNA damage response . 7 8 In order to compare the six GSC cell lines to one another, we performed principal 9 10 11 component analysis (PCA, Figure 3) and hierarchical clustering analysis ( Figure 4B) at 12 13 the protein level. Hierarchical clustering was also performed at the transcript level 14 15 (Figure 4A ). In the 3D PCA plot, three cell lines (GSC23, 11, and 811) were the most 16 17 18 proximal, indicating a high degree of similarity in overall protein expression patterns. 19 20 The other cell lines (GSC2, 13 and 17) were distinctly different from each other and 21 22 from the three other cell lines. In the heatmaps for both the proteomic and 23 24 25 transcriptomic data ( Figure 4 ), the two mesenchymal cell lines (GSC2 and GSC17) 26 27 clustered together, as did the (pro)neural cell lines. Taken together, these results 28 29 indicate that the various GSC cell line classes show similar expression patterns for 30 31 32 chromosome 19 proteins. 33 34 35 36 37 Ingenuity Pathway Analysis 38 39 We examined putative functional roles of chromosome 19 proteins expressed in 40 41 GSC lines by studying them in the context of Ingenuity Pathway Analysis (Figure 5). 42 43 44 Notably, there are several chromosome 19 proteins linked in GSC2 to the RB tumor 45 46 suppressor (Rb) in the extended network. This pathway is universally disrupted in 47 48 glioma 37 . 49 50 51 52 53 Upregulation of Chromosome 19 Proteins in Response to Oncolytic Adenovirus 54 55 Treatment 56 57 58 59 60 19

ACS Paragon Plus Environment Journal of Proteome Research Page 20 of 42

1 2 3 We studied the effect of oncolytic adenovirus therapy on four GSC lines (GSC2, 4 5 6 GSC11, GSC13, and GSC23). Delta24RGD is a replicationcompetent adenovirus that 7 8 targets the RB pathway in gliomas 38 . Integration of adenovirus requires two trans acting 9 10 11 viral proteins (Rep68 and Rep78) and cis acting DNA elements that display Rep binding 12 39 13 sites . The integration may occur in the AAVS1 site on chromosome 19q13.4, in exon1 14 15 of the protein phosphatase 1 regulatory subunit 12C PPP1R12C) 40 . We found that 16 17 18 proteins derived from genes downstream of the AAVS1 site were upregulated in GSC11 19 20 but not three other GSC lines, at 24 hrs and 48 hrs after infection with Delta24RGD 21 22 virus (Figure 6). These results indicate that the patient from whom GSC11 was derived 23 24 25 had a previous CNS infection by adenovirus. It is not yet known if previous adenovirus 26 27 infection may influence the efficacy of Delta24RGD in larger patient groups. These 28 29 unexpected results serve to demonstrate how new biological insights can be derived by 30 31 32 examining protein expression in the context of chromosomal localization of the encoding 33 34 elements. 35 36 37 38 39 Conclusion 40 41 We have defined patterns of chromosome 19 mRNA and protein expression in several 42 43 44 GSC lines. The cells provided a source of chromosome 19 mRNA and proteins that 45 46 demonstrated expression of previously uncharacterized gene products. Our results 47 48 underscore the importance of studying a variety of healthy and diseased tissues in the 49 50 51 context of the CHPP. Furthermore, we detected differential regulation of chromosome 52 53 19 activity in the context of histological subtypes and in response to treatment with 54 55 oncolytic adenovirus. Our results have expanded the knowledge of the role of 56 57 58 59 60 20

ACS Paragon Plus Environment Page 21 of 42 Journal of Proteome Research

1 2 3 chromosome 19 beyond the wellknown impact of 1p/19q codeletion. Our findings may 4 5 6 have relevance for selection of GSC lines for testing responses to preclinical and 7 8 clinical compounds. 9 10 11 12 13 Acknowledgements 14 15 Support from the Cancer Prevention and Research Institute of Texas (CLN) and the 16 17 18 University of Texas Medical Branch (CLN) is gratefully acknowledged, as are the 19 20 Biomolecular Resource Facility Mass Spectrometry Lab (CFL) and the Department of 21 22 Pharmacology & Toxicology (CFL). The DanTE software was written by Tom Taverner 23 24 25 ([email protected]) and Ashoka Polpitiya for the U.S. Department of Energy 26 27 (PNNL, Richland, WA, USA, http://omics.pnl.gov/software ). Dr. Ping Wu (UTMB, 28 29 Department of Neuroscience & Cell Biology) is gratefully acknowledged for the gift of 30 31 32 hNSCs. 33 34 35 36 37 Figure legends 38 39 Tables. 40 41 Table 1. Numbers of differentially expressed chromosome 19 transcripts by molecular 42 43 44 histological subclasses compared to hNSCs. 45 46 Table 2. GoMinerbased ontological analyses of the mesenchymal (M) GSCs compared 47 48 to hNSCs and the neural (PN1) GSCs. 49 50 51 Table 3. Proteins identified in six GSC lines that are described as having low or no 52 53 protein evidence in the Human Protein Atlas version 11.0, with additional evidence from 54 55 neXtProt. 56 57 58 59 60 21

ACS Paragon Plus Environment Journal of Proteome Research Page 22 of 42

1 2 3 4 5 6 Figures. 7 8 Figure 1. Workflow for integrated proteomic and transcriptomic analysis of GSC cell 9 10 11 lines. Glioma stem cell lines, derived from patient tumor samples, were analyzed by 12 13 three approaches: targeted chromosome 19 microarray (1), quantitative proteomics (2), 14 15 and RNASeq (3). Identification and quantification were performed at the transcript (4) 16 17 18 and protein (5) levels, and a custom protein database was generated from the RNASeq 19 20 data (6). Comparisons were made between transcript and protein data (7), and the 21 22 custom protein database was used to search for proteins (8). Validated protein 23 24 25 identifications were queried against protein databases in order to determine levels of 26 27 protein evidence (9), and quantitative proteomic data were used to generate networks in 28 29 Ingenuity Pathway Analysis (10). 30 31 32 Figure 2. Volcano plots ( p value vs log 2 fold change) for GSC2 ( A), GSC811 ( B), 33 34 GSC11 (C), GSC13 (D), GSC17 (E), and GSC23 (F) relative to M37. Proteins that are 35 36 37 differentially expressed relative to M37 include peptidylprolyl cistrans isomerase 38 39 NIMAinteracting 1 (PIN1), nuclear factor 1 Xtype (NFIX1), E3 ubiquitinprotein ligase 40 41 UHRF1, epidermal growth factor receptor substrate 15like 1 (EPS15L1), 42 43 44 lysophospholipid acyltransferase 7 (MBOAT7), calpain small subunit 1 (CAPNS1), 40S 45 46 ribosomal protein S28 (RPS28), ribonuclease H2 subunit A( RNASEH2A), 47 48 isochorismatase domaincontaining protein 2, mitochondrial (ISOC2), immunityrelated 49 50 51 GTPase family Q protein (IRGQ), and DNA repair protein XRCC1. A complete list of 52 53 measured fold changes and pvalues can be found in Supplementary Table 3 . 54 55 56 57 58 59 60 22

ACS Paragon Plus Environment Page 23 of 42 Journal of Proteome Research

1 2 3 Figure 3. Threedimensional principal component analysis (PCA) plot, demonstrating 4 5 6 variation between GSC2 (red), GSC811 (yellow), GSC11 (green), GSC13 (cyan), 7 8 GSC17 (blue), and GSC23 (pink) based upon fold change relative to M37 for proteins 9 10 11 identified in all 6 cell lines. 12 13 Figure 4. Heat map of C19 transcripts ( A) and proteins ( B) for six GSC cell lines. 14 15 Chromosome 19 heat map for six GSC cell lines. Unsupervised hierarchical clustering 16 17 18 was performed using log 2 fold changes relative to hNSCs (A) and Zscaled fold changes 19 20 relative to M37 ( B) for proteins identified in all 6 GSC cell lines. 21 22 Figure 5. Network generated by use of Ingenuity Pathway Analysis. Gene identifiers for 23 24 25 all chromosome 19 proteins identified in GSC2 sample were uploaded and those 26 27 proteins with a pvalue< 0.05 were considered. Fifteen proteins were associated to 28 29 proteins in the retinoblastoma group (Rb), a pathway that is universally disrupted in 30 31 32 glioma. The associated proteins included retinoblastomalike protein 1 (RBL1), 33 34 retinoblastomaassociated protein (RB1), transcription factors AP1 (JUN) and activator 35 36 37 BRG1 (SMARCA4), and several proteins involved in cell cycle control. 38 39 Figure 6. Upregulation of Chromosome 19q13 proteins in GSC11 in response to 40 41 treatment with oncolytic virus. Fold changes at 24h (black bars) and 48h (red bars) are 42 43 44 calculated relative to control. The most dramatically increased protein was rRNA 2'O 45 46 methyltransferase fibrillarin (FBL), a protein involved in rRNA preprocessing. Other 47 48 proteins more highly expressed at 24h include SUMOactivating enzyme subunit 2 49 50 51 (UBA2) and 26S protease regulatory subunit 6B (PSMC4). 52 53 54 55 References 56 57 58 59 60 23

ACS Paragon Plus Environment Journal of Proteome Research Page 24 of 42

1 2 3 (1) Paik, Y.-K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.; 4 Yamamoto, T.; Legrain, P.; Lee, H.-J.; Na, K.; Jeong, S.-K.; He, F.; Binz, P.-A.; Nishimura, T.; Keown, P.; 5 6 Baker, M. S.; Yoo, J. S.; Garin, J.; Archakov, A.; Bergeron, J.; Salekdeh, G. H.; Hancock, W. S. J. Proteome 7 Res. 2012 , 11 , 2005-2013. 8 (2) Paik, Y.-K.; Hancock, W. S. Nat. Biotechnol. 2012 , 30 , 1065-1067. 9 (3) Consortium, E. P. Nature 2012 , 489 , 57-74. 10 (4) Nilsson, C. L.; Berven, F.; Selheim, F.; Liu, H.; Moskal, J. R.; Kroes, R. A.; Sulman, E. P.; Conrad, C. A.; 11 Lang, F. F.; Andrén, P. E.; Nilsson, A.; Carlsohn, E.; Lilja, H.; Malm, J.; Fenyö, D.; Subramaniyam, D.; Wang, 12 13 X.; Gonzales-Gonzales, M.; Dasilva, N.; Diez, P.; Fuentes, M.; Végvári, Á.; Sjödin, K.; Welinder, C.; Laurell, 14 T.; Fehniger, T. E.; Lindberg, H.; Rezeli, M.; Edula, G.; Hober, S.; Marko-Varga, G. J. Proteome Res. 2012 , 15 12 , 135-150. 16 (5) Vranova, V.; Necesalova, E.; Kuglik, P. Oncol Rep. 2007 , 17 , 457-464; Ruano, Y.; Mollejo, M.; Ribalta, 17 T.; Fiano, C.; Camacho, F.; Gomez, E.; de Lope, A.; Hernandez-Moneo, J.-L.; Martinez, P.; Melendez, B. 18 Mol. Cancer 2006 , 5, 39; Hui, A. B.; Lo, K. W.; Yin, X. L.; Poon, W. S.; Ng, H. K. Lab. Invest. 2001 , 81 , 717- 19 723. 20 21 (6) Ino, Y.; Zlatescu, M. C.; Sasaki, H.; Macdonald, D. R.; Stemmer-Rachamimov, A. O.; Jhung, S.; Ramsay, 22 D. A.; von Deimling, A.; Louis, D. N.; Cairncross, J. G. J. Neurosurg. 2000 , 92 , 983-990. 23 (7) Yip, S.; Iafrate, A. J.; Louis, D. N. J. Neuropathol. Exp. Neurol. 2008 , 67 , 1-15 24 10.1097/nen.0b013e31815f65fb. 25 (8) Danussi, C.; Akavia, U. D.; Niola, F.; Jovic, A.; Lasorella, A.; Pe'er, D.; Iavarone, A. Cancer Res. 2013 . 26 (9) Bhatia, M.; Wang, J. C. Y.; Kapp, U.; Bonnet, D.; Dick, J. E. Proc. Natl. Acad. Sci. USA 1997 , 94 , 5320- 27 1997 28 5325; Bonnet, D.; Dick, J. E. Nat. Med. , 3, 730-737; Lapidot, T.; Sirard, C.; Vormoor, J.; Murdoch, B.; 1994 29 Hoang, T.; Caceres-Cortes, J.; Minden, M.; Paterson, B.; Caligiuri, M. A.; Dick, J. E. Nature , 367 , 645- 30 648. 31 (10) Singh, S. K.; Hawkins, C.; Clarke, I. D.; Squire, J. A.; Bayani, J.; Hide, T.; Henkelman, R. M.; Cusimano, 32 M. D.; Dirks, P. B. Nature 2004 , 432 , 396-401. 33 (11) Sulman, E.; Aldape, K.; Colman, H. Curr.t Probl. Cancer 2008 , 32 , 124-142. 34 (12) TCGA. Nature 2008 , 455 , 1061-1068; Network, T. C. G. A. R. Nature 2013 , 494 , 506-506. 35 36 (13) Fagerberg, L.; Oksvold, P.; Skogs, M.; Älgenäs, C.; Lundberg, E.; Pontén, F.; Sivertsson, Å.; Odeberg, 37 J.; Klevebring, D.; Kampf, C.; Asplund, A.; Sjöstedt, E.; Al-Khalili Szigyarto, C.; Edqvist, P.-H.; Olsson, I.; 38 Rydberg, U.; Hudson, P.; Ottosson Takanen, J.; Berling, H.; Björling, L.; Tegel, H.; Rockberg, J.; Nilsson, P.; 39 Navani, S.; Jirström, K.; Mulder, J.; Schwenk, J. M.; Zwahlen, M.; Hober, S.; Forsberg, M.; von Feilitzen, K.; 40 Uhlén, M. J. Proteome Res. 2012 , 12 , 2439-2448. 41 (14) Gaudet, P.; Argoud-Puy, G.; Cusin, I.; Duek, P.; Evalet, O.; Gateau, A.; Gleizes, A.; Pereira, M.; Zahn- 42 Zabal, M.; Zwahlen, C.; Bairoch, A.; Lane, L. Journal of proteome research 2013 , 12 , 293-8; Lane, L.; 43 44 Argoud-Puy, G.; Britan, A.; Cusin, I.; Duek, P. D.; Evalet, O.; Gateau, A.; Gaudet, P.; Gleizes, A.; Masselot, 45 A.; Zwahlen, C.; Bairoch, A. Nucleic acids research 2012 , 40 , D76-83. 46 (15) Galli, R.; Binda, E.; Orfanelli, U.; Cipelletti, B.; Gritti, A.; De Vitis, S.; Fiocco, R.; Foroni, C.; Dimeco, F.; 47 Vescovi, A. Cancer Res. 2004 , 64 , 7011-7021. 48 (16) Jiang, H.; Gomez-Manzano, C.; Aoki, H.; Alonso, M. M.; Kondo, S.; McCormick, F.; Xu, J.; Kondo, Y.; 49 Bekele, B. N.; Colman, H.; Lang, F. F.; Fueyo, J. J. Natl. Cancer Inst. 2007 , 99 , 1410-1414. 50 1998 51 (17) Ewing, B.; Hillier, L.; Wendl, M. C.; Green, P. Genome Res. , 8, 175-185. 52 (18) Li, H.; Durbin, R. Bioinformatics 2009 , 25 , 1754-1760. 53 (19) Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, 54 R.; Subgroup, G. P. D. P. Bioinformatics 2009 , 25 , 2078-2079. 55 (20) McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; 56 Altshuler, D.; Gabriel, S.; Daly, M.; DePristo, M. A. Genome Res. 2010 , 20 , 1297-1303. 57 (21) Kroes, R. A.; Dawson, G.; Moskal, J. R. J. Neurochem. 2007 , 103 , 14-24. 58 59 60 24

ACS Paragon Plus Environment Page 25 of 42 Journal of Proteome Research

1 2 3 (22) Churchill, G. A. Nat. Genet. 2002 , 32 , S490-495. 4 (23) Tusher, V. G.; Tibshirani, R.; Chu, G. Proc. Natl. Acad. Sci. USA 2001 , 98 , 5116-5121. 5 6 (24) Zeeberg, B.; Feng, W.; Wang, G.; Wang, M.; Fojo, A.; Sunshine, M.; Narasimhan, S.; Kane, D.; 7 Reinhold, W.; Lababidi, S.; Bussey, K.; Riss, J.; Barrett, J.; Weinstein, J. Genome Biol. 2003 , 4, R28. 8 (25) Michalski, A.; Damoc, E.; Lange, O.; Denisov, E.; Nolting, D.; Müller, M.; Viner, R.; Schwartz, J.; 9 Remes, P.; Belford, M.; Dunyach, J.-J.; Cox, J.; Horning, S.; Mann, M.; Makarov, A. Mol. Cell. Proteomics 10 2012 , 11 . 11 (26) Zhang, J.; Xin, L.; Shan, B.; Chen, W.; Xie, M.; Yuen, D.; Zhang, W.; Zhang, Z.; Lajoie, G. A.; Ma, B. 12 2012 2011 13 Mol. Cell. Proteomics , 11 ; Han, X.; He, L.; Xin, L.; Shan, B.; Ma, B. J. Proteome Res. , 10 , 2930- 14 2936. 15 (27) Han, Y.; Ma, B.; Zhang, K. J. Bioinform. Comput. Biol. 2005 , 3, 697-716. 16 (28) Karpievitch, Y.; Stanley, J.; Taverner, T.; Huang, J.; Adkins, J. N.; Ansong, C.; Heffron, F.; Metz, T. O.; 17 Qian, W.-J.; Yoon, H.; Smith, R. D.; Dabney, A. R. Bioinformatics 2009 , 25 , 2028-2034; Polpitiya, A. D.; 18 Qian, W.-J.; Jaitly, N.; Petyuk, V. A.; Adkins, J. N.; Camp, D. G.; Anderson, G. A.; Smith, R. D. 19 Bioinformatics 2008 , 24 , 1556-1558. 20 1995 21 (29) Benjamini, Y.; Hochberg, Y. J. Royal Stat. Soc. Series B , 57 , 289-300. 22 (30) Moskal, J. R.; Kroes, R. A.; Dawson, G. Expert Revi. Neurother. 2009 , 9, 1529-1545. 23 (31) Verhaak, R. G. W.; Hoadley, K. A.; Purdom, E.; Wang, V.; Qi, Y.; Wilkerson, M. D.; Miller, C. R.; Ding, 24 L.; Golub, T.; Mesirov, J. P.; Alexe, G.; Lawrence, M.; O'Kelly, M.; Tamayo, P.; Weir, B. A.; Gabriel, S.; 25 Winckler, W.; Gupta, S.; Jakkula, L.; Feiler, H. S.; Hodgson, J. G.; James, C. D.; Sarkaria, J. N.; Brennan, C.; 26 Kahn, A.; Spellman, P. T.; Wilson, R. K.; Speed, T. P.; Gray, J. W.; Meyerson, M.; Getz, G.; Perou, C. M.; 27 2010 28 Hayes, D. N. Cancer Cell , 17 , 98-110. 29 (32) Santoni, M.; Buerattini, L.; Nabissi, M.; Morelli, M. B.; Berardi, R.; Santoni, G.; Cascinu, S. Curr. 30 Protein Pept. Sci. 2013 , 14 , 133-140. 31 (33) Weiler, T.; Du, Q.; Krokhin, O.; Ens, W.; Standing, K.; El-Gabalawy, H.; Wilkins, J. Arthritis Res. Ther. 32 2007 , 9, R30; Tulin, E. E.; Onoda, N.; Nakata, Y.; Maeda, M.; Hasegawa, M.; Nomura, H.; Kitamura, T. J. 33 Immunol. 2001 , 167 , 6338-6347. 34 (34) Buchert, M.; Papin, M.; Bonnans, C.; Darido, C.; Raye, W. S.; Garambois, V.; Pélegrin, A.; Bourgaux, 35 2010 36 J.-F.; Pannequin, J.; Joubert, D.; Hollande, F. Proc. Natl. Acad. Sci. USA , 107 , 2628-2633. 37 (35) Cappell, K. M.; Larson, B.; Sciaky, N.; Whitehurst, A. W. Mol. Cell. Biol. 2010 , 30 , 5135-5144. 38 (36) Zhang, X.; Wan, G.; Berger, F. G.; He, X.; Lu, X. Mol. Cell 2011 , 41 , 371-383. 39 (37) Goldhoff, P.; Clarke, J.; Smirnov, I.; Berger, M. S.; Prados, M. D.; James, C. D.; Perry, A.; Phillips, J. J. 40 J. Neuropathol. Exp. Neurol. 2012 , 71 , 83-89. 41 (38) Jiang, H.; Gomez-Manzano, C.; Lang, F. F.; Alemany, R.; Fueyo, J. Curr. Gene Ther. 2009 , 9, 422-427. 42 (39) Janovitz, T.; Klein, I. A.; Oliveira, T.; Mukherjee, P.; Nussenzweig, M. C.; Sadelain, M.; Falck- 43 2013 44 Pedersen, E. J. Virol. , 87 , 8559-8568; Surosky, R. T.; Urabe, M.; Godwin, S. G.; McQuiston, S. A.; 45 Kurtzman, G. J.; Ozawa, K.; Natsoulis, G. J. Virol. 1997 , 71 , 7951-9; Urabe, M.; Kogure, K.; Kume, A.; Sato, 46 Y.; Tobita, K.; Ozawa, K. J. Gen. Virol. 2003 , 84 , 2127-2132. 47 (40) Linden, R. M.; Ward, P.; Giraud, C.; Winocour, E.; Berns, K. I. Proc. Natl. Acad. Sci. USA 1996 , 93 , 48 11288-11294; Samulski, R. J.; Zhu, X.; Xiao, X.; Brook, J. D.; Housman, D. E.; Epstein, N.; Hunter, L. A. 49 EMBO J. 1991 , 10 , 5071-5078. 50 51 52 53 54 55 56 57 58 59 60 25

ACS Paragon Plus Environment Journal of Proteome Research Page 26 of 42

1 2 3 TOC Graphic 4 5 6 7 8 9 Chromosome 19 Expression in Glioma Stem Cells 10 11 12 13 14 15 16 17 18 19 20 RNAseq 21 22 Targeted C19 Transcriptomics 23 24 25 26 27 28 Quantitative Proteomics 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 26

ACS Paragon Plus Environment Page 27 of 42 Journal of Proteome Research

1 2 3 4 Patient tumor samples 5 6 7 8 9 Glioma stem cell (GSC) lines 10 11 12 13 14 15 Transcriptomics Proteomics RNA-Seq 16 17 18 19 20 C19 Transcript C19 Protein 21 Custom protein 22 identification and identification and 23 database 24 quantification quantification 25 26 27 28 29 Check level of Ingenuity pathway 30 31 protein evidence analysis 32 33 34 35 36 37 38 39 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Proteome Research Page 28 of 42

1 Figure 2A (GSC2) 2 3 4 CALM3 5 1e-8 6 7 8 1e-7 9 10 KHSRP 11 1e-6 12 13 14 1e-5 15 16 17 1e-4 18 ZNF428 PIN1 19 SYMPK 20 1e-3 MRI1 21

22 P value 23 1e-2 24 25 P=0.05 26 1e-1 27 28 29 1e+0 30 31 32 33 1e+1 34 35 -4 -2 0 2 4 6 36 37 38 39 Log2 Fold Change 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 29 of 42 Journal of Proteome Research

1 Figure 2B (GSC8-11) 2 3 4 5 6 1e-6 7 8 9 UHRF1 10 1e-5 MBOAT7 11 NFIX 12 EPS15L1 13 14 1e-4 15 CAPNS1 16 17 1e-3 18 PLIN3 19 20 21 ILVBL 22 1e-2

23 P value 24 P=0.05 25 26 1e-1 27 28 29 30 1e+0 31 32 33 34 1e+1 35 36 -4 -2 0 2 4 6 37 38 39 40 Log2ACS Paragon Fold Plus Environment Change 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Proteome Research Page 30 of 42

1 Figure 2C (GSC11) 2 3 4 5 6 1e-6 7 8 9 10 1e-5 CALM3 11 12 13 14 1e-4 NUCB1 HSPBP1 15 16 C19orf43 17 18 1e-3 SYMPK 19 ISYNA1 DNMT1 20 21 RPS28 22 1e-2

23 P value TOMM40 24 P=0.05 25 26 1e-1 27 28 29 30 1e+0 31 32 33 34 1e+1 35 36 -8 -6 -4 -2 0 2 4 6 8 37 38 39 40 Log2ACS Paragon Fold Plus Environment Change 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 31 of 42 Journal of Proteome Research

1 Figure 2D (GSC13) 2 3 4 5 1e-8 6 CALM3 7 8 9 1e-7 10 11 12 1e-6 TPM4 13 UHRF1 14 15 1e-5 16 RNASEH2A 17 PDCD5 18 1e-4 19 ASF1B 20 SYMPK 21 1e-3 CD97 22

23 P value 24 1e-2 25 26 27 P=0.05 28 1e-1 29 30 31 1e+0 32 33 34 1e+1 35 36 37 -4 -2 0 2 4 6 8 38 39 40 Log2ACS Paragon Fold Plus Environment Change 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Proteome Research Page 32 of 42

1 Figure 2E (GSC17) 2 3 4 5 6 1e-6 7 8 9 10 1e-5 11 12 13 1e-4 SLC1A5 14 15 TUBB4A 16 17 1e-3 ISOC2 18 IRGQ 19 20 21 1e-2 VASP 22 UQCRFS1 23 P value 24 P=0.05 25 1e-1 26 27 MAP2K2 28 29 1e+0 30 31 32 33 1e+1 34 35 36 -4 -3 -2 -1 0 1 2 3 4 37 38 39 Log2 Fold Change 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 33 of 42 Journal of Proteome Research

1 Figure 2F (GSC23) 2 3 4 1e-6 5 6 CALM3 7 8 TPM4 9 1e-5 10 11 12 13 1e-4 14 15 16 PDCD5 17 1e-3 18 19 RPS28 20 21 1e-2 MAP2K2 XRCC1 22

P value 23 24 P=0.05 25 1e-1 26 27 GTF2F1 28 29 1e+0 30 31 32 33 1e+1 34 35 36 -6 -4 -2 0 2 4 37 38 39 40 Log2ACS Paragon Fold Plus Environment Change 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Proteome Research Page 34 of 42

1 2 3 Figure 3 4 5 6 7 8 13 9 10 11 12 13 14 15 16 17 2 18 19 20 21 22 23 23 24 25 26 27 11 8-11 17 28 29 30 31 32 33 34 35 36 37 38 39 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 35 of 42 Journal of Proteome Research

1 2 Figure 4A 3 4 5 6 7 8 9 10 11 12 13 14 2.1 15 16 17 18 19 20 0 21 22 23 24 25 26 -2.1 27 28 29 30 31 32 33 34 35 36 37 38 GSC 17 2 11 8-11 13 23 39 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Proteome Research Page 36 of 42

1 2 Figure 4B 3 4 5 6 7 8 9 10 11 12 13 14 2.4 15 16 17 18 19 20 0.05 21 22 23 24 25 26 -2.3 27 28 29 30 31 32 33 34 35 36 37 38 39 GSC 8-11 23 11 13 17 2 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 37 of 42 Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Proteome Research Page 38 of 42

1 2 Figure 6 ACTN4 3 24h fold 4 COX6B1 48h fold 5 ETFB 6 FBL 7 8 GPI 9 HNRNPL 10 HNRNPUL1 11 PAFAH1B3 12 13 PDCD5 14 PPP2R1A 15 PRMT1 16 PSMC4 17 18 RPL13A 19 RPL18 20 RPL28 21 RPS11 22 23 RPS16 24 RPS19 25 RPS9 26 SNRNP70 27 28 SNRPA 29 SNRPD2 30 TBCB 31 TRIM28 32 33 UBA2 34 UBE2M 35 36 37 -5 0 5 10 15 20 25 30 38 39 Fold Change 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 39 of 42 Journal of Proteome Research

1 2 P1 vs. hNSC P2 vs. hNSC C vs. hNSC M vs. hNSC M vs. P1 3 Total # differentially 361 247 272 441 287 4 expressed 5 6 7 # Upregulated 173 127 136 197 103 8 9 # Downregulated 188 120 136 244 184 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 40 of 42

1 2 Mesenchymal vs. hNSC 3 Term 4 GO ID Total on array Downregulated Upregulated P-Value (Down) P-Value (Up) 5 30246 carbohydrate binding 43 5 16 0.905 0.0001 6 22843 voltage-gated cation channel activity 14 3 5 0.4659 0.036 7 4518 nuclease activity 10 1 4 0.8598 0.0402 8 sequence-specific DNA binding transcription 9 10 3700 factor activity 84 22 17 0.0301 0.0695 11 8135 translation factor activity, nucleic acid binding 5 4 0 0.0042 1 12 47485 protein N-terminus binding 5 3 0 0.0419 1 13 14 15 16 Mesenchymal vs. Proneural 1 17 Term 18 GO ID Total on array Downregulated Upregulated P-Value (Down) P-Value (Up) 19 5515 protein binding 433 59 42 0.4795 0.0347 20 30246 carbohydrate binding 43 4 7 0.8525 0.0413 21 5509 calcium ion binding 32 2 6 0.9435 0.0311 22 15276 ligand-gated ion channel activity 6 1 3 0.5805 0.0074 23 24 4540 ribonuclease activity 4 1 2 0.4393 0.0317 25 47485 protein N-terminus binding 5 3 0 0.0195 1 26 42393 histone binding 4 3 0 0.0086 1 27 4697 protein kinase C activity 3 2 0 0.0492 1 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 ACS Paragon Plus Environment 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 41 of 42 Journal of Proteome Research

1 2 Accession Name Gene Name Description HPA Evidence Summary neXtProt Evidence Level 3 Q8WXI7 MUC16_HUMAN MUC16 Mucin-16 No Record in HPA protein level 4 Q8IY67 RAVR1_HUMAN RAVER1 Ribonucleoprotein PTB-binding 1 No Record in HPA protein level 5 6 Q96T88 UHRF1_HUMAN UHRF1 E3 ubiquitin-protein ligase UHRF1 No Record in HPA protein level 7 Q6PCB7 S27A1_HUMAN SLC27A1 Long-chain fatty acid transport protein 1 Only RNA protein level 8 P27544 CERS1_HUMAN CERS1 Ceramide synthase 1 Only RNA protein level 9 Q9H313 TTYH1_HUMAN TTYH1 Protein tweety homolog 1 Low protein level 10 11 O14594 NCAN_HUMAN NCAN Neurocan core protein Low protein level 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 ACS Paragon Plus Environment 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Proteome Research Page 42 of 42

1 Chromosome 19 Expression in Glioma Stem Cells 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 RNAseq 24 25 26 Targeted C19 Transcriptomics 27 28 29 30 31 32 33 34 35 36 37 38 39 Quantitative Proteomics 40 ACS Paragon Plus Environment 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60