This Accepted Manuscript has not been copyedited and formatted. The final version may differ from this version. A link to any extended data will be provided when the final version is posted online.
Research Article: New Research | Disorders of the Nervous System
Brain region-specific gene signatures revealed by distinct astrocyte subpopulations unveil links to glioma and neurodegenerative diseases
Raquel Cuevas Diaz Duran1,2,3, Chih-Yen Wang4, Hui Zheng5,6,7, Benjamin Deneen8,9,10,11 and Jia Qian Wu1,2
1The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA 2Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX 77030, USA 3Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Ave. Morones Prieto 3000, Monterrey, N.L., 64710, Mexico 4Department of Life Sciences, National Cheng Kung University, Tainan City, 70101, Taiwan 5Huffington Center on Aging, Baylor College of Medicine, Houston, TX 77030, USA 6Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA 7Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA 8Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX 77030, USA 9Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA 10Neurological Research Institute at Texas’ Children’s Hospital, Baylor College of Medicine, Houston, TX 77030, USA 11Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, USA https://doi.org/10.1523/ENEURO.0288-18.2019
Received: 19 July 2018 Revised: 16 January 2019 Accepted: 12 February 2019 Published: 7 March 2019
J.Q.W. and B.D. designed research; J.Q.W., R.C.-D.D., C.-Y.W., and B.D. wrote the paper; R.C.-D.D. and C.- Y.W. performed research; R.C.-D.D. analyzed data; H.Z. and B.D. contributed unpublished reagents/analytic tools. Funding: http://doi.org/10.13039/100000002HHS | National Institutes of Health (NIH) R01 NS088353 1RF1AG054111 R01NS071153 Funding: The Staman Ogilvie Fund-Memorial Hermann Foundation Funding: Mission Connect Funding: http://doi.org/10.13039/100000890National Multiple Sclerosis Society (National MS Society) RG-1501-02756
Authors declare no competing financial interests.
Correspondence should be addressed to Benjamin Deneen, [email protected] or Jia Qian Wu, [email protected]
Cite as: eNeuro 2019; 10.1523/ENEURO.0288-18.2019
Accepted manuscripts are peer-reviewed but have not been through the copyediting, formatting, or proofreading process.
Copyright © 2019 Cuevas- et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed. Alerts: Sign up at www.eneuro.org/alerts to receive customized email alerts when the fully formatted version of this article is published.
1 Brain region-specific gene signatures revealed by distinct astrocyte 2 subpopulations unveil links to glioma and neurodegenerative diseases 3 4 Abbreviated Title: Astrocyte signatures linked to glioma and diseases 5 6 Raquel Cuevas-Diaz Duran1-3, Chih-Yen Wang4, Hui Zheng5-7, Benjamin Deneen8-11*, and 7 Jia Qian Wu1,2* 8 9 1 The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The 10 University of Texas Health Science Center at Houston, Houston, TX 77030, USA 11 2 Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of 12 Molecular Medicine, Houston, TX 77030, USA 13 3 Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Ave. Morones 14 Prieto 3000, Monterrey, N.L., 64710, Mexico 15 4 Department of Life Sciences, National Cheng Kung University, Tainan City, 70101, 16 Taiwan 17 5 Huffington Center on Aging, Baylor College of Medicine, Houston, TX 77030, USA 18 6 Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, 19 USA 20 7 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 21 77030, USA 22 8 Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX 77030, 23 USA 24 9 Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA 25 10 Neurological Research Institute at Texas’ Children’s Hospital, Baylor College of 26 Medicine, Houston, TX 77030, USA 27 11 Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, 28 USA 29 30 Author contributions 31 JQW and BD designed research. RCDD and CYW performed research. RCDD analyzed 32 data. HZ and BD contributed reagents/materials. RCDD, CYW, BD, and JQW participated 33 in writing the manuscript. All authors have read and approved the final manuscript.
1
34 35 Corresponding authors 36 * To whom correspondence should be addressed. Tel: +1 (713) 500-3421; Fax: +1 (713) 37 500-2424; Email: [email protected] 38 39 Number of 40 Figures: 6 41 Tables: 4 42 Multimedia: 0 43 Extended Data Figures: 11 44 Extended Data Tables: 5 45 46 Number of words in 47 Abstract: 206 48 Significance Statement: 64 49 Introduction: 748 50 Discussion: 2067 51 52 Acknowledgements 53 The authors would like to thank Shun-Fen Tzeng for supporting Chih-Yen and 54 Ms. Mary Ann Cushman for editing the manuscript. 55 56 Conflicts of Interest 57 Authors declare no competing financial interests. 58 59 Funding sources 60 JQW and RCDD were supported by grants from the National Institutes of Health R01 61 NS088353; The Staman Ogilvie Fund-Memorial Hermann Foundation; Mission Connect, a 62 program of the TIRR Foundation. BD was supported by grants from the National Institutes 63 of Health 1RF1AG054111 and R01NS071153; National MS Society RG-1501-02756. 64
2
65 Abstract
66 Currently, there are no effective treatments for glioma or for neurodegenerative diseases
67 due, in part, to our limited understanding of the pathophysiology and cellular heterogeneity
68 of these diseases. Mounting evidence suggests that astrocytes play an active role in the
69 pathogenesis of these diseases by contributing to a diverse range of pathophysiological
70 states. In a previous study, five molecularly distinct astrocyte subpopulations from three
71 different brain regions were identified. To further delineate the underlying diversity of
72 these populations, we obtained mouse brain region-specific gene signatures for both
73 protein-coding and long non-coding RNA (lncRNA) and found that these astrocyte
74 subpopulations are endowed with unique molecular signatures across diverse brain regions.
75 Additional gene set and single-sample enrichment analyses revealed that gene signatures of
76 different subpopulations are differentially correlated with glioma tumors that harbor distinct
77 genomic alterations. To the best of our knowledge, this is the first study that links
78 transcriptional profiles of astrocyte subpopulations with glioma genomic mutations.
79 Furthermore, our results demonstrated that subpopulations of astrocytes in select brain
80 regions are associated with specific neurodegenerative diseases. Overall, the present study
81 provides a new perspective for understanding the pathophysiology of glioma and
82 neurodegenerative diseases and highlights the potential contributions of diverse astrocyte
83 populations to normal, malignant, and degenerative brain functions.
84
85 Significance Statement
86 Mounting evidence suggests that astrocytes play an active role in disease
87 pathogenesis by contributing to a diverse range of pathophysiological states. The present
88 study identifies new layers of astrocyte diversity and new correlations between distinct
3
89 astrocyte subpopulations and disease states. Understanding the heterogeneity of astrocytes
90 in different physiological conditions and diseases will provide new avenues for improved
91 diagnostics and therapeutics targeting specific astrocyte subpopulations.
92
93 Introduction
94 Astrocytes make up about 40% of all cells in the human brain (Herculano-Houzel, 2014;
95 Zhang et al., 2016) and play essential roles in brain function by controlling extracellular
96 neurotransmitter and potassium ion levels, regulating the blood-brain-barrier, and
97 promoting synapse formation and function (Barres, 2008). Previously, astrocytes were
98 thought to be a homogeneous population of cells that tile the CNS and support neuronal
99 survival. However, recent genome-wide gene expression studies have shown that astrocytes
100 are highly heterogeneous (Cahoy et al., 2008; Doyle et al., 2008; Yeh et al., 2009). A recent
101 study demonstrated astrocyte heterogeneity in the brain by comparing gene expression
102 profiles of FACS-sorted Aldh1l1-GFP+ subpopulations and identified five astrocyte
103 subpopulations (A-E) across three brain regions (olfactory bulb, cortex, and brain stem).
104 Moreover, these astrocyte subpopulations exhibited functional differences related to
105 proliferation, migration, and synapse formation. (Lin et al., 2017).
106 Functional heterogeneity of astrocytes has been described under both normal
107 physiological conditions and diverse disease states (Denis-Donini et al., 1984; Hill et al.,
108 1996; Grass et al., 2004; White et al., 2010; Morel et al., 2017). During disease progression,
109 astrocyte populations respond to pathological conditions through a transformation called
110 reactive astrogliosis. Morphological and gene expression differences have been observed in
111 reactive astrocytes from different brain regions during the progression of neurodegenerative
112 diseases (Hill et al., 1996; Liddelow et al., 2017). Nervous system malignancies also have
4
113 astroglial origins. Gliomas, the most common primary malignancies in the CNS, are a
114 heterogeneous group of tumors characterized by their resemblance to glia. A parallel
115 between brain development and glioma tumorigenesis is apparent in that certain astrocyte
116 subpopulations are analogous to those that populate human glioma (Lin, et al 2017).
117 Despite the discovery of these links between diverse astrocyte populations and disease
118 states, how distinct subpopulations of astrocytes contribute to certain neurological disease
119 remains poorly defined. For example, given the plethora of genomic data available for
120 human glioma, whether key genetic mutations associated with glioma are also associated
121 with specific astrocyte/glioma subpopulation remains unknown. Thus, one of the goals of
122 this study is to identify new correlations between diverse astrocyte subpopulations and
123 distinct disease states.
124 In addition to correlations with disease states, another goal of this study is to unearth
125 additional layers of astrocyte heterogeneity. A novel study identified five distinct
126 subpopulations of astrocytes across three brain regions (Lin, et al. 2017). However,
127 astrocyte subpopulations between different brain regions were not compared, and the
128 expression of long non-coding RNAs (lncRNAs), a type of regulatory RNA that has been
129 shown to play important roles in CNS development, plasticity, and disease (Hung et al.,
130 2011; Guttman and Rinn, 2012; Batista and Chang, 2013; Zhang et al., 2014; Dong et al.,
131 2016) was not investigated. Thus, the other goal of this study is to further delineate the
132 underlying molecular diversity of astrocytes from existing datasets.
133 Towards the overarching goal of decoding the nature of astrocyte diversity, we used
134 advanced comparative bioinformatics to obtain region-specific (e.g., cortex-specific) and
135 regional subpopulation-enriched (e.g., cortex subpopulation A-enriched) astrocyte gene
136 signatures. In addition to these 15 astrocyte regional subpopulations (five subpopulations in
5
137 three brain regions), we also collected gene expression data sets for other astrocyte
138 subpopulations from different developmental stages and injury models (Rusnakova et al.,
139 2013; Zeisel et al., 2015; Gokce et al., 2016; Noristani et al., 2016; Hara et al., 2017; Wu et
140 al., 2017). Our analysis included a total of 42 astrocyte gene signatures. We implemented
141 an unsupervised single-sample enrichment method to correlate astrocyte subpopulation
142 gene signatures with The Cancer Genome Atlas (TCGA) lower-grade glioma (LGG) and
143 glioblastoma (GBM) samples including clinical sample features and key mutations. The
144 results show that specific astrocyte subpopulation gene signatures are more highly
145 correlated with certain glioma subtypes and genomic variants. For example, astrocyte
146 subpopulations B and C in all brain regions are significantly correlated with amplification
147 of the gene encoding EGFR (epidermal growth factor receptor). Additionally, astrocyte
148 subpopulations D and E were among the gene signatures highly correlated with LGG
149 samples bearing both mutation in IDH gene and 1p/19q codeletion. Furthermore,
150 correlations between astrocyte subpopulations and neurodegenerative diseases identified a
151 collection of genes that we validated in AD mouse models. Together, the present study
152 identifies new layers of astrocyte diversity, while making critical new connections between
153 these populations and neurological disease including glioma and neurodegeneration. Thus,
154 understanding astrocyte subpopulations has a broad utility that spans normal, malignant,
155 and degenerative brain functions, and will open new avenues for the development of
156 improved diagnostics and therapeutics to target specific astrocyte subpopulations.
157
158 Materials and Methods
159 Astrocyte subpopulation- and brain region-specific gene signatures
6
160 RNA-Seq datasets for five astrocyte subpopulations (A, B, C, D, E) from three brain
161 regions (olfactory bulb, brain stem, and cortex) were downloaded from the data repository
162 for a previous publication (GSE72826) (Lin et al., 2017). Data were obtained from both
163 male and female mice. For each brain region, non-astrocyte samples (Aldh1l1-GFP- cells)
164 were also downloaded. Briefly, reads were mapped to the mm10 mouse reference genome
165 downloaded from GENCODE (https://www.gencodegenes.org/) using TopHat v2.1.0
166 (Trapnell et al., 2009). Mapped reads were assembled using Cufflinks v2.2.1 (Trapnell et
167 al., 2012) and Fragments per Kilobase of transcript per Million mapped reads (FPKM)
168 values were obtained for both protein-coding and lncRNA genes using a published pipeline
169 (Dong et al., 2015; Cuevas Diaz Duran et al., 2016; Wu, 2016; Duran et al., 2017). Our
170 annotation file included 21,948 protein-coding genes (55,252 transcripts) and 29,232
171 lncRNA genes (49,189 transcripts). Any value of FPKM < 0.1 was set to 0.1 to avoid ratio
172 inflation (Quackenbush, 2002). Additionally, read counts for all annotated genes and
173 transcripts were calculated using HTSeq-count (Anders et al., 2015). FPKM and
174 normalized read count matrices are included in Extended Data Figure 2-1.
175 To obtain regional subpopulation-enriched gene signatures (e.g., for cortex
176 subpopulation A), we compared astrocyte subpopulations to non-astrocytes within each
177 region in a pairwise mode. Comparisons were performed using DESeq2 (Love et al., 2014)
178 with normalized counts. A binary results matrix was created with three rows (regions) and
179 five columns (subpopulations). Each cell in the matrix represented a comparison of an
180 astrocyte subpopulation from a specific region against the non-astrocyte sample in the same
181 region. These were referred to as regional subpopulation-enriched gene signatures. Genes
182 were considered significant if they were expressed (FPKM > 1) in at least one of the
183 replicates with an expression fold-change > 2 and FDR < 0.1%. The log-transformed fold-
7
184 changes and q-values of differentially expressed protein-coding and lncRNA genes from
185 different astrocyte subpopulations and regions were compared using a t-test (p-value <
186 0.05). (The details of all statistical analyses are listed in Table 1).
187 In order to identify region-specific gene signatures, we compared all astrocyte
188 samples from one region to astrocyte samples from the remaining two regions, yielding a
189 total of three comparisons. Comparisons were performed using DESeq2 (Love et al., 2014)
190 with normalized counts. Additionally, astrocyte samples from the same region were
191 compared to their corresponding non-astrocyte samples. A gene was differentially
192 expressed (DE) with FPKM > 1 in at least one of the replicates in each comparison, with an
193 expression fold-change > 4 at p-value < 0.05, and a fold-change against non-astrocyte
194 samples > 2 at p-value < 0.05.
195 Principal Components Analysis (PCA)
196 Principal Components Analysis (PCA) is an unsupervised, exploratory, and multivariate
197 statistical technique used to reduce the dimensionality of complex data sets while retaining
198 most of the variation (Jolliffe, 2002). PCA identifies vectors, or principal components (PC),
199 for which the variation in the data is maximal. Because each PC is a linear combination of
200 the original variables, it is possible to identify a biological interpretation of each component
201 that explains the differences between the samples. When using PCA in genome-wide
202 expression studies, the data set generally consists of a gene expression matrix with genes as
203 rows and samples as columns.
204 PCA was implemented by first generating a gene expression matrix of log2-
205 transformed FPKM values of DE genes for all the samples and their replicates, with regions
206 as rows and samples as columns, to yield a 4092 x 36 matrix. The “prcomp” R function (R
207 Core Team, 2015), which computes the singular value decomposition of the gene
8
208 expression matrix, was then implemented. Samples were considered as variables and PCs
209 that captured most of the variability in the samples were obtained.
210 To gain insight into the biological significance of the identified PCs and their
211 underlying regulatory factors, Gene Set Enrichment Analysis (GSEA) (Subramanian et al.,
212 2005) was performed. DE genes were ranked by the scores assigned according to each PC.
213 Using the MsigDB gene set database (Liberzon et al., 2011), the top enriched gene sets with
214 FDR < 0.01 and absolute normalized enrichment score (NES) > 2 were deemed significant.
215 Collection of astrocyte gene signatures
216 In addition to the 15 astrocyte regional subpopulations (five subpopulations in three brain
217 regions) previously described, we collected 25 gene expression data sets for other astrocyte
218 subpopulations from diverse studies in which astrocytes were purified using single-cell
219 microfluidics (Zeisel et al., 2015; Gokce et al., 2016; Wu et al., 2017), FACS-sorting
220 (Rusnakova et al., 2013; Noristani et al., 2016), or laser microdissection (Hara et al., 2017).
221 These diverse astrocyte subpopulations consisted of purified cells from brain (amygdala,
222 striatum, hippocampus, and cortex), different developmental stages, and from spinal cord
223 injury models. We used astrocyte gene signatures including both protein-coding and
224 lncRNA genes except those gene signatures obtained from literature where lncRNA
225 information was not reported. A brief description of each of these gene profiles is included
226 in Extended Data Figure 3-1. To determine the similarities between astrocyte gene
227 signatures, we calculated the Jaccard index. Indexes higher than 0.7 were considered
228 significant.
229 Gene set variation analysis (GSVA)
230 Glioma RNA-Seq data (counts) were downloaded from the Recount2 database (Frazee et
231 al., 2011). Clinical data from TCGA GBM dataset was collected from the Broad Institute
9
232 GDAC Firehose website (version 2016_01_28) and from the GlioVis website (Bowman et
233 al., 2017). For TCGA LGG dataset, the genomic data from the supplementary materials of
234 the PanGlioma TCGA paper were collected (Ceccarelli et al., 2016). Our analysis included
235 sample features comprising gene expression, histology, subtype, somatic mutations, copy-
236 number variations, 1p/19q codeletions (defined below), and other clinical features.
237 To compare astrocyte subpopulations and glioma samples, we adopted the GSVA approach
238 (Hänzelmann et al., 2013). GSVA is an unsupervised, non-parametric method used to
239 evaluate the degree to which genes in a gene signature are coordinately up- or
240 downregulated within a particular biological sample. The GSVA R package was used to
241 calculate an enrichment score for each astrocyte gene signature across individual glioma
242 samples and frequently encountered somatic mutations to determine whether an astrocyte
243 subpopulation is associated with those glioma subtypes or genomic variations. The
244 genomic features used for GBM samples included transcriptional subtype; mutations in
245 genes encoding TP53, EGFR, PTEN, NF1, PIK3R1, RB1, ATRX, IDH1, APOB, or
246 PDGFRA; amplifications of genes encoding EGFR, PDGFRA, or CDK4; and deletions of
247 genes encoding MTAP, PTEN, QKI, or RB1. Similarly, for LGG, the genomic features
248 queried included tumor grade, histological subtype, mutation in the IDH gene, codeletion of
249 1p/19q loci, mutations in genes encoding TP53, EGFR, PTEN, ATRX, IDH1, IDH2, CIC,
250 NOTCH1, FUBP1, PIK3CA, NF1, PIK3R1, SMARCA4, ARID1A, TCF12, ZBTB20,
251 PTPN11, PLCG1, or ZCCHC12; deletions of genes encoding PTEN, NF1, CDKN2C, and
252 CDKN2A; and amplifications of genes encoding PIK3CA, PIK3C2B, PDGFRA, MDM4,
253 MDM2, EGFR, or CDK4.
254 To evaluate whether the differences among groups of samples with different
255 genomic features (such as subtype, mutations, and copy-number variations) could be
10
256 explained using GSVA enrichment scores, ANOVA and Wilcoxon Rank-Sum tests were
257 conducted. Each genomic feature was correlated to all of the gene signatures by comparing
258 enrichment scores between groups of samples formed according to the different levels of
259 the genomic feature. A correlation matrix of transformed p-values was obtained for GBM
260 and LGG samples. Correlations were deemed significant with a p-value < 0.05 (FDR < 0.25
261 using the Benjamini & Hochberg adjustment for multiple comparisons). GSVA enrichment
262 score matrices and correlation matrices of transformed p-values are included in Extended
263 Data Figure 3-6.
264 Survival analysis
265 Correlations between astrocyte gene signatures and patient survival were calculated to
266 conduct Kaplan-Meier survival analyses. Clinical data was parsed to obtain patients’ age,
267 vital status (Dead/Alive), and months to death or months from most recent follow-up
268 depending on the patients’ vital status. Patient samples were filtered to use only those with
269 tumor purity greater than 70%. Classical GBM samples with EGFR amplification and LGG
270 samples with astrocytoma histology were used for survival analyses. For each astrocyte
271 gene signature, samples were categorized into two groups: (1) High (enrichment scores > 0)
272 and (2) Low (enrichment scores < 0). Kaplan-Meier survival plots were generated for each
273 astrocyte gene signature using the “survival” (Therneau, 2015) and “survminer”
274 (Kassambara A, Kosinski M, 2017) R libraries. The survival curves of samples with High
275 and Low enrichment scores were compared using log-rank tests. In order to determine the
276 combined survival effect of score enrichment and age, we applied multivariate Kaplan-
277 Meier survival analysis. For example, to estimate the combined effect of an astrocyte gene
278 signature and age, the samples were first divided according to whether they had High or
279 Low enrichment scores. Then within each group, samples were further stratified according
11
280 to the age at which the tumor was diagnosed. Samples were grouped by age as either
281 younger or older than 60 years (for GBM samples), or as 45 years (for LGG samples). Age
282 thresholds were obtained by creating a histogram of the “age at diagnosis” and selecting the
283 age at which the distribution is divided into two balanced sub-distributions representing the
284 younger and the older groups. The p-value obtained from the log-rank test was used to
285 indicate the statistical significance of correlations in survival between groups. Furthermore,
286 to determine the association of IDH mutations with the survival of samples correlated with
287 astrocyte gene signatures we constructed contingency tables and performed Fisher’s tests.
288 Associations with p-values < 0.05 were considered significant.
289 Gene set enrichment analysis
290 A comprehensive collection of gene sets was downloaded from the Molecular Signatures
291 Database (MsigDB) (Liberzon et al., 2011). Additional gene sets were obtained from
292 neurodegenerative disease studies in which gene expression was assessed (Extended Data
293 Figure 2-4). A hypergeometric statistical test (phyper R function) was used to determine
294 gene set enrichment using the DE gene list for each astrocyte subpopulation, region, and
295 regional subpopulation. Gene sets were considered enriched with FDR < 0.05 and gene
296 number > 5.
297 Immunofluorescence
298 Brain tissues from 5xFAD (Polito et al., 2014) and NLGF mice (Saito et al., 2014) (gifts
299 from Dr. Zheng Hui, Huffington Center on Aging, Baylor College of Medicine) were fixed
300 in 4% paraformaldehyde and sectioned at 20-μm thickness. The brain tissues were
301 permeabilized with 0.3% Triton™ X-100 (Thermo Fisher) in PBS and blocked with 2.5%
302 horse serum (Vector) for 1 h at room temperature. The sections were then incubated in PBS
303 with 0.1% Triton X-100 containing chicken anti-GFAP (1:500; Abcam) combined with
12
304 either rabbit anti-Adcy7 (1:100; Bioss), mouse anti-Serping1 (1:50; Santa Cruz), or rabbit
305 anti-Emp1 (1:50; Abcam) overnight at 4 °C. After incubation with secondary antibodies
306 conjugated with Alexa Fluor 488 and Alexa Fluor 568 or Alexa Fluor 647 (1:500;
307 Invitrogen) for 1 h at room temperature, the tissues were counterstained with DAPI (Sigma)
308 solution and mounted with mounting media (Vector).
309 Statistical analysis of immunostained images
310 Images of cortices were collected from at least three different tissues for each group, and
311 fluorescence intensity was measured using ImageJ software (NIH). Cells with positive
312 immunosignals over GFAP+ cells were counted in each image, which included
313 approximately 50 GFAP+ cells, sufficient for performing calculations. All of these data
314 were presented as mean ±SE and were analyzed to determine statistically significant
315 differences at p-value < 0.05 using the two-tailed unpaired Student’s t test.
316 Table 1. List of figures for each experiment indicating data structure, statistical tests
317 applied, and significance levels.
Figure Data structure Type of test Power Notes
Figure Gene Set Enrichment Kolmogorov- FDR < 0.01 |NES| > 2
1C, 1D, Analysis (GSEA) Smirnov
1E
Figure Differential DESeq2 FDR < 0.1 FPKM > 1,
2A, 2B expression analysis generalized normalized count
linear model fold-change > 2
(GLM)
13
Figure Gene set enrichment Hypergeometric FDR < 0.05 Gene number > 5
2C
Figure Differential DESeq2 p-value < 0.05 FPKM > 1,
2D expression analysis generalized normalized
linear model counts, fold-
(GLM) change > 4
(compared to
astrocyte
subpopulations
from other
regions),
Fold-change > 2
(compared to
non-astrocyte
sample).
Extended Normal distribution Unpaired p-value < 0.05
Data Student’s t-test
Figure 2-
2
Figure Correlations ANOVA, p-value < 0.05
3C, 4B Wilcoxon Rank- (FDR < 0.25)
Sum tests
Figure 5 Kaplan-Meier Log-rank test p-value < 0.05
14
survival plots Fisher’s test
Figure Immunostained two-tailed p-value < 0.05,
6A-C, images unpaired p-value < 0.01,
Extended Student’s t test p-value < 0.001
Data
Figure 6-
2B-D
318
319 Results
320 Transcription profiles of Aldh1l1-GFP+ astrocyte subpopulations demonstrate local
321 and regional heterogeneity
322 To determine whether identified astrocyte subpopulations demonstrate additional
323 heterogeneity across brain regions, we leveraged existing data sets (Lin, et al, 2017) and
324 adopted a pairwise comparison strategy using normalized counts. Previously, five different
325 Aldh1l1-GFP + astrocyte subpopulations were isolated from three brain regions (olfactory
326 bulb, cortex, and brain stem). In that study, transcripts from astrocyte subpopulations and
327 from the Aldh1l1-GFP- reference population were sequenced and subpopulation-specific
328 gene signatures were obtained (Lin et al., 2017). To include lncRNAs in the gene
329 signatures, we re-mapped the purified astrocyte subpopulations from different brain regions
330 and their corresponding non-astrocyte samples to the mm10 mouse reference genome using
331 a published pipeline (Duran et al., 2017). For gene quantification we used a comprehensive
332 protein-coding and lncRNA annotation file including 21,948 protein-coding genes (55,252
333 transcripts) and 29,232 lncRNA genes (49,189 transcripts).
15
334 We implemented Principal Components Analysis (PCA) using a gene expression
335 matrix consisting of differentially expressed (DE) genes between regions as rows and
336 samples with replicates as columns, yielding a 4092 x 36 matrix. We hypothesized that
337 each principal component (PC) is associated with an underlying factor regulating gene
338 expression and that such factors might explain the variability between subpopulations and
339 regions. PCA analysis resulted in 36 uncorrelated and orthogonal PCs that account for the
340 variability in the gene expression matrix. The first four components explain 82.3% of the
341 variability of these samples. To gain biological insights for the selected PCs, we plotted the
342 samples in their transformed component space. Figure 1A depicts the plotted samples in the
343 2D-space formed by PC1 and PC2. PC1 explains 68.98% of the variability and clusters the
344 samples into two main groups: Non-astrocyte (shapes without filling) and Astrocyte (color-
345 filled shapes) subpopulations. The majority of samples clustered well; only a few outliers
346 appeared in the non-astrocyte sample space. Similarly, the combined effect of PC1 and PC2
347 is helpful for discriminating among subpopulations, as shown by the elliptical subspaces
348 (Figure 1A) that enclose shapes mostly of the same color. In order to cluster samples by
349 region, we plotted PC3 and PC4, as shown in Figure 1B. The elliptical subspaces in the
350 figure demonstrate the separation of samples into the olfactory bulb, cortex, and brain stem
351 regions.
352 To validate the grouping of samples accomplished with PCA, we performed GSEA
353 using differentially expressed (DE) genes ranked by the scores assigned in each PC. For
354 PC1, the top enriched gene sets were GO_MYELIN_SHEATH and
355 LEIN_ASTROCYTE_MARKERS (Figure 1C). The top-enriched gene sets for PC2 are
356 GO_SYNAPTIC_SIGNALING and VERHAAK_GLIOBLASTOMA_MESENCHYMAL
357 (Figure 1D). The top-enriched gene sets for PC3 are
16
358 GO_MICROTUBULE_ASSOCIATED_COMPLEX and
359 OXIDATIVE_PHOSPHORYLATION (Figure 1E). Astrocyte subpopulation B (combined
360 from all three brain regions) is mostly correlated with the mesenchymal glioblastoma gene
361 signature, as the enriched genes in this gene set (e.g., Scpep1, Rac2, Blcrb, Mrc2, Cd14,
362 and C5ar, among others) are upregulated in this subpopulation.
363 Differentially expressed protein-coding genes and lncRNAs depict regional
364 subpopulation-specific gene signatures and gene sets
365 To obtain regional subpopulation enriched genes, we implemented a pairwise comparison
366 strategy using normalized counts. For each brain region, we compared the diverse astrocyte
367 subpopulations against their corresponding non-astrocyte samples. Genes with FPKM > 1,
368 fold-change > 2, and FDR < 10% were considered to be differentially expressed (DE). The
369 number of unique DE genes per subpopulation and brain region are listed in Table 2.
370 Significant numbers of DE lncRNAs contribute to the regional subpopulation gene
371 signatures. Figure 2A shows a heatmap of the log2-transformed FPKM values of cortex
372 subpopulation B-enriched genes (protein-coding and lncRNAs) compared to other
373 subpopulations. The heatmap in Figure 2B illustrates the log2-transformed FPKM values of
374 the 317 regional subpopulation upregulated lncRNA genes. After using a statistical test
375 (unpaired two-tailed t-test with p-value < 0.05), we observed that the fold-changes of DE
376 lncRNAs from cortex subpopulations B-E, and olfactory bulb subpopulations A, B, C, and
377 E had higher fold-change than their corresponding DE protein-coding genes. Extended Data
378 Figure 2-2 shows the comparison between fold-changes and q-values of DE protein-coding
379 and lncRNA genes from different astrocyte subpopulations and regions.
380 Table 2. Regional subpopulation-specific DE protein-coding and lncRNA genes.
17
Brain Region
Subpopulation Type Olfactory Brain Cortex Bulb Stem
Protein- UP 86 5 29
coding DOWN 51 0 6 PopA UP 12 1 6 lncRNA DOWN 5 0 1
Protein- UP 294 690 55
coding DOWN 207 324 4 PopB UP 42 138 19 lncRNA DOWN 18 23 0
Protein- UP 268 271 197
coding DOWN 75 94 21 PopC UP 20 15 6 lncRNA DOWN 10 11 15
Protein- UP 28 5 101
coding DOWN 16 1 18 PopD UP 16 4 20 lncRNA DOWN 1 1 6
Protein- UP 68 43 0
coding DOWN 289 18 0 PopE UP 35 13 0 lncRNA DOWN 27 2 0
381
382 A set of gene signature DE genes with highest expression in cortex subpopulation B
383 compared to cortex non-astrocytes include, for example, MER Proto-Oncogene Tyrosine
384 Kinase (Mertk), Ras Homolog Family Member B (Rhob), and Signal Regulatory Protein
18
385 Alpha (Sirpa). (see discussion). Similarly, a set of DE lncRNAs in cortex subpopulation B
386 with the highest expression compared to non-astrocyte samples include, for example,
387 Gm37524,Gm3764, and Junos. We used DE genes to find the enrichment of gene sets. The
388 top five significantly enriched gene sets found in cortex subpopulation B are listed in
389 Figure 2C (upper panel) and they include “BLALOCK_ALZHEIMERS_DISEASE_UP”,
390 “GO_CELLULAR RESPONSE_TO_CYTOKINE_STIMULUS”,
391 “GO_CELLULAR_REPONSE_TO_STRESS”, “GO_WOUND_HEALING”, and
392 “GO_BIOLOGICAL_ADHESION”.
393 A set of gene signature DE protein-coding genes with highest expression in cortex
394 subpopulation C include, for example, Glial High Affinity Glutamate Transporter (Slc2a1),
395 Carboxypeptidase E (Cpe), and Transmembrane Protein 47 (Tmem47). DE lncRNAs
396 enriched in cortex subpopulation C include Gm13872, Gm26672, and Gm37885. The top-
397 enriched gene sets with DE genes found in cortex subpopulation C are shown in Figure 2C
398 (lower panel). Enriched gene sets include
399 “GO_GENERATION_OR_PRECURSOR_METABOLITES_AND_ENERGY”,
400 “GO_MITOCHONDRIAL_ATP_SYNTHESIS_COUPLED_PROTON_TRANSPORT”,
401 “KEGG_PARKINSONS_DISEASE”, and “KEGG_HUNTINGTONS_DISEASE”.
402 Extended Data Figure 2-3 depicts examples of the top-enriched gene sets specific to each
403 regional astrocyte subpopulation. The complete list of gene sets and their enrichment scores
404 can be found in Extended Data Figure 2-4.
405 To evaluate regional differences in the various astrocyte subpopulations, we
406 compared all astrocyte subpopulations from one particular region against astrocyte samples
407 from the remaining regions (FPKM > 1, fold-change > 4, p-value < 0.05). To eliminate the
408 effect of genes also expressed in non-astrocyte subpopulations, we compared the region-
19
409 specific astrocyte subpopulations against their corresponding non-astrocyte samples
410 (FPKM > 1, fold-change > 2, p-value < 0.05). The differential expression of a gene was
411 considered significant if it met the criteria for both comparisons (region-specific and
412 astrocyte-specific). Table 3 shows the number of DE protein-coding and lncRNAs
413 identified from each brain region. Figure 2D depicts the log2-transformed FPKM values of
414 DE genes from all brain regions. Insulin-like Growth Factor 1 (Igf1), Flavin-containing
415 Monooxygenase 1 (Fmo1), and Brain-specific Angiogenesis Inhibitor 1 (Bai1 or Adgrb1).
416 are among the top DE protein-coding genes with highest expression in olfactory bulb
417 astrocyte subpopulations. Membrane Frizzled-related Protein (Mfrp), Integrin Subunit
418 Alpha 9 (Itga9), and Transmembrane Protein 218 (Tmem218) are within the top DE
419 protein-coding genes in brain stem astrocyte subpopulations. Similarly, Scavenger Receptor
420 Class A Member 3 (Scara3), Leucine Rich Repeat Containing 10B (Lrrc10b), and Cristallin
421 Mu (Crym) are included in the top DE protein-coding genes obtained from astrocyte
422 subpopulations of the cortex.
423 Table 3. Number of region-specific DE protein-coding and lncRNA genes.
Unique DE genes Total DE Brain Region Protein-coding lncRNA genes UP DOWN UP DOWN
Olfactory Bulb (ob) 35 13 15 1 64
Cortex (ctx) 6 7 3 0 16
Brain Stem (bs) 21 8 5 3 37
424
425 Upregulated astrocyte gene signatures of several subpopulations and regions correlate
426 with glioblastoma samples
20
427 To be comprehensive in studying astrocyte heterogeneity, in addition to the five astrocyte
428 subpopulations from three brain regions described, we also collected gene expression data
429 sets for other astrocyte subpopulations from diverse studies in which astrocytes were
430 purified through using single-cell microfluidics (Zeisel et al., 2015; Gokce et al., 2016; Wu
431 et al., 2017), FACS-sorting (Rusnakova et al., 2013; Noristani et al., 2016), or laser
432 microdissection (Hara et al., 2017). The gene signatures collected were derived from
433 diverse subpopulations of purified astrocytes from brain (amygdala, striatum, hippocampus,
434 and cortex), different developmental stages, and from spinal cord injury (SCI) models.
435 Only protein-coding genes were provided in these datasets. These astrocyte subpopulation
436 gene signatures were combined with the regional astrocyte subpopulation specific
437 signatures described previously, yielding a total of 42 gene signatures (Extended Data
438 Figure 3-1) of which 17 included both protein-coding and lncRNA genes. In each gene
439 signature profile, genes are significantly upregulated relative to the non-astrocyte sample,
440 the remaining astrocyte subpopulations, or both. To determine the similarity among gene
441 signatures, the Jaccard indexes were calculated and a heatmap was built (Extended Data
442 Figure 3-2). The Jaccard index indicates the percent of similarity due to the overlap of
443 genes in gene signatures. The highest similarity index between the gene signatures obtained
444 from literature and our 15 astrocyte gene signatures was 7% (olfactory bulb subpopulation
445 C and HIPPOCAMPUS_TOP240), indicating that they are quite different. The inclusion of
446 other datasets from the literature provided additional information. As observed in the
447 aforementioned heatmap, similarity indexes between the majority of the gene signatures are
448 low. Interestingly, the similarity index between cortex injury and cortex development gene
449 signatures was the highest (Jaccard index = 0.87).
21
450 We adopted the Gene Set Variation Analysis (GSVA) approach (Hänzelmann et al.,
451 2013) to determine the degree of similarity between astrocyte subpopulations and glioma
452 samples. GSVA is an unsupervised non-parametric method for evaluating the degree to
453 which genes in a gene signature are coordinately up- or downregulated within a certain
454 biological sample. We used GSVA with the 42 astrocyte subpopulation gene signatures as
455 gene sets and obtained enrichment score matrices for glioblastoma and lower-grade glioma
456 from TCGA. The enrichment score of a gene signature may be positive or negative and it
457 provides evidence of the coordinated up or downregulation of the members of that gene
458 signature in a particular sample. Enrichment scores are obtained without prior knowledge
459 of the sample phenotype. In our analysis, a positive enrichment score indicates a correlation
460 between a glioma sample and the specific astrocyte subpopulation from which the gene
461 signature was derived.
462 The heatmap in Figure 3A shows the resulting enrichment scores of the GSVA
463 between 103 TCGA GBM samples with a tumor purity greater than 70% and 21
464 representative astrocyte gene signatures due to limited space. The enrichment score
465 heatmap of TCGA GBM samples with all 42 astrocyte gene signatures is included in
466 Extended Data Figure 3-3. The 103 TCGA GBM samples represented 47, 22, and 34 cases
467 of classical, mesenchymal, and proneural transcriptional subtypes, respectively, as
468 previously defined (Wang et al., 2018). Nearly 77% (36 out of 47) of the glioblastoma
469 samples within the classical subtype exhibited EGFR gene amplification. The top five
470 positively enriched gene signatures with the highest number of classical subtype samples
471 carrying an EGFR amplification were amygdala_top50, striatum_top50,
472 hippocampus_top240, PopC_olfactory_bulb, and PopA_brain stem. For each gene
473 signature, the number of samples with positive scores are shown in Figure 3B. For viewing
22
474 clarity, Figures 3B and 3C display results using 21 representative astrocyte gene signatures.
475 Please refer to Extended Data Figure 3-4 for displays using all astrocyte gene signatures.
476 To evaluate whether the differences among groups of samples with different
477 genomic features (such as subtype, mutations, and copy-number variations) could be
478 explained with GSVA enrichment scores, ANOVA and Wilcoxon Rank-Sum tests were
479 performed. Each genomic feature was correlated to all gene signatures by comparing
480 enrichment scores between samples grouped according to the different levels of the
481 genomic features. The correlation matrix of genomic features and 21 representative
482 signature scores is depicted in Figure 3C. Correlations were deemed significant with a p-
483 value < 0.05. Significant positive correlations with the Classical GBM subtype samples
484 were found in 11 out of 42 astrocyte gene signatures including brain stem subpopulation A,
485 olfactory bulb (subpopulations A, C, and D), amygdala, striatum, and hippocampus.
486 Classical GBM subtype had the highest number of positively correlated astrocyte gene
487 signatures, consistent with a previous report (Verhaak et al., 2010). Mesenchymal and
488 Proneural GBM subtypes had either negative or no significant correlations. No significant
489 positive correlations were found for GBM samples with TP53 mutations.
490 A large proportion of astrocyte gene signatures were significantly positively
491 correlated with samples carrying amplifications of the gene encoding EGFR (62% 26 out of
492 42) and deletions of the genes encoding CDKN2A and MTAP (50% and 45%). Gene
493 signatures of subpopulations A, B, and C, brain stem (subpopulations A, C), olfactory bulb
494 (subpopulations A, B, C, E), cortex (subpopulations B, C), amygdala, striatum,
495 hippocampus, reactive astrocytes (RA), and scar-forming astrocytes (SA) in SCI had
496 significant positive correlations with GBM samples with a combined set of genomic
497 features that included amplification of the EGFR gene and deletions of the CDKN2A and
23
498 MTAP genes. Boxplots depicting the correlation of regional astrocyte subpopulation gene
499 signatures with GBM samples with EGFR amplification is included in Extended Data
500 Figure 3-5. Gene signatures of subpopulations A and C from brain stem and olfactory bulb,
501 and from cortex subpopulation B, were positively correlated with EGFR amplifications
502 with p-value < 0.001.
503 Upregulated astrocyte gene signatures in several subpopulations and regions correlate
504 with LGG samples
505 A total of 160 LGG samples (tumor purity > 70%) from TCGA were used for analysis of
506 correlation with astrocyte gene signatures. The histological classifications of the LGG
507 samples used included 44 astrocytomas, 45 oligoastrocytomas, and 70 oligodendrogliomas.
508 Figure 4A shows the enrichment scores obtained using GSVA. The upper section of the
509 heatmap shows the classification of different co-ocurring genomic alterations previously
510 shown to be related to glioma histology (Cancer Genome Atlas Research Network et al.,
511 2015). Two groups are formed corresponding mostly of oligodendroglioma (right side) and
512 a mixture enriched in astrocytoma and oligoastrocytoma histologies (left side). As
513 previously demonstrated (Cancer Genome Atlas Research Network et al., 2015), the
514 mixture of astrocytoma and oligoastrocytoma samples is characterized modestly by
515 mutations of IDH without codeletions of 1p/19q loci, TP53 mutations, and ATRX
516 mutations. The oligodendroglioma group is enriched in samples carrying mutations of IDH
517 gene combined with codeletions of 1p/19q loci, and mutations in the TERT promoter. A
518 cluster of positive high enrichment scores between LGG samples and astrocyte gene
519 signatures is observed on the left side of the heatmap corresponding mostly to the mixture
520 of astrocytoma and oligoastrocytoma histologies. Brain stem subpopulation A, olfactory
521 bulb subpopulation D, and RA SCI astrocyte gene signatures were significantly correlated
24
522 with astrocytoma LGG samples. Amygdala (AS1-AS3) and cortex development gene
523 signatures were correlated with oligoastrocytoma histology. Subpopulations D and E,
524 cortex subpopulation A, brain stem subpopulation D, cortex injury (B2), and cortex
525 development gene signatures were correlated with oligodendroglioma. The heatmap in
526 Figure 4B shows the correlation matrix between 21 astrocyte gene signatures and LGG
527 samples with diverse genomic features due to limited space. Extended Data Figure 4-1
528 includes a heatmap representing the correlation between LGG samples and all 42 astrocyte
529 gene signatures. Approximately 14% and 7% of the 42 astrocyte gene signatures were
530 positively correlated with grade 2 and grade 3 LGG samples, respectively.
531 Mutation in the gene encoding IDH1 or IDH2 and complete deletion of both the
532 short arm of chromosome 1 and the long arm of chromosome 19 (1p/19q codeletion) are
533 markers frequently used in clinical practice for classification of LGG samples. Both are
534 indicators of favorable prognosis. We used IDH status as a genomic feature with the
535 following categories: wild-type IDH, mutant IDH with codeletion of 1p/19q, and mutant
536 IDH with no codeletion of 1p/19q. Approximately 45% and 40% of the gene signatures
537 were positively correlated with wild-type IDH samples and mutant IDH samples with no
538 1p/19q codeletion, respectively. Subpopulation B, brain stem (subpopulations A, B),
539 olfactory bulb (subpopulations A, B, D, E), cortex (subpopulations B, E), hemisection,
540 transection, hippocampus (ASTRO1), and RA SCI astrocyte gene signatures correlated
541 significantly with both wild-type IDH and mutant IDH samples with no1p/19q codeletion.
542 Figure 4C shows a heatmap displaying the proportion of samples featuring each IDH status
543 variant with positive enrichment scores for 21 representative astrocyte gene signatures
544 (Extended Data Figure 4-1 displays all 42 astrocyte gene signatures). Extended Data Figure
545 4-2:A-C shows the enrichment score matrices obtained with each IDH status variant. For
25
546 wild-type IDH, olfactory bulb (subpopulations A, B, D, and E), cortex (subpopulations B
547 and D), and brain stem subpopulation A gene signatures have positive enrichment scores
548 with the highest proportion of samples. Mutant IDH samples without the 1p/19q codeletion
549 had high enrichment scores with amygdala gene signatures (AS1-AS3). Mutant IDH
550 samples carrying the 1p/19q codeletion had high enrichment scores with subpopulations D
551 and E, cortex subpopulation A, cortex injury (B2), cortex development (A1, A2), and SA
552 SCI gene signatures. Table 4 shows the distribution of histology class and IDH status of the
553 160 LGG samples used in our analysis. The wild-type IDH and mutant IDH without 1p/19q
554 codeletion groups consisted mostly of astrocytomas (44% and 45%), whereas mutant IDH
555 samples with the 1p/19q codeletion were mostly oligodendrogliomas (80%). Extended Data
556 Figure 4-3A shows the enrichment scores for LGG samples with an astrocytic histology
557 and wild-type IDH or mutant IDH with no 1p/19q codeletion. Brain stem subpopulation A
558 and olfactory bulb subpopulations D and E are among the top five gene sets with positive
559 enrichment scores that have the highest number of samples with the aforementioned
560 features.
561
562 Table 4. Histological classification of LGG samples with tumor purity greater than 70% used in this
563 study in each IDH status group.
IDH wild-type Mutant IDH + Mutant IDH +
Histological class (27 samples) no 1p/19q 1p/19q
codeletion codeletion
(71 samples) (61 samples)
Astrocytoma 12 32 0
Oligoastrocytoma 8 25 12
26
Oligodendroglioma 7 13 49
564
565 Mutations in the promoter of the gene encoding TERT were positively correlated
566 with astrocyte gene signatures (cortex development, cortex injury, SA SCI) in 12% of the
567 LGG samples. Positive correlations were found in 36% of the gene signatures for LGG
568 samples carrying mutations in the CIC gene. Astrocyte subpopulation gene signatures
569 (PopD, PopE, Cortex, and PopA cortex) clustered a subset of LGG samples into
570 oligodendrogliomas with IDH mutation and 1p/19q codeletion, TERT promoter mutations,
571 and CIC gene mutations. The heatmap in Extended Data Figure 4-3B contains the
572 enrichment scores of samples with an oligodendroglioma histology, mutant TERT
573 promoter, mutant CIC gene, and mutant IDH gene combined with the 1p/19q codeletion.
574 Kaplan-Meier survival plots are commonly used to assess treatment efficacy in
575 clinical trials. In order to determine whether astrocyte gene signatures are correlated with
576 the survival of glioma patients, we performed survival analysis using the survival R library
577 (Therneau, 2015). Kaplan-Meier estimates of overall survival among patients who had
578 LGG samples with an astrocytoma histology were obtained for each gene signature. We
579 grouped samples into negative (Low) and positive (High) enrichment scores, with and
580 without age as a factor, as described in Materials and Methods. Figure 5A-E depicts
581 significant (p-value < 0.05) survival plots with distinct astrocyte gene signatures. Better
582 prognosis (without an age effect) was observed for patients with astrocytoma LGG samples
583 correlated with brain stem subpopulation A (Figure 5A) and subpopulation D (Figure 5C)
584 gene signatures. Age had a greater effect than enrichment score on the probability of
585 survival for patients with astrocytoma LGG samples correlated with cortex subpopulation
586 A (Figure 5B), cortex subpopulation D (Figure 5D), or olfactory bulb subpopulation D
27
587 (Figure 5E). Astrocytoma LGG patients older than 45 years with profiles positively
588 associated with subpopulation D in cortex (Figure 5D) and olfactory bulb (Figure 5E) had
589 worse prognoses, as did LGG patients older than 45 years with Low enrichment scores for
590 profiles associated with cortex subpopulation A (Figure 5B).
591 To test whether the observed differences in survival (Figure 5) can be explained by
592 specific genetic mutations associated with subtypes of LGG tumors, we performed statistical
593 analysis. We used IDH mutation, a common previously known mutation of glioma samples.
594 We obtained 2x2 contingency tables with the number of LGG samples classified as “High”
595 and “Low” (correlation with gene signatures) and as “IDHwt” or “Mutated_IDH”. Then we
596 used Fisher’s test to determine whether the proportion of samples classified as “High” or
597 “Low” was different depending on the number of samples with and without IDH mutations.
598 We found that for some gene signatures (brainstem subpopulation A, cortex subpopulation A,
599 and cortex subpopulation D) the p-value of Fisher’s test was not significant (p-values=
600 0.4606, 0.09365 and 0.08331 respectively), therefore there is no statistical evidence to
601 demonstrate that there are differences in the enrichment scores between IDHwt and mutated
602 IDH. However, for subpopulation D and olfactory bulb subpopulation D, the p-value was
603 0.04211 and 0.001592 respectively, thus, the correlation between survival and enrichment
604 scores among these samples could possibly be affected by mutations of IDH.
605 Astrocyte subpopulation gene signatures correlate with neurodegenerative disease
606 gene sets
607 In addition to correlating the signatures of astrocyte subpopulations with glioma samples,
608 we also investigated their correlation with neurodegenerative diseases. We collected gene
609 sets derived from neurodegenerative disease studies (Extended Data Figure 2-4) from the
610 literature and combined them with related gene sets from the MsigDB database (Liberzon
28
611 et al., 2011). Using a hypergeometric test, the DE genes from each regional astrocyte
612 subpopulation were analyzed to identify the enrichment of gene sets related to
613 neurodegenerative diseases including AD, HD, and PD. Several neurodegenerative disease
614 gene sets were found to be enriched using regional subpopulation gene signatures at FDR <
615 0.05. As an example, “BLALOCK ALZHEIMERS DISEASE UP” was significantly
616 enriched with 97 upregulated genes from the cortex subpopulation B gene signature.
617 Extended Data Figure 6-1A shows a violin plot representing the normalized counts of the
618 97 upregulated genes from the cortex subpopulation B gene signature. The gene expression
619 of these 97 genes is also high in the subpopulation B profiles of brain stem and olfactory
620 bulb, due to their inherent similarity. The “LABADORF_HUNTINGTONS_UP” gene set
621 was significantly enriched with 89 DE genes from the cortex subpopulation B gene
622 signature (Extended Data Figure 6-1B). DE genes from cortex subpopulation C were
623 enriched for “KEGG HUNTINGTONS DISEASE” and “KEGG PARKINSONS
624 DISEASE” gene sets, with 13 and 11 genes respectively (Extended Data Figure 6-1C). The
625 complete list of significantly enriched neurodegenerative disease gene sets can be found in
626 Extended Data Figure 6-3.
627 The expression of astrocyte genes enriched in the Alzheimer’s disease gene set was
628 validated using immunostaining in mouse disease models
629 Significant enrichment of the AD gene set was found in DE genes from cortex astrocyte
630 subpopulation B. For validation, we selected three genes with gene expression fold-change
631 > 4 compared to the corresponding non-astrocyte samples. We used two mouse models of
632 AD: 5xFamilial Alzheimer’s disease (5xFAD) (Polito et al., 2014) and APPNLGF (NLGF)
633 (Saito et al., 2014). AD pathology was confirmed in the cortex as an increase in GFAP+
634 astrocytes as well as detectable hypertrophy, as shown in Extended Data Figure 6-2A.
29
635 Because subpopulation B astrocytes reside mainly in the inner cortex (Lin et al., 2017),
636 tissue sections from these brain regions were analyzed. The protein expression of Adcy7,
637 Serping1, and Emp1 genes was analyzed using immunofluorescence in inner cortex brain
638 tissue sections. Expression of Adcy7 protein was identified in 60% of WT cortical
639 astrocytes. In contrast, more than 80% of cortical astrocytes from 5xFAD mice models
640 expressed Adcy7 protein with an average fold-change of 1.5 r0.18 and a p-value < 0.05
641 (Figure 6A). Protein expression of the Serping1 gene was found in approximately 50%
642 cortical WT astrocytes and was upregulated in AD with a fold-change of 1.72 r0.19 and p-
643 value < 0.05 (Figure 6B). Similarly, nearly 50% and 80% of cortical WT and AD
644 astrocytes, respectively, expressed the Emp1 protein, which was upregulated 3.57 r0.27
645 times with a p-value < 0.001 (Figure 6C).
646 Microarray studies have reported two types of reactive astrocytes, A1 and A2
647 astrocytes, that are induced after neuroinflammation and ischemia, respectively (Zamanian
648 et al., 2012; Liddelow et al., 2017). Each type of astrocyte has its own reactive profile with
649 differences in transcriptome, morphology, and function. Since Serping1 protein is one of
650 the known markers for A1 reactive astrocytes and Emp1 protein is one of the known
651 markers for A2, we evaluated their combined expression in cortical astrocytes. Figure 6D
652 shows the single and combined fluorescence images in both WT and 5xFAD cortical
653 tissues. The immunostaining results from the NLGF AD mouse model are included in
654 Extended Data Figure 6-2:B-E and were similar to results from the 5xFAD model.
655
656 Discussion
30
657 Currently there is no effective treatment for glioma or for neurodegenerative
658 diseases such as AD, PD, and HD, because our understanding of their pathophysiology is
659 far from complete. Mounting evidence indicates that astrocytes are involved in various
660 neurological diseases. Moreover, it is now known that astrocytes are functionally diverse
661 and respond differently to pathological conditions.
662 Astrocytes are heterogeneous in morphology, developmental origin, gene
663 expression profile, and physiological properties (Zhang and Barres, 2010). Previous
664 research demonstrated the existence of at least five different astrocyte subpopulations from
665 three brain regions (olfactory bulb, cortex, and brain stem) (Lin et al., 2017). Analogous
666 astrocyte subpopulations were found in human and mouse glioma. Astrocyte subpopulation
667 C and its analog in glioma were enriched in the expression of genes associated with
668 synapse formation and epilepsy. In the present study, we expanded the genomic analyses of
669 these heterogeneous astrocyte subpopulations to include the expression of lncRNA genes
670 and the identification of gene signatures enriched in astrocyte subpopulations within each
671 brain region. We found that even though astrocyte subpopulations share similarities across
672 regions, there are interesting differences among them in gene expression. Astrocyte
673 subpopulations display distinct lncRNA gene expression patterns among brain regions
674 highlighting the contribution of lncRNAs to the different astrocyte gene signatures and to
675 potential regulatory functions.
676 We selected subsets of DE protein-coding genes from astrocyte gene signatures to
677 gain insight into the functions of astrocyte subpopulations in different brain regions. Highly
678 expressed DE protein-coding genes enriched in cortex subpopulation B included Mertk and
679 Sirpa, both known as synaptic phagocytic genes. MERTK interacts with the integrin
680 pathway regulating CdKII/DOCK180/Rac1 modules, which control the rearrangement of
31
681 the actin cytoskeleton during phagocytosis (Wu et al., 2005). Previous research has
682 demonstrated that astrocytes in both the developing and the adult CNS contribute to neural
683 circuit remodeling by phagocytosing synapses via the MEGF10 and MERTK pathways in
684 response to neural activity (Chung et al., 2013). However, even though Mertk is still
685 expressed in mature astrocytes, synaptic phagocytosis declines in adult CNS (Chung et al.,
686 2013); thus inhibitory mechanisms might be upregulated. SIRPA membrane receptor
687 recognizes CD47 and upon binding inhibits synapse phagocytosis (Barclay and van den
688 Berg, 2014). Researchers have found that the expression of Sirpa membrane receptor is also
689 upregulated in mature astrocytes (Sloan et al., 2017). In the previous research, cortex
690 subpopulation B was found at a later stage during cortical development compared to
691 subpopulations A, C, and E (Lin et al., 2017). MERTK and SIRPA potentially regulate
692 synaptic density in cortex subpopulation B during late cortical development.
693 Slc1a2 and Cpe are among the highly expressed DE protein coding genes found in
694 cortex subpopulation C samples. SLC1A2 is a glial amino acid transporter which plays a
695 major role in synaptic glutamate clearance. The dysfunction of SLC1A2 (commonly
696 observed in neurodegenerative diseases) causes elevated levels of glutamate which yield
697 neuronal damage (Lin et al., 2012). In a previous study, co-cultures with neurons
698 demonstrated that cortex subpopulation C astrocytes significantly enhanced synapse
699 formation as compared to subpopulation A or bulk astrocytes (Lin et al., 2017). Further
700 functional studies are required to determine if the upregulation of Slc1a2 is related to the
701 enhancement of synapse function. Another DE protein coding gene found with high
702 expression in cortex subpopulation C is Cpe. CPE is a peptidase which may also function
703 as a neurotrophic factor promoting neuronal survival. Interestingly, CPE has been found to
704 be secreted by cultured astrocytes (Klein et al., 1992) and its overexpression in a mouse
32
705 model of glioma was found to mitigate cell migration (Armento et al., 2017). Through
706 Transwell assays, astrocyte cortex subpopulation C was identified as having less migratory
707 potential than subpopulation A (Lin et al., 2017). Our results suggest a potential link
708 between the upregulation of CPE peptidase and cell migration. Overall, astrocyte gene
709 signatures may be used to gain insight into their region and subpopulation specific
710 functions.
711 PCA and GSEA analyses yielded significantly enriched gene sets relevant to
712 functions in specific astrocyte subpopulations and brain regions. Through PCA, we found
713 that PC1 separated samples into non-astrocytes and astrocyte samples, whereas PC2
714 clustered samples according to subpopulation. The gene set GO_MYELIN_SHEATH was
715 among the top enriched sets using PC1 scores. The enrichment of this gene set is likely due
716 to the upregulation of astrocyte-derived factors which support oligodendrocyte myelination
717 in the CNS. Together, PC1 and PC2 captured nearly 69% of the variability. PC3 and PC4
718 further separated samples according to brain region. Our results demonstrate the
719 heterogeneity in astrocyte gene expression profiles due to subpopulation and brain region.
720 Many genes become upregulated in astrocytes in response to injuries and diseases
721 through a transformation called reactive astrogliosis. As a functionally diverse cell
722 population, the responses of astrocytes to injury and disease are equally diverse. Barres and
723 colleagues (Zamanian et al., 2012) found that astrocytes exist in at least two reactive states,
724 A1 (inflammatory) and A2 (ischemic). A1 neuroinflammatory reactive astrocytes exhibit
725 upregulation of complement cascade genes that leads to loss of synapses. In contrast, A2
726 ischemic reactive astrocytes display upregulation of neurotrophic factors such as
727 thrombospondins that promote synapse recovery and repair (Zamanian et al., 2012). A1
728 reactive astrocytes have been found in postmortem brain tissue from patients with
33
729 neurodegenerative diseases (AD, HD, PD), amyotrophic lateral sclerosis (ALS), and
730 multiple sclerosis (MS) (Liddelow et al., 2017). Furthermore, reactive astrocytes have also
731 been found neighboring tumor cells and might enhance their malignancy by inducing cell
732 proliferation and migration (Le et al., 2003; Biasoli et al., 2014). A better understanding of
733 the heterogeneity of astrocytes and their potential involvement in and contributions to
734 different diseases is needed to promote the development of treatments targeting specific
735 astrocyte subpopulations.
736 Altered astrocyte function is increasingly recognized as a factor contributing to
737 glioma and a number of neurodegenerative diseases. Given the glial-like histopathology of
738 glioma and its underlying cellular diversity, certain types of glioma might be composed of
739 malignant analogues of astrocyte subpopulations (Patel et al., 2014; Laug et al., 2018).
740 Thus, we surveyed multiple previously published astrocyte subpopulation datasets and
741 correlated our collection of astrocyte subpopulation gene signatures with those of various
742 subtypes of glioma and neurodegenerative diseases. Our results indicate that distinct
743 astrocyte gene signatures are correlated to glioma samples with specific genomic features
744 and somatic mutations. We correlated astrocyte gene signatures with copy-number
745 variations that are known to play a role in tumor proliferation. For example, amplifications
746 of the EGFR gene are known to lead to tumor growth through enhancement of cell
747 proliferation. MTAP and CDKN2A genes are frequently deleted in human cancers
748 (Mavrakis et al., 2016). A total of 18 astrocyte gene signatures were significantly correlated
749 with GBM samples harboring a combination of copy-number variations including EGFR
750 amplification, CDKN2A deletion, and MTAP deletion. Subpopulation C from all regions,
751 subpopulation A (brain stem and olfactory bulb), and subpopulation B (cortex and olfactory
752 bulb) were among the aforementioned gene signatures.
34
753 We also correlated astrocyte gene signatures with LGG samples. The correlation
754 between astrocyte gene signatures and LGG samples improved when IDH-1p/19q status
755 was considered than when histological class was accounted for, consistent with a previous
756 report (Cancer Genome Atlas Research Network et al., 2015). Our results indicate that
757 more than half of the gene signatures were positively correlated with samples classified
758 either as wild-type or mutated IDH with no 1p/19q codeletion. The high positive correlation
759 of gene profiles with wild-type and mutant IDH samples without the 1p/19q codeletion is
760 most likely due to the astrocyte origin of these gene signatures. Interestingly, most of the
761 astrocyte subpopulations from the olfactory bulb (A, B, D, E) were strongly correlated with
762 wild-type IDH. It is possible that astrocytomas acquire a signature that is most similar to
763 olfactory bulb astrocytes. Our results also show that amygdala gene signatures (AS1-AS3)
764 were strongly correlated with samples with mutant IDH and no 1p/19q codeletion. The
765 gene signatures of subpopulations D and E, cortex subpopulation A, cortex development
766 (A1 and A2), and cortex injury (B2) were most highly correlated with LGG samples
767 bearing the IDH mutation and the 1p/19q codeletion; they are also positively associated
768 with oligodendroglioma histology. This is very interesting because it provides potential
769 links between key genomic mutations and the resultant cellular phenotypes. Even though
770 the gene signatures were derived from astrocyte subpopulations, the correlation with
771 oligodendroglioma samples might indicate that low-grade oligodendrogliomas contain
772 proliferating glial progenitor cells that dedifferentiated from astrocyte subpopulations,
773 consistent with a previous report (Dai, 2001).
774 Survival analysis was conducted to determine whether enrichment scores of
775 astrocyte signatures are correlated with the survival probability of patients. Kaplan-Meier
776 plots suggest that patients with an astrocytic LGG and a gene profile similar to that of
35
777 astrocyte subpopulation A in brain stem are not correlated with mutations of IDH and have
778 a better prognosis. On the other hand, the survival of LGG patients associated with
779 subpopulation D and olfactory bulb subpopulation D gene signatures could possibly be
780 affected by mutations of IDH correlated with subtypes of these LGG samples.
781 The gene expression correlations found between astrocyte gene signatures and
782 TCGA GBM/LGG samples suggest that certain subtypes of glioma could potentially
783 originate from specific astrocyte subpopulations. Future experimentation will help testing
784 the gene expression correlations identified in this work to advance the understanding and
785 treatment of brain tumors. For example, interesting work has been done to generate RNA-
786 Seq transcriptional profiles from patient-derived glioma stem cells (Jin et al., 2017; Park et
787 al., 2017; Puchalski et al., 2018). Finding the enrichment of astrocyte subpopulation and
788 region gene signatures in patient-derived glioma stem cell models may pinpoint astrocyte
789 subpopulations which might be involved in gliomagenesis.
790 In addition to glioma, increasing evidence suggests that astrocytes dysfunction is
791 also involved in neurodegenerative diseases (Sofroniew and Vinters, 2010). Through
792 GSEA, we found that cortex and olfactory bulb gene profiles of subpopulations B and C
793 were enriched in AD, PD, and HD. GO terms related to mitochondrial respiratory chain
794 complex were significantly enriched with genes in the cortex region astrocyte gene
795 signature. As a result, astrocyte subpopulations in the cortex might be more vulnerable to
796 mitochondrial dysfunction that could be associated with neurodegenerative diseases. The
797 AD gene set was enriched with DE genes from cortex subpopulation B. Immunostaining
798 assays validated the protein expression of three highly upregulated genes in cortex
799 subpopulation B in two AD mouse models. Inner cortices of AD mice have significantly
800 more GFAP + astrocytes than do wild-type mouse brains. Because subpopulation B
36
801 astrocytes reside mostly in the inner cortex, we assumed that cortical GFAP+ cells belong
802 mainly to this subpopulation. GFAP + astrocytes in AD cortices also exhibit significantly
803 increased Adcy7 protein expression. Adcy7 is a membrane-bound enzyme that catalyzes
804 the formation of cyclic AMP (cAMP) from ATP (Crown Human Genome Center, 1997;
805 Safran et al., 2003). Increased immunostaining of cAMP co-immunolocalizes with beta-
806 amyloid proteins in cerebral cortical vessels of AD patients (Martínez et al., 2001).
807 Furthermore, significantly elevated levels of cAMP have been found in cerebrospinal fluids
808 of AD patients (Martínez et al., 1999). The above evidence suggests a potential role of
809 cAMP in enhancing the progression of AD. Our results indicate that cortex subpopulation
810 B might be involved in AD development as a response to an insult or stimuli. One of the
811 possible mechanisms by which cortex subpopulation B might contribute to the progression
812 of AD is through the upregulation of Adcy7, which would subsequently enhance the
813 synthesis of cAMP from ATP. Studies involving a decrease in the expression of Adcy7 in
814 cortex subpopulation B astrocytes from AD mice are required to better understand the
815 underlying pathways and mechanisms of AD development.
816 In summary, we have demonstrated that astrocyte subpopulations from diverse brain
817 regions have unique gene signatures for both protein-coding and lncRNA genes. Regional
818 astrocyte subpopulation gene signatures are enriched in different functional gene sets,
819 indicating their heterogeneity. We also obtained from the literature a comprehensive
820 collection of gene signatures of purified astrocyte subpopulations from diverse
821 developmental stages, brain regions, and conditions. Through analyses of gene set
822 enrichment, we found that gene signatures of specific astrocyte subpopulations correlated
823 with distinct glioma subtypes with unique genomic alterations. Additionally, we
824 demonstrated the association of different astrocyte gene signatures with neurodegenerative
37
825 diseases. We validated the upregulated protein expression of a set of genes in cortical
826 astrocytes in two mouse models of AD. Thus, our analysis indicates that certain subtypes of
827 glioma and neurodegenerative diseases are correlated to specific astrocyte subpopulations.
828 Targeting specific astrocyte subpopulations could present a novel strategy for treating these
829 devastating neurological diseases.
830
831 References
832 Anders S, Pyl PT, Huber W (2015) HTSeq--a Python framework to work with high-
833 throughput sequencing data. Bioinformatics 31:166–169 Available at:
834 http://www.ncbi.nlm.nih.gov/pubmed/25260700.
835 Armento A, Ilina EI, Kaoma T, Muller A, Vallar L, Niclou SP, Krüger MA, Mittelbronn M,
836 Naumann U (2017) Carboxypeptidase E transmits its anti-migratory function in
837 glioma cells via transcriptional regulation of cell architecture and motility regulating
838 factors. Int J Oncol 51:702–714 Available at:
839 http://www.ncbi.nlm.nih.gov/pubmed/28656234.
840 Barclay AN, van den Berg TK (2014) The Interaction Between Signal Regulatory Protein
841 Alpha (SIRP
842 Annu Rev Immunol 32:25–50 Available at:
843 http://www.annualreviews.org/doi/10.1146/annurev-immunol-032713-120142.
844 Barres BA (2008) The Mystery and Magic of Glia: A Perspective on Their Roles in Health
845 and Disease. Neuron 60:430–440 Available at:
846 http://linkinghub.elsevier.com/retrieve/pii/S0896627308008866.
847 Batista PJ, Chang HY (2013) Long noncoding RNAs: cellular address codes in
38
848 development and disease. Cell 152:1298–1307 Available at:
849 http://www.ncbi.nlm.nih.gov/pubmed/23498938.
850 Biasoli D, Sobrinho MF, da Fonseca ACC, de Matos DG, Romão L, de Moraes Maciel R,
851 Rehen SK, Moura-Neto V, Borges HL, Lima FRS (2014) Glioblastoma cells inhibit
852 astrocytic p53-expression favoring cancer malignancy. Oncogenesis 3:e123 Available
853 at: http://www.ncbi.nlm.nih.gov/pubmed/25329722.
854 Bowman RL, Wang Q, Carro A, Verhaak RGW, Squatrito M (2017) GlioVis data portal for
855 visualization and analysis of brain tumor expression datasets. Neuro Oncol 19:139–
856 141 Available at: http://www.ncbi.nlm.nih.gov/pubmed/28031383.
857 Cahoy JD, Emery B, Kaushal A, Foo LC, Zamanian JL, Christopherson KS, Xing Y,
858 Lubischer JL, Krieg PA, Krupenko SA, Thompson WJ, Barres BA (2008) A
859 Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New
860 Resource for Understanding Brain Development and Function. J Neurosci 28:264–278
861 Available at: http://www.jneurosci.org/cgi/doi/10.1523/JNEUROSCI.4178-07.2008.
862 Cancer Genome Atlas Research Network et al. (2015) Comprehensive, Integrative
863 Genomic Analysis of Diffuse Lower-Grade Gliomas. N Engl J Med 372:2481–2498
864 Available at: http://www.ncbi.nlm.nih.gov/pubmed/26061751.
865 Ceccarelli M et al. (2016) Molecular Profiling Reveals Biologically Discrete Subsets and
866 Pathways of Progression in Diffuse Glioma. Cell 164:550–563 Available at:
867 http://linkinghub.elsevier.com/retrieve/pii/S009286741501692X.
868 Chung W-S, Clarke LE, Wang GX, Stafford BK, Sher A, Chakraborty C, Joung J, Foo LC,
869 Thompson A, Chen C, Smith SJ, Barres BA (2013) Astrocytes mediate synapse
870 elimination through MEGF10 and MERTK pathways. Nature 504:394–400 Available
871 at: http://www.nature.com/articles/nature12776.
39
872 Crown Human Genome Center (1997) GeneCards Human Gene Database. Available at:
873 www.genecards.org [Accessed April 15, 2018].
874 Cuevas Diaz Duran R, Menon S, Wu J (2016) The Analyses of Global Gene Expression
875 and Transcription Factor Regulation. In, pp 1–35 Available at:
876 http://link.springer.com/10.1007/978-94-017-7450-5_1.
877 Dai C (2001) PDGF autocrine stimulation dedifferentiates cultured astrocytes and induces
878 oligodendrogliomas and oligoastrocytomas from neural progenitors and astrocytes in
879 vivo. Genes Dev 15:1913–1925 Available at:
880 http://www.genesdev.org/cgi/doi/10.1101/gad.903001.
881 Denis-Donini S, Glowinski J, Prochiantz A (1984) Glial heterogeneity may define the
882 three-dimensional shape of mouse mesencephalic dopaminergic neurones. Nature
883 307:641–643 Available at: http://www.ncbi.nlm.nih.gov/pubmed/6694754.
884 Dong X, Chen K, Cuevas-Diaz Duran R, You Y, Sloan SA, Zhang Y, Zong S, Cao Q,
885 Barres BA, Wu JQ (2015) Comprehensive Identification of Long Non-coding RNAs
886 in Purified Cell Types from the Brain Reveals Functional LncRNA in OPC Fate
887 Determination Wahlestedt C, ed. PLOS Genet 11:e1005669 Available at:
888 http://dx.plos.org/10.1371/journal.pgen.1005669.
889 Dong X, You Y, Wu JQ (2016) Building an RNA Sequencing Transcriptome of the Central
890 Nervous System. Neuroscientist 22:579–592 Available at:
891 http://www.ncbi.nlm.nih.gov/pubmed/26463470.
892 Doyle JP, Dougherty JD, Heiman M, Schmidt EF, Stevens TR, Ma G, Bupp S, Shrestha P,
893 Shah RD, Doughty ML, Gong S, Greengard P, Heintz N (2008) Application of a
894 Translational Profiling Approach for the Comparative Analysis of CNS Cell Types.
895 Cell 135:749–762 Available at:
40
896 http://linkinghub.elsevier.com/retrieve/pii/S0092867408013664.
897 Duran RC-D, Yan H, Zheng Y, Huang X, Grill R, Kim DH, Cao Q, Wu JQ (2017) The
898 systematic analysis of coding and long non-coding RNAs in the sub-chronic and
899 chronic stages of spinal cord injury. Sci Rep 7:41008 Available at:
900 http://www.ncbi.nlm.nih.gov/pubmed/28106101.
901 Frazee AC, Langmead B, Leek JT (2011) ReCount: a multi-experiment resource of
902 analysis-ready RNA-seq gene count datasets. BMC Bioinformatics 12:449 Available
903 at: http://www.ncbi.nlm.nih.gov/pubmed/22087737.
904 Gokce O, Stanley GM, Treutlein B, Neff NF, Camp JG, Malenka RC, Rothwell PE,
905 Fuccillo M V, Südhof TC, Quake SR (2016) Cellular Taxonomy of the Mouse
906 Striatum as Revealed by Single-Cell RNA-Seq. Cell Rep 16:1126–1137 Available at:
907 http://www.ncbi.nlm.nih.gov/pubmed/27425622.
908 Grass D, Pawlowski PG, Hirrlinger J, Papadopoulos N, Richter DW, Kirchhoff F,
909 Hülsmann S (2004) Diversity of functional astroglial properties in the respiratory
910 network. J Neurosci 24:1358–1365 Available at:
911 http://www.ncbi.nlm.nih.gov/pubmed/14960607.
912 Guttman M, Rinn JL (2012) Modular regulatory principles of large non-coding RNAs.
913 Nature 482:339–346 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22337053.
914 Hänzelmann S, Castelo R, Guinney J (2013) GSVA: gene set variation analysis for
915 microarray and RNA-seq data. BMC Bioinformatics 14:7 Available at:
916 http://www.ncbi.nlm.nih.gov/pubmed/23323831.
917 Hara M, Kobayakawa K, Ohkawa Y, Kumamaru H, Yokota K, Saito T, Kijima K,
918 Yoshizaki S, Harimaya K, Nakashima Y, Okada S (2017) Interaction of reactive
919 astrocytes with type I collagen induces astrocytic scar formation through the integrin-
41
920 N-cadherin pathway after spinal cord injury. Nat Med 23:818–828 Available at:
921 http://www.ncbi.nlm.nih.gov/pubmed/28628111.
922 Herculano-Houzel S (2014) The glia/neuron ratio: How it varies uniformly across brain
923 structures and species and what that means for brain physiology and evolution. Glia
924 62:1377–1391 Available at: http://doi.wiley.com/10.1002/glia.22683.
925 Hill SJ, Barbarese E, McIntosh TK (1996) Regional heterogeneity in the response of
926 astrocytes following traumatic brain injury in the adult rat. J Neuropathol Exp Neurol
927 55:1221–1229 Available at: http://www.ncbi.nlm.nih.gov/pubmed/8957445.
928 Hung T et al. (2011) Extensive and coordinated transcription of noncoding RNAs within
929 cell-cycle promoters. Nat Genet 43:621–629 Available at:
930 http://www.ncbi.nlm.nih.gov/pubmed/21642992.
931 Jin X, Kim LJY, Wu Q, Wallace LC, Prager BC, Sanvoranart T, Gimple RC, Wang X,
932 Mack SC, Miller TE, Huang P, Valentim CL, Zhou Q-G, Barnholtz-Sloan JS, Bao S,
933 Sloan AE, Rich JN (2017) Targeting glioma stem cells through combined BMI1 and
934 EZH2 inhibition. Nat Med 23:1352–1361 Available at:
935 http://www.ncbi.nlm.nih.gov/pubmed/29035367.
936 John Lin C-C, Yu K, Hatcher A, Huang T-W, Lee HK, Carlson J, Weston MC, Chen F,
937 Zhang Y, Zhu W, Mohila CA, Ahmed N, Patel AJ, Arenkiel BR, Noebels JL,
938 Creighton CJ, Deneen B (2017) Identification of diverse astrocyte populations and
939 their malignant analogs. Nat Neurosci 20:396–405 Available at:
940 http://www.nature.com/articles/nn.4493.
941 Jolliffe IT (2002) Principal Component Analysis,. New York: Springer.
942 Kassambara A, Kosinski M BP (2017) survminer: Drawing Survival Curves using
943 “ggplot2.”
42
944 Klein RS, Das B, Fricker LD (1992) Secretion of carboxypeptidase E from cultured
945 astrocytes and from AtT-20 cells, a neuroendocrine cell line: implications for
946 neuropeptide biosynthesis. J Neurochem 58:2011–2018 Available at:
947 http://www.ncbi.nlm.nih.gov/pubmed/1573389.
948 Laug D, Glasgow SM, Deneen B (2018) A glial blueprint for gliomagenesis. Nat Rev
949 Neurosci Available at: http://www.nature.com/articles/s41583-018-0014-3.
950 Le DM, Besson A, Fogg DK, Choi K-S, Waisman DM, Goodyer CG, Rewcastle B, Yong
951 VW (2003) Exploitation of Astrocytes by Glioma Cells to Facilitate Invasiveness: A
952 Mechanism Involving Matrix Metalloproteinase-2 and the Urokinase-Type
953 Plasminogen Activator–Plasmin Cascade. J Neurosci 23:4034–4043 Available at:
954 http://www.jneurosci.org/lookup/doi/10.1523/JNEUROSCI.23-10-04034.2003.
955 Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP
956 (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27:1739–1740
957 Available at: http://www.ncbi.nlm.nih.gov/pubmed/21546393.
958 Liddelow SA et al. (2017) Neurotoxic reactive astrocytes are induced by activated
959 microglia. Nature 541:481–487 Available at:
960 http://www.nature.com/articles/nature21029.
961 Lin C-LG, Kong Q, Cuny GD, Glicksman MA (2012) Glutamate transporter EAAT2: a
962 new target for the treatment of neurodegenerative diseases. Future Med Chem 4:1689–
963 1700 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22924507.
964 Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion
965 for RNA-seq data with DESeq2. Genome Biol 15:550 Available at:
966 http://www.ncbi.nlm.nih.gov/pubmed/25516281.
967 Martínez M, Fernández E, Frank A, Guaza C, de la Fuente M, Hernanz A (1999) Increased
43
968 cerebrospinal fluid cAMP levels in Alzheimer’s disease. Brain Res 846:265–267
969 Available at: http://www.ncbi.nlm.nih.gov/pubmed/10556645.
970 Martínez M, Hernández AI, Hernanz A (2001) Increased cAMP immunostaining in
971 cerebral vessels in Alzheimer’s disease. Brain Res 922:148–152 Available at:
972 http://www.ncbi.nlm.nih.gov/pubmed/11730714.
973 Mavrakis KJ et al. (2016) Disordered methionine metabolism in MTAP/CDKN2A-deleted
974 cancers leads to dependence on PRMT5. Science (80- ) 351:1208–1213 Available at:
975 http://www.sciencemag.org/lookup/doi/10.1126/science.aad5944.
976 Morel L, Chiang MSR, Higashimori H, Shoneye T, Iyer LK, Yelick J, Tai A, Yang Y
977 (2017) Molecular and Functional Properties of Regional Astrocytes in the Adult Brain.
978 J Neurosci 37:8706–8717 Available at:
979 http://www.ncbi.nlm.nih.gov/pubmed/28821665.
980 Noristani HN, Sabourin JC, Boukhaddaoui H, Chan-Seng E, Gerber YN, Perrin FE (2016)
981 Spinal cord injury induces astroglial conversion towards neuronal lineage. Mol
982 Neurodegener 11:68 Available at: http://www.ncbi.nlm.nih.gov/pubmed/27716282.
983 Park NI et al. (2017) ASCL1 Reorganizes Chromatin to Direct Neuronal Fate and Suppress
984 Tumorigenicity of Glioblastoma Stem Cells. Cell Stem Cell 21:209–224.e7 Available
985 at: http://www.ncbi.nlm.nih.gov/pubmed/28712938.
986 Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, Cahill DP,
987 Nahed B V, Curry WT, Martuza RL, Louis DN, Rozenblatt-Rosen O, Suvà ML,
988 Regev A, Bernstein BE (2014) Single-cell RNA-seq highlights intratumoral
989 heterogeneity in primary glioblastoma. Science 344:1396–1401 Available at:
990 http://www.ncbi.nlm.nih.gov/pubmed/24925914.
991 Polito VA, Li H, Martini-Stoica H, Wang B, Yang L, Xu Y, Swartzlander DB, Palmieri M,
44
992 di Ronza A, Lee VM-Y, Sardiello M, Ballabio A, Zheng H (2014) Selective clearance
993 of aberrant tau proteins and rescue of neurotoxicity by transcription factor EB. EMBO
994 Mol Med 6:1142–1160 Available at: http://www.ncbi.nlm.nih.gov/pubmed/25069841.
995 Puchalski RB et al. (2018) An anatomic transcriptional atlas of human glioblastoma.
996 Science 360:660–663 Available at: http://www.ncbi.nlm.nih.gov/pubmed/29748285.
997 Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32
998 Suppl:496–501 Available at: http://www.ncbi.nlm.nih.gov/pubmed/12454644.
999 R Core Team (2015) A language and environment for statistical computing.
1000 Rusnakova V, Honsa P, Dzamba D, Ståhlberg A, Kubista M, Anderova M (2013)
1001 Heterogeneity of astrocytes: from development to injury - single cell gene expression.
1002 PLoS One 8:e69734 Available at: http://www.ncbi.nlm.nih.gov/pubmed/23940528.
1003 Safran M, Chalifa-Caspi V, Shmueli O, Olender T, Lapidot M, Rosen N, Shmoish M, Peter
1004 Y, Glusman G, Feldmesser E, Adato A, Peter I, Khen M, Atarot T, Groner Y, Lancet
1005 D (2003) Human Gene-Centric Databases at the Weizmann Institute of Science:
1006 GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 31:142–146 Available
1007 at: http://www.ncbi.nlm.nih.gov/pubmed/12519968.
1008 Saito T, Matsuba Y, Mihira N, Takano J, Nilsson P, Itohara S, Iwata N, Saido TC (2014)
1009 Single App knock-in mouse models of Alzheimer’s disease. Nat Neurosci 17:661–663
1010 Available at: http://www.ncbi.nlm.nih.gov/pubmed/24728269.
1011 Sloan SA, Darmanis S, Huber N, Khan TA, Birey F, Caneda C, Reimer R, Quake SR,
1012 Barres BA, Paşca SP (2017) Human Astrocyte Maturation Captured in 3D Cerebral
1013 Cortical Spheroids Derived from Pluripotent Stem Cells. Neuron 95:779–790.e6
1014 Available at: https://linkinghub.elsevier.com/retrieve/pii/S0896627317306839.
1015 Sofroniew M V, Vinters H V (2010) Astrocytes: biology and pathology. Acta Neuropathol
45
1016 119:7–35 Available at: http://www.ncbi.nlm.nih.gov/pubmed/20012068.
1017 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich
1018 A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment
1019 analysis: A knowledge-based approach for interpreting genome-wide expression
1020 profiles. Proc Natl Acad Sci 102:15545–15550 Available at:
1021 http://www.pnas.org/cgi/doi/10.1073/pnas.0506580102.
1022 Therneau TM (2015) A Package for Survival Analysis in S.
1023 Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-
1024 Seq. Bioinformatics 25:1105–1111 Available at:
1025 http://www.ncbi.nlm.nih.gov/pubmed/19289445.
1026 Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL,
1027 Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of
1028 RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562–578 Available at:
1029 http://www.ncbi.nlm.nih.gov/pubmed/22383036.
1030 Verhaak RGW et al. (2010) Integrated genomic analysis identifies clinically relevant
1031 subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR,
1032 and NF1. Cancer Cell 17:98–110 Available at:
1033 http://www.ncbi.nlm.nih.gov/pubmed/20129251.
1034 Wang Q et al. (2018) Tumor Evolution of Glioma-Intrinsic Gene Expression Subtypes
1035 Associates with Immunological Changes in the Microenvironment. Cancer Cell
1036 33:152 Available at: http://www.ncbi.nlm.nih.gov/pubmed/29316430.
1037 White RE, McTigue DM, Jakeman LB (2010) Regional heterogeneity in astrocyte
1038 responses following contusive spinal cord injury in mice. J Comp Neurol 518:1370–
1039 1390 Available at: http://www.ncbi.nlm.nih.gov/pubmed/20151365.
46
1040 Wu J ed. (2016) Transcriptomics and Gene Regulation. Dordrecht: Springer Netherlands.
1041 Available at: http://link.springer.com/10.1007/978-94-017-7450-5.
1042 Wu Y, Singh S, Georgescu M-M, Birge RB (2005) A role for Mer tyrosine kinase in
1043 alphavbeta5 integrin-mediated phagocytosis of apoptotic cells. J Cell Sci 118:539–553
1044 Available at: http://www.ncbi.nlm.nih.gov/pubmed/15673687.
1045 Wu YE, Pan L, Zuo Y, Li X, Hong W (2017) Detecting Activated Cell Populations Using
1046 Single-Cell RNA-Seq. Neuron 96:313–329.e6 Available at:
1047 http://www.ncbi.nlm.nih.gov/pubmed/29024657.
1048 Yeh T-H, Lee DY, Gianino SM, Gutmann DH (2009) Microarray analyses reveal regional
1049 astrocyte heterogeneity with implications for neurofibromatosis type 1 (NF1)-
1050 regulated glial proliferation. Glia 57:1239–1249 Available at:
1051 http://doi.wiley.com/10.1002/glia.20845.
1052 Zamanian JL, Xu L, Foo LC, Nouri N, Zhou L, Giffard RG, Barres BA (2012) Genomic
1053 analysis of reactive astrogliosis. J Neurosci 32:6391–6410 Available at:
1054 http://www.ncbi.nlm.nih.gov/pubmed/22553043.
1055 Zeisel A, Muñoz-Manchado AB, Codeluppi S, Lönnerberg P, La Manno G, Juréus A,
1056 Marques S, Munguba H, He L, Betsholtz C, Rolny C, Castelo-Branco G, Hjerling-
1057 Leffler J, Linnarsson S (2015) Brain structure. Cell types in the mouse cortex and
1058 hippocampus revealed by single-cell RNA-seq. Science 347:1138–1142 Available at:
1059 http://www.ncbi.nlm.nih.gov/pubmed/25700174.
1060 Zhang Y, Barres BA (2010) Astrocyte heterogeneity: an underappreciated topic in
1061 neurobiology. Curr Opin Neurobiol 20:588–594 Available at:
1062 http://www.ncbi.nlm.nih.gov/pubmed/20655735.
1063 Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP,
47
1064 Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R,
1065 Maniatis T, Barres BA, Wu JQ (2014) An RNA-sequencing transcriptome and
1066 splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci
1067 34:11929–11947 Available at: http://www.ncbi.nlm.nih.gov/pubmed/25186741.
1068 Zhang Y, Sloan SA, Clarke LE, Caneda C, Plaza CA, Blumenthal PD, Vogel H, Steinberg
1069 GK, Edwards MSB, Li G, Duncan JA, Cheshier SH, Shuer LM, Chang EF, Grant GA,
1070 Gephart MGH, Barres BA (2016) Purification and Characterization of Progenitor and
1071 Mature Human Astrocytes Reveals Transcriptional and Functional Differences with
1072 Mouse. Neuron 89:37–53 Available at:
1073 http://linkinghub.elsevier.com/retrieve/pii/S0896627315010193.
1074
1075 Figure Legends
1076 Figure 1. Principal Component Analysis (PCA) performed using log2-transformed
1077 FPKM of 4092 genes differentially expressed (DE) between regions.
1078 (A) Scores for principal components 1 (PC1) and 2 (PC2) are depicted. PC1 separates
1079 samples into astrocyte subpopulations and non-astrocyte populations. Circular and elliptical
1080 figures enclose samples according to subpopulation. (B) The plot of principal components 3
1081 (PC3) and 4 (PC4) separates samples into brain regions, as indicated by the elliptical
1082 figures. (C, D, E) Differentially expressed genes were ranked using the scores of PC1, PC2,
1083 and PC3, respectively, and the top enriched gene sets were obtained through Gene Set
1084 Enrichment Analysis (GSEA).
1085 Figure 2. Gene signatures defined by astrocyte subpopulations and brain regions.
1086 (A) Heatmap showing the log2-transformed FPKM values of cortex subpopulation B-
1087 enriched genes (protein-coding and lncRNAs) compared to other subpopulations. DE genes
48
1088 were obtained by performing pairwise comparisons of astrocyte subpopulations in each
1089 brain region to their corresponding non-astrocyte samples. Selection criteria: FPKM > 1,
1090 fold-change > 2, and FDR < 10%. (B) Heatmap illustrating upregulated DE lncRNA genes
1091 from astrocyte regional subpopulation gene signatures. (C) The top five significantly
1092 enriched gene sets obtained using the gene signatures of cortex subpopulation B (top panel)
1093 and cortex subpopulation C (lower panel). Bar plots indicate the number of genes found in
1094 the enriched gene set. The yellow line illustrates the gene set enrichment using -log10-
1095 transformed FDR. (D) Heatmap depicting the region-specific astrocyte gene signatures. DE
1096 genes were obtained by comparing all samples from the same brain region to the remaining
1097 regions (FPKM > 1, fold-change > 4, and p-value < 0.05) and to their corresponding non-
1098 astrocyte samples (FPKM > 1, fold-change > 2, and p-value < 0.05). Colors of all heatmaps
1099 represent log2 transformed FPKM values. See also Extended Data Figures 2-1 to 2-4.
1100 Figure 3. Correlation between upregulated astrocyte gene signatures and TCGA
1101 GBM samples
1102 (A) Heatmap displaying the hierarchical clustering of enrichment scores obtained through
1103 Gene Set Variation Analysis (GSVA). High scores indicate strong positive correlation
1104 between astrocyte gene signatures and gene expression profiles of 103 TCGA GBM
1105 samples with a tumor purity > 70%. Upper bars indicate tumor subtype and the presence or
1106 absence of an EGFR gene amplification.
1107 (B) Heatmap representing the enrichment scores between astrocyte gene signatures and
1108 Classical GBM samples that carry an EGFR gene amplification. Bar plots indicate the
1109 number of samples with positive enrichment score for each astrocyte gene signature.
1110 (C) Exploratory analysis of the correlation between astrocyte gene signatures and GBM
1111 samples with different subtypes, somatic mutations, and copy-number variations. The
49
1112 heatmap depicts the -log10-transformed p-value of either an ANOVA comparison between
1113 GBM subtypes or a Wilcoxon Rank-Sum test between samples carrying the selected
1114 mutations, amplifications, and deletions. For viewing clarity, only 21 representative
1115 astrocyte gene signatures are shown in each panel. Figures depicting all astrocyte gene
1116 signatures are available in Extended Data Figures 3-3 and 3-4. See also Extended Data
1117 Figures 3-1 to 3-6.
1118 Figure 4. Correlation between upregulated astrocyte gene signatures and TCGA LGG
1119 samples
1120 (A) Heatmap displaying the hierarchical clustering of enrichment scores obtained through
1121 Gene Set Variation Analysis (GSVA). High scores indicate strong positive correlation
1122 between astrocyte gene signatures and 160 TCGA LGG samples with tumor purity > 70%.
1123 Upper bars indicate tumor histology and known co-ocurring genomic alterations: IDH
1124 mutation combined with 1p/19q codeletion subtype (IDH.codel.subtype), mutations in
1125 TERT promoter (TERT.promoter.status), TP53 mutations, and ATRX mutations. (B)
1126 Exploratory analysis of the correlation between astrocyte gene signatures (y-axis) and LGG
1127 grades, histological subtypes, somatic mutations, and copy-number variations (x-axis). The
1128 heatmap depicts the -log10-transformed p-value of either an ANOVA comparison between
1129 LGG subtypes or a Wilcoxon Rank-Sum test between samples carrying the selected
1130 mutations, amplifications, and deletions. (C) Heatmap depicting the proportion of LGG
1131 samples with specific IDH status variants that had positive enrichment scores for every
1132 astrocyte gene signature. For viewing clarity, only 21 representative astrocyte gene
1133 signatures are shown in each panels B and C. Displays depicting complete astrocyte gene
1134 signatures are available in Extended Data Figure 4-1. See also Extended Data Figures 3-1,
1135 3-2, 3-6, 4-1, 4-2, and 4-3.
50
1136 Figure 5. Astrocyte subpopulation gene signatures are associated with survival time
1137 differences in TCGA LGG astrocytoma patients.
1138 Figure shows Kaplan-Meier plots and log-rank test p-values of sample groups stratified
1139 according to High and Low enrichment scores for (A) brain stem subpopulation A, (B)
1140 cortex subpopulation A and age, (c) subpopulation D, (D) cortex subpopulation D and age,
1141 and (E) olfactory bulb subpopulation D and age.
1142 Figure 6. Protein expression of candidate genes in the brain tissues from the 5xFAD
1143 Alzheimer’s disease mouse model.
1144 Brain tissues collected from 5xFAD mouse models were prepared for immunofluorescence
1145 using anti-GFAP (green) and either (A) anti-Adcy7, (B) anti-Serping1, or (C) anti-Emp1.
1146 The regions outlined with a square are displayed at higher magnification on the right, and
1147 the arrows point to the same cells for comparison in the fluorescent images. For the
1148 astrocytes in the inner layer of cortex (the area near the corpus callosum), the proportion of
1149 GFAP+ cells displaying red fluorescence was calculated, and the intensity of red
1150 fluorescence overlapping with GFAP signals was measured. Two-tailed unpaired Student’s
1151 t-tests were used to compare AD samples to the wild-type group. (D) Co-immunostaining
1152 of cortical brain tissues was performed with anti-GFAP, anti-Serping1, and anti-Emp1.
1153 *p < 0.05, **p < 0.01, ***p < 0.001. Scale bar in panels A-D, 50 μm. See also Extended
1154 Data Figures 6-1, 6-2, and 6-3.
1155
1156 Extended Data
1157 Figure 2-1. Gene expression as FPKM and normalized read count matrices across
1158 astrocyte subpopulations.
51
1159 Figure 2-2. Comparison of log-transformed fold-changes and q-values between DE
1160 lncRNA and protein-coding genes from different astrocyte subpopulation and region
1161 gene signatures. Blue circles represent DE lncRNAs and red circles represent DE protein-
1162 coding genes. Using an unpaired two-tailed t-test we observed that the fold-changes of DE
1163 lncRNAs (blue) are statistically higher (p-value < 0.05) than those of protein-coding genes
1164 (red).
1165 Figure 2-3. Gene set enrichment using astrocyte subpopulation regional gene
1166 signatures. Bar plots indicate the number of genes found in the enriched gene set. The
1167 yellow line illustrates the gene set enrichment using -log10-transformed FDR.
1168 Figure 2-4. Enrichment scores obtained from the gene set enrichment analysis using regional
1169 astrocyte subpopulation gene signatures. Scores represent FDR transformed into -log10 scale.
1170 The file also contains additional gene sets obtained from published studies related to
1171 neurodegenerative diseases. Gene sets were added to the MsigDB collection and used for
1172 gene set enrichment analysis.
1173 Figure 3-1. Collection of 42 astrocyte gene signatures used in downstream analysis.
1174 Figure 3-2. Heatmap displaying the Jaccard index of genes shared between astrocyte
1175 gene signatures.
1176 Figure 3-3. Heatmap displaying the hierarchical clustering of enrichment scores
1177 obtained through Gene Set Variation Analysis (GSVA) and GBM samples.
1178 High scores indicate strong positive correlation between astrocyte gene signatures and gene
1179 expression profiles of 103 TCGA GBM samples with a tumor purity > 70%. Upper bars
1180 indicate tumor subtype and the presence or absence of an EGFR gene amplification.
1181 Figure 3-4. Correlation between upregulated astrocyte gene signatures and TCGA
1182 GBM samples. (A) Analysis of the correlation between astrocyte gene signatures and
52
1183 GBM samples with different subtypes, somatic mutations, and copy-number variations. The
1184 heatmap depicts the -log10-transformed p-value of either an ANOVA comparison between
1185 GBM subtypes or a Wilcoxon Rank-Sum test between samples carrying the selected
1186 mutations, amplifications, and deletions. (B) Heatmap representing the enrichment scores
1187 between astrocyte gene signatures and Classical GBM samples that carry an EGFR gene
1188 amplification. Bar plots indicate the number of samples with positive enrichment score for
1189 each astrocyte gene signature.
1190 Figure 3-5. Correlation of region-specific astrocyte subpopulation enrichment scores
1191 with 103 TCGA Glioblastoma samples with tumor purity > 70%. A Wilcoxon Rank-
1192 Sum test was used to compare the distribution of enrichment scores between samples with
1193 and without amplification of EGFR. AMP = amplification of EGFR; No AMP = no
1194 amplification of EGFR found. ***p-value < 0.001, **p-value < 0.01, * p-value < 0.05.
1195 Figure 3-6. GSVA enrichment scores obtained between the 42 astrocyte subpopulation
1196 gene signatures and TCGA GBM or LGG samples. Correlation matrices of -log10
1197 transformed p-values are also included.
1198 Figure 4-1. Correlation between upregulated astrocyte gene signatures and TCGA
1199 LGG samples. (A) Analysis of the correlation between astrocyte gene signatures (y-axis)
1200 and LGG grades, histological subtypes, somatic mutations, and copy-number variations (x-
1201 axis). The heatmap depicts the -log10-transformed p-value of either an ANOVA
1202 comparison between LGG subtypes or a Wilcoxon Rank-Sum test between samples
1203 carrying the selected mutations, amplifications, and deletions. (B) Heatmap depicting the
1204 proportion of LGG samples with specific IDH status variants that had positive enrichment
1205 scores for every astrocyte gene signature.
53
1206 Figure 4-2. Heatmaps depicting GSVA enrichment score matrices between astrocyte
1207 gene signatures and TCGA LGG samples. Samples with (A) wild-type IDH1, (B) mutant
1208 IDH1 with 1p/19q codeletion, and (C) mutant IDH1 without 1p/19q codeletion are
1209 compared.
1210 Figure 4-3. Heatmaps depicting GSVA enrichment score matrices between astrocyte
1211 gene signatures and TCGA LGG samples with different features. Samples with (A)
1212 astrocytoma histology and wild-type IDH1or mutant IDH1 without 1p/19q codeletion, and
1213 (B) oligodendroglioma histology and mutant IDH1 with 1p/19q codeletion, mutant TERT
1214 promoter and mutations in CIC gene are compared.
1215 Figure 6-1. Violin plots representing normalized count distributions. Normalized count
1216 distributions of (A) the 97 upregulated genes from the cortex subpopulation B gene
1217 signature found in the Alzheimer’s disease gene set, (B) the 87 upregulated genes from the
1218 cortex subpopulation B gene signature found in the Huntington’s disease gene set, and (C)
1219 the 11 upregulated genes from the cortex subpopulation C gene signature found in the
1220 Parkinson’s disease gene set are shown in violin plots.
1221 Figure 6-2. Protein expression of candidate genes in the brain tissues from the NLGF
1222 Alzheimer’s disease mouse model.
1223 (A) AD pathology was confirmed in the cortex as an increase in GFAP+ astrocytes as well
1224 as detectable hypertrophy in 5xFAD and NLGF AD mouse models. Brain tissues collected
1225 from NLGF mouse models were prepared for immunofluorescence using anti-GFAP
1226 (green) and either (B) anti-Adcy7, (C) anti-Serping1, or (D) anti-Emp1. The regions
1227 outlined with a square are displayed at higher magnification on the right, and the arrows
1228 point to the same cells for comparison in the fluorescent images. For the astrocytes in the
1229 inner layer of cortex (the area near the corpus callosum), the proportion of GFAP+ cells
54
1230 displaying red fluorescence was calculated, and the intensity of red fluorescence
1231 overlapping with GFAP signals was measured. Two-tailed unpaired Student’s t tests were
1232 used to compare AD samples to the wild-type group. (E) Co-immunostaining of cortical
1233 brain tissues was performed with anti-GFAP, anti-Serping1, and anti-Emp1.
1234 *p < 0.05, **p < 0.01, ***p < 0.001. Scale bar in panels A-D, 50 μm.
1235 Figure 6-3. Significantly enriched gene sets related to neurodegenerative diseases (AD, HD,
1236 and PD). Gene set enrichment was obtained with differentially expressed genes specific for
1237 each regional astrocyte subpopulation.
1238
55