This Accepted Manuscript has not been copyedited and formatted. The final version may differ from this version. A link to any extended data will be provided when the final version is posted online.

Research Article: New Research | Disorders of the Nervous System

Brain region-specific signatures revealed by distinct astrocyte subpopulations unveil links to glioma and neurodegenerative diseases

Raquel Cuevas Diaz Duran1,2,3, Chih-Yen Wang4, Hui Zheng5,6,7, Benjamin Deneen8,9,10,11 and Jia Qian Wu1,2

1The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA 2Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX 77030, USA 3Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Ave. Morones Prieto 3000, Monterrey, N.L., 64710, Mexico 4Department of Life Sciences, National Cheng Kung University, Tainan City, 70101, Taiwan 5Huffington Center on Aging, Baylor College of Medicine, Houston, TX 77030, USA 6Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA 7Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA 8Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX 77030, USA 9Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA 10Neurological Research Institute at Texas’ Children’s Hospital, Baylor College of Medicine, Houston, TX 77030, USA 11Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, USA https://doi.org/10.1523/ENEURO.0288-18.2019

Received: 19 July 2018 Revised: 16 January 2019 Accepted: 12 February 2019 Published: 7 March 2019

J.Q.W. and B.D. designed research; J.Q.W., R.C.-D.D., C.-Y.W., and B.D. wrote the paper; R.C.-D.D. and C.- Y.W. performed research; R.C.-D.D. analyzed data; H.Z. and B.D. contributed unpublished reagents/analytic tools. Funding: http://doi.org/10.13039/100000002HHS | National Institutes of Health (NIH) R01 NS088353 1RF1AG054111 R01NS071153 Funding: The Staman Ogilvie Fund-Memorial Hermann Foundation Funding: Mission Connect Funding: http://doi.org/10.13039/100000890National Multiple Sclerosis Society (National MS Society) RG-1501-02756

Authors declare no competing financial interests.

Correspondence should be addressed to Benjamin Deneen, [email protected] or Jia Qian Wu, [email protected]

Cite as: eNeuro 2019; 10.1523/ENEURO.0288-18.2019

Accepted manuscripts are peer-reviewed but have not been through the copyediting, formatting, or proofreading process.

Copyright © 2019 Cuevas- et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed. Alerts: Sign up at www.eneuro.org/alerts to receive customized email alerts when the fully formatted version of this article is published.

1 Brain region-specific gene signatures revealed by distinct astrocyte 2 subpopulations unveil links to glioma and neurodegenerative diseases 3 4 Abbreviated Title: Astrocyte signatures linked to glioma and diseases 5 6 Raquel Cuevas-Diaz Duran1-3, Chih-Yen Wang4, Hui Zheng5-7, Benjamin Deneen8-11*, and 7 Jia Qian Wu1,2* 8 9 1 The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The 10 University of Texas Health Science Center at Houston, Houston, TX 77030, USA 11 2 Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of 12 Molecular Medicine, Houston, TX 77030, USA 13 3 Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Ave. Morones 14 Prieto 3000, Monterrey, N.L., 64710, Mexico 15 4 Department of Life Sciences, National Cheng Kung University, Tainan City, 70101, 16 Taiwan 17 5 Huffington Center on Aging, Baylor College of Medicine, Houston, TX 77030, USA 18 6 Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, 19 USA 20 7 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 21 77030, USA 22 8 Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX 77030, 23 USA 24 9 Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA 25 10 Neurological Research Institute at Texas’ Children’s Hospital, Baylor College of 26 Medicine, Houston, TX 77030, USA 27 11 Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, 28 USA 29 30 Author contributions 31 JQW and BD designed research. RCDD and CYW performed research. RCDD analyzed 32 data. HZ and BD contributed reagents/materials. RCDD, CYW, BD, and JQW participated 33 in writing the manuscript. All authors have read and approved the final manuscript.

1

34 35 Corresponding authors 36 * To whom correspondence should be addressed. Tel: +1 (713) 500-3421; Fax: +1 (713) 37 500-2424; Email: [email protected] 38 39 Number of 40 Figures: 6 41 Tables: 4 42 Multimedia: 0 43 Extended Data Figures: 11 44 Extended Data Tables: 5 45 46 Number of words in 47 Abstract: 206 48 Significance Statement: 64 49 Introduction: 748 50 Discussion: 2067 51 52 Acknowledgements 53 The authors would like to thank Shun-Fen Tzeng for supporting Chih-Yen and 54 Ms. Mary Ann Cushman for editing the manuscript. 55 56 Conflicts of Interest 57 Authors declare no competing financial interests. 58 59 Funding sources 60 JQW and RCDD were supported by grants from the National Institutes of Health R01 61 NS088353; The Staman Ogilvie Fund-Memorial Hermann Foundation; Mission Connect, a 62 program of the TIRR Foundation. BD was supported by grants from the National Institutes 63 of Health 1RF1AG054111 and R01NS071153; National MS Society RG-1501-02756. 64

2

65 Abstract

66 Currently, there are no effective treatments for glioma or for neurodegenerative diseases

67 due, in part, to our limited understanding of the pathophysiology and cellular heterogeneity

68 of these diseases. Mounting evidence suggests that astrocytes play an active role in the

69 pathogenesis of these diseases by contributing to a diverse range of pathophysiological

70 states. In a previous study, five molecularly distinct astrocyte subpopulations from three

71 different brain regions were identified. To further delineate the underlying diversity of

72 these populations, we obtained mouse brain region-specific gene signatures for both

73 -coding and long non-coding RNA (lncRNA) and found that these astrocyte

74 subpopulations are endowed with unique molecular signatures across diverse brain regions.

75 Additional gene set and single-sample enrichment analyses revealed that gene signatures of

76 different subpopulations are differentially correlated with glioma tumors that harbor distinct

77 genomic alterations. To the best of our knowledge, this is the first study that links

78 transcriptional profiles of astrocyte subpopulations with glioma genomic mutations.

79 Furthermore, our results demonstrated that subpopulations of astrocytes in select brain

80 regions are associated with specific neurodegenerative diseases. Overall, the present study

81 provides a new perspective for understanding the pathophysiology of glioma and

82 neurodegenerative diseases and highlights the potential contributions of diverse astrocyte

83 populations to normal, malignant, and degenerative brain functions.

84

85 Significance Statement

86 Mounting evidence suggests that astrocytes play an active role in disease

87 pathogenesis by contributing to a diverse range of pathophysiological states. The present

88 study identifies new layers of astrocyte diversity and new correlations between distinct

3

89 astrocyte subpopulations and disease states. Understanding the heterogeneity of astrocytes

90 in different physiological conditions and diseases will provide new avenues for improved

91 diagnostics and therapeutics targeting specific astrocyte subpopulations.

92

93 Introduction

94 Astrocytes make up about 40% of all cells in the human brain (Herculano-Houzel, 2014;

95 Zhang et al., 2016) and play essential roles in brain function by controlling extracellular

96 neurotransmitter and potassium ion levels, regulating the blood-brain-barrier, and

97 promoting synapse formation and function (Barres, 2008). Previously, astrocytes were

98 thought to be a homogeneous population of cells that tile the CNS and support neuronal

99 survival. However, recent genome-wide gene expression studies have shown that astrocytes

100 are highly heterogeneous (Cahoy et al., 2008; Doyle et al., 2008; Yeh et al., 2009). A recent

101 study demonstrated astrocyte heterogeneity in the brain by comparing gene expression

102 profiles of FACS-sorted Aldh1l1-GFP+ subpopulations and identified five astrocyte

103 subpopulations (A-E) across three brain regions (olfactory bulb, cortex, and brain stem).

104 Moreover, these astrocyte subpopulations exhibited functional differences related to

105 proliferation, migration, and synapse formation. (Lin et al., 2017).

106 Functional heterogeneity of astrocytes has been described under both normal

107 physiological conditions and diverse disease states (Denis-Donini et al., 1984; Hill et al.,

108 1996; Grass et al., 2004; White et al., 2010; Morel et al., 2017). During disease progression,

109 astrocyte populations respond to pathological conditions through a transformation called

110 reactive astrogliosis. Morphological and gene expression differences have been observed in

111 reactive astrocytes from different brain regions during the progression of neurodegenerative

112 diseases (Hill et al., 1996; Liddelow et al., 2017). Nervous system malignancies also have

4

113 astroglial origins. Gliomas, the most common primary malignancies in the CNS, are a

114 heterogeneous group of tumors characterized by their resemblance to glia. A parallel

115 between brain development and glioma tumorigenesis is apparent in that certain astrocyte

116 subpopulations are analogous to those that populate human glioma (Lin, et al 2017).

117 Despite the discovery of these links between diverse astrocyte populations and disease

118 states, how distinct subpopulations of astrocytes contribute to certain neurological disease

119 remains poorly defined. For example, given the plethora of genomic data available for

120 human glioma, whether key genetic mutations associated with glioma are also associated

121 with specific astrocyte/glioma subpopulation remains unknown. Thus, one of the goals of

122 this study is to identify new correlations between diverse astrocyte subpopulations and

123 distinct disease states.

124 In addition to correlations with disease states, another goal of this study is to unearth

125 additional layers of astrocyte heterogeneity. A novel study identified five distinct

126 subpopulations of astrocytes across three brain regions (Lin, et al. 2017). However,

127 astrocyte subpopulations between different brain regions were not compared, and the

128 expression of long non-coding RNAs (lncRNAs), a type of regulatory RNA that has been

129 shown to play important roles in CNS development, plasticity, and disease (Hung et al.,

130 2011; Guttman and Rinn, 2012; Batista and Chang, 2013; Zhang et al., 2014; Dong et al.,

131 2016) was not investigated. Thus, the other goal of this study is to further delineate the

132 underlying molecular diversity of astrocytes from existing datasets.

133 Towards the overarching goal of decoding the nature of astrocyte diversity, we used

134 advanced comparative bioinformatics to obtain region-specific (e.g., cortex-specific) and

135 regional subpopulation-enriched (e.g., cortex subpopulation A-enriched) astrocyte gene

136 signatures. In addition to these 15 astrocyte regional subpopulations (five subpopulations in

5

137 three brain regions), we also collected gene expression data sets for other astrocyte

138 subpopulations from different developmental stages and injury models (Rusnakova et al.,

139 2013; Zeisel et al., 2015; Gokce et al., 2016; Noristani et al., 2016; Hara et al., 2017; Wu et

140 al., 2017). Our analysis included a total of 42 astrocyte gene signatures. We implemented

141 an unsupervised single-sample enrichment method to correlate astrocyte subpopulation

142 gene signatures with The Cancer Genome Atlas (TCGA) lower-grade glioma (LGG) and

143 glioblastoma (GBM) samples including clinical sample features and key mutations. The

144 results show that specific astrocyte subpopulation gene signatures are more highly

145 correlated with certain glioma subtypes and genomic variants. For example, astrocyte

146 subpopulations B and C in all brain regions are significantly correlated with amplification

147 of the gene encoding EGFR (epidermal growth factor receptor). Additionally, astrocyte

148 subpopulations D and E were among the gene signatures highly correlated with LGG

149 samples bearing both mutation in IDH gene and 1p/19q codeletion. Furthermore,

150 correlations between astrocyte subpopulations and neurodegenerative diseases identified a

151 collection of that we validated in AD mouse models. Together, the present study

152 identifies new layers of astrocyte diversity, while making critical new connections between

153 these populations and neurological disease including glioma and neurodegeneration. Thus,

154 understanding astrocyte subpopulations has a broad utility that spans normal, malignant,

155 and degenerative brain functions, and will open new avenues for the development of

156 improved diagnostics and therapeutics to target specific astrocyte subpopulations.

157

158 Materials and Methods

159 Astrocyte subpopulation- and brain region-specific gene signatures

6

160 RNA-Seq datasets for five astrocyte subpopulations (A, B, C, D, E) from three brain

161 regions (olfactory bulb, brain stem, and cortex) were downloaded from the data repository

162 for a previous publication (GSE72826) (Lin et al., 2017). Data were obtained from both

163 male and female mice. For each brain region, non-astrocyte samples (Aldh1l1-GFP- cells)

164 were also downloaded. Briefly, reads were mapped to the mm10 mouse reference genome

165 downloaded from GENCODE (https://www.gencodegenes.org/) using TopHat v2.1.0

166 (Trapnell et al., 2009). Mapped reads were assembled using Cufflinks v2.2.1 (Trapnell et

167 al., 2012) and Fragments per Kilobase of transcript per Million mapped reads (FPKM)

168 values were obtained for both protein-coding and lncRNA genes using a published pipeline

169 (Dong et al., 2015; Cuevas Diaz Duran et al., 2016; Wu, 2016; Duran et al., 2017). Our

170 annotation file included 21,948 protein-coding genes (55,252 transcripts) and 29,232

171 lncRNA genes (49,189 transcripts). Any value of FPKM < 0.1 was set to 0.1 to avoid ratio

172 inflation (Quackenbush, 2002). Additionally, read counts for all annotated genes and

173 transcripts were calculated using HTSeq-count (Anders et al., 2015). FPKM and

174 normalized read count matrices are included in Extended Data Figure 2-1.

175 To obtain regional subpopulation-enriched gene signatures (e.g., for cortex

176 subpopulation A), we compared astrocyte subpopulations to non-astrocytes within each

177 region in a pairwise mode. Comparisons were performed using DESeq2 (Love et al., 2014)

178 with normalized counts. A binary results matrix was created with three rows (regions) and

179 five columns (subpopulations). Each cell in the matrix represented a comparison of an

180 astrocyte subpopulation from a specific region against the non-astrocyte sample in the same

181 region. These were referred to as regional subpopulation-enriched gene signatures. Genes

182 were considered significant if they were expressed (FPKM > 1) in at least one of the

183 replicates with an expression fold-change > 2 and FDR < 0.1%. The log-transformed fold-

7

184 changes and q-values of differentially expressed protein-coding and lncRNA genes from

185 different astrocyte subpopulations and regions were compared using a t-test (p-value <

186 0.05). (The details of all statistical analyses are listed in Table 1).

187 In order to identify region-specific gene signatures, we compared all astrocyte

188 samples from one region to astrocyte samples from the remaining two regions, yielding a

189 total of three comparisons. Comparisons were performed using DESeq2 (Love et al., 2014)

190 with normalized counts. Additionally, astrocyte samples from the same region were

191 compared to their corresponding non-astrocyte samples. A gene was differentially

192 expressed (DE) with FPKM > 1 in at least one of the replicates in each comparison, with an

193 expression fold-change > 4 at p-value < 0.05, and a fold-change against non-astrocyte

194 samples > 2 at p-value < 0.05.

195 Principal Components Analysis (PCA)

196 Principal Components Analysis (PCA) is an unsupervised, exploratory, and multivariate

197 statistical technique used to reduce the dimensionality of complex data sets while retaining

198 most of the variation (Jolliffe, 2002). PCA identifies vectors, or principal components (PC),

199 for which the variation in the data is maximal. Because each PC is a linear combination of

200 the original variables, it is possible to identify a biological interpretation of each component

201 that explains the differences between the samples. When using PCA in genome-wide

202 expression studies, the data set generally consists of a gene expression matrix with genes as

203 rows and samples as columns.

204 PCA was implemented by first generating a gene expression matrix of log2-

205 transformed FPKM values of DE genes for all the samples and their replicates, with regions

206 as rows and samples as columns, to yield a 4092 x 36 matrix. The “prcomp” R function (R

207 Core Team, 2015), which computes the singular value decomposition of the gene

8

208 expression matrix, was then implemented. Samples were considered as variables and PCs

209 that captured most of the variability in the samples were obtained.

210 To gain insight into the biological significance of the identified PCs and their

211 underlying regulatory factors, Gene Set Enrichment Analysis (GSEA) (Subramanian et al.,

212 2005) was performed. DE genes were ranked by the scores assigned according to each PC.

213 Using the MsigDB gene set database (Liberzon et al., 2011), the top enriched gene sets with

214 FDR < 0.01 and absolute normalized enrichment score (NES) > 2 were deemed significant.

215 Collection of astrocyte gene signatures

216 In addition to the 15 astrocyte regional subpopulations (five subpopulations in three brain

217 regions) previously described, we collected 25 gene expression data sets for other astrocyte

218 subpopulations from diverse studies in which astrocytes were purified using single-cell

219 microfluidics (Zeisel et al., 2015; Gokce et al., 2016; Wu et al., 2017), FACS-sorting

220 (Rusnakova et al., 2013; Noristani et al., 2016), or laser microdissection (Hara et al., 2017).

221 These diverse astrocyte subpopulations consisted of purified cells from brain (amygdala,

222 striatum, hippocampus, and cortex), different developmental stages, and from spinal cord

223 injury models. We used astrocyte gene signatures including both protein-coding and

224 lncRNA genes except those gene signatures obtained from literature where lncRNA

225 information was not reported. A brief description of each of these gene profiles is included

226 in Extended Data Figure 3-1. To determine the similarities between astrocyte gene

227 signatures, we calculated the Jaccard index. Indexes higher than 0.7 were considered

228 significant.

229 Gene set variation analysis (GSVA)

230 Glioma RNA-Seq data (counts) were downloaded from the Recount2 database (Frazee et

231 al., 2011). Clinical data from TCGA GBM dataset was collected from the Broad Institute

9

232 GDAC Firehose website (version 2016_01_28) and from the GlioVis website (Bowman et

233 al., 2017). For TCGA LGG dataset, the genomic data from the supplementary materials of

234 the PanGlioma TCGA paper were collected (Ceccarelli et al., 2016). Our analysis included

235 sample features comprising gene expression, histology, subtype, somatic mutations, copy-

236 number variations, 1p/19q codeletions (defined below), and other clinical features.

237 To compare astrocyte subpopulations and glioma samples, we adopted the GSVA approach

238 (Hänzelmann et al., 2013). GSVA is an unsupervised, non-parametric method used to

239 evaluate the degree to which genes in a gene signature are coordinately up- or

240 downregulated within a particular biological sample. The GSVA R package was used to

241 calculate an enrichment score for each astrocyte gene signature across individual glioma

242 samples and frequently encountered somatic mutations to determine whether an astrocyte

243 subpopulation is associated with those glioma subtypes or genomic variations. The

244 genomic features used for GBM samples included transcriptional subtype; mutations in

245 genes encoding TP53, EGFR, PTEN, NF1, PIK3R1, RB1, ATRX, IDH1, APOB, or

246 PDGFRA; amplifications of genes encoding EGFR, PDGFRA, or CDK4; and deletions of

247 genes encoding MTAP, PTEN, QKI, or RB1. Similarly, for LGG, the genomic features

248 queried included tumor grade, histological subtype, mutation in the IDH gene, codeletion of

249 1p/19q loci, mutations in genes encoding TP53, EGFR, PTEN, ATRX, IDH1, IDH2, CIC,

250 NOTCH1, FUBP1, PIK3CA, NF1, PIK3R1, SMARCA4, ARID1A, TCF12, ZBTB20,

251 PTPN11, PLCG1, or ZCCHC12; deletions of genes encoding PTEN, NF1, CDKN2C, and

252 CDKN2A; and amplifications of genes encoding PIK3CA, PIK3C2B, PDGFRA, MDM4,

253 MDM2, EGFR, or CDK4.

254 To evaluate whether the differences among groups of samples with different

255 genomic features (such as subtype, mutations, and copy-number variations) could be

10

256 explained using GSVA enrichment scores, ANOVA and Wilcoxon Rank-Sum tests were

257 conducted. Each genomic feature was correlated to all of the gene signatures by comparing

258 enrichment scores between groups of samples formed according to the different levels of

259 the genomic feature. A correlation matrix of transformed p-values was obtained for GBM

260 and LGG samples. Correlations were deemed significant with a p-value < 0.05 (FDR < 0.25

261 using the Benjamini & Hochberg adjustment for multiple comparisons). GSVA enrichment

262 score matrices and correlation matrices of transformed p-values are included in Extended

263 Data Figure 3-6.

264 Survival analysis

265 Correlations between astrocyte gene signatures and patient survival were calculated to

266 conduct Kaplan-Meier survival analyses. Clinical data was parsed to obtain patients’ age,

267 vital status (Dead/Alive), and months to death or months from most recent follow-up

268 depending on the patients’ vital status. Patient samples were filtered to use only those with

269 tumor purity greater than 70%. Classical GBM samples with EGFR amplification and LGG

270 samples with astrocytoma histology were used for survival analyses. For each astrocyte

271 gene signature, samples were categorized into two groups: (1) High (enrichment scores > 0)

272 and (2) Low (enrichment scores < 0). Kaplan-Meier survival plots were generated for each

273 astrocyte gene signature using the “survival” (Therneau, 2015) and “survminer”

274 (Kassambara A, Kosinski M, 2017) R libraries. The survival curves of samples with High

275 and Low enrichment scores were compared using log-rank tests. In order to determine the

276 combined survival effect of score enrichment and age, we applied multivariate Kaplan-

277 Meier survival analysis. For example, to estimate the combined effect of an astrocyte gene

278 signature and age, the samples were first divided according to whether they had High or

279 Low enrichment scores. Then within each group, samples were further stratified according

11

280 to the age at which the tumor was diagnosed. Samples were grouped by age as either

281 younger or older than 60 years (for GBM samples), or as 45 years (for LGG samples). Age

282 thresholds were obtained by creating a histogram of the “age at diagnosis” and selecting the

283 age at which the distribution is divided into two balanced sub-distributions representing the

284 younger and the older groups. The p-value obtained from the log-rank test was used to

285 indicate the statistical significance of correlations in survival between groups. Furthermore,

286 to determine the association of IDH mutations with the survival of samples correlated with

287 astrocyte gene signatures we constructed contingency tables and performed Fisher’s tests.

288 Associations with p-values < 0.05 were considered significant.

289 Gene set enrichment analysis

290 A comprehensive collection of gene sets was downloaded from the Molecular Signatures

291 Database (MsigDB) (Liberzon et al., 2011). Additional gene sets were obtained from

292 neurodegenerative disease studies in which gene expression was assessed (Extended Data

293 Figure 2-4). A hypergeometric statistical test (phyper R function) was used to determine

294 gene set enrichment using the DE gene list for each astrocyte subpopulation, region, and

295 regional subpopulation. Gene sets were considered enriched with FDR < 0.05 and gene

296 number > 5.

297 Immunofluorescence

298 Brain tissues from 5xFAD (Polito et al., 2014) and NLGF mice (Saito et al., 2014) (gifts

299 from Dr. Zheng Hui, Huffington Center on Aging, Baylor College of Medicine) were fixed

300 in 4% paraformaldehyde and sectioned at 20-μm thickness. The brain tissues were

301 permeabilized with 0.3% Triton™ X-100 (Thermo Fisher) in PBS and blocked with 2.5%

302 horse serum (Vector) for 1 h at room temperature. The sections were then incubated in PBS

303 with 0.1% Triton X-100 containing chicken anti-GFAP (1:500; Abcam) combined with

12

304 either rabbit anti-Adcy7 (1:100; Bioss), mouse anti-Serping1 (1:50; Santa Cruz), or rabbit

305 anti-Emp1 (1:50; Abcam) overnight at 4 °C. After incubation with secondary antibodies

306 conjugated with Alexa Fluor 488 and Alexa Fluor 568 or Alexa Fluor 647 (1:500;

307 Invitrogen) for 1 h at room temperature, the tissues were counterstained with DAPI (Sigma)

308 solution and mounted with mounting media (Vector).

309 Statistical analysis of immunostained images

310 Images of cortices were collected from at least three different tissues for each group, and

311 fluorescence intensity was measured using ImageJ software (NIH). Cells with positive

312 immunosignals over GFAP+ cells were counted in each image, which included

313 approximately 50 GFAP+ cells, sufficient for performing calculations. All of these data

314 were presented as mean ±SE and were analyzed to determine statistically significant

315 differences at p-value < 0.05 using the two-tailed unpaired Student’s t test.

316 Table 1. List of figures for each experiment indicating data structure, statistical tests

317 applied, and significance levels.

Figure Data structure Type of test Power Notes

Figure Gene Set Enrichment Kolmogorov- FDR < 0.01 |NES| > 2

1C, 1D, Analysis (GSEA) Smirnov

1E

Figure Differential DESeq2 FDR < 0.1 FPKM > 1,

2A, 2B expression analysis generalized normalized count

linear model fold-change > 2

(GLM)

13

Figure Gene set enrichment Hypergeometric FDR < 0.05 Gene number > 5

2C

Figure Differential DESeq2 p-value < 0.05 FPKM > 1,

2D expression analysis generalized normalized

linear model counts, fold-

(GLM) change > 4

(compared to

astrocyte

subpopulations

from other

regions),

Fold-change > 2

(compared to

non-astrocyte

sample).

Extended Normal distribution Unpaired p-value < 0.05

Data Student’s t-test

Figure 2-

2

Figure Correlations ANOVA, p-value < 0.05

3C, 4B Wilcoxon Rank- (FDR < 0.25)

Sum tests

Figure 5 Kaplan-Meier Log-rank test p-value < 0.05

14

survival plots Fisher’s test

Figure Immunostained two-tailed p-value < 0.05,

6A-C, images unpaired p-value < 0.01,

Extended Student’s t test p-value < 0.001

Data

Figure 6-

2B-D

318

319 Results

320 Transcription profiles of Aldh1l1-GFP+ astrocyte subpopulations demonstrate local

321 and regional heterogeneity

322 To determine whether identified astrocyte subpopulations demonstrate additional

323 heterogeneity across brain regions, we leveraged existing data sets (Lin, et al, 2017) and

324 adopted a pairwise comparison strategy using normalized counts. Previously, five different

325 Aldh1l1-GFP + astrocyte subpopulations were isolated from three brain regions (olfactory

326 bulb, cortex, and brain stem). In that study, transcripts from astrocyte subpopulations and

327 from the Aldh1l1-GFP- reference population were sequenced and subpopulation-specific

328 gene signatures were obtained (Lin et al., 2017). To include lncRNAs in the gene

329 signatures, we re-mapped the purified astrocyte subpopulations from different brain regions

330 and their corresponding non-astrocyte samples to the mm10 mouse reference genome using

331 a published pipeline (Duran et al., 2017). For gene quantification we used a comprehensive

332 protein-coding and lncRNA annotation file including 21,948 protein-coding genes (55,252

333 transcripts) and 29,232 lncRNA genes (49,189 transcripts).

15

334 We implemented Principal Components Analysis (PCA) using a gene expression

335 matrix consisting of differentially expressed (DE) genes between regions as rows and

336 samples with replicates as columns, yielding a 4092 x 36 matrix. We hypothesized that

337 each principal component (PC) is associated with an underlying factor regulating gene

338 expression and that such factors might explain the variability between subpopulations and

339 regions. PCA analysis resulted in 36 uncorrelated and orthogonal PCs that account for the

340 variability in the gene expression matrix. The first four components explain 82.3% of the

341 variability of these samples. To gain biological insights for the selected PCs, we plotted the

342 samples in their transformed component space. Figure 1A depicts the plotted samples in the

343 2D-space formed by PC1 and PC2. PC1 explains 68.98% of the variability and clusters the

344 samples into two main groups: Non-astrocyte (shapes without filling) and Astrocyte (color-

345 filled shapes) subpopulations. The majority of samples clustered well; only a few outliers

346 appeared in the non-astrocyte sample space. Similarly, the combined effect of PC1 and PC2

347 is helpful for discriminating among subpopulations, as shown by the elliptical subspaces

348 (Figure 1A) that enclose shapes mostly of the same color. In order to cluster samples by

349 region, we plotted PC3 and PC4, as shown in Figure 1B. The elliptical subspaces in the

350 figure demonstrate the separation of samples into the olfactory bulb, cortex, and brain stem

351 regions.

352 To validate the grouping of samples accomplished with PCA, we performed GSEA

353 using differentially expressed (DE) genes ranked by the scores assigned in each PC. For

354 PC1, the top enriched gene sets were GO_MYELIN_SHEATH and

355 LEIN_ASTROCYTE_MARKERS (Figure 1C). The top-enriched gene sets for PC2 are

356 GO_SYNAPTIC_SIGNALING and VERHAAK_GLIOBLASTOMA_MESENCHYMAL

357 (Figure 1D). The top-enriched gene sets for PC3 are

16

358 GO_MICROTUBULE_ASSOCIATED_COMPLEX and

359 OXIDATIVE_PHOSPHORYLATION (Figure 1E). Astrocyte subpopulation B (combined

360 from all three brain regions) is mostly correlated with the mesenchymal glioblastoma gene

361 signature, as the enriched genes in this gene set (e.g., Scpep1, Rac2, Blcrb, Mrc2, Cd14,

362 and C5ar, among others) are upregulated in this subpopulation.

363 Differentially expressed protein-coding genes and lncRNAs depict regional

364 subpopulation-specific gene signatures and gene sets

365 To obtain regional subpopulation enriched genes, we implemented a pairwise comparison

366 strategy using normalized counts. For each brain region, we compared the diverse astrocyte

367 subpopulations against their corresponding non-astrocyte samples. Genes with FPKM > 1,

368 fold-change > 2, and FDR < 10% were considered to be differentially expressed (DE). The

369 number of unique DE genes per subpopulation and brain region are listed in Table 2.

370 Significant numbers of DE lncRNAs contribute to the regional subpopulation gene

371 signatures. Figure 2A shows a heatmap of the log2-transformed FPKM values of cortex

372 subpopulation B-enriched genes (protein-coding and lncRNAs) compared to other

373 subpopulations. The heatmap in Figure 2B illustrates the log2-transformed FPKM values of

374 the 317 regional subpopulation upregulated lncRNA genes. After using a statistical test

375 (unpaired two-tailed t-test with p-value < 0.05), we observed that the fold-changes of DE

376 lncRNAs from cortex subpopulations B-E, and olfactory bulb subpopulations A, B, C, and

377 E had higher fold-change than their corresponding DE protein-coding genes. Extended Data

378 Figure 2-2 shows the comparison between fold-changes and q-values of DE protein-coding

379 and lncRNA genes from different astrocyte subpopulations and regions.

380 Table 2. Regional subpopulation-specific DE protein-coding and lncRNA genes.

17

Brain Region

Subpopulation Type Olfactory Brain Cortex Bulb Stem

Protein- UP 86 5 29

coding DOWN 51 0 6 PopA UP 12 1 6 lncRNA DOWN 5 0 1

Protein- UP 294 690 55

coding DOWN 207 324 4 PopB UP 42 138 19 lncRNA DOWN 18 23 0

Protein- UP 268 271 197

coding DOWN 75 94 21 PopC UP 20 15 6 lncRNA DOWN 10 11 15

Protein- UP 28 5 101

coding DOWN 16 1 18 PopD UP 16 4 20 lncRNA DOWN 1 1 6

Protein- UP 68 43 0

coding DOWN 289 18 0 PopE UP 35 13 0 lncRNA DOWN 27 2 0

381

382 A set of gene signature DE genes with highest expression in cortex subpopulation B

383 compared to cortex non-astrocytes include, for example, MER Proto-Oncogene Tyrosine

384 Kinase (Mertk), Ras Homolog Family Member B (Rhob), and Signal Regulatory Protein

18

385 Alpha (Sirpa). (see discussion). Similarly, a set of DE lncRNAs in cortex subpopulation B

386 with the highest expression compared to non-astrocyte samples include, for example,

387 Gm37524,Gm3764, and Junos. We used DE genes to find the enrichment of gene sets. The

388 top five significantly enriched gene sets found in cortex subpopulation B are listed in

389 Figure 2C (upper panel) and they include “BLALOCK_ALZHEIMERS_DISEASE_UP”,

390 “GO_CELLULAR RESPONSE_TO_CYTOKINE_STIMULUS”,

391 “GO_CELLULAR_REPONSE_TO_STRESS”, “GO_WOUND_HEALING”, and

392 “GO_BIOLOGICAL_ADHESION”.

393 A set of gene signature DE protein-coding genes with highest expression in cortex

394 subpopulation C include, for example, Glial High Affinity Glutamate Transporter (Slc2a1),

395 Carboxypeptidase E (Cpe), and Transmembrane Protein 47 (Tmem47). DE lncRNAs

396 enriched in cortex subpopulation C include Gm13872, Gm26672, and Gm37885. The top-

397 enriched gene sets with DE genes found in cortex subpopulation C are shown in Figure 2C

398 (lower panel). Enriched gene sets include

399 “GO_GENERATION_OR_PRECURSOR_METABOLITES_AND_ENERGY”,

400 “GO_MITOCHONDRIAL_ATP_SYNTHESIS_COUPLED_PROTON_TRANSPORT”,

401 “KEGG_PARKINSONS_DISEASE”, and “KEGG_HUNTINGTONS_DISEASE”.

402 Extended Data Figure 2-3 depicts examples of the top-enriched gene sets specific to each

403 regional astrocyte subpopulation. The complete list of gene sets and their enrichment scores

404 can be found in Extended Data Figure 2-4.

405 To evaluate regional differences in the various astrocyte subpopulations, we

406 compared all astrocyte subpopulations from one particular region against astrocyte samples

407 from the remaining regions (FPKM > 1, fold-change > 4, p-value < 0.05). To eliminate the

408 effect of genes also expressed in non-astrocyte subpopulations, we compared the region-

19

409 specific astrocyte subpopulations against their corresponding non-astrocyte samples

410 (FPKM > 1, fold-change > 2, p-value < 0.05). The differential expression of a gene was

411 considered significant if it met the criteria for both comparisons (region-specific and

412 astrocyte-specific). Table 3 shows the number of DE protein-coding and lncRNAs

413 identified from each brain region. Figure 2D depicts the log2-transformed FPKM values of

414 DE genes from all brain regions. Insulin-like Growth Factor 1 (Igf1), Flavin-containing

415 Monooxygenase 1 (Fmo1), and Brain-specific Angiogenesis Inhibitor 1 (Bai1 or Adgrb1).

416 are among the top DE protein-coding genes with highest expression in olfactory bulb

417 astrocyte subpopulations. Membrane Frizzled-related Protein (Mfrp), Integrin Subunit

418 Alpha 9 (Itga9), and Transmembrane Protein 218 (Tmem218) are within the top DE

419 protein-coding genes in brain stem astrocyte subpopulations. Similarly, Scavenger Receptor

420 Class A Member 3 (Scara3), Leucine Rich Repeat Containing 10B (Lrrc10b), and Cristallin

421 Mu (Crym) are included in the top DE protein-coding genes obtained from astrocyte

422 subpopulations of the cortex.

423 Table 3. Number of region-specific DE protein-coding and lncRNA genes.

Unique DE genes Total DE Brain Region Protein-coding lncRNA genes UP DOWN UP DOWN

Olfactory Bulb (ob) 35 13 15 1 64

Cortex (ctx) 6 7 3 0 16

Brain Stem (bs) 21 8 5 3 37

424

425 Upregulated astrocyte gene signatures of several subpopulations and regions correlate

426 with glioblastoma samples

20

427 To be comprehensive in studying astrocyte heterogeneity, in addition to the five astrocyte

428 subpopulations from three brain regions described, we also collected gene expression data

429 sets for other astrocyte subpopulations from diverse studies in which astrocytes were

430 purified through using single-cell microfluidics (Zeisel et al., 2015; Gokce et al., 2016; Wu

431 et al., 2017), FACS-sorting (Rusnakova et al., 2013; Noristani et al., 2016), or laser

432 microdissection (Hara et al., 2017). The gene signatures collected were derived from

433 diverse subpopulations of purified astrocytes from brain (amygdala, striatum, hippocampus,

434 and cortex), different developmental stages, and from spinal cord injury (SCI) models.

435 Only protein-coding genes were provided in these datasets. These astrocyte subpopulation

436 gene signatures were combined with the regional astrocyte subpopulation specific

437 signatures described previously, yielding a total of 42 gene signatures (Extended Data

438 Figure 3-1) of which 17 included both protein-coding and lncRNA genes. In each gene

439 signature profile, genes are significantly upregulated relative to the non-astrocyte sample,

440 the remaining astrocyte subpopulations, or both. To determine the similarity among gene

441 signatures, the Jaccard indexes were calculated and a heatmap was built (Extended Data

442 Figure 3-2). The Jaccard index indicates the percent of similarity due to the overlap of

443 genes in gene signatures. The highest similarity index between the gene signatures obtained

444 from literature and our 15 astrocyte gene signatures was 7% (olfactory bulb subpopulation

445 C and HIPPOCAMPUS_TOP240), indicating that they are quite different. The inclusion of

446 other datasets from the literature provided additional information. As observed in the

447 aforementioned heatmap, similarity indexes between the majority of the gene signatures are

448 low. Interestingly, the similarity index between cortex injury and cortex development gene

449 signatures was the highest (Jaccard index = 0.87).

21

450 We adopted the Gene Set Variation Analysis (GSVA) approach (Hänzelmann et al.,

451 2013) to determine the degree of similarity between astrocyte subpopulations and glioma

452 samples. GSVA is an unsupervised non-parametric method for evaluating the degree to

453 which genes in a gene signature are coordinately up- or downregulated within a certain

454 biological sample. We used GSVA with the 42 astrocyte subpopulation gene signatures as

455 gene sets and obtained enrichment score matrices for glioblastoma and lower-grade glioma

456 from TCGA. The enrichment score of a gene signature may be positive or negative and it

457 provides evidence of the coordinated up or downregulation of the members of that gene

458 signature in a particular sample. Enrichment scores are obtained without prior knowledge

459 of the sample phenotype. In our analysis, a positive enrichment score indicates a correlation

460 between a glioma sample and the specific astrocyte subpopulation from which the gene

461 signature was derived.

462 The heatmap in Figure 3A shows the resulting enrichment scores of the GSVA

463 between 103 TCGA GBM samples with a tumor purity greater than 70% and 21

464 representative astrocyte gene signatures due to limited space. The enrichment score

465 heatmap of TCGA GBM samples with all 42 astrocyte gene signatures is included in

466 Extended Data Figure 3-3. The 103 TCGA GBM samples represented 47, 22, and 34 cases

467 of classical, mesenchymal, and proneural transcriptional subtypes, respectively, as

468 previously defined (Wang et al., 2018). Nearly 77% (36 out of 47) of the glioblastoma

469 samples within the classical subtype exhibited EGFR gene amplification. The top five

470 positively enriched gene signatures with the highest number of classical subtype samples

471 carrying an EGFR amplification were amygdala_top50, striatum_top50,

472 hippocampus_top240, PopC_olfactory_bulb, and PopA_brain stem. For each gene

473 signature, the number of samples with positive scores are shown in Figure 3B. For viewing

22

474 clarity, Figures 3B and 3C display results using 21 representative astrocyte gene signatures.

475 Please refer to Extended Data Figure 3-4 for displays using all astrocyte gene signatures.

476 To evaluate whether the differences among groups of samples with different

477 genomic features (such as subtype, mutations, and copy-number variations) could be

478 explained with GSVA enrichment scores, ANOVA and Wilcoxon Rank-Sum tests were

479 performed. Each genomic feature was correlated to all gene signatures by comparing

480 enrichment scores between samples grouped according to the different levels of the

481 genomic features. The correlation matrix of genomic features and 21 representative

482 signature scores is depicted in Figure 3C. Correlations were deemed significant with a p-

483 value < 0.05. Significant positive correlations with the Classical GBM subtype samples

484 were found in 11 out of 42 astrocyte gene signatures including brain stem subpopulation A,

485 olfactory bulb (subpopulations A, C, and D), amygdala, striatum, and hippocampus.

486 Classical GBM subtype had the highest number of positively correlated astrocyte gene

487 signatures, consistent with a previous report (Verhaak et al., 2010). Mesenchymal and

488 Proneural GBM subtypes had either negative or no significant correlations. No significant

489 positive correlations were found for GBM samples with TP53 mutations.

490 A large proportion of astrocyte gene signatures were significantly positively

491 correlated with samples carrying amplifications of the gene encoding EGFR (62% 26 out of

492 42) and deletions of the genes encoding CDKN2A and MTAP (50% and 45%). Gene

493 signatures of subpopulations A, B, and C, brain stem (subpopulations A, C), olfactory bulb

494 (subpopulations A, B, C, E), cortex (subpopulations B, C), amygdala, striatum,

495 hippocampus, reactive astrocytes (RA), and scar-forming astrocytes (SA) in SCI had

496 significant positive correlations with GBM samples with a combined set of genomic

497 features that included amplification of the EGFR gene and deletions of the CDKN2A and

23

498 MTAP genes. Boxplots depicting the correlation of regional astrocyte subpopulation gene

499 signatures with GBM samples with EGFR amplification is included in Extended Data

500 Figure 3-5. Gene signatures of subpopulations A and C from brain stem and olfactory bulb,

501 and from cortex subpopulation B, were positively correlated with EGFR amplifications

502 with p-value < 0.001.

503 Upregulated astrocyte gene signatures in several subpopulations and regions correlate

504 with LGG samples

505 A total of 160 LGG samples (tumor purity > 70%) from TCGA were used for analysis of

506 correlation with astrocyte gene signatures. The histological classifications of the LGG

507 samples used included 44 astrocytomas, 45 oligoastrocytomas, and 70 oligodendrogliomas.

508 Figure 4A shows the enrichment scores obtained using GSVA. The upper section of the

509 heatmap shows the classification of different co-ocurring genomic alterations previously

510 shown to be related to glioma histology (Cancer Genome Atlas Research Network et al.,

511 2015). Two groups are formed corresponding mostly of oligodendroglioma (right side) and

512 a mixture enriched in astrocytoma and oligoastrocytoma histologies (left side). As

513 previously demonstrated (Cancer Genome Atlas Research Network et al., 2015), the

514 mixture of astrocytoma and oligoastrocytoma samples is characterized modestly by

515 mutations of IDH without codeletions of 1p/19q loci, TP53 mutations, and ATRX

516 mutations. The oligodendroglioma group is enriched in samples carrying mutations of IDH

517 gene combined with codeletions of 1p/19q loci, and mutations in the TERT promoter. A

518 cluster of positive high enrichment scores between LGG samples and astrocyte gene

519 signatures is observed on the left side of the heatmap corresponding mostly to the mixture

520 of astrocytoma and oligoastrocytoma histologies. Brain stem subpopulation A, olfactory

521 bulb subpopulation D, and RA SCI astrocyte gene signatures were significantly correlated

24

522 with astrocytoma LGG samples. Amygdala (AS1-AS3) and cortex development gene

523 signatures were correlated with oligoastrocytoma histology. Subpopulations D and E,

524 cortex subpopulation A, brain stem subpopulation D, cortex injury (B2), and cortex

525 development gene signatures were correlated with oligodendroglioma. The heatmap in

526 Figure 4B shows the correlation matrix between 21 astrocyte gene signatures and LGG

527 samples with diverse genomic features due to limited space. Extended Data Figure 4-1

528 includes a heatmap representing the correlation between LGG samples and all 42 astrocyte

529 gene signatures. Approximately 14% and 7% of the 42 astrocyte gene signatures were

530 positively correlated with grade 2 and grade 3 LGG samples, respectively.

531 Mutation in the gene encoding IDH1 or IDH2 and complete deletion of both the

532 short arm of 1 and the long arm of chromosome 19 (1p/19q codeletion) are

533 markers frequently used in clinical practice for classification of LGG samples. Both are

534 indicators of favorable prognosis. We used IDH status as a genomic feature with the

535 following categories: wild-type IDH, mutant IDH with codeletion of 1p/19q, and mutant

536 IDH with no codeletion of 1p/19q. Approximately 45% and 40% of the gene signatures

537 were positively correlated with wild-type IDH samples and mutant IDH samples with no

538 1p/19q codeletion, respectively. Subpopulation B, brain stem (subpopulations A, B),

539 olfactory bulb (subpopulations A, B, D, E), cortex (subpopulations B, E), hemisection,

540 transection, hippocampus (ASTRO1), and RA SCI astrocyte gene signatures correlated

541 significantly with both wild-type IDH and mutant IDH samples with no1p/19q codeletion.

542 Figure 4C shows a heatmap displaying the proportion of samples featuring each IDH status

543 variant with positive enrichment scores for 21 representative astrocyte gene signatures

544 (Extended Data Figure 4-1 displays all 42 astrocyte gene signatures). Extended Data Figure

545 4-2:A-C shows the enrichment score matrices obtained with each IDH status variant. For

25

546 wild-type IDH, olfactory bulb (subpopulations A, B, D, and E), cortex (subpopulations B

547 and D), and brain stem subpopulation A gene signatures have positive enrichment scores

548 with the highest proportion of samples. Mutant IDH samples without the 1p/19q codeletion

549 had high enrichment scores with amygdala gene signatures (AS1-AS3). Mutant IDH

550 samples carrying the 1p/19q codeletion had high enrichment scores with subpopulations D

551 and E, cortex subpopulation A, cortex injury (B2), cortex development (A1, A2), and SA

552 SCI gene signatures. Table 4 shows the distribution of histology class and IDH status of the

553 160 LGG samples used in our analysis. The wild-type IDH and mutant IDH without 1p/19q

554 codeletion groups consisted mostly of astrocytomas (44% and 45%), whereas mutant IDH

555 samples with the 1p/19q codeletion were mostly oligodendrogliomas (80%). Extended Data

556 Figure 4-3A shows the enrichment scores for LGG samples with an astrocytic histology

557 and wild-type IDH or mutant IDH with no 1p/19q codeletion. Brain stem subpopulation A

558 and olfactory bulb subpopulations D and E are among the top five gene sets with positive

559 enrichment scores that have the highest number of samples with the aforementioned

560 features.

561

562 Table 4. Histological classification of LGG samples with tumor purity greater than 70% used in this

563 study in each IDH status group.

IDH wild-type Mutant IDH + Mutant IDH +

Histological class (27 samples) no 1p/19q 1p/19q

codeletion codeletion

(71 samples) (61 samples)

Astrocytoma 12 32 0

Oligoastrocytoma 8 25 12

26

Oligodendroglioma 7 13 49

564

565 Mutations in the promoter of the gene encoding TERT were positively correlated

566 with astrocyte gene signatures (cortex development, cortex injury, SA SCI) in 12% of the

567 LGG samples. Positive correlations were found in 36% of the gene signatures for LGG

568 samples carrying mutations in the CIC gene. Astrocyte subpopulation gene signatures

569 (PopD, PopE, Cortex, and PopA cortex) clustered a subset of LGG samples into

570 oligodendrogliomas with IDH mutation and 1p/19q codeletion, TERT promoter mutations,

571 and CIC gene mutations. The heatmap in Extended Data Figure 4-3B contains the

572 enrichment scores of samples with an oligodendroglioma histology, mutant TERT

573 promoter, mutant CIC gene, and mutant IDH gene combined with the 1p/19q codeletion.

574 Kaplan-Meier survival plots are commonly used to assess treatment efficacy in

575 clinical trials. In order to determine whether astrocyte gene signatures are correlated with

576 the survival of glioma patients, we performed survival analysis using the survival R library

577 (Therneau, 2015). Kaplan-Meier estimates of overall survival among patients who had

578 LGG samples with an astrocytoma histology were obtained for each gene signature. We

579 grouped samples into negative (Low) and positive (High) enrichment scores, with and

580 without age as a factor, as described in Materials and Methods. Figure 5A-E depicts

581 significant (p-value < 0.05) survival plots with distinct astrocyte gene signatures. Better

582 prognosis (without an age effect) was observed for patients with astrocytoma LGG samples

583 correlated with brain stem subpopulation A (Figure 5A) and subpopulation D (Figure 5C)

584 gene signatures. Age had a greater effect than enrichment score on the probability of

585 survival for patients with astrocytoma LGG samples correlated with cortex subpopulation

586 A (Figure 5B), cortex subpopulation D (Figure 5D), or olfactory bulb subpopulation D

27

587 (Figure 5E). Astrocytoma LGG patients older than 45 years with profiles positively

588 associated with subpopulation D in cortex (Figure 5D) and olfactory bulb (Figure 5E) had

589 worse prognoses, as did LGG patients older than 45 years with Low enrichment scores for

590 profiles associated with cortex subpopulation A (Figure 5B).

591 To test whether the observed differences in survival (Figure 5) can be explained by

592 specific genetic mutations associated with subtypes of LGG tumors, we performed statistical

593 analysis. We used IDH mutation, a common previously known mutation of glioma samples.

594 We obtained 2x2 contingency tables with the number of LGG samples classified as “High”

595 and “Low” (correlation with gene signatures) and as “IDHwt” or “Mutated_IDH”. Then we

596 used Fisher’s test to determine whether the proportion of samples classified as “High” or

597 “Low” was different depending on the number of samples with and without IDH mutations.

598 We found that for some gene signatures (brainstem subpopulation A, cortex subpopulation A,

599 and cortex subpopulation D) the p-value of Fisher’s test was not significant (p-values=

600 0.4606, 0.09365 and 0.08331 respectively), therefore there is no statistical evidence to

601 demonstrate that there are differences in the enrichment scores between IDHwt and mutated

602 IDH. However, for subpopulation D and olfactory bulb subpopulation D, the p-value was

603 0.04211 and 0.001592 respectively, thus, the correlation between survival and enrichment

604 scores among these samples could possibly be affected by mutations of IDH.

605 Astrocyte subpopulation gene signatures correlate with neurodegenerative disease

606 gene sets

607 In addition to correlating the signatures of astrocyte subpopulations with glioma samples,

608 we also investigated their correlation with neurodegenerative diseases. We collected gene

609 sets derived from neurodegenerative disease studies (Extended Data Figure 2-4) from the

610 literature and combined them with related gene sets from the MsigDB database (Liberzon

28

611 et al., 2011). Using a hypergeometric test, the DE genes from each regional astrocyte

612 subpopulation were analyzed to identify the enrichment of gene sets related to

613 neurodegenerative diseases including AD, HD, and PD. Several neurodegenerative disease

614 gene sets were found to be enriched using regional subpopulation gene signatures at FDR <

615 0.05. As an example, “BLALOCK ALZHEIMERS DISEASE UP” was significantly

616 enriched with 97 upregulated genes from the cortex subpopulation B gene signature.

617 Extended Data Figure 6-1A shows a violin plot representing the normalized counts of the

618 97 upregulated genes from the cortex subpopulation B gene signature. The gene expression

619 of these 97 genes is also high in the subpopulation B profiles of brain stem and olfactory

620 bulb, due to their inherent similarity. The “LABADORF_HUNTINGTONS_UP” gene set

621 was significantly enriched with 89 DE genes from the cortex subpopulation B gene

622 signature (Extended Data Figure 6-1B). DE genes from cortex subpopulation C were

623 enriched for “KEGG HUNTINGTONS DISEASE” and “KEGG PARKINSONS

624 DISEASE” gene sets, with 13 and 11 genes respectively (Extended Data Figure 6-1C). The

625 complete list of significantly enriched neurodegenerative disease gene sets can be found in

626 Extended Data Figure 6-3.

627 The expression of astrocyte genes enriched in the Alzheimer’s disease gene set was

628 validated using immunostaining in mouse disease models

629 Significant enrichment of the AD gene set was found in DE genes from cortex astrocyte

630 subpopulation B. For validation, we selected three genes with gene expression fold-change

631 > 4 compared to the corresponding non-astrocyte samples. We used two mouse models of

632 AD: 5xFamilial Alzheimer’s disease (5xFAD) (Polito et al., 2014) and APPNLGF (NLGF)

633 (Saito et al., 2014). AD pathology was confirmed in the cortex as an increase in GFAP+

634 astrocytes as well as detectable hypertrophy, as shown in Extended Data Figure 6-2A.

29

635 Because subpopulation B astrocytes reside mainly in the inner cortex (Lin et al., 2017),

636 tissue sections from these brain regions were analyzed. The protein expression of Adcy7,

637 Serping1, and Emp1 genes was analyzed using immunofluorescence in inner cortex brain

638 tissue sections. Expression of Adcy7 protein was identified in 60% of WT cortical

639 astrocytes. In contrast, more than 80% of cortical astrocytes from 5xFAD mice models

640 expressed Adcy7 protein with an average fold-change of 1.5 r0.18 and a p-value < 0.05

641 (Figure 6A). Protein expression of the Serping1 gene was found in approximately 50%

642 cortical WT astrocytes and was upregulated in AD with a fold-change of 1.72 r0.19 and p-

643 value < 0.05 (Figure 6B). Similarly, nearly 50% and 80% of cortical WT and AD

644 astrocytes, respectively, expressed the Emp1 protein, which was upregulated 3.57 r0.27

645 times with a p-value < 0.001 (Figure 6C).

646 Microarray studies have reported two types of reactive astrocytes, A1 and A2

647 astrocytes, that are induced after neuroinflammation and ischemia, respectively (Zamanian

648 et al., 2012; Liddelow et al., 2017). Each type of astrocyte has its own reactive profile with

649 differences in transcriptome, morphology, and function. Since Serping1 protein is one of

650 the known markers for A1 reactive astrocytes and Emp1 protein is one of the known

651 markers for A2, we evaluated their combined expression in cortical astrocytes. Figure 6D

652 shows the single and combined fluorescence images in both WT and 5xFAD cortical

653 tissues. The immunostaining results from the NLGF AD mouse model are included in

654 Extended Data Figure 6-2:B-E and were similar to results from the 5xFAD model.

655

656 Discussion

30

657 Currently there is no effective treatment for glioma or for neurodegenerative

658 diseases such as AD, PD, and HD, because our understanding of their pathophysiology is

659 far from complete. Mounting evidence indicates that astrocytes are involved in various

660 neurological diseases. Moreover, it is now known that astrocytes are functionally diverse

661 and respond differently to pathological conditions.

662 Astrocytes are heterogeneous in morphology, developmental origin, gene

663 expression profile, and physiological properties (Zhang and Barres, 2010). Previous

664 research demonstrated the existence of at least five different astrocyte subpopulations from

665 three brain regions (olfactory bulb, cortex, and brain stem) (Lin et al., 2017). Analogous

666 astrocyte subpopulations were found in human and mouse glioma. Astrocyte subpopulation

667 C and its analog in glioma were enriched in the expression of genes associated with

668 synapse formation and epilepsy. In the present study, we expanded the genomic analyses of

669 these heterogeneous astrocyte subpopulations to include the expression of lncRNA genes

670 and the identification of gene signatures enriched in astrocyte subpopulations within each

671 brain region. We found that even though astrocyte subpopulations share similarities across

672 regions, there are interesting differences among them in gene expression. Astrocyte

673 subpopulations display distinct lncRNA gene expression patterns among brain regions

674 highlighting the contribution of lncRNAs to the different astrocyte gene signatures and to

675 potential regulatory functions.

676 We selected subsets of DE protein-coding genes from astrocyte gene signatures to

677 gain insight into the functions of astrocyte subpopulations in different brain regions. Highly

678 expressed DE protein-coding genes enriched in cortex subpopulation B included Mertk and

679 Sirpa, both known as synaptic phagocytic genes. MERTK interacts with the integrin

680 pathway regulating CdKII/DOCK180/Rac1 modules, which control the rearrangement of

31

681 the actin cytoskeleton during phagocytosis (Wu et al., 2005). Previous research has

682 demonstrated that astrocytes in both the developing and the adult CNS contribute to neural

683 circuit remodeling by phagocytosing synapses via the MEGF10 and MERTK pathways in

684 response to neural activity (Chung et al., 2013). However, even though Mertk is still

685 expressed in mature astrocytes, synaptic phagocytosis declines in adult CNS (Chung et al.,

686 2013); thus inhibitory mechanisms might be upregulated. SIRPA membrane receptor

687 recognizes CD47 and upon binding inhibits synapse phagocytosis (Barclay and van den

688 Berg, 2014). Researchers have found that the expression of Sirpa membrane receptor is also

689 upregulated in mature astrocytes (Sloan et al., 2017). In the previous research, cortex

690 subpopulation B was found at a later stage during cortical development compared to

691 subpopulations A, C, and E (Lin et al., 2017). MERTK and SIRPA potentially regulate

692 synaptic density in cortex subpopulation B during late cortical development.

693 Slc1a2 and Cpe are among the highly expressed DE protein coding genes found in

694 cortex subpopulation C samples. SLC1A2 is a glial amino acid transporter which plays a

695 major role in synaptic glutamate clearance. The dysfunction of SLC1A2 (commonly

696 observed in neurodegenerative diseases) causes elevated levels of glutamate which yield

697 neuronal damage (Lin et al., 2012). In a previous study, co-cultures with neurons

698 demonstrated that cortex subpopulation C astrocytes significantly enhanced synapse

699 formation as compared to subpopulation A or bulk astrocytes (Lin et al., 2017). Further

700 functional studies are required to determine if the upregulation of Slc1a2 is related to the

701 enhancement of synapse function. Another DE protein coding gene found with high

702 expression in cortex subpopulation C is Cpe. CPE is a peptidase which may also function

703 as a neurotrophic factor promoting neuronal survival. Interestingly, CPE has been found to

704 be secreted by cultured astrocytes (Klein et al., 1992) and its overexpression in a mouse

32

705 model of glioma was found to mitigate cell migration (Armento et al., 2017). Through

706 Transwell assays, astrocyte cortex subpopulation C was identified as having less migratory

707 potential than subpopulation A (Lin et al., 2017). Our results suggest a potential link

708 between the upregulation of CPE peptidase and cell migration. Overall, astrocyte gene

709 signatures may be used to gain insight into their region and subpopulation specific

710 functions.

711 PCA and GSEA analyses yielded significantly enriched gene sets relevant to

712 functions in specific astrocyte subpopulations and brain regions. Through PCA, we found

713 that PC1 separated samples into non-astrocytes and astrocyte samples, whereas PC2

714 clustered samples according to subpopulation. The gene set GO_MYELIN_SHEATH was

715 among the top enriched sets using PC1 scores. The enrichment of this gene set is likely due

716 to the upregulation of astrocyte-derived factors which support oligodendrocyte myelination

717 in the CNS. Together, PC1 and PC2 captured nearly 69% of the variability. PC3 and PC4

718 further separated samples according to brain region. Our results demonstrate the

719 heterogeneity in astrocyte gene expression profiles due to subpopulation and brain region.

720 Many genes become upregulated in astrocytes in response to injuries and diseases

721 through a transformation called reactive astrogliosis. As a functionally diverse cell

722 population, the responses of astrocytes to injury and disease are equally diverse. Barres and

723 colleagues (Zamanian et al., 2012) found that astrocytes exist in at least two reactive states,

724 A1 (inflammatory) and A2 (ischemic). A1 neuroinflammatory reactive astrocytes exhibit

725 upregulation of complement cascade genes that leads to loss of synapses. In contrast, A2

726 ischemic reactive astrocytes display upregulation of neurotrophic factors such as

727 thrombospondins that promote synapse recovery and repair (Zamanian et al., 2012). A1

728 reactive astrocytes have been found in postmortem brain tissue from patients with

33

729 neurodegenerative diseases (AD, HD, PD), amyotrophic lateral sclerosis (ALS), and

730 multiple sclerosis (MS) (Liddelow et al., 2017). Furthermore, reactive astrocytes have also

731 been found neighboring tumor cells and might enhance their malignancy by inducing cell

732 proliferation and migration (Le et al., 2003; Biasoli et al., 2014). A better understanding of

733 the heterogeneity of astrocytes and their potential involvement in and contributions to

734 different diseases is needed to promote the development of treatments targeting specific

735 astrocyte subpopulations.

736 Altered astrocyte function is increasingly recognized as a factor contributing to

737 glioma and a number of neurodegenerative diseases. Given the glial-like histopathology of

738 glioma and its underlying cellular diversity, certain types of glioma might be composed of

739 malignant analogues of astrocyte subpopulations (Patel et al., 2014; Laug et al., 2018).

740 Thus, we surveyed multiple previously published astrocyte subpopulation datasets and

741 correlated our collection of astrocyte subpopulation gene signatures with those of various

742 subtypes of glioma and neurodegenerative diseases. Our results indicate that distinct

743 astrocyte gene signatures are correlated to glioma samples with specific genomic features

744 and somatic mutations. We correlated astrocyte gene signatures with copy-number

745 variations that are known to play a role in tumor proliferation. For example, amplifications

746 of the EGFR gene are known to lead to tumor growth through enhancement of cell

747 proliferation. MTAP and CDKN2A genes are frequently deleted in human cancers

748 (Mavrakis et al., 2016). A total of 18 astrocyte gene signatures were significantly correlated

749 with GBM samples harboring a combination of copy-number variations including EGFR

750 amplification, CDKN2A deletion, and MTAP deletion. Subpopulation C from all regions,

751 subpopulation A (brain stem and olfactory bulb), and subpopulation B (cortex and olfactory

752 bulb) were among the aforementioned gene signatures.

34

753 We also correlated astrocyte gene signatures with LGG samples. The correlation

754 between astrocyte gene signatures and LGG samples improved when IDH-1p/19q status

755 was considered than when histological class was accounted for, consistent with a previous

756 report (Cancer Genome Atlas Research Network et al., 2015). Our results indicate that

757 more than half of the gene signatures were positively correlated with samples classified

758 either as wild-type or mutated IDH with no 1p/19q codeletion. The high positive correlation

759 of gene profiles with wild-type and mutant IDH samples without the 1p/19q codeletion is

760 most likely due to the astrocyte origin of these gene signatures. Interestingly, most of the

761 astrocyte subpopulations from the olfactory bulb (A, B, D, E) were strongly correlated with

762 wild-type IDH. It is possible that astrocytomas acquire a signature that is most similar to

763 olfactory bulb astrocytes. Our results also show that amygdala gene signatures (AS1-AS3)

764 were strongly correlated with samples with mutant IDH and no 1p/19q codeletion. The

765 gene signatures of subpopulations D and E, cortex subpopulation A, cortex development

766 (A1 and A2), and cortex injury (B2) were most highly correlated with LGG samples

767 bearing the IDH mutation and the 1p/19q codeletion; they are also positively associated

768 with oligodendroglioma histology. This is very interesting because it provides potential

769 links between key genomic mutations and the resultant cellular phenotypes. Even though

770 the gene signatures were derived from astrocyte subpopulations, the correlation with

771 oligodendroglioma samples might indicate that low-grade oligodendrogliomas contain

772 proliferating glial progenitor cells that dedifferentiated from astrocyte subpopulations,

773 consistent with a previous report (Dai, 2001).

774 Survival analysis was conducted to determine whether enrichment scores of

775 astrocyte signatures are correlated with the survival probability of patients. Kaplan-Meier

776 plots suggest that patients with an astrocytic LGG and a gene profile similar to that of

35

777 astrocyte subpopulation A in brain stem are not correlated with mutations of IDH and have

778 a better prognosis. On the other hand, the survival of LGG patients associated with

779 subpopulation D and olfactory bulb subpopulation D gene signatures could possibly be

780 affected by mutations of IDH correlated with subtypes of these LGG samples.

781 The gene expression correlations found between astrocyte gene signatures and

782 TCGA GBM/LGG samples suggest that certain subtypes of glioma could potentially

783 originate from specific astrocyte subpopulations. Future experimentation will help testing

784 the gene expression correlations identified in this work to advance the understanding and

785 treatment of brain tumors. For example, interesting work has been done to generate RNA-

786 Seq transcriptional profiles from patient-derived glioma stem cells (Jin et al., 2017; Park et

787 al., 2017; Puchalski et al., 2018). Finding the enrichment of astrocyte subpopulation and

788 region gene signatures in patient-derived glioma stem cell models may pinpoint astrocyte

789 subpopulations which might be involved in gliomagenesis.

790 In addition to glioma, increasing evidence suggests that astrocytes dysfunction is

791 also involved in neurodegenerative diseases (Sofroniew and Vinters, 2010). Through

792 GSEA, we found that cortex and olfactory bulb gene profiles of subpopulations B and C

793 were enriched in AD, PD, and HD. GO terms related to mitochondrial respiratory chain

794 complex were significantly enriched with genes in the cortex region astrocyte gene

795 signature. As a result, astrocyte subpopulations in the cortex might be more vulnerable to

796 mitochondrial dysfunction that could be associated with neurodegenerative diseases. The

797 AD gene set was enriched with DE genes from cortex subpopulation B. Immunostaining

798 assays validated the protein expression of three highly upregulated genes in cortex

799 subpopulation B in two AD mouse models. Inner cortices of AD mice have significantly

800 more GFAP + astrocytes than do wild-type mouse brains. Because subpopulation B

36

801 astrocytes reside mostly in the inner cortex, we assumed that cortical GFAP+ cells belong

802 mainly to this subpopulation. GFAP + astrocytes in AD cortices also exhibit significantly

803 increased Adcy7 protein expression. Adcy7 is a membrane-bound enzyme that catalyzes

804 the formation of cyclic AMP (cAMP) from ATP (Crown Center, 1997;

805 Safran et al., 2003). Increased immunostaining of cAMP co-immunolocalizes with beta-

806 amyloid in cerebral cortical vessels of AD patients (Martínez et al., 2001).

807 Furthermore, significantly elevated levels of cAMP have been found in cerebrospinal fluids

808 of AD patients (Martínez et al., 1999). The above evidence suggests a potential role of

809 cAMP in enhancing the progression of AD. Our results indicate that cortex subpopulation

810 B might be involved in AD development as a response to an insult or stimuli. One of the

811 possible mechanisms by which cortex subpopulation B might contribute to the progression

812 of AD is through the upregulation of Adcy7, which would subsequently enhance the

813 synthesis of cAMP from ATP. Studies involving a decrease in the expression of Adcy7 in

814 cortex subpopulation B astrocytes from AD mice are required to better understand the

815 underlying pathways and mechanisms of AD development.

816 In summary, we have demonstrated that astrocyte subpopulations from diverse brain

817 regions have unique gene signatures for both protein-coding and lncRNA genes. Regional

818 astrocyte subpopulation gene signatures are enriched in different functional gene sets,

819 indicating their heterogeneity. We also obtained from the literature a comprehensive

820 collection of gene signatures of purified astrocyte subpopulations from diverse

821 developmental stages, brain regions, and conditions. Through analyses of gene set

822 enrichment, we found that gene signatures of specific astrocyte subpopulations correlated

823 with distinct glioma subtypes with unique genomic alterations. Additionally, we

824 demonstrated the association of different astrocyte gene signatures with neurodegenerative

37

825 diseases. We validated the upregulated protein expression of a set of genes in cortical

826 astrocytes in two mouse models of AD. Thus, our analysis indicates that certain subtypes of

827 glioma and neurodegenerative diseases are correlated to specific astrocyte subpopulations.

828 Targeting specific astrocyte subpopulations could present a novel strategy for treating these

829 devastating neurological diseases.

830

831 References

832 Anders S, Pyl PT, Huber W (2015) HTSeq--a Python framework to work with high-

833 throughput sequencing data. Bioinformatics 31:166–169 Available at:

834 http://www.ncbi.nlm.nih.gov/pubmed/25260700.

835 Armento A, Ilina EI, Kaoma T, Muller A, Vallar L, Niclou SP, Krüger MA, Mittelbronn M,

836 Naumann U (2017) Carboxypeptidase E transmits its anti-migratory function in

837 glioma cells via transcriptional regulation of cell architecture and motility regulating

838 factors. Int J Oncol 51:702–714 Available at:

839 http://www.ncbi.nlm.nih.gov/pubmed/28656234.

840 Barclay AN, van den Berg TK (2014) The Interaction Between Signal Regulatory Protein

841 Alpha (SIRP α ) and CD47: Structure, Function, and Therapeutic Target.

842 Annu Rev Immunol 32:25–50 Available at:

843 http://www.annualreviews.org/doi/10.1146/annurev-immunol-032713-120142.

844 Barres BA (2008) The Mystery and Magic of Glia: A Perspective on Their Roles in Health

845 and Disease. Neuron 60:430–440 Available at:

846 http://linkinghub.elsevier.com/retrieve/pii/S0896627308008866.

847 Batista PJ, Chang HY (2013) Long noncoding RNAs: cellular address codes in

38

848 development and disease. Cell 152:1298–1307 Available at:

849 http://www.ncbi.nlm.nih.gov/pubmed/23498938.

850 Biasoli D, Sobrinho MF, da Fonseca ACC, de Matos DG, Romão L, de Moraes Maciel R,

851 Rehen SK, Moura-Neto V, Borges HL, Lima FRS (2014) Glioblastoma cells inhibit

852 astrocytic p53-expression favoring cancer malignancy. Oncogenesis 3:e123 Available

853 at: http://www.ncbi.nlm.nih.gov/pubmed/25329722.

854 Bowman RL, Wang Q, Carro A, Verhaak RGW, Squatrito M (2017) GlioVis data portal for

855 visualization and analysis of brain tumor expression datasets. Neuro Oncol 19:139–

856 141 Available at: http://www.ncbi.nlm.nih.gov/pubmed/28031383.

857 Cahoy JD, Emery B, Kaushal A, Foo LC, Zamanian JL, Christopherson KS, Xing Y,

858 Lubischer JL, Krieg PA, Krupenko SA, Thompson WJ, Barres BA (2008) A

859 Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New

860 Resource for Understanding Brain Development and Function. J Neurosci 28:264–278

861 Available at: http://www.jneurosci.org/cgi/doi/10.1523/JNEUROSCI.4178-07.2008.

862 Cancer Genome Atlas Research Network et al. (2015) Comprehensive, Integrative

863 Genomic Analysis of Diffuse Lower-Grade Gliomas. N Engl J Med 372:2481–2498

864 Available at: http://www.ncbi.nlm.nih.gov/pubmed/26061751.

865 Ceccarelli M et al. (2016) Molecular Profiling Reveals Biologically Discrete Subsets and

866 Pathways of Progression in Diffuse Glioma. Cell 164:550–563 Available at:

867 http://linkinghub.elsevier.com/retrieve/pii/S009286741501692X.

868 Chung W-S, Clarke LE, Wang GX, Stafford BK, Sher A, Chakraborty C, Joung J, Foo LC,

869 Thompson A, Chen C, Smith SJ, Barres BA (2013) Astrocytes mediate synapse

870 elimination through MEGF10 and MERTK pathways. Nature 504:394–400 Available

871 at: http://www.nature.com/articles/nature12776.

39

872 Crown Human Genome Center (1997) GeneCards Human Gene Database. Available at:

873 www..org [Accessed April 15, 2018].

874 Cuevas Diaz Duran R, Menon S, Wu J (2016) The Analyses of Global Gene Expression

875 and Transcription Factor Regulation. In, pp 1–35 Available at:

876 http://link.springer.com/10.1007/978-94-017-7450-5_1.

877 Dai C (2001) PDGF autocrine stimulation dedifferentiates cultured astrocytes and induces

878 oligodendrogliomas and oligoastrocytomas from neural progenitors and astrocytes in

879 vivo. Genes Dev 15:1913–1925 Available at:

880 http://www.genesdev.org/cgi/doi/10.1101/gad.903001.

881 Denis-Donini S, Glowinski J, Prochiantz A (1984) Glial heterogeneity may define the

882 three-dimensional shape of mouse mesencephalic dopaminergic neurones. Nature

883 307:641–643 Available at: http://www.ncbi.nlm.nih.gov/pubmed/6694754.

884 Dong X, Chen K, Cuevas-Diaz Duran R, You Y, Sloan SA, Zhang Y, Zong S, Cao Q,

885 Barres BA, Wu JQ (2015) Comprehensive Identification of Long Non-coding RNAs

886 in Purified Cell Types from the Brain Reveals Functional LncRNA in OPC Fate

887 Determination Wahlestedt C, ed. PLOS Genet 11:e1005669 Available at:

888 http://dx.plos.org/10.1371/journal.pgen.1005669.

889 Dong X, You Y, Wu JQ (2016) Building an RNA Sequencing Transcriptome of the Central

890 Nervous System. Neuroscientist 22:579–592 Available at:

891 http://www.ncbi.nlm.nih.gov/pubmed/26463470.

892 Doyle JP, Dougherty JD, Heiman M, Schmidt EF, Stevens TR, Ma G, Bupp S, Shrestha P,

893 Shah RD, Doughty ML, Gong S, Greengard P, Heintz N (2008) Application of a

894 Translational Profiling Approach for the Comparative Analysis of CNS Cell Types.

895 Cell 135:749–762 Available at:

40

896 http://linkinghub.elsevier.com/retrieve/pii/S0092867408013664.

897 Duran RC-D, Yan H, Zheng Y, Huang X, Grill R, Kim DH, Cao Q, Wu JQ (2017) The

898 systematic analysis of coding and long non-coding RNAs in the sub-chronic and

899 chronic stages of spinal cord injury. Sci Rep 7:41008 Available at:

900 http://www.ncbi.nlm.nih.gov/pubmed/28106101.

901 Frazee AC, Langmead B, Leek JT (2011) ReCount: a multi-experiment resource of

902 analysis-ready RNA-seq gene count datasets. BMC Bioinformatics 12:449 Available

903 at: http://www.ncbi.nlm.nih.gov/pubmed/22087737.

904 Gokce O, Stanley GM, Treutlein B, Neff NF, Camp JG, Malenka RC, Rothwell PE,

905 Fuccillo M V, Südhof TC, Quake SR (2016) Cellular Taxonomy of the Mouse

906 Striatum as Revealed by Single-Cell RNA-Seq. Cell Rep 16:1126–1137 Available at:

907 http://www.ncbi.nlm.nih.gov/pubmed/27425622.

908 Grass D, Pawlowski PG, Hirrlinger J, Papadopoulos N, Richter DW, Kirchhoff F,

909 Hülsmann S (2004) Diversity of functional astroglial properties in the respiratory

910 network. J Neurosci 24:1358–1365 Available at:

911 http://www.ncbi.nlm.nih.gov/pubmed/14960607.

912 Guttman M, Rinn JL (2012) Modular regulatory principles of large non-coding RNAs.

913 Nature 482:339–346 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22337053.

914 Hänzelmann S, Castelo R, Guinney J (2013) GSVA: gene set variation analysis for

915 microarray and RNA-seq data. BMC Bioinformatics 14:7 Available at:

916 http://www.ncbi.nlm.nih.gov/pubmed/23323831.

917 Hara M, Kobayakawa K, Ohkawa Y, Kumamaru H, Yokota K, Saito T, Kijima K,

918 Yoshizaki S, Harimaya K, Nakashima Y, Okada S (2017) Interaction of reactive

919 astrocytes with type I collagen induces astrocytic scar formation through the integrin-

41

920 N-cadherin pathway after spinal cord injury. Nat Med 23:818–828 Available at:

921 http://www.ncbi.nlm.nih.gov/pubmed/28628111.

922 Herculano-Houzel S (2014) The glia/neuron ratio: How it varies uniformly across brain

923 structures and species and what that means for brain physiology and evolution. Glia

924 62:1377–1391 Available at: http://doi.wiley.com/10.1002/glia.22683.

925 Hill SJ, Barbarese E, McIntosh TK (1996) Regional heterogeneity in the response of

926 astrocytes following traumatic brain injury in the adult rat. J Neuropathol Exp Neurol

927 55:1221–1229 Available at: http://www.ncbi.nlm.nih.gov/pubmed/8957445.

928 Hung T et al. (2011) Extensive and coordinated transcription of noncoding RNAs within

929 cell-cycle promoters. Nat Genet 43:621–629 Available at:

930 http://www.ncbi.nlm.nih.gov/pubmed/21642992.

931 Jin X, Kim LJY, Wu Q, Wallace LC, Prager BC, Sanvoranart T, Gimple RC, Wang X,

932 Mack SC, Miller TE, Huang P, Valentim CL, Zhou Q-G, Barnholtz-Sloan JS, Bao S,

933 Sloan AE, Rich JN (2017) Targeting glioma stem cells through combined BMI1 and

934 EZH2 inhibition. Nat Med 23:1352–1361 Available at:

935 http://www.ncbi.nlm.nih.gov/pubmed/29035367.

936 John Lin C-C, Yu K, Hatcher A, Huang T-W, Lee HK, Carlson J, Weston MC, Chen F,

937 Zhang Y, Zhu W, Mohila CA, Ahmed N, Patel AJ, Arenkiel BR, Noebels JL,

938 Creighton CJ, Deneen B (2017) Identification of diverse astrocyte populations and

939 their malignant analogs. Nat Neurosci 20:396–405 Available at:

940 http://www.nature.com/articles/nn.4493.

941 Jolliffe IT (2002) Principal Component Analysis,. New York: Springer.

942 Kassambara A, Kosinski M BP (2017) survminer: Drawing Survival Curves using

943 “ggplot2.”

42

944 Klein RS, Das B, Fricker LD (1992) Secretion of carboxypeptidase E from cultured

945 astrocytes and from AtT-20 cells, a neuroendocrine cell line: implications for

946 neuropeptide biosynthesis. J Neurochem 58:2011–2018 Available at:

947 http://www.ncbi.nlm.nih.gov/pubmed/1573389.

948 Laug D, Glasgow SM, Deneen B (2018) A glial blueprint for gliomagenesis. Nat Rev

949 Neurosci Available at: http://www.nature.com/articles/s41583-018-0014-3.

950 Le DM, Besson A, Fogg DK, Choi K-S, Waisman DM, Goodyer CG, Rewcastle B, Yong

951 VW (2003) Exploitation of Astrocytes by Glioma Cells to Facilitate Invasiveness: A

952 Mechanism Involving Matrix Metalloproteinase-2 and the Urokinase-Type

953 Plasminogen Activator–Plasmin Cascade. J Neurosci 23:4034–4043 Available at:

954 http://www.jneurosci.org/lookup/doi/10.1523/JNEUROSCI.23-10-04034.2003.

955 Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP

956 (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27:1739–1740

957 Available at: http://www.ncbi.nlm.nih.gov/pubmed/21546393.

958 Liddelow SA et al. (2017) Neurotoxic reactive astrocytes are induced by activated

959 microglia. Nature 541:481–487 Available at:

960 http://www.nature.com/articles/nature21029.

961 Lin C-LG, Kong Q, Cuny GD, Glicksman MA (2012) Glutamate transporter EAAT2: a

962 new target for the treatment of neurodegenerative diseases. Future Med Chem 4:1689–

963 1700 Available at: http://www.ncbi.nlm.nih.gov/pubmed/22924507.

964 Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion

965 for RNA-seq data with DESeq2. Genome Biol 15:550 Available at:

966 http://www.ncbi.nlm.nih.gov/pubmed/25516281.

967 Martínez M, Fernández E, Frank A, Guaza C, de la Fuente M, Hernanz A (1999) Increased

43

968 cerebrospinal fluid cAMP levels in Alzheimer’s disease. Brain Res 846:265–267

969 Available at: http://www.ncbi.nlm.nih.gov/pubmed/10556645.

970 Martínez M, Hernández AI, Hernanz A (2001) Increased cAMP immunostaining in

971 cerebral vessels in Alzheimer’s disease. Brain Res 922:148–152 Available at:

972 http://www.ncbi.nlm.nih.gov/pubmed/11730714.

973 Mavrakis KJ et al. (2016) Disordered methionine metabolism in MTAP/CDKN2A-deleted

974 cancers leads to dependence on PRMT5. Science (80- ) 351:1208–1213 Available at:

975 http://www.sciencemag.org/lookup/doi/10.1126/science.aad5944.

976 Morel L, Chiang MSR, Higashimori H, Shoneye T, Iyer LK, Yelick J, Tai A, Yang Y

977 (2017) Molecular and Functional Properties of Regional Astrocytes in the Adult Brain.

978 J Neurosci 37:8706–8717 Available at:

979 http://www.ncbi.nlm.nih.gov/pubmed/28821665.

980 Noristani HN, Sabourin JC, Boukhaddaoui H, Chan-Seng E, Gerber YN, Perrin FE (2016)

981 Spinal cord injury induces astroglial conversion towards neuronal lineage. Mol

982 Neurodegener 11:68 Available at: http://www.ncbi.nlm.nih.gov/pubmed/27716282.

983 Park NI et al. (2017) ASCL1 Reorganizes Chromatin to Direct Neuronal Fate and Suppress

984 Tumorigenicity of Glioblastoma Stem Cells. Cell Stem Cell 21:209–224.e7 Available

985 at: http://www.ncbi.nlm.nih.gov/pubmed/28712938.

986 Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, Cahill DP,

987 Nahed B V, Curry WT, Martuza RL, Louis DN, Rozenblatt-Rosen O, Suvà ML,

988 Regev A, Bernstein BE (2014) Single-cell RNA-seq highlights intratumoral

989 heterogeneity in primary glioblastoma. Science 344:1396–1401 Available at:

990 http://www.ncbi.nlm.nih.gov/pubmed/24925914.

991 Polito VA, Li H, Martini-Stoica H, Wang B, Yang L, Xu Y, Swartzlander DB, Palmieri M,

44

992 di Ronza A, Lee VM-Y, Sardiello M, Ballabio A, Zheng H (2014) Selective clearance

993 of aberrant tau proteins and rescue of neurotoxicity by transcription factor EB. EMBO

994 Mol Med 6:1142–1160 Available at: http://www.ncbi.nlm.nih.gov/pubmed/25069841.

995 Puchalski RB et al. (2018) An anatomic transcriptional atlas of human glioblastoma.

996 Science 360:660–663 Available at: http://www.ncbi.nlm.nih.gov/pubmed/29748285.

997 Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32

998 Suppl:496–501 Available at: http://www.ncbi.nlm.nih.gov/pubmed/12454644.

999 R Core Team (2015) A language and environment for statistical computing.

1000 Rusnakova V, Honsa P, Dzamba D, Ståhlberg A, Kubista M, Anderova M (2013)

1001 Heterogeneity of astrocytes: from development to injury - single cell gene expression.

1002 PLoS One 8:e69734 Available at: http://www.ncbi.nlm.nih.gov/pubmed/23940528.

1003 Safran M, Chalifa-Caspi V, Shmueli O, Olender T, Lapidot M, Rosen N, Shmoish M, Peter

1004 Y, Glusman G, Feldmesser E, Adato A, Peter I, Khen M, Atarot T, Groner Y, Lancet

1005 D (2003) Human Gene-Centric Databases at the Weizmann Institute of Science:

1006 GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 31:142–146 Available

1007 at: http://www.ncbi.nlm.nih.gov/pubmed/12519968.

1008 Saito T, Matsuba Y, Mihira N, Takano J, Nilsson P, Itohara S, Iwata N, Saido TC (2014)

1009 Single App knock-in mouse models of Alzheimer’s disease. Nat Neurosci 17:661–663

1010 Available at: http://www.ncbi.nlm.nih.gov/pubmed/24728269.

1011 Sloan SA, Darmanis S, Huber N, Khan TA, Birey F, Caneda C, Reimer R, Quake SR,

1012 Barres BA, Paşca SP (2017) Human Astrocyte Maturation Captured in 3D Cerebral

1013 Cortical Spheroids Derived from Pluripotent Stem Cells. Neuron 95:779–790.e6

1014 Available at: https://linkinghub.elsevier.com/retrieve/pii/S0896627317306839.

1015 Sofroniew M V, Vinters H V (2010) Astrocytes: biology and pathology. Acta Neuropathol

45

1016 119:7–35 Available at: http://www.ncbi.nlm.nih.gov/pubmed/20012068.

1017 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich

1018 A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment

1019 analysis: A knowledge-based approach for interpreting genome-wide expression

1020 profiles. Proc Natl Acad Sci 102:15545–15550 Available at:

1021 http://www.pnas.org/cgi/doi/10.1073/pnas.0506580102.

1022 Therneau TM (2015) A Package for Survival Analysis in S.

1023 Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-

1024 Seq. Bioinformatics 25:1105–1111 Available at:

1025 http://www.ncbi.nlm.nih.gov/pubmed/19289445.

1026 Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL,

1027 Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of

1028 RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562–578 Available at:

1029 http://www.ncbi.nlm.nih.gov/pubmed/22383036.

1030 Verhaak RGW et al. (2010) Integrated genomic analysis identifies clinically relevant

1031 subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR,

1032 and NF1. Cancer Cell 17:98–110 Available at:

1033 http://www.ncbi.nlm.nih.gov/pubmed/20129251.

1034 Wang Q et al. (2018) Tumor Evolution of Glioma-Intrinsic Gene Expression Subtypes

1035 Associates with Immunological Changes in the Microenvironment. Cancer Cell

1036 33:152 Available at: http://www.ncbi.nlm.nih.gov/pubmed/29316430.

1037 White RE, McTigue DM, Jakeman LB (2010) Regional heterogeneity in astrocyte

1038 responses following contusive spinal cord injury in mice. J Comp Neurol 518:1370–

1039 1390 Available at: http://www.ncbi.nlm.nih.gov/pubmed/20151365.

46

1040 Wu J ed. (2016) Transcriptomics and Gene Regulation. Dordrecht: Springer Netherlands.

1041 Available at: http://link.springer.com/10.1007/978-94-017-7450-5.

1042 Wu Y, Singh S, Georgescu M-M, Birge RB (2005) A role for Mer tyrosine kinase in

1043 alphavbeta5 integrin-mediated phagocytosis of apoptotic cells. J Cell Sci 118:539–553

1044 Available at: http://www.ncbi.nlm.nih.gov/pubmed/15673687.

1045 Wu YE, Pan L, Zuo Y, Li X, Hong W (2017) Detecting Activated Cell Populations Using

1046 Single-Cell RNA-Seq. Neuron 96:313–329.e6 Available at:

1047 http://www.ncbi.nlm.nih.gov/pubmed/29024657.

1048 Yeh T-H, Lee DY, Gianino SM, Gutmann DH (2009) Microarray analyses reveal regional

1049 astrocyte heterogeneity with implications for neurofibromatosis type 1 (NF1)-

1050 regulated glial proliferation. Glia 57:1239–1249 Available at:

1051 http://doi.wiley.com/10.1002/glia.20845.

1052 Zamanian JL, Xu L, Foo LC, Nouri N, Zhou L, Giffard RG, Barres BA (2012) Genomic

1053 analysis of reactive astrogliosis. J Neurosci 32:6391–6410 Available at:

1054 http://www.ncbi.nlm.nih.gov/pubmed/22553043.

1055 Zeisel A, Muñoz-Manchado AB, Codeluppi S, Lönnerberg P, La Manno G, Juréus A,

1056 Marques S, Munguba H, He L, Betsholtz C, Rolny C, Castelo-Branco G, Hjerling-

1057 Leffler J, Linnarsson S (2015) Brain structure. Cell types in the mouse cortex and

1058 hippocampus revealed by single-cell RNA-seq. Science 347:1138–1142 Available at:

1059 http://www.ncbi.nlm.nih.gov/pubmed/25700174.

1060 Zhang Y, Barres BA (2010) Astrocyte heterogeneity: an underappreciated topic in

1061 neurobiology. Curr Opin Neurobiol 20:588–594 Available at:

1062 http://www.ncbi.nlm.nih.gov/pubmed/20655735.

1063 Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP,

47

1064 Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R,

1065 Maniatis T, Barres BA, Wu JQ (2014) An RNA-sequencing transcriptome and

1066 splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci

1067 34:11929–11947 Available at: http://www.ncbi.nlm.nih.gov/pubmed/25186741.

1068 Zhang Y, Sloan SA, Clarke LE, Caneda C, Plaza CA, Blumenthal PD, Vogel H, Steinberg

1069 GK, Edwards MSB, Li G, Duncan JA, Cheshier SH, Shuer LM, Chang EF, Grant GA,

1070 Gephart MGH, Barres BA (2016) Purification and Characterization of Progenitor and

1071 Mature Human Astrocytes Reveals Transcriptional and Functional Differences with

1072 Mouse. Neuron 89:37–53 Available at:

1073 http://linkinghub.elsevier.com/retrieve/pii/S0896627315010193.

1074

1075 Figure Legends

1076 Figure 1. Principal Component Analysis (PCA) performed using log2-transformed

1077 FPKM of 4092 genes differentially expressed (DE) between regions.

1078 (A) Scores for principal components 1 (PC1) and 2 (PC2) are depicted. PC1 separates

1079 samples into astrocyte subpopulations and non-astrocyte populations. Circular and elliptical

1080 figures enclose samples according to subpopulation. (B) The plot of principal components 3

1081 (PC3) and 4 (PC4) separates samples into brain regions, as indicated by the elliptical

1082 figures. (C, D, E) Differentially expressed genes were ranked using the scores of PC1, PC2,

1083 and PC3, respectively, and the top enriched gene sets were obtained through Gene Set

1084 Enrichment Analysis (GSEA).

1085 Figure 2. Gene signatures defined by astrocyte subpopulations and brain regions.

1086 (A) Heatmap showing the log2-transformed FPKM values of cortex subpopulation B-

1087 enriched genes (protein-coding and lncRNAs) compared to other subpopulations. DE genes

48

1088 were obtained by performing pairwise comparisons of astrocyte subpopulations in each

1089 brain region to their corresponding non-astrocyte samples. Selection criteria: FPKM > 1,

1090 fold-change > 2, and FDR < 10%. (B) Heatmap illustrating upregulated DE lncRNA genes

1091 from astrocyte regional subpopulation gene signatures. (C) The top five significantly

1092 enriched gene sets obtained using the gene signatures of cortex subpopulation B (top panel)

1093 and cortex subpopulation C (lower panel). Bar plots indicate the number of genes found in

1094 the enriched gene set. The yellow line illustrates the gene set enrichment using -log10-

1095 transformed FDR. (D) Heatmap depicting the region-specific astrocyte gene signatures. DE

1096 genes were obtained by comparing all samples from the same brain region to the remaining

1097 regions (FPKM > 1, fold-change > 4, and p-value < 0.05) and to their corresponding non-

1098 astrocyte samples (FPKM > 1, fold-change > 2, and p-value < 0.05). Colors of all heatmaps

1099 represent log2 transformed FPKM values. See also Extended Data Figures 2-1 to 2-4.

1100 Figure 3. Correlation between upregulated astrocyte gene signatures and TCGA

1101 GBM samples

1102 (A) Heatmap displaying the hierarchical clustering of enrichment scores obtained through

1103 Gene Set Variation Analysis (GSVA). High scores indicate strong positive correlation

1104 between astrocyte gene signatures and gene expression profiles of 103 TCGA GBM

1105 samples with a tumor purity > 70%. Upper bars indicate tumor subtype and the presence or

1106 absence of an EGFR gene amplification.

1107 (B) Heatmap representing the enrichment scores between astrocyte gene signatures and

1108 Classical GBM samples that carry an EGFR gene amplification. Bar plots indicate the

1109 number of samples with positive enrichment score for each astrocyte gene signature.

1110 (C) Exploratory analysis of the correlation between astrocyte gene signatures and GBM

1111 samples with different subtypes, somatic mutations, and copy-number variations. The

49

1112 heatmap depicts the -log10-transformed p-value of either an ANOVA comparison between

1113 GBM subtypes or a Wilcoxon Rank-Sum test between samples carrying the selected

1114 mutations, amplifications, and deletions. For viewing clarity, only 21 representative

1115 astrocyte gene signatures are shown in each panel. Figures depicting all astrocyte gene

1116 signatures are available in Extended Data Figures 3-3 and 3-4. See also Extended Data

1117 Figures 3-1 to 3-6.

1118 Figure 4. Correlation between upregulated astrocyte gene signatures and TCGA LGG

1119 samples

1120 (A) Heatmap displaying the hierarchical clustering of enrichment scores obtained through

1121 Gene Set Variation Analysis (GSVA). High scores indicate strong positive correlation

1122 between astrocyte gene signatures and 160 TCGA LGG samples with tumor purity > 70%.

1123 Upper bars indicate tumor histology and known co-ocurring genomic alterations: IDH

1124 mutation combined with 1p/19q codeletion subtype (IDH.codel.subtype), mutations in

1125 TERT promoter (TERT.promoter.status), TP53 mutations, and ATRX mutations. (B)

1126 Exploratory analysis of the correlation between astrocyte gene signatures (y-axis) and LGG

1127 grades, histological subtypes, somatic mutations, and copy-number variations (x-axis). The

1128 heatmap depicts the -log10-transformed p-value of either an ANOVA comparison between

1129 LGG subtypes or a Wilcoxon Rank-Sum test between samples carrying the selected

1130 mutations, amplifications, and deletions. (C) Heatmap depicting the proportion of LGG

1131 samples with specific IDH status variants that had positive enrichment scores for every

1132 astrocyte gene signature. For viewing clarity, only 21 representative astrocyte gene

1133 signatures are shown in each panels B and C. Displays depicting complete astrocyte gene

1134 signatures are available in Extended Data Figure 4-1. See also Extended Data Figures 3-1,

1135 3-2, 3-6, 4-1, 4-2, and 4-3.

50

1136 Figure 5. Astrocyte subpopulation gene signatures are associated with survival time

1137 differences in TCGA LGG astrocytoma patients.

1138 Figure shows Kaplan-Meier plots and log-rank test p-values of sample groups stratified

1139 according to High and Low enrichment scores for (A) brain stem subpopulation A, (B)

1140 cortex subpopulation A and age, (c) subpopulation D, (D) cortex subpopulation D and age,

1141 and (E) olfactory bulb subpopulation D and age.

1142 Figure 6. Protein expression of candidate genes in the brain tissues from the 5xFAD

1143 Alzheimer’s disease mouse model.

1144 Brain tissues collected from 5xFAD mouse models were prepared for immunofluorescence

1145 using anti-GFAP (green) and either (A) anti-Adcy7, (B) anti-Serping1, or (C) anti-Emp1.

1146 The regions outlined with a square are displayed at higher magnification on the right, and

1147 the arrows point to the same cells for comparison in the fluorescent images. For the

1148 astrocytes in the inner layer of cortex (the area near the corpus callosum), the proportion of

1149 GFAP+ cells displaying red fluorescence was calculated, and the intensity of red

1150 fluorescence overlapping with GFAP signals was measured. Two-tailed unpaired Student’s

1151 t-tests were used to compare AD samples to the wild-type group. (D) Co-immunostaining

1152 of cortical brain tissues was performed with anti-GFAP, anti-Serping1, and anti-Emp1.

1153 *p < 0.05, **p < 0.01, ***p < 0.001. Scale bar in panels A-D, 50 μm. See also Extended

1154 Data Figures 6-1, 6-2, and 6-3.

1155

1156 Extended Data

1157 Figure 2-1. Gene expression as FPKM and normalized read count matrices across

1158 astrocyte subpopulations.

51

1159 Figure 2-2. Comparison of log-transformed fold-changes and q-values between DE

1160 lncRNA and protein-coding genes from different astrocyte subpopulation and region

1161 gene signatures. Blue circles represent DE lncRNAs and red circles represent DE protein-

1162 coding genes. Using an unpaired two-tailed t-test we observed that the fold-changes of DE

1163 lncRNAs (blue) are statistically higher (p-value < 0.05) than those of protein-coding genes

1164 (red).

1165 Figure 2-3. Gene set enrichment using astrocyte subpopulation regional gene

1166 signatures. Bar plots indicate the number of genes found in the enriched gene set. The

1167 yellow line illustrates the gene set enrichment using -log10-transformed FDR.

1168 Figure 2-4. Enrichment scores obtained from the gene set enrichment analysis using regional

1169 astrocyte subpopulation gene signatures. Scores represent FDR transformed into -log10 scale.

1170 The file also contains additional gene sets obtained from published studies related to

1171 neurodegenerative diseases. Gene sets were added to the MsigDB collection and used for

1172 gene set enrichment analysis.

1173 Figure 3-1. Collection of 42 astrocyte gene signatures used in downstream analysis.

1174 Figure 3-2. Heatmap displaying the Jaccard index of genes shared between astrocyte

1175 gene signatures.

1176 Figure 3-3. Heatmap displaying the hierarchical clustering of enrichment scores

1177 obtained through Gene Set Variation Analysis (GSVA) and GBM samples.

1178 High scores indicate strong positive correlation between astrocyte gene signatures and gene

1179 expression profiles of 103 TCGA GBM samples with a tumor purity > 70%. Upper bars

1180 indicate tumor subtype and the presence or absence of an EGFR gene amplification.

1181 Figure 3-4. Correlation between upregulated astrocyte gene signatures and TCGA

1182 GBM samples. (A) Analysis of the correlation between astrocyte gene signatures and

52

1183 GBM samples with different subtypes, somatic mutations, and copy-number variations. The

1184 heatmap depicts the -log10-transformed p-value of either an ANOVA comparison between

1185 GBM subtypes or a Wilcoxon Rank-Sum test between samples carrying the selected

1186 mutations, amplifications, and deletions. (B) Heatmap representing the enrichment scores

1187 between astrocyte gene signatures and Classical GBM samples that carry an EGFR gene

1188 amplification. Bar plots indicate the number of samples with positive enrichment score for

1189 each astrocyte gene signature.

1190 Figure 3-5. Correlation of region-specific astrocyte subpopulation enrichment scores

1191 with 103 TCGA Glioblastoma samples with tumor purity > 70%. A Wilcoxon Rank-

1192 Sum test was used to compare the distribution of enrichment scores between samples with

1193 and without amplification of EGFR. AMP = amplification of EGFR; No AMP = no

1194 amplification of EGFR found. ***p-value < 0.001, **p-value < 0.01, * p-value < 0.05.

1195 Figure 3-6. GSVA enrichment scores obtained between the 42 astrocyte subpopulation

1196 gene signatures and TCGA GBM or LGG samples. Correlation matrices of -log10

1197 transformed p-values are also included.

1198 Figure 4-1. Correlation between upregulated astrocyte gene signatures and TCGA

1199 LGG samples. (A) Analysis of the correlation between astrocyte gene signatures (y-axis)

1200 and LGG grades, histological subtypes, somatic mutations, and copy-number variations (x-

1201 axis). The heatmap depicts the -log10-transformed p-value of either an ANOVA

1202 comparison between LGG subtypes or a Wilcoxon Rank-Sum test between samples

1203 carrying the selected mutations, amplifications, and deletions. (B) Heatmap depicting the

1204 proportion of LGG samples with specific IDH status variants that had positive enrichment

1205 scores for every astrocyte gene signature.

53

1206 Figure 4-2. Heatmaps depicting GSVA enrichment score matrices between astrocyte

1207 gene signatures and TCGA LGG samples. Samples with (A) wild-type IDH1, (B) mutant

1208 IDH1 with 1p/19q codeletion, and (C) mutant IDH1 without 1p/19q codeletion are

1209 compared.

1210 Figure 4-3. Heatmaps depicting GSVA enrichment score matrices between astrocyte

1211 gene signatures and TCGA LGG samples with different features. Samples with (A)

1212 astrocytoma histology and wild-type IDH1or mutant IDH1 without 1p/19q codeletion, and

1213 (B) oligodendroglioma histology and mutant IDH1 with 1p/19q codeletion, mutant TERT

1214 promoter and mutations in CIC gene are compared.

1215 Figure 6-1. Violin plots representing normalized count distributions. Normalized count

1216 distributions of (A) the 97 upregulated genes from the cortex subpopulation B gene

1217 signature found in the Alzheimer’s disease gene set, (B) the 87 upregulated genes from the

1218 cortex subpopulation B gene signature found in the Huntington’s disease gene set, and (C)

1219 the 11 upregulated genes from the cortex subpopulation C gene signature found in the

1220 Parkinson’s disease gene set are shown in violin plots.

1221 Figure 6-2. Protein expression of candidate genes in the brain tissues from the NLGF

1222 Alzheimer’s disease mouse model.

1223 (A) AD pathology was confirmed in the cortex as an increase in GFAP+ astrocytes as well

1224 as detectable hypertrophy in 5xFAD and NLGF AD mouse models. Brain tissues collected

1225 from NLGF mouse models were prepared for immunofluorescence using anti-GFAP

1226 (green) and either (B) anti-Adcy7, (C) anti-Serping1, or (D) anti-Emp1. The regions

1227 outlined with a square are displayed at higher magnification on the right, and the arrows

1228 point to the same cells for comparison in the fluorescent images. For the astrocytes in the

1229 inner layer of cortex (the area near the corpus callosum), the proportion of GFAP+ cells

54

1230 displaying red fluorescence was calculated, and the intensity of red fluorescence

1231 overlapping with GFAP signals was measured. Two-tailed unpaired Student’s t tests were

1232 used to compare AD samples to the wild-type group. (E) Co-immunostaining of cortical

1233 brain tissues was performed with anti-GFAP, anti-Serping1, and anti-Emp1.

1234 *p < 0.05, **p < 0.01, ***p < 0.001. Scale bar in panels A-D, 50 μm.

1235 Figure 6-3. Significantly enriched gene sets related to neurodegenerative diseases (AD, HD,

1236 and PD). Gene set enrichment was obtained with differentially expressed genes specific for

1237 each regional astrocyte subpopulation.

1238

55