<<

bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Shared co-expression networks in autism from

2 induced pluripotent stem cell (iPSC) neurons

3

4 Dwaipayan Adhya1,2,*, Vivek Swarup3,*, Paulina Nowosiad2, Carole Shum2, Kamila Jozwik4,

5 Grainne McAlonan5, Maria Andreina Mendez5, Jamie Horder5, Declan Murphy5, Daniel H.

6 Geschwind3,7, Jack Price2,6, Jason Carroll4, Deepak P. Srivastava2,6+, & Simon Baron-Cohen1+

7

8 1Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge,

9 CB2 8AH UK.

10 2Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience

11 Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London,

12 London, UK, SE5 9NU, UK.

13 3Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine,

14 University of California, Los Angeles, Los Angeles, CA 90095, USA.

15 4Cancer Research UK Cambridge Institute, Cambridge CB2 0RE, UK.

16 5Department of Forensic and Neurodevelopmental Sciences, Sackler Institute for Translational

17 Neurodevelopment, Institute of Psychiatry, Psychology and Neuroscience, King's College

18 London, London SE5 8AF, UK.

19 6MRC Centre for Neurodevelopmental Disorders, King's College London, London, UK.

1 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

20 7Department of Genetics, University of California, Los Angeles, Los Angeles, CA

21 90095, USA.

22 + Joint senior authors

23 * Joint first authors

24 Short title: Transcriptome analysis of iPSC-derived neurons

2 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

25 Abstract

26 Autism Spectrum Conditions (henceforth, autism) are a diverse set of neurodevelopmental

27 phenotypes with a complex genetic basis. Idiopathic autism, characterized by a diagnosis of

28 autism not caused by a known genetic variant, is associated with hundreds of rare and common

29 genetic variants each of small effect. Functional genomics analyses of post mortem tissue

30 have identified convergent atypical gene correlation networks in idiopathic autism. However,

31 post mortem tissue is difficult to obtain and is susceptible to unknown confounding factors

32 related to the cause of death and to storage conditions. To circumvent these limitations, we

33 created induced pluripotent stem cells (iPSCs) from hair follicles of idiopathic autistic

34 individuals and made iPSC-derived neurons, to investigate its usefulness as a substitute for post

35 mortem brain tissue. Plucking hair follicles is a relatively painless and ethical procedure, and

36 hair samples can be obtained from anyone. Functional genomics analyses were used as a

37 replicable analysis pipeline to assess efficacy of iPSC-derived neurons.

38 networks, previously identified in adult autism , were atypical in the iPSC autism neural

39 cultures in this study. These included those associated with neuronal maturation, synaptic

40 maturation, immune response and inflammation, and gene regulatory mechanisms. In addition,

41 GABRA4, HTR7, ROBO1 and SLITRK5 were atypically expressed among previously

42 associated with autism. A drawback of this study was its small sample size, reflecting practical

43 challenges in generating iPSCs from patient cohorts. We conclude that, using rigorous

44 functional genomics analyses, atypical molecular processes seen in the adult autistic post-

45 mortem brain can be modelled in hair follicle iPSC-derived neurons. There is thus potential for

46 scaling up of autism transcriptome studies using an iPSC-based model system.

47

48 Introduction

3 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

49 Autism spectrum conditions (henceforth autism) are neurodevelopmental in nature, with a

50 heterogeneous genetic background (Berg; Geschwind, 2012; Bourgeron, 2015; O'Roak; et al.,

51 2012). Autism is diagnosed on the basis of impaired social-communication, alongside

52 unusually narrow and repetitive interests and activities (APA, 2013). Idiopathic autism is

53 characterized by a primary diagnosis of autism not associated with a known genetic variant.

54 Exome sequencing studies and analysis of copy number variation have revealed hundreds of

55 rare and common genetic variants are associated with idiopathic autism (O'Roak; et al., 2011;

56 O'Roak; et al., 2012). Functional convergence among these genetic variants has been revealed

57 through RNA sequencing of autistic post mortem brains (Parikshak; et al., 2013; Voineagu; et

58 al., 2011). However, autistic post mortem brain tissue is a scarce resource and may be

59 susceptible to developing anoxic-ischemic changes based on cause of death and post mortem

60 interval (Lewis, 2002). Conditions and period of storage, as well as cause of death, can also

61 have a bearing on RNA quality (Kretzschmar, 2009). More confounding factors may be

62 introduced in brains of donors with a history of illnesses, seizures and substance abuse

63 (Woolfenden; et al., 2012). There are also considerable ethical issues associated with human

64 organ donation for research (Kretzschmar, 2009).

65 To tackle these challenges, there has been a shift towards development of an induced

66 pluripotent stem cell (iPSC) model of autism, through the differentiation into brain tissue of

67 iPSCs reprogrammed from somatic cells of autistic individuals (Marchetto; et al., 2010; Pasca;

68 et al., 2011). Using this strategy it is possible to create well-defined brain tissue in vitro to

69 undertake controlled experiments, something that has not been afforded by post mortem brain

70 tissue. It is hypothesized that the transcriptome of brain tissue derived in this fashion mimics

71 fundamental characteristics of post mortem brain tissue.

4 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

72 There are different methods of generating iPSC-derived brain tissue: 1) the more recent three-

73 dimensional cerebral organoid method capable of generating heterogeneous tissue

74 with neuronal cells of multiple lineages (Lancaster; et al., 2013); or 2) the classical two-

75 dimensional culture method for generating cortical neurons (Shi; Kirwan; Livesey, 2012). As

76 autism is known to affect cortical grey matter, we used the two-dimensional cortical neuron

77 differentiation method in the current proof-of-principle study. Our study aims to demonstrate

78 the efficacy of iPSC-derived neurons or brain tissue as a proxy for post mortem brain tissue,

79 and propose a rigorous pipeline for reliable scaling up of autism transcriptome studies, without

80 the ethical issues associated with human organ donation.

81 To compare iPSC-derived neurons (iDN) with post mortem brain tissue, we used comparable

82 gene expression network analysis methods previously used to discover atypical gene

83 expression networks in autism (Parikshak; et al., 2013; Parikshak; et al., 2016; Voineagu; et

84 al., 2011). We first undertook RNA-sequencing and differential gene expression in autism and

85 control iDN, followed by weighted gene co-expression network analysis (WGCNA) – an

86 unsupervised analysis method that clusters genes based on their expression profiles

87 (Langfelder; Horvath, 2008), followed by enrichment and binding analyses.

88 Additionally, we undertook exome sequencing to identify odds of de novo variants in

89 participants being associated with gene expression networks. As we are proposing scaling up

90 of transcriptome studies in autism, we chose an iPSC source cell collection method that is

91 relatively non-invasive and painless for donors, i.e., keratinocytes from plucked hair follicles.

92 Hair follicles is also easier to extract compared to skin fibroblasts or blood mononuclear cells.

93 Using this method it has been possible to collect more than 300 hair samples from autistic

94 individuals as part of the European Autism Interventions - A Multicentre Study for Developing

95 New Medications (EU-AIMS) consortium. However, methodological challenges associated

5 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

96 with reprogramming iPSCs have meant efficiency is low, while costs high (Streckfuss-

97 Bomeke; et al., 2013).

98 Nevertheless, using WGCNA and associated downstream analyses, we achieved the following:

99 1) We validated autism gene networks previously identified in post mortem brain tissue by

100 comparing these with gene networks in iDN; 2) We demonstrated the reliable nature of RNA-

101 sequencing and unsupervised clustering analysis methods used in the detection of autism

102 molecular phenotypes. We conclude that neurons and brain tissue from hair follicles of autistic

103 individuals have similar molecular phenotypes as are seen in autistic post mortem brains. We

104 propose that iDN minimizes dependence on post mortem brain tissue and can be scaled up in

105 transcriptome studies in autism.

106

107 Results

108 Differential gene expression analysis reveals descriptive gene expression differences between

109 autism and control day 35 iDN

110 We undertook RNA-sequencing followed by bioinformatics analysis of early cortical lineage

111 neurons differentiated using established protocol (Shi; Kirwan; Livesey, 2012) (Fig 2(A)). We

112 focused on day 35 from start of neural induction as at this stage there is a mix of early born

113 deep layer neurons and late born upper layer neurons (Shi; Kirwan; Livesey, 2012), making it

114 highly suitable as a snapshot for studying cortical neuron development. We first investigated

115 descriptive gene expression differences by calculating FPKM for each gene (Fragments Per

116 Kilobase of transcript per Million). Multiple linear regression used to study differential gene

117 expression revealed significant differences in gene expression in thousands of genes between

6 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

118 autism and healthy iDN. Fig 2(B) shows the top 50 differentially expressed genes in autism

119 and control participants.

120 Genome-wide coexpression networks reveal biological processes essential to both neural

121 development and immune activation in autism

122 We applied signed weighted coexpression network analysis (WGCNA) and identified 11

123 coexpression modules significantly correlated to autism (labelled by colour, e.g., salmon, Fig

124 2C), and ranked according to their module eigengene values (ME, the first principal component

125 of the module). These modules represent genes that share highly similar expression patterns

126 (Fig 2D, Fig S1). We then evaluated the shared consensus function of each module by

127 enrichment for (GO). Of the 11 modules, 5 modules were positively correlated

128 in autism iPSC neurons, while 6 modules were negatively correlated to autism iPSC neurons.

129 The top 3 positively correlated modules – ‘steelblue’ (Cellular Metabolic Processes),

130 ‘lightgreen’ (Neural Development) and ‘white’ (Immune Activation), and the top 3 negatively

131 correlated modules – ‘grey60’ (Epigenetic Regulation), ‘salmon’ (Gene Regulation) and

132 ‘sienna3’ ( Organisation), were also functionally most significant with resepct to

133 autism (Fig 2D). The consensus functions of each module are non-exclusive, and each module

134 will henceforth be referred to by their corresponding R package designated module colour

135 names.

136 MEs for ‘steelblue’, ‘lightgreen’ and ‘white’ increase in autism iPSC neurons (Fig 2D). The

137 ‘steelblue’ module is enriched for metabolic functions associated with post-mitotic cells (Fig

138 2G) suggesting terminal differentiation. The ‘lightgreen’ module is enriched for GO terms

139 including regulation of cell-cell adhesion, cognition, calcium mediated signalling and

140 regulation of dendrite maturation associated with late phase neuron development. The most

141 interconnected genes of this module (based on correlation to ME, also known as ‘hub genes’)

7 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

142 include GABRA4 – a subunit of the inhibitory GABA-A (Roberts; et al., 2005) and

143 CX3CL1 – negative modulator of excitatory glutamatergic neurotransmission (Ragozzino; et

144 al., 2006; Sheridan; et al., 2014) (Fig 2H). The ‘white’ module is enriched for cytokine binding,

145 regulation of DNA damage response, positive regulation of apoptosis and negative regulation

146 of neuronal death, and its hubs include ADAM9 – implicated in inflammatory events related

147 to neutrophil activation (Amendola; et al., 2015) and GPNMB – expressed in microglia and

148 regulator of immune/inflammatory responses in the brain (Huang; Ma; Yokoyama, 2012) (Fig

149 2I).

150 Compared to the positively correlated modules, the negatively correlated modules (Fig 2D)

151 reflect global gene regulatory functions associated with developing neurons, deficient in autism

152 iDN. The ‘salmon’ module is enriched for RNA methyltransferase activity, epigenetic

153 regulation of gene expression and s-adenosylmethionine-dependant methyltransferase activity

154 (Fig 2I). The ‘sienna3’ module is enriched for nucleic acid binding, regulation of RNA

155 metabolic process and regulation of gene expression (Fig 2J), while the ‘grey60’ module is

156 enriched for regulation of histone H3-K4 methylation, DNA binding and chromosome

157 organisation (Fig 2K). HTR7 (‘salmon’ module), ROBO1 (‘sienna3’ module) and SLITRK5

158 (‘salmon’ module) are known autism risk genes in the negatively correlated modules

159 significantly downregulated in this study, demonstrating a causal link between dysregulated

160 expression of risk genes and autism, explored in detail in the next section.

161 To take a deeper look at potential gene regulatory function deficiencies, we analysed the

162 interaction of transcription factors with DNA recognition motifs upstream of the start site of

163 differentially expressed genes. Our aim was to study enrichment for transcription factor binding

164 sites that characterize signals regulating transcription (Frith; et al., 2004). We obtained DNA

165 motifs for TF binding from TRANSFAC (Matys; et al., 2003), scanned 1000bp upstream

8 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

166 sequences of top 200 genes from each module and calculated motif enrichment using Clover

167 (Frith; et al., 2004) and MEME (Bailey; Elkan, 1994) algorithms. We identified several TFs

168 (Fig 3A-F) amongst which STAT3 is prominent in the positively correlated modules (Fig 3G).

169 STAT3 motifs were found upstream of 66/163 genes of the ‘steelblue’ module. It is involved

170 in neuroprotection against inflammatory insults, differentiation and proliferation (Leibinger; et

171 al., 2013; Park; Nozell; Benveniste, 2012). STAT3 is also known to interact with , which

172 is enriched here for binding targets in the ‘white’ module, and anti-correlated modules

173 ‘sienna3’ and ‘grey60’ (Fig 3F). KLF4 interacts with the JAK-STAT pathway to reduce cell

174 proliferation, and induce apoptosis (Qin; Zhang, 2012). Furthermore, ARID3A – a part of the

175 ARID family of transcriptional regulators is functionally similar to KLF4 (Lin; et al., 2014),

176 and both ARID3A and KLF4 have downstream targets in all three negatively correlated

177 modules ‘salmon’, ‘sienna3’ and ‘grey60’, and the positively correlated ‘white’ module (Fig

178 3H).

179 Thus, gene co-expression modules revealed functional variations between autism and control

180 iDN in this study. This approach also helped us correlate mechanistic variation in iDN with

181 those observed in post mortem brain studies and iPSC-derived minibrain studies, as discussed

182 below.

183 Gene modules associated with autism brains are highly conserved in iPSC-derived early

184 cortical neurons

185 Our next aim was to search for autism-associated genes sharing common autism-associated

186 physiological functions. We used a set of 155 autism associated candidate genes from a

187 previous study (Parikshak; et al., 2013) using the Simons Foundation Autism Research

188 Initiative (SFARI) database. The SFARI list of autism associated genes is a database of genes

189 associated with autism, collated according to the type of genetic variations from whole genome

9 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

190 sequencing studies, rare genetic mutations and mutations causing syndromic forms of autism.

191 It was first published in 2009 (Basu; Kollu; Banerjee-Basu, 2009) and an up-to-date reference

192 for all known risk genes can be found at: https://gene.sfari.org/autdb/HG_Home.do. We

193 mapped the genes in our gene networks with the SFARI autism risk genes, and found that the

194 SFARI autism risk genes are enriched in the negatively correlated ‘salmon’ module (p=0.002;

195 odds ratio [OR] = 1.5) (Fig 4A).

196 First, we mapped 5 developmental gene modules dysregulated in the autism post mortem brain

197 (APMB), from Parikshak et al., (2013) (dev_asdM2, dev_asdM3, dev_asdM13, dev_asdM16,

198 dev_asdM17) (Parikshak; et al., 2013), shown in Fig 4A. Of these 5 sets, dev_asdM2 and

199 dev_asdM3 represent DNA-binding and transcriptional regulation and are downregulated in

200 autism, while dev_asdM14, dev_asdM16 and dev_asdM17 include GO terms that represent

201 nervous system development and synaptic maturation, and are upregulated in autism. The

202 dev_asdM2 set is enriched in the top downregulated genes (‘Top –ve DE’, p = 2×10-4; OR =

203 1.8), the ‘grey60’ module (p = 0.004; OR = 1.6) and the ‘sienna3’ module (p = 10-5; OR = 2).

204 The dev_asdM3 set is enriched in the top downregulated genes (‘Top –ve DE’, p = 0.008; OR

205 = 1.5), the ‘grey60’ module (p = 3×10-14; OR = 2.5) and the ‘sienna3’ module (p = 4×10-4; OR

206 = 1.7). The dev_asdM13 set is enriched in the top upregulated genes (‘Top +ve DE’, p = 10-6;

207 OR = 2.1), the ‘lightgreen’ module (p = 3×10-9; OR = 3.1) and the ‘white’ module (p = 10-6;

208 OR = 2.3). The dev_asdM16 set is enriched in the ‘lightgreen’ module (p = 10-4; OR = 2.6),

209 and, the dev_asdM17 set is enriched in the top upregulated genes (‘Top +ve DE’, p = 0.002;

210 OR = 1.7) and the ‘lightgreen’ module (p = 0.002; OR = 1.9).

211 We then mapped two gene modules known to be highly correlated with autism in the temporal

212 and frontal cortex, APMB_asdM12 (a neuronal function module) and APMB_asdM16 (an

213 immune module) from Voineagu et al., (2011) (Voineagu; et al., 2011) (Fig 4a). Although the

10 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

214 APMB_asdM12 module has been reported to be downregulated in autism post mortem brain,

215 it shows nominal enrichment in the upregulated ‘white’ module (p = 0.04; OR = 1.8). The

216 APMB_asdM16, on the other hand, is an upregulated module in autism post mortem brains,

217 and is also enriched in the top upregulated genes (‘Top +ve DE’, p = 5×10-6; OR = 2.6) and the

218 ‘white’ module (p = 6×10-5; OR = 2.7) (Fig 4a). Gene sets associated with attenuated cortical

219 patterning or ACP (ACP_asdM5, ACP_asdM13, ACP_asdM14) (Parikshak; et al., 2016) were

220 also mapped (Fig 4a). This suggested greater prediction of ACP in autism iDN. In Fig 4A we

221 observed significant enrichment of upregulated autism gene sets in positively correlated

222 modules while enrichment of downregulated autism gene sets in negatively correlated modules.

223 This clearly suggests recapitulation of autism post mortem brain phenotypes in iPSC-derived

224 neural cells.

225 Autism iDN gene modules enriched for genes expressed in cells from mature cerebral cortex

226 We then plotted enrichment of neuronal cell types in adult human brains (Zhang; et al., 2016)

227 in autism iDN to explore if pathway enrichments discussed above corresponded to cell type

228 differences between autism and control iDN in this study. We were expecting a rise in non-

229 neuronal cell types to reflect enriched immune functions in autism iDN. Not surprisingly,

230 astrocyte-associated markers were detected in the ‘white’ module (p < 0.0005; OR = 2.1), while

231 microglial lineage markers also in the ‘white’ module (p < 0.0005; OR = 2.4). Additionally,

232 mature neuron markers were detected in the ‘lightgreen’ module (p < 0.0005; OR = 1.9), and

233 endothelial markers in the ‘steelblue’ module (p < 0.05; OR = 2.1) and ‘white’ module (p <

234 0.0005; OR = 2.4) (Fig 4B). Enrichment of mature neuron and glial markers exclusively in the

235 positively correlated modules is indicative of terminally differentiated cell lineages with higher

236 prevalence of immune cells in autism iDN compared to controls. Furthermore, the appearance

237 of these cell types is atypical at day 35 of cortical neuronal differentiation.

11 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

238

239 Gene networks in autism iPSC-derived early cortical neurons and developing autism iPSC

240 minibrains show moderate preservation

241 We then mapped gene networks identified in a previous autism iPSC study (Mariani; et al.,

242 2015) with gene networks in this study. We calculated module enrichment of gene modules in

243 the previous autism iPSC study with equivalent colour-assigned modules in this study (Fig 4C).

244 A module with a Zsummary > 10 corresponds to strong module preservation, and 2 < Zsummary <

245 10 is equivalent to moderate module preservation. The ‘white’, ‘sienna3’ and ‘grey60’ modules

246 are moderately well preserved in this study (2 < Zsummary < 10; p < 0.05), while the ‘lightgreen’

247 and ‘salmon’ module moderately preserved (2 < Zsummary < 10; p < 0.05) (Fig 4C). Module

248 preservation was only moderate as minibrains (or ‘cerebral organoids’) were used previously

249 instead of monolayer neuron cultures in this study.

250 Nominal genetic architecture alterations and de novo variants enrichment in exome

251 Molecular genetics studies have shown that autism-associated gene mutations are distributed

252 across the whole population with no single major responsible genetic locus, and there is

253 accumulation of thousands of low risk alleles in a subset of autistic individuals (Bourgeron,

254 2015; O'Roak; et al., 2012). Transcriptome studies have revealed common dysregulated

255 biological processes in autism in independent populations of autistic individuals, using three

256 different methods – post mortem brain studies (Parikshak; et al., 2013; Voineagu; et al., 2011),

257 minibrain studies (Mariani; et al., 2015), and in this study using monolayer neuronal cultures

258 (Fig 4a). To find out whether genetic mutations corresponded to dysregulation of gene

259 expression in our samples, we visualised genomic variants from top 50 differentially expressed

260 genes (p <0.005) found in the individual participants (Fig 5A). Only some of the variants seem

261 to correspond to dysregulated expression, e.g., mutations in WDR49 in CTRM2 and DDIT4L

12 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

262 in CTRM3 corresponds to reduced gene expression in controls, while mutations in ARMC3

263 and MAP3K19 in ASDM1 and 004ASM corresponds to reduced gene dosage in autism, and a

264 mutation in TRIM14 appears to correspond to reduced gene expression in 004ASM. However

265 these associations are only causal and there maybe additional layers of transcriptional control

266 affecting gene dosage.

267 Studies have shown rare and unique de novo variants to be major contributors to the genetic

268 signatures of autism (Betancur, 2011; Bourgeron, 2015; O'Roak; et al., 2011; O'Roak; et al.,

269 2012; Yuen; et al., 2015) although it is unclear whether they are associated with converging

270 biological processes. To investigate the correlation between biological processes and genetic

271 events in individual participants, we checked enrichment of gene expression modules in rare

272 and unique exome variants from each participant. We did not find any strong correlation of

273 exome variants with gene expression modules. Rare variants in 010ASM are enriched in top

274 downregulated genes (Top –ve DE) (p = 0.04; OR = 1.7), while those in CTRM1 are enriched

275 in the ‘salmon’ module (p = 0.03; OR = 2) (Fig 5B). Unique variants in 010ASM are enriched

276 in the ‘grey60’ module (p = 0.03; OR = 1.4), while those in CTRM3 are enriched in the top

277 downregulated genes (Top –ve DE) (p = 0.004; OR = 2.1) and the ‘sienna3’ module (p = 0.02;

278 OR = 1.9) (Fig 5C). This could further suggest additional layers of transcriptional regulation

279 controlling gene expression. We also observed only minor variations in the genetic architecture

280 in the exomes between autism and control participants (Fig S2). However, these variations

281 were not significant and given the small sample size of this study no distinction could be made

282 between the autism and control participants based on their exomes alone.

283 Discussion

284 This study aimed to explore the potential of iPSC derived neurons and brain tissue as a proxy

285 for post mortem brain in undertaking autism transcriptome studies. As autism is known to affect

13 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

286 the cerebral cortex, we generated two-dimensional cortical neuron cultures to test efficacy of

287 this method in autism transcriptome analyses, in order to propose: 1) minimizing dependence

288 on post mortem brain tissue and, 2) scale up transcriptome studies in autism. Through this study

289 we have succeeded in delivering evidence of autism associated convergent biological processes

290 and pathophysiological mechanisms in iDN from our cohort. This is also the first study to

291 compare gene expression in autism iPSC neurons with post mortem brain tissue. Using similar

292 network correlation analyses as used in post mortem brain studies and an iPSC-derived

293 organoid study it was possible to identify validated autism-associated mechanisms in iDN

294 generated in this study. The enrichment of post mortem brain cortical networks – dev_asdM13,

295 dev_asdM16 and dev_asdM17 in the iDN neural development (‘lightgreen’) module is

296 evidence of autism iPSC neurons recapitulating neuronal maturation pathways associated with

297 autism, while enrichment of post mortem brain immune and inflammatory network

298 (APMB_asdM16) in the immune activation (‘white’) module is evidence of the autism-

299 associated non-neuronal pathways being recapitulated. The dysregulated gene networks also

300 point to a post-mitotic, non-proliferative state. STAT3 was identified as a significant TF and is

301 suggestive of iDN-specific inflammatory responses in autism. STAT3 is also known to interact

302 directly with KLF4 to inhibit cellular regeneration (Qin; Zhang, 2012; Qin; Zou; Zhang, 2013),

303 which has also been identified as a potential transcription factor associated with the

304 dysregulated pathways in this study.

305 We also found dysregulation in other autism-associated molecular phenotypes recently

306 discovered in autism post mortem brain. For example, we found some evidence of attenuation

307 of cortical patterning (ACP) of differentially expressed genes in autism iDN (Parikshak; et al.,

308 2016; Voineagu; et al., 2011) (Fig 4A), with two of the autistic individuals clustering together

309 based on gene expression (Fig S3). Interestingly, the two individuals clustering together,

310 ASDM1 and 010ASM, also share similar clinical diagnosis (Table S1). There is also a

14 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

311 suggestion of GABA-glutamate signalling imbalance – a prominent phenotype in autism

312 (Coghlan; et al., 2012; Mariani; et al., 2015). GABRA4 and CX3CL1 are hub genes in the

313 positively correlated ‘lightgreen’ module (Fig 2H). GABRA4 increases GABA

314 neurotransmission while CX3CL1 downregulates glutamate transmission respectively. There

315 might also be disruption of L-glutamate-gated ion channel function, as GRIA4 is enriched in a

316 negatively correlated ‘grey60’ module (Fig 2L). Excitatory-inhibitory imbalance is a well-

317 known cellular phenotype of autism, and identification of genes related to this phenomenon in

318 autism-correlated networks further demonstrates ability of iDN to recapitulate autism brain

319 phenotypes in vitro.

320 The primary limitation of this study is its low sample size, and that is related to the low

321 reprogramming efficiency afforded by current transcriptional factor-induced reprogramming

322 methods. Collecting hair follicle, however, is the most ethical method for generating source

323 cell for iPSC generation. Thus although lacking statistical power, this study was able to inform

324 us that iPSC derived brain tissue can act as a proxy for post mortem brain for undertaking

325 transcriptome studies, and by using the combination of methods described in here (Fig S4)

326 dependence on the scarce post mortem resource can be reduced while also enabling scaling up.

327 To summarize, the conclusions in Fig 6 are the result of stringent and exhaustive multiple

328 correlations taking an integrative and unbiased approach, to show how iDN from individuals

329 with idiopathic autism having heterogeneous genetic backgrounds can give rise to an

330 amalgamated pathophysiology that have been previously associated with the adult autism brain

331 through studies using post mortem brain tissue. Through these analyses we propose greater

332 application of the iPSC system in autism transcriptome studies, along with reduced dependence

333 on autism post mortem brain tissue. With technological improvements and collaboration in this

15 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

334 field, bigger cohorts can be studied in the future to study the origin of biological divergence in

335 autism.

336 Study participants and methods

337 Induced pluripotent stem cells

338 IPSCs (2 clones from each individual) were produced from 3 autistic individuals and 3 controls

339 (Fig 1a), using keratinocytes from plucked hair follicles (Aasen; Izpisua Belmonte, 2010) (see

340 extended experimental procedures). Idiopathic autistic individuals were chosen. For clinical

341 diagnosis details, see Table S1.

342 Neuronal differentiation

343 We differentiated iPSC lines into cortical neurons using a well-established method (Shi;

344 Kirwan; Livesey, 2012). IPSCs were differentiated till early neuron stage – day 35 (see

345 extended experimental procedures).

346 Immunocytochemistry

347 Neuronal differentiation in the iPSC lines were characterised using immunocytochemistry.

348 IPSC cells were differentiated till day 8, day 21 and day 35 and tagged with antibodies of

349 appropriate markers associated with each developmental stage (see extended experimental

350 procedures). Nuclei were stained using DAPI, and imaging was performed using a 40×

351 objective on a confocal microscope (Leica) (Fig 1B).

352

16 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

353 RNA-sequencing

354 Starting with 500ng of total RNA, poly(A) containing mRNA was purified and libraries were

355 prepared using TruSeq Stranded mRNA kit (Illumina). Unstranded libraries with a mean

356 fragment size of 150bp (range 100-300bp) were constructed, and underwent 50bp single ended

357 sequencing on an Illumina HiSeq 2500 machine. Reads were mapped to the

358 GRCh37.75 (UCSC version hg19) using STAR: RNA-seq aligner (Dobin; et al., 2013). Quality

359 control was performed using Picard tools (Broad Institute) and QoRTs (Hartley; Mullikin,

360 2015). Gene expression levels were quantified using an union model with HTSeq

361 (Anders; Pyl; Huber, 2015).

362 Differential gene expression (DGE)

363 DGE analysis was performed using R statistical packages R Core Team (2016). R: A language

364 and environment for statistical computing. R Foundation for Statistical Computing, Vienna,

365 Austria. URL: https://www.R-project.org/. with gene expression levels adjusted for gene

366 length, library size, and G+C content (henceforth referred to as “Normalized FPKM”). A linear

367 mixed effects model framework was used to assess differential expression in log2(Normalized

368 FPKM). Autism diagnosis was treated as a fixed effect, while also using technical covariates

369 accounting for RNA quality, library preparation, and batch effects as fixed effects in this model.

370 Weighted gene coexpression network analysis

371 The R package weighted gene coexpression network analysis (WGCNA) (Langfelder;

372 Horvath, 2008) was used to construct coexpression networks as previously shown (Parikshak;

373 et al., 2013). Biweight midcorrelation was used to assess correlations between log2(Normalized

374 FPKM). For module-trait analysis, 1st principal component of each module (eigengene) was

17 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

375 related to an autism diagnosis in a linear mixed effects framework as above, replacing the

376 expression values of each gene with the eigengene.

377

18 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

378 Gene sets

379 A SFARI autism risk gene set was compiled using the online SFARI gene database, AutDB,

380 using “Gene Score” as shown previously (Parikshak; et al., 2013). We obtained dev_asdM2,

381 dev_asdM3, dev_asdM13, dev_asdM16 and dev_asdM17 modules from an independent

382 transcriptome analysis study using RNA-sequencing data from post mortem early developing

383 brains (Parikshak; et al., 2013). Modules APMB_asdM12 and APMB_asdM16 were obtained

384 from an autism post mortem gene expression study (Voineagu; et al., 2011). We obtained

385 another three autism-associated modules: ACP_asdM5, ACP_asdM13 and ACP_asdM14 from

386 an independent gene expression study profiling dysregulated cortical patterning genes in autism

387 post mortem brain (Parikshak; et al., 2016). All three studies applied WGCNA to identify

388 modules of dysregulated genes in autism.

389 Gene set overrepresentation analysis

390 Enrichment analyses were performed either with logistic regression (all enrichments analyses

391 in Fig 5a, 6b, 6c, S3b) or Fisher’s exact test (cell type enrichment, Fig 5b). All GO enrichment

392 analysis to characterize gene modules was performed using GO Elite (Zambon; et al., 2012)

393 with 10,000 permutations. Molecular function and biological process terms were used for

394 display purposes.

395 Transcription Factor Binding Site Enrichment

396 The top 200 genes in each module (ranked kME) were used for transcription factor binding site

397 (TFBS) enrichment analysis.

398 Exome sequencing

19 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

399 200ng of genomic DNA was purified, library prepared and exome enriched using Illumina

400 Nextera Rapid Capture Exome (Illumina). Exome DNA was sequenced on an Illumina HiSeq

401 2500 machine. Reads were aligned to human genome GRCh37.75 (UCSC version hg19), using

402 the Burrows Wheeler Aligner (BWA) (Li; Durbin, 2009). The genome analysis toolkit (GATK)

403 (Broad Institute) was used to identify single nucleotide polymorphisms (SNPs) and insertion-

404 deletion mutations (indels) by the local de-novo assembly of haplotypes. The genetic variants

405 were evaluated for ratio of transition mutations to transversion mutations (Ti/Tv),

406 heterozygous/homozygous (het:hom) ratio, and insertion/deletion ratio. SNPs and indels were

407 annotated using ANNOVAR (Yang; Wang, 2015) and variant effector predictor (VEP)

408 (McLaren; et al., 2010). Important annotations such as minor allele frequency from the 1000G

409 project, and exonic function were used.

410 For full methods, see extended experimental procedures.

411

412 Acknowledgments

413 We gratefully acknowledge the participants in this study. This study was supported by grants

414 from the European Autism Interventions (EU-AIMS), the Wellcome Trust ISSF Grant (No.

415 097819) and the King's Health Partners Research and Development Challenge Fund – a fund

416 administered on behalf of King's Health Partners by Guy's and St Thomas' Charity awarded to

417 D.P.S., and the Innovative Medicines Initiative Joint Undertaking under grant agreement no.

418 115300, resources of which are composed of financial contribution from the European Union's

419 Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution

420 (J.P. and S.B.C), the Mortimer D Sackler Foundation, the Autism Research Trust, the Chinese

421 University of Hong Kong, and a doctoral fellowship from the Jawaharlal Nehru Memorial Trust

422 awarded to D.A. The funding organizations had no role in the design and conduct of the study,

20 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

423 in the collection, management, analysis and interpretation of the data, or in the preparation,

424 review or approval of the manuscript. We are grateful to Debbie Spain and Suzanne Coghlan

425 for participant recruitment, to Annie Kathuria, Rosy Watkins, Hema Pramod, Rupert Faraway,

426 Pooja Raval, Kate Sellers, Michael Deans and Rodrigo Rafagnin for assistance during the

427 study, and to Aicha Massrali, Arkoprovo Paul, Bhismadev Chakrabarti, Michael Lombardo,

428 Rick Livesey and Mark Kotter for valuable discussions. We thank the Wohl Cellular Imaging

429 Centre (WCIC) at the IoPPN, Kings College, London for help with microscopy.

430

431

21 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

432 Figure legends

433 Fig 1. Characterisation of iPSC from individuals with and without autism. (a) Schematic of

434 iPSC generation process from keratinocytes, followed by cortical neuron differentiation.

435 Cortical neuron differentiation protocol was characterised by imaging at early neuroprogenitors

436 (day 8), late neuroprogenitors (day 21) and neurons (day 35). (b) Immunofluorescence staining

437 to show morphological changes during developmental progression of autism and control iDN.

438 Confirmation of Ki67+ and Nestin+ early progenitor (day 8) (scale bar: 10µm), Pax6+ late

439 progenitor (day 21) (scale bar: 10µm), TBR1+ and MAP2+ neurons (day 35) (scale bar: 10µm).

440

441 Fig 2. Transcriptome-wide gene co-expression network analysis in autism and control neurons.

442 (a) Schematic of RNA-seq and exome seq experiments and analyses. (b) Sample clustering

443 based on gene expression patterns. Top 50 differentially expressed genes shown here. (c)

444 Signed association of mRNA module eigengenes with autism. Modules with positive values

445 indicate increased expression in autism iPSC-neurons, while modules with negative values

446 indicate decreased expression in autism iPSC-neurons. Red dotted lines indicate Benjamini-

447 Hochberg corrected p<0.05. R package designated module colour names are used. (d)

448 Correlation network dendogram with consensus function of significant gene modules and their

449 relationship with autism (red denotes a positive relationship). (e) Module eigengene values of

450 positively correlated gene modules. (f) Module eigengene values of negatively correlated gene

451 modules. (g) Coexpression network plot and GO term enrichment of the ‘steelblue’ module.

452 Top 25 genes are indicated. (h) Coexpression network plot and GO term enrichment of the

453 ‘lightgreen’ module. Top 25 genes are indicated. (i) Coexpression network plot and GO term

454 enrichment of the ‘white’ module. Top 25 genes are indicated. (j) Coexpression network plot

455 and GO term enrichment of the ‘salmon’ module. Top 25 genes are indicated. (k) Coexpression

22 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

456 network plot and GO term enrichment of the ‘sienna3’ module. Top 25 genes are indicated. (l)

457 Coexpression network plot and GO term enrichment of the ‘grey60’ module. Top 25 genes are

458 indicated.

459

460 Fig 3. Transcription factor binding site analysis. (a) TFs predicted to bind the 1000bp upstream

461 region of genes in ‘steelblue’ module. (b) TFs predicted to bind the 1000bp upstream region of

462 genes in ‘lightgreen’ module. (c) TFs predicted to bind the 1000bp upstream region of genes

463 in ‘white’ module. (d) TFs predicted to bind the 1000bp upstream region of genes in ‘salmon’

464 module. (e) TFs predicted to bind the 1000bp upstream region of genes in ‘sienna3’ module.

465 (f) TFs predicted to bind the 1000bp upstream region of genes in ‘grey60’ module. (g) STAT3

466 binding site motif and GO term enrichment. STAT3, involved in neuroprotection against

467 inflammatory responses is a TF significantly upregulated in autism iPSC-neurons. (h) KLF4

468 and ARID3A are the most commonly found TFs in this study, having putative binding with

469 gene upstream regions in positively as well as negatively correlated modules.

470

471 Fig 4. Selected autism-associated gene modules and cell type enrichment analyses. (a)

472 Enrichment of selected autism post mortem gene modules and SFARI autism risk gene list in

473 iDN gene modules. Odds ratio after logistic regression have been shown. Only OR>1.5 has

474 been shown (p-value in parenthesis). (b) Cell type enrichment in iDN gene modules (*p<0.05,

475 ***p<0.0005). (c) Module preservation of autism minibrain gene modules in iDN gene

476 modules from this study.

477

23 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

478 Fig 5. Relationship between differential gene expression and exome variants. (a) Sample-wise

479 clustered top 50 differentially expressed genes, and associated coMut plot of genetic variants

480 (mutations in genes marked in purple seem to have causal effect on gene expression). (b)

481 Enrichment of participant rare variants in iDN gene modules. (c) Enrichment of participant

482 unique variants in iDN gene modules.

483

484 Fig 6. Study synopsis showing molecular functions common in autism post mortem brain

485 transcriptome studies and iDN transcriptome in this study.

486

487 Supplementary Info

488 1. Extended experimental procedures

489 2. Fig S1

490 3. Fig S2

491 4. Fig S3

492 5. Fig S4

493 6. Fig S5

494 7. Supplementary figure legends

495 8. Supplementary table S1

496

24 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

497 Keywords:

498 Autism, iPSC, neurons, minibrain, post mortem brain, transcriptome, functional genomics,

499 molecular pathways

500

25 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

501 Extended experimental procedures

502 Neuronal differentiation

503 Keratinocytes were collected from volunteers with and without an autism diagnosis (Ethics

504 approved, 13/LO/1218) as part of a larger European study (EU-AIMS). These were

505 reprogrammed into iPSCs using previously described methods (Aasen; Izpisua Belmonte,

506 2010; Takahashi; et al., 2007). IPS cells were cultured in E8 medium (Life Technologies) with

507 E8 supplement (Life Technologies). Induction of neurons of cortical lineage was established

508 using a modified dual SMADi protocol(Shi; Kirwan; Livesey, 2012). Once the cell culture

509 reached 95% confluence, neural induction was initiated by changing the culture medium to

510 support neural induction, neurogenesis and neuronal differentiation. A combination of N2- and

511 B27-containing media with additives was used, henceforth called ‘neuralising medium’. N2

512 medium consisted of DMEM/F12 (Sigma), N2 (Gibco). B27 medium consisted of Neurobasal

513 (Invitrogen), B27 (Gibco). Neuralising medium was supplemented with ‘dual SMADi’ 1 μM

514 Dorsomorphin (Tocris), 500 ng/ml human Noggin-CF chimera (R&D Systems) – inhibitors of

515 WNT pathway, BMPs and SMAD, and 10 μM SB431542 (Tocris) – inhibitor of TGFβ

516 signaling. Noggin and dorsomorphin supress embryonic development thereby inducing neural

517 differentiation pathways, while SB431542 mediates loss of pluripotency.

518 Immunocytochemistry

519 Cultures were fixed in 4% formaldehyde followed by ice-cold 100% methanol and processed

520 for immunofluorescence staining and confocal microscopy. Secondary antibodies used for

521 primary antibody detection were species-specific Alexa-dye conjugates (Invitrogen). We used

522 the following primary antibodies to Ki67 (Thermo Fisher PA5-16785), Nestin (Millipore

523 MAB5326), Pax6 (BioLegend 901301), TBR1 (Abcam ab31940), MAP2 (Abcam ab92434).

26 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

524 RNA isolation and sequencing

525 RNA was extracted using TRIzol (Thermo Fischer) and 1-bromo-3-chloropropane (BCP;

526 Sigma). To remove genomic DNA during processing, turbo DNase (Thermo Fischer) was used.

527 RNA concentration was quantified using Ribogreen assay (Invitrogen).

528 Starting with 500ng of total RNA, poly(A) containing mRNA was purified and libraries were

529 prepared using TruSeq Stranded mRNA kit (Illumina). Unstranded libraries were constructed

530 and underwent 50bp single ended sequencing on an Illumina HiSeq 2500 machine. To analyse

531 iPSC mRNA-seq data, the raw reads were mapped to the human genome GRCh37.75 (UCSC

532 version hg19) using STAR: RNA-seq aligner(Dobin; et al., 2013). Aligned reads were sorted

533 using samtools(Li; et al., 2009), while biases were removed using Picard tools (Broad Institute).

534 Quality control was performed using Picard tools (Broad Institute) and QoRTs (Hartley;

535 Mullikin, 2015). Gene expression levels were quantified using an union exon model with

536 HTSeq (Anders; Pyl; Huber, 2015), which uses uniquely aligned reads. Only the genes with

537 >10 reads and expressed in 80% of the samples, were kept. The resulting read counts were log2

538 transformed and GC content, gene length, and library size normalised using the cqn package

539 (Hansen; Irizarry; Wu, 2012) in R.

540 mRNA weighted co-expression network analysis

541 Co-expression network analysis was performed using the R library, WGCNA(Langfelder;

542 Horvath, 2008). We wanted to investigate autism-specific iPSC-neuronal culture co-expressed

543 genes (or modules). Biweighted mid-correlations were calculated for all pairs of genes, then a

544 signed similarity matrix was created. In the signed network, the similarity between genes

545 reflects the sign of the correlation of their expression profiles. The signed similarity matrix was

546 then raised to power β to emphasize strong correlations on an exponential scale. The resulting

547 matrix (known as adjacency matrix) was then transformed into a topological overlap matrix.

27 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

548 Since we are primarily interested in exploring co-expressed genes conserved across our cohort,

549 we created consensus networks correlated to autism as previously published (Parikshak; et al.,

550 2013; Parikshak; et al., 2016). After scaling for each individual network (consensus scaling

551 quantile = 0.2), a soft thresholding power of 14 was chosen (as it was the smallest threshold

552 that resulted in a scale-free R2 fit of 0.8) (Fig S5). The consensus network was created by using

553 a topological overlap matrix (TOM) to calculate the component-wise minimum values for

554 topological overlap. Using dissTOM = 1 – TOM as distance measure, genes were hierarchically

555 clustered. Modules were then assigned using a dynamic tree-cutting algorithm (cutreeHybrid,

556 using default parameters except deepSplit = 4, cutHeight = 0.999, minModulesize = 100,

557 dthresh=0.1 and pamStage = FALSE).

558 Resulting modules of co-expressed genes were used to calculate module eigengenes (MEs; or

559 1st principal component of the module). MEs were correlated to biological traits, in this case

560 autism, to find disease-specific modules. Module hubs were defined by calculating module

561 membership (kME) values which are the Pearson correlation between each gene and

562 corresponding ME, and genes with kME < 0.7 were removed from the module. Network

563 visualisation was done using iGraph package in R (Csardi; Nepusz, 2006).

564 Module preservation analysis

565 Module preservation analysis was performed to validate co-expression in a previous autism-

566 iPSC study (Mariani; et al., 2015). Module values from autism-iPSC network analysis were

567 used as reference, to calculate the Zsummary statistic for each module. This measure combines

568 module density and intramodular connectivity metrics to give a composite statistic where Z >

569 2 suggests moderate preservation and Z > 10 suggests high preservation (Langfelder; et al.,

570 2011).

571

28 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

572 Enrichment analysis for gene sets

573 Two types of gene set enrichments were performed. For autism-correlated module enrichment,

574 logistic regression was performed using already published gene modules (Parikshak; et al.,

575 2013; Parikshak; et al., 2016; Voineagu; et al., 2011) to control for gene length and gene

576 expression level. A two-sided Fisher exact test with 95% confidence interval was performed

577 for cell-type enrichment analysis using published human brain dataset (Zhang; et al., 2016).

578 Module genes were characterised using GO Elite (version 1.2.5) (Zambon; et al., 2012) using

579 total expressed genes as background. GO Elite uses a Z-score approximation of hypergeometric

580 distribution to assess term enrichment, and removes redundant GO or KEGG terms to give a

581 concise output. 10,000 permutations were used, and required at least 10 genes to be enriched

582 in a given pathway at a Z-score of at least 2. Only biological process and molecular function

583 categories are reported.

584 Transcription factor binding site enrichment

585 Transcription factor binding site (TFBS) enrichment analysis was performed by scanning the

586 canonical promoter region (1000bp upstream of the transcription start site) for the genes in

587 each co-expression module. For each transcription factor (TF), the top 200 connected genes in

588 each module were assessed, using the following steps: 1) putative motifs bound by the TF were

589 obtained from TRANSFAC (Matys; et al., 2003). 2) upstream sequences of these 200 genes

590 were scanned with the Clover algorithm (Frith; et al., 2004) to calculate motif enrichment; and

591 3) enrichment above background was calculated using the MEME algorithm (Bailey; Elkan,

592 1994).

593 Genomic DNA isolation and exome sequencing

29 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

594 Genomic DNA from autism and control iPSCs was isolated using Promega ReliaPrep™ gDNA

595 Tissue Miniprep System (Promega). RNase A treatment was performed to digest contaminating

596 RNA, and proteinase K to digest . DNA concentration was quantified using Picogreen

597 assay (Invitrogen).

598 10ng/µl of genomic DNA was used for library preparation and exome enrichment using

599 Nextera Rapid Capture Exomes (Illumina). Paired end libraries were constructed and

600 sequenced using Illumina HiSeq 2500 machine. Sample data was aligned to human genome

601 GRCh37.75 (UCSC version hg19), using the Burrows Wheeler Aligner (BWA) (Li; Durbin,

602 2009). Aligned reads were sorted according to chromosome number. Duplicate reads usually

603 created during sequencing were identified and removed using Picard tools (Broad Institute).

604 Quality scores were assigned to individual bases, then adjusted to reduce systematic errors

605 using genome analysis toolkit (GATK, Broad Institute). SNPs (single nucleotide

606 polymorphisms) and indels (insertions-deletions) were identified using GATK, by local de-

607 novo assembly of haplotypes (haploid genotype). The SNPs and indels were then recalibrated

608 to check that they were true genetic variants and not artefacts. Next, these variants were

609 evaluated for ratio of transition mutations to transversions mutations (Ti/Tv),

610 heterozygous/homozygous (het:hom) ratio, and insertion/deletion (indel) ratio. SNPs and

611 indels identified by GATK were annotated using ANNOVAR (Yang; Wang, 2015) and variant

612 effector predictor (VEP) (McLaren; et al., 2010). Important annotations include minor allele

613 frequency from the 1000G project, SIFT score (Ng; Henikoff, 2003), PolyPhen 2 score

614 (Ramensky; Bork; Sunyaev, 2002), base change, and exonic function.

615

30 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

616 Supplementary figure legends:

617

618 Fig S1: More autism correlated gene expression modules, with tissue differentiation and

619 housekeeping roles. (a) Correlation network dendogram with consensus function of significant

620 gene modules and their relationship with autism (red denotes a positive relationship). (b)

621 Module eigengene values of correlated gene modules.

622

623 Fig S2: Comparison of overall genetic architecture between autism and control participants. (a)

624 Transition vs Transversion (TiTv) ratio. (b) Single nucleotide polymorphisms (SNPs), also

625 indicating average novel mutations and known mutations in each group. (c) Insertion/deletion

626 type mutations (indels), also indicating average novel mutations and known mutations in each

627 group. (d) Heterozygous vs homozygous mutations. (e) Homozygous rare variants. (f)

628 Heterozygous rare variants.

629

630 Fig S3: Clustering of individual participant samples based on differential expression patterns

631 of top 50 genes associated with attenuated cortical patterning.

632

633 Fig S4: RNA-sequencing sample preparation and bioinformatics pipeline.

634

635 Fig S5: Soft thresholding used for exploring co-expressed genes conserved across our cohort.

636

31 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

637 References

638 Aasen, T.; Izpisua Belmonte, J. C. (2010): Isolation and cultivation of human keratinocytes 639 from skin or plucked hair for the generation of induced pluripotent stem cells. Nat Protoc 5, 2, 640 371-382. http://dx.doi.org/10.1038/nprot.2009.241. 641 Amendola, R. S.; Martin, A. C.; Selistre-de-Araujo, H. S.; et al. (2015): ADAM9 disintegrin 642 domain activates human neutrophils through an autocrine circuit involving integrins and 643 CXCR2. J Leukoc Biol. http://dx.doi.org/10.1189/jlb.3A0914-455R. 644 Anders, S.; Pyl, P. T.; Huber, W. (2015): HTSeq--a Python framework to work with high- 645 throughput sequencing data. Bioinformatics 31, 2, 166-169. 646 http://dx.doi.org/10.1093/bioinformatics/btu638. 647 APA (2013): Diagnostic and Statistical Manual of Mental Disorders (DSM-5®).(American 648 Psychiatric Pub). 649 Bailey, T. L.; Elkan, C. (1994): Fitting a mixture model by expectation maximization to 650 discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28-36, 651 http://www.ncbi.nlm.nih.gov/pubmed/7584402. 652 Basu, S. N.; Kollu, R.; Banerjee-Basu, S. (2009): AutDB: a gene reference resource for autism 653 research. Nucleic Acids Res 37, Database issue, D832-836. 654 http://dx.doi.org/10.1093/nar/gkn835. 655 Berg, J. M.; Geschwind, D. H. (2012): Autism genetics: searching for specificity and 656 convergence. Genome Biol 13, 7, 247. http://dx.doi.org/10.1186/gb4034. 657 Betancur, C. (2011): Etiological heterogeneity in autism spectrum disorders: more than 100 658 genetic and genomic disorders and still counting. Brain Res 1380, 42-77. 659 http://dx.doi.org/10.1016/j.brainres.2010.11.078. 660 Bourgeron, T. (2015): From the genetic architecture to synaptic plasticity in autism spectrum 661 disorder. Nat Rev Neurosci 16, 9, 551-563. http://dx.doi.org/10.1038/nrn3992. 662 Coghlan, S.; Horder, J.; Inkster, B.; et al. (2012): GABA system dysfunction in autism and 663 related disorders: from synapse to symptoms. Neurosci Biobehav Rev 36, 9, 2044-2055. 664 http://dx.doi.org/10.1016/j.neubiorev.2012.07.005. 665 Csardi, G.; Nepusz, T. (2006): The igraph software package for complex network research. 666 InterJournal, Complex Systems 1695, 5, 1-9. 667 Dobin, A.; Davis, C. A.; Schlesinger, F.; et al. (2013): STAR: ultrafast universal RNA-seq 668 aligner. Bioinformatics 29, 1, 15-21. http://dx.doi.org/10.1093/bioinformatics/bts635. 669 Frith, M. C.; Fu, Y.; Yu, L.; et al. (2004): Detection of functional DNA motifs via statistical 670 over-representation. Nucleic Acids Res 32, 4, 1372-1381. 671 http://dx.doi.org/10.1093/nar/gkh299. 672 Hansen, K. D.; Irizarry, R. A.; Wu, Z. (2012): Removing technical variability in RNA-seq data 673 using conditional quantile normalization. Biostatistics 13, 2, 204-216. 674 http://dx.doi.org/10.1093/biostatistics/kxr054. 675 Hartley, S. W.; Mullikin, J. C. (2015): QoRTs: a comprehensive toolset for quality control and 676 data processing of RNA-Seq experiments. BMC Bioinformatics 16, 224. 677 http://dx.doi.org/10.1186/s12859-015-0670-5. 678 Huang, J. J.; Ma, W. J.; Yokoyama, S. (2012): Expression and immunolocalization of Gpnmb, 679 a glioma-associated glycoprotein, in normal and inflamed central nervous systems of adult rats. 680 Brain Behav 2, 2, 85-96. http://dx.doi.org/10.1002/brb3.39. 681 Kretzschmar, H. (2009): Brain banking: opportunities, challenges and meaning for the future. 682 Nat Rev Neurosci 10, 1, 70-78. http://dx.doi.org/10.1038/nrn2535.

32 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

683 Lancaster, M. A.; Renner, M.; Martin, C. A.; et al. (2013): Cerebral organoids model human 684 brain development and microcephaly. Nature 501, 7467, 373-379. 685 http://dx.doi.org/10.1038/nature12517. 686 Langfelder, P.; Horvath, S. (2008): WGCNA: an R package for weighted correlation network 687 analysis. BMC Bioinformatics 9, 559. http://dx.doi.org/10.1186/1471-2105-9-559. 688 Langfelder, P.; Luo, R.; Oldham, M. C.; et al. (2011): Is my network module preserved and 689 reproducible? PLoS Comput Biol 7, 1, e1001057. 690 http://dx.doi.org/10.1371/journal.pcbi.1001057. 691 Leibinger, M.; Andreadaki, A.; Diekmann, H.; et al. (2013): Neuronal STAT3 activation is 692 essential for CNTF- and inflammatory stimulation-induced CNS axon regeneration. Cell Death 693 Dis 4, e805. http://dx.doi.org/10.1038/cddis.2013.310. 694 Lewis, D. A. (2002): The human brain revisited: opportunities and challenges in postmortem 695 studies of psychiatric disorders. Neuropsychopharmacology 26, 2, 143-154. 696 http://dx.doi.org/10.1016/S0893-133X(01)00393-1. 697 Li, H.; Durbin, R. (2009): Fast and accurate short read alignment with Burrows-Wheeler 698 transform. Bioinformatics 25, 14, 1754-1760. http://dx.doi.org/10.1093/bioinformatics/btp324. 699 Li, H.; Handsaker, B.; Wysoker, A.; et al. (2009): The Sequence Alignment/Map format and 700 SAMtools. Bioinformatics 25, 16, 2078-2079. 701 http://dx.doi.org/10.1093/bioinformatics/btp352. 702 Lin, C.; Song, W.; Bi, X.; et al. (2014): Recent advances in the ARID family: focusing on roles 703 in human cancer. Onco Targets Ther 7, 315-324. http://dx.doi.org/10.2147/OTT.S57023. 704 Marchetto, M. C.; Carromeu, C.; Acab, A.; et al. (2010): A model for neural development and 705 treatment of Rett syndrome using human induced pluripotent stem cells. Cell 143, 4, 527-539. 706 http://dx.doi.org/10.1016/j.cell.2010.10.016. 707 Mariani, J.; Coppola, G.; Zhang, P.; et al. (2015): FOXG1-Dependent Dysregulation of 708 GABA/Glutamate Neuron Differentiation in Autism Spectrum Disorders. Cell 162, 2, 375-390. 709 http://dx.doi.org/10.1016/j.cell.2015.06.034. 710 Matys, V.; Fricke, E.; Geffers, R.; et al. (2003): TRANSFAC: transcriptional regulation, from 711 patterns to profiles. Nucleic Acids Res 31, 1, 374-378, 712 http://www.ncbi.nlm.nih.gov/pubmed/12520026. 713 McLaren, W.; Pritchard, B.; Rios, D.; et al. (2010): Deriving the consequences of genomic 714 variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 16, 2069-2070. 715 http://dx.doi.org/10.1093/bioinformatics/btq330. 716 Ng, P. C.; Henikoff, S. (2003): SIFT: Predicting changes that affect 717 function. Nucleic Acids Res 31, 13, 3812-3814, 718 http://www.ncbi.nlm.nih.gov/pubmed/12824425. 719 O'Roak, B. J.; Deriziotis, P.; Lee, C.; et al. (2011): Exome sequencing in sporadic autism 720 spectrum disorders identifies severe de novo mutations. Nat Genet 43, 6, 585-589. 721 http://dx.doi.org/10.1038/ng.835. 722 O'Roak, B. J.; Vives, L.; Girirajan, S.; et al. (2012): Sporadic autism exomes reveal a highly 723 interconnected protein network of de novo mutations. Nature 485, 7397, 246-250. 724 http://dx.doi.org/10.1038/nature10989. 725 Parikshak, N. N.; Luo, R.; Zhang, A.; et al. (2013): Integrative functional genomic analyses 726 implicate specific molecular pathways and circuits in autism. Cell 155, 5, 1008-1021. 727 http://dx.doi.org/10.1016/j.cell.2013.10.031. 728 Parikshak, N. N.; Swarup, V.; Belgard, T. G.; et al. (2016): Genome-wide changes in lncRNA, 729 splicing, and regional gene expression patterns in autism. Nature. 730 http://dx.doi.org/10.1038/nature20612.

33 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

731 Park, K. W.; Nozell, S. E.; Benveniste, E. N. (2012): Protective role of STAT3 in NMDA and 732 glutamate-induced neuronal death: negative regulatory effect of SOCS3. PLoS One 7, 11, 733 e50874. http://dx.doi.org/10.1371/journal.pone.0050874. 734 Pasca, S. P.; Portmann, T.; Voineagu, I.; et al. (2011): Using iPSC-derived neurons to uncover 735 cellular phenotypes associated with Timothy syndrome. Nat Med 17, 12, 1657-1662. 736 http://dx.doi.org/10.1038/nm.2576. 737 Qin, S.; Zhang, C. L. (2012): Role of Kruppel-like factor 4 in neurogenesis and radial neuronal 738 migration in the developing cerebral cortex. Mol Cell Biol 32, 21, 4297-4305. 739 http://dx.doi.org/10.1128/MCB.00838-12. 740 Qin, S.; Zou, Y.; Zhang, C. L. (2013): Cross-talk between KLF4 and STAT3 regulates axon 741 regeneration. Nat Commun 4, 2633. http://dx.doi.org/10.1038/ncomms3633. 742 Ragozzino, D.; Di Angelantonio, S.; Trettel, F.; et al. (2006): Chemokine fractalkine/CX3CL1 743 negatively modulates active glutamatergic synapses in rat hippocampal neurons. J Neurosci 26, 744 41, 10488-10498. http://dx.doi.org/10.1523/JNEUROSCI.3192-06.2006. 745 Ramensky, V.; Bork, P.; Sunyaev, S. (2002): Human non-synonymous SNPs: server and 746 survey. Nucleic Acids Res 30, 17, 3894-3900, 747 http://www.ncbi.nlm.nih.gov/pubmed/12202775. 748 Roberts, D. S.; Raol, Y. H.; Bandyopadhyay, S.; et al. (2005): Egr3 stimulation of GABRA4 749 promoter activity as a mechanism for seizure-induced up-regulation of GABA(A) receptor 750 alpha4 subunit expression. Proc Natl Acad Sci U S A 102, 33, 11894-11899. 751 http://dx.doi.org/10.1073/pnas.0501434102. 752 Sheridan, G. K.; Wdowicz, A.; Pickering, M.; et al. (2014): CX3CL1 is up-regulated in the rat 753 hippocampus during memory-associated synaptic plasticity. Front Cell Neurosci 8, 233. 754 http://dx.doi.org/10.3389/fncel.2014.00233. 755 Shi, Y.; Kirwan, P.; Livesey, F. J. (2012): Directed differentiation of human pluripotent stem 756 cells to cerebral cortex neurons and neural networks. Nat Protoc 7, 10, 1836-1846. 757 http://dx.doi.org/10.1038/nprot.2012.116. 758 Streckfuss-Bomeke, K.; Wolf, F.; Azizian, A.; et al. (2013): Comparative study of human- 759 induced pluripotent stem cells derived from bone marrow cells, hair keratinocytes, and skin 760 fibroblasts. Eur Heart J 34, 33, 2618-2629. http://dx.doi.org/10.1093/eurheartj/ehs203. 761 Takahashi, K.; Tanabe, K.; Ohnuki, M.; et al. (2007): Induction of pluripotent stem cells from 762 adult human fibroblasts by defined factors. Cell 131, 5, 861-872. 763 http://dx.doi.org/10.1016/j.cell.2007.11.019. 764 Voineagu, I.; Wang, X.; Johnston, P.; et al. (2011): Transcriptomic analysis of autistic brain 765 reveals convergent molecular pathology. Nature 474, 7351, 380-384. 766 http://dx.doi.org/10.1038/nature10110. 767 Woolfenden, S.; Sarkozy, V.; Ridley, G.; et al. (2012): A systematic review of two outcomes 768 in autism spectrum disorder - epilepsy and mortality. Dev Med Child Neurol 54, 4, 306-312. 769 http://dx.doi.org/10.1111/j.1469-8749.2012.04223.x. 770 Yang, H.; Wang, K. (2015): Genomic variant annotation and prioritization with ANNOVAR 771 and wANNOVAR. Nat Protoc 10, 10, 1556-1566. http://dx.doi.org/10.1038/nprot.2015.105. 772 Yuen, R. K.; Thiruvahindrapuram, B.; Merico, D.; et al. (2015): Whole-genome sequencing of 773 quartet families with autism spectrum disorder. Nat Med 21, 2, 185-191. 774 http://dx.doi.org/10.1038/nm.3792. 775 Zambon, A. C.; Gaj, S.; Ho, I.; et al. (2012): GO-Elite: a flexible solution for pathway and 776 ontology over-representation. Bioinformatics 28, 16, 2209-2210. 777 http://dx.doi.org/10.1093/bioinformatics/bts366. 778 Zhang, Y.; Sloan, S. A.; Clarke, L. E.; et al. (2016): Purification and Characterization of 779 Progenitor and Mature Human Astrocytes Reveals Transcriptional and Functional Differences 780 with Mouse. Neuron 89, 1, 37-53. http://dx.doi.org/10.1016/j.neuron.2015.11.013.

34 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

781

35 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1 A

iPSC Cortical neuron reprogramming differentiation (Takahashi et al., 2007) (Shi et al., 2012) Individuals with idiopathic autism Keratinocytes iPSC

Day

0 8 21 35

iPSC Neurons Early neuroprogenitor Late neuroprogenitor

Individuals with no known psychiatric conditions Keratinocytes iPSC

B Control Autism

Day 8 Day 8

DAPI Ki67 Nestin DAPI Ki67 Nestin DAPI Ki67 Nestin DAPI Ki67 Nestin

Day 21 Day 21

DAPI Pax6 DAPI Pax6 DAPI Pax6 DAPI Pax6

Day 35 Day 35

DAPI TBR1 MAP2 DAPI TBR1 MAP2 DAPI TBR1 MAP2 DAPI TBR1 MAP2 bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2

a Day c Gene module correlation to autism d Signed correlation network

0 35

Neurons mRNA-seq Exome-seq

1.0 white lightgreen steelblue darkred midnightblue 0.5 Height

0.0 Bioinformatics −0.5 Consensus function (non-exclusive) Gene regulation Immune activation −1.0 Neural development Epigenetic regulation brown Chromosome organisation > Differential gene expression 0.5 0.6 0.7 0.8 0.9 1.0 Cellular metabolic processes skyblue3 Signed correlation to Autism

> Gene expression network analysis (WGCNA) grey60 Module sienna3 > Transcription factor analysis salmon d Colour hclust (*, "average") > Exome analysis darkturquoise > Rare variant discovery Condition: (Autism in red) b Sample gene expression clustering Colour Key and Histogram ‘R’ designated ‘steelblue’‘lightgreen’ ‘white’ ‘salmon’ ‘sienna3’ ‘grey60’ module colours counts 4 6 8 10 2 0

−2 −1 0 1 2 row Z−score

e steelblue lightgreen white f salmon sienna3 grey60

+ve correlation -ve correlation with Autism with Autism -0.2 0.0 0.2 -0.2 -0.1 0.0 0.1 -0.2 0.0 0.1 0.2 -0.3 -0.1 0.1 -0.2 0.0 0.1 0.2 0.3 -0.3 -0.1 0.1 Module Eigengene Value Module Eigengene Value Module Eigengene Value Module Eigengene Value Module Eigengene Value Control Autism Control Autism Control Autism Module Eigengene Value Control Autism Control Autism Control Autism CTRM2 CTRM3 CTRM1 ASDM1 010ASM 004ASM steelblue lightgreen white g h ANKRD63 i ST3GAL5 EMX2 FRMPD2 ISM2 SLC41A1 HS3ST1 Gene Ontology Plot CHKA EGFL6 KCNJ6 Gene Ontology Plot Gene Ontology Plot MATN2 MYO10 CHD5 regulation of cell-cell adhesion GRM8 KIAA0754 cytokine binding cellular modified DNAH6 cognition HYDIN amino acid metabolic process JAZF1 CX3CL1 regulation of DNA damage response, ABCA1 ADAM9 SLC7A6 SV2CTSHZ1 APP ARRDC4 VAT1L signal transduction by class mediator kinase regulator activity calcium-mediated signaling SLC45A3 EXT1 GRM1 CRYZ CCDC40 SAMD4A LAMC2 FOS BAIAP3 positive regulation of apoptosis lipid transport response to COL1A2 ACTN1 EMX2OS EPS8 CADM1 RP11-466P24.7 RYR3 GABRA4 C4orf50 steroid hormone stimulus negative regulation of ANO4 UNCX GPNMB LRRC37A3 DOK6 CPVL regulation of small GTPase CEACAM21 MTUS2 regulation of neuron apoptosis TENM2 C6orf118 ZNF106 C21orf62 CDO1 mediated signal transduction dendrite development VWA3A caspase regulator activity RP11-742N3.1 sulfur compound PPP4R4 metabolic process neuron projection development FAM110C cell activation involved in DUSP22 PRICKLE2 small GTPase SSTR3 immune response FER1L6 mediated signal transduction CCPG1 0246 8 10 12 ITGA3 Z-Score C9orf117 0246 8 10 12 NABP1 TGOLN2 LHFPL2 0246 8 10 12 MCAM PVRL3-AS1 NSUN7 Z-Score Z-Score j salmon k sienna3 l grey60 RNGTT AFF3 KCNS2 TET3 EHMT1 GRIP1 PATZ1 FBN3 VEZT Gene Ontology Plot Gene Ontology Plot SLC40A1 CEP68 WHSC1 Gene Ontology Plot nucleic acid binding AC104135.3 MAT2A SCAF8 ncRNA processing regulation of histone H3-K4 methylation RP1-310O13.12 ZNF559 regulation of GREB1 NRLAK 4PBBR PTCHD2 EPB41 PPP1R16B SHD nuclease activity RNA metabolic process DNA binding NHSL2 PCDHGB7 ZNF300 CDH7 PI4KAP1 SSTR2 CKAP5 MEG8 MEG3 regulation of gene expression rRNA metabolic process PCDH15 DRD2 ELL2 chromosome organization PTCH1 KIF26A AL132709.8 NCBP1 NETO2 BACE1 RNA processing KIAA1324L RP11-143K11.1 RET CPLX2 CTD-2314G24.2 RNA methyltransferase activity HBS1L FOXO3 TMEM169 chromatin binding WSCD2 PCBP4 ISL1 KCNA2 MPRIPGOPC spliceosomal PHYHIPL regulation of gene expression, WDR36 PSMD5 C16orf45 snRNP assembly ligand-dependent epigenetic HMGN1 transcription elongation from DERL3 binding GABBR2 S-adenosylmethionine-dependent RNA polymerase II promoter protein-DNA complex disassembly AL132709.5 methyltransferase activity LINC00966 GRIA4 ZNF318 0246 8 10 12 ZNF385D ZNF136 0246 8 10 12 ZNF430 ZNF85 0246 8 10 12 AL117190.3 ELOVL6 Z-Score PBX2 LRRC37A4P Z-Score Z-Score bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3 a Cellular metabolic processes b Neural development c Immune activation ‘steelblue’ ‘lightgreen’ ‘white’ SP1 SP2 KLF5 EGR1 KLF5 KLF4 NFATC2 CRX STAT5A MZF1 EGR2 FOXI1 STAT5B RREB1 PRDM1 STAT1 RFX2 ARID3A FEV STAT3 RFX1 HLTF HOXA5 STAT4 FOSL1 INSM1 SOX17 TFBS TFBS TFBS 1000bp upstream 1000bp upstream 1000bp upstream

d Epigenetic regulation e Gene regulation f Chromosome organisation ‘salmon’ ‘sienna3’ ‘grey60’

FOXP1 KLF4 FOXD3 ARID3A MEF2C SRY ARID3A SOX5 MEF2A NRF1 NFE2L2 ELK4 KLF4 BACH1 GABPA ARID3A MAFK ZBTB33 GFI1 TFBS TFBS TFBS 1000bp upstream 1000bp upstream 1000bp upstream

g 66/163 steelblue genes have a h STAT3 motif `upstream of TSS

1000bp upstream

Gene Ontology Plot of STAT3-regulated genes

Gene regulation small GTPase mediated KLF4 signal transduction Immune activation Chromosome organisation

ribonucleoside triphosphate ARID3A catabolic process Epigenetic regulation

response to biotic stimulus

0 2 4 6 8 Z-Score bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 4 a Post mortem gene module enrichment in iDN

1.5 SFARI autism risk genes (0.002)

1.8 1.6 2 DNA-binding and dev_asdM2 Lower gene expression (2e-04) (0.004) (1e-05) 3 Transcriptional in post mortem brain Regulation 1.5 2.5 1.7 dev_asdM3 (0.008) (3e-14) (4e-04)

1.8 Neuronal function APMB_asdM12 (0.04)

2.6 2.7 2 Immune activation APMB_asdM16 (5e-06) (6e-05)

2.1 3.1 2.3 Synaptic plasticity dev_asdM13 (1e-06) (3e-09) (1e-06)

2.6 Synaptic structure dev_asdM16 Higher gene expression (1e-04) in post mortem brain 1.7 1.9 1 Synaptic maturation dev_asdM17 (0.002) (0.002)

2.5 2.7 ACP_asdM5 (8e-08) (1e-07)

Attenuated cortical patterning 2.2 2.6 ACP_asdM13 (5e-04) (1e-04) modules 0 1.8 ACP_asdM14 (0.03)

white grey60 salmon sienna3 steelblue lightgreen Top +ve DE Top -ve DE

Higher gene expression Lower gene expression in autism iDN in autism iDN

b Autism iDN cell−type enrichment c Autism iDN module preservation

Mariani et al 2015 Marchetto et al 2016 2.1 3 steelblue * Preservation Z-summary Preservation Zsummary

Higher gene expression 1.9 lightgreen *** white in autism iDN sienna3 grey60 2

2.1 2.4 2.4 white *** *** ***

lightgreen lightgreen salmon salmon steelblue sienna3 grey60 grey60 1 Preservation Zsummary Preservation Z-summary steelblue white 0 2 4 6 8 10

Lower gene expression salmon −2 in autism iDN 50 100 200 500 1000 2000 Module size Module size sienna3 0

Astrocyte Microglia

Mature Neuron Oligodendrocyte Endothelial cells bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5 a Relationship between gene expression and exome variants Color key and histogram counts 0 2 4 6 8 10 −2 −1 0 1 2 frameshift deletion frameshift insertion row Z−score nonsynonymous SNV synonymous SNV CPLX2 PTCH1 ROBO1 KALP KCNA2 SHD FEZF1.AS1 FEZF1 KCNJ5 TTR ARMC3 SPAG6 CA12 WDR49 ENSG00000259867 ENSG00000234840 RSPO3 DDIT4L PRRX1 SOX3 SOX21.AS1 CRH SOX21 NR4A2 ENSG00000180178 VAX1 HDC HERC2P3 ENSG00000231240 MAP3K19 GDF5 ENSG00000259055 ENSG00000271862 ENSG00000258789 ENSG00000245025 ENSG00000269921 ENSG00000266926 ENSG00000248632 ENSG00000250130 ENSG00000271314 KCNS2 HTR7 CILP2 TRIM14 ENSG00000235376 SPDEF POU5F1P3 ENSG00000260860 RNF128 USP17L2 CTRM1 ASDM1 CTRM1 ASDM1 CTRM3 CTRM3 CTRM2 CTRM2 010ASM 004ASM 010ASM 004ASM b Autism iDN rare variants enrichment c Autism iDN unique variants enrichment

004ASM 2 004ASM 2

010ASM 1.7 010ASM 1.4 (0.04) 1 (0.03) 1

ASDM1 ASDM1

0 0 2 CTRM1 (0.03) CTRM1

-1 -1 CTRM2 CTRM2

CTRM3 -2 2.1 1.9 -2 CTRM3 (0.004) (0.02)

white white salmon sienna3 grey60 salmon grey60 steelblue steelblue sienna3 Up in ASM lightgreen Up in ASM lightgreen Down in ASM Down in ASM Higher Lower Higher Lower gene expression in iDN gene expression in iDN gene expression in iDN gene expression in iDN bioRxiv preprint doi: https://doi.org/10.1101/349415; this version posted June 19, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 6

Candidate gene evidence key Non-neuronal cells GPNMB Module hub gene HTR7 Top 100 differentially expressed gene STAT3 Transcription factor Synaptic structure * SFARI autism risk gene Synaptic maturation GABRA4*, CX3CL1

HTR7* Synaptic plasticity

Autism post mortem brains ADAM9 Autism iPSC cortical neurons STAT3 Immune and inflammatory response FN1, DCN

SLITRK5*, ROBO1* Vesicular transport and neuronal projection

DNA-binding and transcriptional regulation FOXO3

Chromosome organization

FOXP1*, MEF2C* Epigenetic regulation KLF4, ARID3A