articles

Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43

Magdalini Polymenidou1,2,6, Clotilde Lagier-Tourenne1,2,6, Kasey R Hutt2,3,6, Stephanie C Huelga2,3, Jacqueline Moran1,2, Tiffany Y Liang2,3, Shuo-Chien Ling1,2, Eveline Sun1,2, Edward Wancewicz4, Curt Mazur4, Holly Kordasiewicz1,2, Yalda Sedaghat4, John Paul Donohue5, Lily Shiue5, C Frank Bennett4, W Yeo2,3 & Don W Cleveland1,2

We used cross-linking and immunoprecipitation coupled with high-throughput sequencing to identify binding sites in 6,304 as the brain RNA targets for TDP-43, an RNA binding that, when mutated, causes amyotrophic lateral sclerosis. Massively parallel sequencing and splicing-sensitive junction arrays revealed that levels of 601 mRNAs were changed (including Fus (Tls), progranulin and other transcripts encoding neurodegenerative disease–associated ) and 965 altered splicing events were detected (including in sortilin, the receptor for progranulin) following depletion of TDP-43 from mouse adult brain with antisense oligonucleotides. RNAs whose levels were most depleted by reduction in TDP-43 were derived from genes with very long introns and that encode proteins involved in synaptic activity. Lastly, we found that TDP-43 autoregulates its synthesis, in part by directly binding and enhancing splicing of an intron in the 3′ untranslated region of its own transcript, thereby triggering nonsense-mediated RNA degradation.

Amyotrophic lateral sclerosis (ALS) is an adult-onset disorder in which RNA processing regulation in neuronal integrity. However, a compre- premature loss of motor neurons leads to fatal paralysis. Most cases hensive protein-RNA interaction map for TDP-43 and identification of ALS are sporadic, with only 10% of affected individuals having a of post-transcriptional events that may be crucial for neuronal survival familial history. A breakthrough in understanding ALS pathogenesis remain to be established. was the discovery that TDP-43, which in the normal setting is pri- A common approach for identifying specific RNA-binding pro- marily nuclear, mislocalizes and forms neuronal and glial cytoplasmic tein targets or aberrantly spliced isoforms related to disease has been aggregates in ALS1,2, frontotemporal lobar degeneration (FTLD), and through selection of candidate genes. However, recent advances in © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 in Alzheimer’s and Parkinson’s diseases (reviewed in ref. 3). Dominant DNA sequencing technology have provided powerful tools for explor- mutations in TDP-43 were subsequently identified as being causative in ing transcriptomes at remarkable resolution12. Moreover, cross-linking, sporadic and familial ALS cases and in rare individuals with FTLD4–7. immunoprecipitation and high-throughput sequencing (CLIP-seq or At present, it is unresolved as to whether neurodegeneration is a result HITS-CLIP) experiments have found that a single RNA-binding protein of a loss of TDP-43 function or a gain of toxic property or a combination can have previously unrecognized roles in RNA processing and affect of the two. However, a notable feature of TDP-43 pathology is TDP- many alternatively spliced transcripts13–15. We used these approaches 43 nuclear clearance in neurons containing cytoplasmic aggregates, to identify a comprehensive TDP-43 protein-RNA interaction map consistent with pathogenesis being driven, at least in part, by a loss of in the CNS. After depletion of TDP-43 in vivo, RNA sequencing and TDP-43 nuclear function1,7. splicing-sensitive microarrays were used to determine that TDP-43 is Several lines of evidence suggest an involvement of TDP-43 in crucial for maintaining normal levels and splicing patterns of >1,000 ­multiple steps of RNA metabolism, including transcription, splicing or mRNAs. The most downregulated of these TDP-43–dependent RNAs transport of mRNA3, as well as microRNA metabolism8. Misregulation have pre-mRNAs with very long introns that contain multiple TDP- of RNA processing has been described in a growing number of neuro- 43–binding sites and encoded proteins related to synaptic activity. The logical diseases9. The recognition of TDP-43’s role in ­neurodegeneration nuclear TDP-43 clearance widely reported in TDP-43 proteinopathies1,7 and the recent identification of ALS-causing mutations in FUS/TLS10,11, will lead to a disruption of this role on long RNAs that are preferentially another RNA/DNA binding protein, has reinforced a crucial role for expressed in brain, thereby contributing to neuronal vulnerability.

1Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, USA. 2Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, California, USA. 3Stem Cell Program and Institute for Genomic Medicine, University of California at San Diego, La Jolla, California, USA. 4Isis Pharmaceuticals, Carlsbad, California, USA. 5RNA Center, Department of Molecular, Cell and Developmental Biology, Sinsheimer Labs, University of California, Santa Cruz, California, USA. 6These authors contributed equally to this work. Correspondence should be addressed to D.W.C. ([email protected]) or G.W.Y. ([email protected]).

Received 20 December 2010; accepted 14 February 2011; published online 27 February 2011; doi:10.1038/nn.2779

nature neuroscience advance online publication 1 articles

RESULTS ­identify TDP-43–binding sites from clusters of sequence reads (Fig. 1b), Protein-RNA interaction map of TDP-43 in mouse brain with a conservative threshold (the number of reads mapped to a cluster We used CLIP-seq to identify in vivo RNA targets of TDP-43 in adult had to exceed the expected number by chance at P < 0.01). This stringent mouse brain (Supplementary Fig. 1a). After ultraviolet irradiation to definition will miss some true binding sites, but was intentionally chosen to stabilize in vivo protein-RNA interactions, we immunoprecipitated identify the strongest bound sites while minimizing false positives. Indeed, TDP-43 with a monoclonal antibody16 that had a higher immuno- additional probable binding sites could be identified by inspection of reads precipitation efficiency than any of the commercial antibodies tested mapped to specific RNAs (see Fig. 1c). Moreover, similarly defined clusters (Supplementary Fig. 1b). Complexes representing the expected molec- from the low mobility complexes (Supplementary Fig. 1b) showed a 92% ular weight of a single molecule of TDP-43 bound to its target RNAs overlap with those from the monomeric complexes (Fig. 1a), consistent were excised (Fig. 1a) and sequenced. We also observed lower mobility with the reduced mobility complexes comprising multiple TDP-43s (or protein-RNA complexes whose abundance was reduced by increased other RNA-binding proteins) bound to a single RNA. nuclease digestion. Immunoblotting of the same immunoprecipitated Genome-wide comparison of our replicate experiments revealed samples before radioactive labeling of the target RNAs revealed that (Supplementary Fig. 1c) that the vast majority (90%) of TDP-43– TDP-43 protein was a component of both the ~43-kDa complex and binding sites in experiment 1 overlapped with those in experiment 2 more slowly migrating complexes (Fig. 1a). (compared with an overlap of only 8% (P ≈ 0, Z = 570) when clusters We performed two independent experiments and obtained 5,341,577 were randomly distributed across the length of the pre-mRNAs contain- and 12,009,500 36-bp sequence reads. Mapped reads of both experi- ing them). Combining the mapped sequences yielded 39,961 clusters, ments were predominantly in protein-coding genes, with ~97% of them representing binding sites of TDP-43 in 6,304 annotated protein-coding­ oriented in the direction of transcription, confirming little DNA con- genes, approximately 30% of the murine transcriptome (Fig. 1d). We tamination. The positions of mapped reads from both experiments were computationally sampled reads (in 10% intervals) from the CLIP highly consistent, as exemplified by TDP-43 binding on the semaphorin sequences and found a clear logarithmic relationship (Supplementary 3F transcript (Fig. 1b). Fig. 1d), from which we calculated that our current dataset contains We used a cluster-finding algorithm with gene-specific thresholds that ~84% of all TDP-43 RNA targets in the mouse brain. Comparison with accounted for pre-mRNA length and variable expression levels15,17 to the mRNA targets identified from primary rat neuronal cells18 by RNA

Figure 1 TDP-43 binds distal introns of pre- ab1 kb mRNA transcripts through UG-rich sites in vivo. MNase MNase (a) Autoradiograph of TDP-43-RNA complexes α–TDP-43 IgG α–TDP-43 IgG CLIP reads trimmed by different concentrations of UV: +++++ ++++– experiment 1 micrococcal nuclease (MNase, left). Complexes in red box were used for library preparation and CLIP reads 80 experiment 2 sequencing. Immunoblot showing TDP-43 in 58

~46 kDa and higher molecular weight complexes 46 dependent on ultraviolet (UV) crosslinking 30 CLIP reads (right). (b) Example of a TDP-43–binding site combined (CLIP cluster) on Semaphorin 3F defined 25 by overlapping reads from two independent 17 experiments surpassing a gene-specific 7 CLIP cluster (62 reads) threshold. (c) University of California Santa Cruz RNA Protein Sema3F

© 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 (UCSC) Genome Browser screenshot of neurexin 3 intron 8 (mm8; chr12:89842000–89847000) c d displaying three examples of TDP-43–binding 2 kb CLIP experiment 1 CLIP experiment 2 modes. The right-most CLIP cluster represents 5,341,577 reads 12,009,500 reads a canonical binding site coinciding GU-rich CLIP reads Combine reads combined sequence motifs, whereas the left-most cluster 17,351,077 reads lacks GU-rich sequences and a region containing Trim and map multiple GU repeats showed no evidence of CLIP clusters 3,473,413 reads TDP-43 binding. The second CLIP cluster Direction of (middle purple-outlined box) with weak binding transcription in genes was found only when relaxing cluster-finding 97.50% sense UGUGUG algorithm parameters. (d) Flow-chart illustrating Cluster definition the number of reads analyzed from both CLIP- 39,961 clusters in seq experiments. (e) Histogram of Z scores Nrxn 3 intron 8 6,304 genes indicating the enrichment of GU-rich hexamers in CLIP-seq clusters compared with equally sized e f Distal Proximal Not annotated TDP-43 clusters Randomly distributed intron intron clusters, randomly distributed in the same pre- GUGU count clusters GUGU count 12 mRNAs. Sequences and scores of the top eight Motif Z score Z 10% 9% 5’ UTR 2 kbExon 2 kb 3’ UTR augugu 158 9% hexamers are indicated. Pie charts enumerate uguagu 156 10% guaugu 146 15% 2%1% 57% 12% 46% 4% 3% clusters containing increasing counts of (GU)2 ugugua 143 3% 1% 8 uaugug 138 11% 21% compared with randomly distributed clusters guguau 137 (P ≈ 0, χ2 = 21,662). (f) Pre-mRNAs were 30% 27% 0123>3 28% (motif count) divided into annotated regions (top). 2 4 63% Motif Z score 14% Distribution of TDP-43 (left) or previously log 24% 21 ugugug 462 published Argonaute-binding sites as a control gugugu 453 (right) showed preferential binding of TDP-43 in 0 TDP-43 CLIP Ago CLIP binding binding distal introns. 100 Z score 300

2 advance online publication nature neuroscience articles

immunoprecipitation (RIP, an approach with the serious caveat that intronic clusters being >2 kb from the nearest exon-intron boundary the absence of cross-linking allows re-association of RNAs and RNA- (Fig. 1f). This number rose to 82% for clusters >500 bases from the binding proteins after cell lysis, as previously documented19) revealed nearest exon-intron boundary (Supplementary Fig. 2b). Such distal 2,672 of the genes with CLIP-seq clusters in common. As expected from intronic binding is in sharp contrast with published RNA-binding maps our CLIP-seq analysis in whole brain, we found strong representation for tissue-specific RNA binding proteins involved in alternative splic- of neuronal (see below) and glial mRNA targets, including glutamate ing, such as Nova or Fox214,15. The same analysis on published data in transporter 1 (Glt1), myelin-associated glycoprotein (Mag) and myelin mouse brain for the Argonaute proteins17,21, which are recruited by oligodendrocyte glycoprotein (Mog). microRNAs to the 3′ ends of genes in metazoans17,21, showed a sub- stantially different pattern of binding. Only 24% (or 30%) of Argonaute TDP-43 binds GU-rich distal intronic sites clusters resided within 2 kb (or 500 bases) of the nearest exon-intron Sequence motifs enriched in TDP-43–binding sites were determined by boundary, whereas 28% were in 3′ untranslated regions (3′ UTRs) comparing sequences in clusters with randomly selected regions of similar (Fig. 1f and Supplementary Fig. 2b). This prominent concentration sizes in the same protein-coding genes. Z score statistics revealed that the of Argonaute binding near 3′ ends is in stark contrast with the uniform most significantly enriched hexamers consisted of GU-repeats (Z > 450), distribution of TDP-43–binding sites across the length of pre-mRNAs consistent with published in vitro results20, or a GU-rich motif interrupted (Supplementary Fig. 2c). by a single adenine (Z = 137–158) (Fig. 1e). The majority (57%) of clus- ters contained at least four GUGU elements, as compared with only 9% RNAs altered after in vivo TDP-43 depletion in mouse brain when equally sized clusters were randomly placed in the same pre-mRNAs To identify the contribution of TDP-43 in maintaining levels and (Fig. 1e). Furthermore, the number of GUGU ­tetramers correlated with splicing patterns of RNAs, we injected two antisense oligonucleotides the strength of binding, as estimated by the relative number of reads in each (ASOs) directed against TDP-43 and a control ASO with no target in cluster per gene compared with all clusters in other genes (Supplementary the mouse genome into the striatum of normal adult mice (Fig. 2a). Fig. 2a). Nevertheless, genome-wide analysis revealed that GU-rich Striatum is a well-defined structure that is amenable to accurate dissec- repeats were neither necessary nor sufficient to specify a TDP-43–binding tion and isolation, with TDP-43 expression levels being comparable to site. One example is the left-most binding site in neurexin 3 (Fig. 1c), other brain regions. Stereotactic injections of ASOs that target TDP-43, which does not have a GU motif, whereas a GU-rich motif 2-kb upstream control ASO or saline were performed in three groups of age- and sex- of it was not bound by TDP-43. In fact, only ~3% of all transcribed matched adult C57BL/6 mice and were tolerated with minimal effects 300 nucleotide stretches containing more than a TDP-43 mRNA three GUGU tetramers contained TDP-43 clus- Stereotactic injection of ASOs AAAAAAAAAA in the striatum of normal mice ters by CLIP-seq, indicating that TDP-43 target TDP-43–specific ASO genes cannot be identified by simply scanning RNA-seq RNA qRT-PCR extraction nucleotide sequences for GU-rich regions. RT-PCR Although the vast majority (93%) of TDP- RNase H AAAAAAAAAA 43 sites were in introns, we identified a bind- Protein ing preference of TDP-43 with most (63%) extraction Immunoblot AAA AAA Dissection of the striatum Figure 2 depletion of TDP-43 in mouse In vivo TDP-43 pre-mRNA brain with ASOs. (a) Strategy for depletion of degradation TDP-43 in mouse striatum. TDP-43–specific bc © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 or control ASOs were injected into the striatum Control oligo TDP-43 KD Saline Control oligo TDP-43 KD of adult mice. TDP-43 mRNA was degraded 342,150,462 reads 294,180,255 reads via endogenous RNase H digestion, which TDP-43 72 nt long (n = 4) 72 nt long (n = 4) specifically recognizes ASO–pre-mRNA (DNA/ Trimmed RNA) hybrids. Mice were killed after 2 weeks GAPDH and mapped

and striata were dissected for RNA and protein 120 100 Densitometric 127,725,000 reads 103,514,455 reads extraction. (b) Semi-quantitative immunoblot 80 analysis demonstrating substantial depletion of 60 Direction of 40

protein transcription in genes TDP-43 20

TDP-43 protein to 20% of controls in TDP-43 Percentage 0 ASO–treated mice compared with saline- and 120 94.56% sense 94.33% sense 100 qRT-PCR control ASO–treated mice. qRT-PCR showed 80 0 similar TDP-43 depletion at the mRNA level. 60 mRNA 40 TDP-43

KD, knockdown. (c) Flow-chart illustrating the Percentage 20 0 11,852 expressed number of reads sequenced and aligned from Control oligo TDP-43 KD Saline genes the RNA-seq experiments. (d) Differentially 10 regulated genes identified by RNA-seq analysis. d Downregulated genes (239) Scatter-plot revealed the presence of 362 and Upregulated genes (362) ef 239 genes (diamonds) that were significantly 8 140 Saline up- (red) or down- (green) regulated after Control RNA-seq qRT-PCR 6 120 TDP-43 KD TDP-43 depletion. RNA-seq analysis confirmed 120 100 RPKM) that TDP-43 levels were reduced to 20% of 2 100 4 80

(log 80 control animals. (e) Normalized expression TDP-43 60 (based on RPKM values from RNA-seq) of four 60 2 40 noncoding RNAs that were TDP-43 targets and 40

Expression TDP-43 knockdown 20 20 were downregulated after TDP-43 depletion. 0 0 Percentage Meg3/Gtl2 0 (f) Quantitative RT-PCR validation of noncoding 0 2 4 6 8 10 Percentage ncRNA expression Meg3/Gtl2Rian Malat1/Neat2 Xist Expression control (log RPKM) Saline RNA Meg3/Gtl2. 2 Controloligo TDP-43oligo

nature neuroscience advance online publication 3 articles

on the survival of the animals. Mice were killed a b 600 200 40 Upregulated Downregulated 2 after 2 weeks and total RNA and protein from Mean cluster count in introns Saline 35 Control oligo striata were isolated (Fig. 2a). Samples treated 500 150 TDP-43 KD 30 100 1.5 with TDP-43 ASO showed a significant and 400 25 reproducible reduction of TDP-43 RNA and 50 Cluster count 300 20 1 protein to approximately 20% of normal levels 0 –5 15 when compared with controls (P < 3 × 10 ; 200 10 0.5 Fig. 2b). Total intron length (kb) 100 To explore the effects of TDP-43 down- 5 regulation on its target RNAs, we converted 0 0 0 Most Not Most Relative mRNA expression (fold change) Nrxn3 Park2 Nlgn1 Fgf14 Kcnd2 CadpsCEfna5 hat poly-A–enriched RNAs from four biological upregulated regulated downregulated replicates of TDP-43 or control ASO-treated, c 1.0 Mouse exons Mouse introns Human introns as well as three saline-treated animals, to 0.9 P < 0.0872 P < 6 x 10–6 P < 5.3 x 10–6 cDNAs and sequenced them in a strand- 0.8 22 0.7 specific manner , yielding an average of >25 0.6 million 72-bp reads per library. The number 0.5 0.4 Frequency of mapped reads per kilobase of exon, per mil- 0.3 Enriched in brain Not enriched in brain lion mapped reads (RPKM) for each annotated 0.2 Random subset protein-coding gene was determined to estab- 0.1 Not in random subset 0 lish a metric of normalized gene expression12. 102 103 104 102 103 104 102 103 104 Median exon length (nt) Median intron length (nt) Median intron length (nt) Hierarchical clustering of values for the independent samples revealed Figure 3 Binding of TDP-43 on long transcripts enriched in brain sustains their normal mRNA levels. high correlation (R2 = 0.96) between biologi- (a) Correlation between RNA-seq and CLIP-seq data. Genes were ranked on their degree of regulation cal replicates of each condition (TDP-43 and after TDP-43 depletion (x axis) and the mean number of intronic CLIP clusters found in the next 100 control ASO/saline) (Supplementary Fig. 3a). genes from the ranked list were plotted (y axis, green line). Similarly, the mean total intron length for the next 100 genes was plotted (y axis, red line). The cluster count for each upregulated gene (red dots) Notably, all control ASO–treated samples were and each downregulated gene (green dots) was plotted using the same ordering (inset). (b) qRT-PCR clustered together, as were the samples from for selected downregulated genes with long introns (except for Chat) revealed a significant reduction TDP-43 ASO–treated animals, consistent with of all transcripts when compared with controls (P < 8 × 10−3). Error bars represent s.d. calculated TDP-43 reduction having an appreciable effect in each group for 3–5 biological replicates. (c) Cumulative distribution plots comparing exon length on regulation of gene expression. (left) or intron length (middle) across mouse brain tissue–enriched genes (388 genes) and non-brain Reads of each treatment group were com- tissue–enriched genes (15,153 genes). Genes enriched in brain had significantly longer median intron length compared with genes not enriched in brain (right, solid red line and black lines < 6.2 × 10−6 bined, yielding greater than 100 million , P by two-sample Kolmogorov Smirnov goodness-of-fit hypothesis test), whereas a random subset of uniquely mapped reads per condition (Fig. 2c). 388 genes showed no difference in intron length (dashed lines). Similar analysis across human brain Approximately 70% (11,852) of annotated pro- tissue–enriched genes (387 genes) and non-brain tissue–enriched genes (17,985 genes) also showed tein-coding genes in mouse satisfied at least 1 significantly longer introns in brain enriched genes (solid red and black lines, P < 5.3 × 10−6), whereas a RPKM in either condition. Statistical compari- random subset of 387 genes showed no difference in intron length (dashed lines). son revealed that 362 genes were significantly upregulated and 239 downregulated on reduction of TDP-43 protein Of the RNAs containing TDP-43 clusters, 96% were unaffected by TDP- © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 (P < 0.05; Fig. 2d and Supplementary Tables 1 and 2). We found that 43 depletion, suggesting either that other RNA-binding proteins com- TDP-43 itself was downregulated by RPKM analysis to 20% of the levels pensate for TDP-43 loss or that the remaining 20% of TDP-43 protein in control treatments, consistent with quantitative RT-PCR (qRT-PCR) suffices to regulate these transcripts. For the 239 RNAs downregulated measurements (Fig. 2b). RNAs unique to neurons (including doublecor- after TDP-43 depletion, a striking enrichment of multiple TDP-43 bind- tin, the neuron-specific beta-tubulin and choline O-acetyl transferase ing sites was observed (Fig. 3). In fact, the 100 most downregulated (Chat)) or glia (including glial fibrillary acidic protein, myelin binding genes contained an average of ~37 TDP-43–binding sites per pre-mRNA protein, Glt1 and Mag) were highly represented in the RNA-seq data, con- and 12 genes had more than 100 clusters (Fig. 3a and Supplementary firming assessment of RNA levels in multiple cell types, as expected. Table 2). We did not observe this bias for multiple TDP-43 binding Of the set of ~242 literature-curated murine noncoding RNAs23, sites if we randomized the order of genes (Supplementary Fig. 4a), expression of 4 increased and expression of 55 decreased by more than if we ordered them by their expression levels (RPKM) in either treat- twofold on TDP-43 depletion (P < 10−5; Supplementary Table 3). ment (Supplementary Fig. 4b,c), or if the Argonaute-binding sites were Malat1/Neat2, Xist, Rian and Meg3 are noncoding RNA examples that plotted on genes ranked by their expression pattern on TDP-43 deple- were both decreased (Fig. 2e,f) and bound by TDP-43, consistent with tion (Supplementary Fig. 4d). Furthermore, this trend was significant TDP-43 having a direct role in regulating their expression. for TDP-43 clusters found in introns (Fig. 3a), but not in exons, 5′ or 3′ UTRs (Supplementary Fig. 4e–g). TDP-43 binding to long pre-mRNAs sustains their levels To address whether TDP-43 binding enrichment in the down- RNA-seq and CLIP-seq datasets were integrated by first ranking all regulated genes could be attributed to intron size, we performed the 11,852 expressed genes by their degree of change on TDP-43 reduc- same analysis on the ranked list, but calculated total (Fig. 3a) or mean tion compared with control treatment. For each group of 100 consec- (Supplementary Fig. 4h) intron size instead of cluster counts. We utively ranked genes (starting from the most upregulated gene), we found that the most downregulated genes after TDP-43 reduction determined the mean number of TDP-43 clusters. No enrichment in had exceptionally long introns that were more than sixfold longer TDP-43 clusters in the upregulated genes was identified, indicating that (average of 28,707 bp; median length of 11,786 bp) than unaffected their ­upregulation was likely an indirect consequence of TDP-43 loss. or upregulated genes (average of 4,532 bp, median length of 2,273 bp,

4 advance online publication nature neuroscience articles

Table 1 The fraction of downregulated, but not upregulated TDP-43 targets increased with intron size Mean intron Expressed TDP-43 targets Downregulated Downregulated Upregulated Upregulated length (kb) genes genes TDP-43 targets genes TDP-43 targets 0–1 2,566 510 (20%) 31 8 (26%) 100 7 (7%) 1–10 8,022 4,485 (56%) 80 47 (59%) 252 56 (32%) 10–100 1,238 1,027 (83%) 109 104 (95%) 10 3 (30%) >100 26 26 (100%) 19 19 (100%) 0 0 (0%) Total 11,852 6,048 (51%) 239 178 (74%) 362 66 (18%) Number of TDP-43 targets among expressed, downregulated or upregulated genes, grouped by their mean intron length as indicated. The percentages in parentheses represent the fraction of TDP-43 targets found in each group upon TDP-43 depletion.

P < 4 × 10−18 by t test). Again, this correlation of downregulation with molecules neurexin 1 and 3 (Nrxn1, Nrxn3) and neuroligin 1 (Nlgn1). intron size was not observed for any of the control conditions men- We analyzed a compendium of expression array data from different tioned above (Supplementary Fig. 4a–g). Indeed, the enrichment of mouse organs and human tissues and found that genes preferentially TDP-43 binding can be largely attributed to intron size differences, expressed in brain have significantly longer introns (P < 6 × 10−6), but as the number of TDP-43–binding sites per kilobase of intron length not exons (Fig. 3c). The length of these genes was not correlated with (cluster density; Supplementary Fig. 4i) was only slightly increased the size of the respective proteins and the prevalence of long introns (P < 0.022) for downregulated versus unaffected or upregulated genes was largely conserved between the corresponding mouse and human (0.072 sites per kb downregulated genes, 0.059 sites per kb other genes). genes. Although binding of TDP-43 in long introns can be explained Dividing all mouse protein-coding genes into four groups on the basis by the increased likelihood to contain UG repeats, the conservation of mean intron length (<1 kb, 1–10 kb, 10–100 kb and >100 kb) con- through evolution of this particular gene structure (Supplementary firmed that the fraction of TDP-43 targets increased (from 20% to Fig. 7) suggests that these exceptionally long introns contain important 100%) with intron size (Table 1). Indeed, 83% of genes that contained regulatory elements. average intron lengths of 10–100 kb and all 26 genes that contained To validate the RNA-seq results, we analyzed a selection of brain- >100-kb-long introns were direct targets of TDP-43. enriched TDP-43 targets containing long introns. Genome browser A highly significant fraction (74%) of all downregulated genes were views of neurexin 3 (Nrxn3), Parkin 2 (Park2), neuroligin 1 (Nlgn1), direct targets of TDP-43 in comparison with genes that were unchanged fibroblast growth factor 14 (Fgf14), potassium voltage-gated channel (52%, P < 0.001) or upregulated (18%, P < 10−17) on TDP-43 deple- subfamily D member 2 (Kcnd2), calcium-dependent secretion activa- tion. Notably, all 19 downregulated genes with >100-kb-long introns tor (Cadps) and ephrin-A5 (Efna5) revealed a scattered distribution of were direct TDP-43 targets. In strong contrast, no genes in the same multiple TDP-43–binding sites across the full length of the pre-mRNA intron length category were upregulated on TDP-43 depletion and only (Supplementary Fig. 7a), consistent with the results of the global analy- 30% of upregulated genes with 10–100-kb-long introns were TDP-43 sis (Fig. 3a). qRT-PCR verified the TDP-43–dependent reduction of ­targets (Table 1). The crucial role of TDP-43 in maintaining the mRNA all the long transcripts that we tested (Fig. 3b). Chat had a median abundance of long intron-containing genes was also reflected by the intron size of <10 kb, with TDP-43 clusters restricted to a single intronic downregulation after TDP-43 depletion of ~10% of genes with >10-kb- site (Supplementary Fig. 7a). Nevertheless, qRT-PCR confirmed the long introns, the large majority (123 of 128, 96%) of which are direct RNA-seq result of a significant reduction in Chat levels after TDP-43 TDP-43 targets. depletion (P = 0.005; Fig. 3b). analysis showed that TDP-43 targets whose expres- Only 18% of the upregulated genes were direct targets of TDP- © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 sion was downregulated after TDP-43 depletion were highly enriched 43 (Table 1) and gene ontology analysis revealed an enrichment for for synaptic activity and function (Supplementary Figs. 5 and 6 and genes involved in the inflammatory response (Table 2), suggesting Supplementary Table 4). Notably, several genes with long introns tar- that their differential expression is an indirect consequence of TDP- geted by TDP-43 have crucial roles in synaptic function and have been 43 loss. However, of the 66 upregulated RNAs that contained CLIP- implicated in neurological diseases, such as subunit 2A of the NMDA seq clusters, 29% harbored TDP-43–binding site(s) in their 3′ UTR, receptor (Grin2a), the ionotropic glutamate receptor 6 (Grik2/GluR6), a percentage that is twofold higher than that of downregulated genes the calcium-activated alpha (Kcnma1), the voltage- (Supplementary Fig. 8). This suggests that TDP-43 represses gene dependent (Cacna1c), and the synaptic cell-adhesion expression when bound to 3′ UTRs.

Table 2 Main functional categories enriched within the TDP-43–regulated genes GO Downregulated genes (239) Upregulated genes (362) categories GO term Corrected P value GO term Corrected P value Molecular function GO:0006811 Ion transport 2.31 × 10–8 GO:0006952 Defense response 2.28 × 10–20 GO:0007268 Synaptic transmission 1.79 × 10–7 GO:0006954 Inflammatory response 4.97 × 10–12 Cellular component GO:0045202 Synapse 5.33 × 10–8 GO:0000323 Lytic vacuole 1.70 × 10–10 GO:0005886 Plasma membrane 2.35 × 10–13 GO:0005764 Lysosome 1.70 × 10–10 Biological process GO:0005216 activity 1.49 × 10–11 GO:0004197 Cysteine-type 6.00 × 10–4 endopeptidase activity GO:0022803 Passive transmembrane 7.06 × 10–12 GO:0003950 NAD+ ADP- 1.11 × 10–2 transporter activity ribosyltransferase activity Examples of the most significantly enriched Gene Ontology (GO) terms in the list of up- or downregulated genes on TDP-43 knockdown, as annotated. Downregulated genes, the majority of which were direct targets of TDP-43, were enriched for synaptic activity and function. In contrast, upregulated genes, most of which were not bound by TDP-43, encoded primarily inflammatory and immune response–related proteins. The P value indicated was corrected for multiple testing using the Benjamini-Hochberg method.

nature neuroscience advance online publication 5 articles

Figure 4 TDP-43 mediates alternative splicing a regulation of its RNA targets. (a) Schematic Constitutive EST/mRNA representation of different exon classes defined all exons defined P < 8 x 10–3 by EST and mRNA libraries, RNA-seq or splicing- cassette sensitive microarray data as indicated (left). Included Right, bar plot displaying the percentage of RNA-seq P < 3 x 10–6 all exons defined P < 2 x 10–3 exons that contain TDP-43 clusters within 2 kb excluded upstream and downstream of the exon-intron junctions. (b) Example of alternative splicing Included Junction array P < 10–3 array exons change on exon 18 of sortilin 1 analyzed using defined RNA-seq reads mapping to the exon body (left). excluded Arrows depict increased density of reads in the 0 10 20 30 TDP-43 knockdown samples compared with Percent of exons with TDP-43 binding within 2 kb controls. 76% of the spliced-junction reads b 2 kb in the TDP-43 knockdown samples supported Inclusion 76% (480 reads) inclusion versus only 19% in the control CLIP clusters TDP-43 KD 19% (117 reads) oligo–treated samples (right). (c) Comparison RNA-seq Control of mouse cassette exons detected by splicing- control sensitive microarrays to conserved exons in RNA-seq TDP-43 KD human orthologous genes. 85% and 57% of 24% (147 reads) human exons corresponding to the excluded Sort1 Exclusion 81% (498 reads) and included mouse exons, after TDP-43 depletion, respectively (left and middle pie c Human exons orthologous to mouse charts), contained EST and mRNA evidence for Exons excluded on Exons included on All mouse exons alternative splicing. As a control, the percentage Human EST/mRNA TDP-43 depletion in mouse TDP-43 depletion in mouse on junction array defined cassette exons 3% 7% 9% of human exons orthologous to all mouse exons Human constitutive 12% exons represented on the splicing-sensitive array 22% Human, other types 36% 57% and that have alternative splicing evidence of alternative splicing 85% 69% are shown in the right pie chart. (d) Semi- quantitative RT-PCR analyses of selected targets confirmed alternative splicing changes in TDP-43 d Exons included on TDP-43 depletion Exons excluded on TDP-43 depletion Control KD Control KD Control KD Control KD knockdown samples compared with controls. 3 1.5 12 4 2 1 8 Right, representative acrylamide gel pictures 2 1 0.5 4 of RT-PCR products from control or knockdown 0 0 0 0 adult brain samples. Quantification of splicing Sort1 Kcnd3 Ppp3ca Dst 3 3 3 15 changes from three biological replicates per 2 2 2 10 1 1 1 5 group (left); error bars represent s.d. 0 0 0 0 Sema3f Pdp1 Dclk1 Tsn

0.4 2 4 4 0.2 1 TDP-43 mediates alternative splicing of 2 2 0 0 0 0 its mRNA targets Atp2b2 Ddef2 Rbm33 Poldip3 8 2

Although TDP-43–binding sites were enriched Ratio inlcusion to exclusion 4 Ratio inlcusion to exclusion ex3 4 1 2 Control (n = 3) in distal introns (Fig. 1f), 11% (21,041 out of 0 0 0 4 ex2&3 TDP-43 KD (n = 3) 190,161) of all mouse exons, including both Eif4hMpzl1 2 0.8 © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 0 constitutive and alternative exons, contained 0.4 Kcnip2 TDP-43–binding site(s) in a 2-kb window 0 extending from the 5′ and 3′ exon-intron Dnajc5 boundaries (Fig. 4a). Compared with all exons, TDP-43 clusters were significantly enriched (P < 8 × 10−3) around exons As an independent method of identifying TDP-43–regulated exons, with transcript evidence for either alternative inclusion or exclusion RNAs from the same ASO-treated animals were analyzed on custom- (that is, cassette exons). Of the 8,637 known mouse cassette exons, designed splicing-sensitive Affymetrix microarrays27. Using a conserva- 15.1% contained TDP-43–binding sites in the exon or intron within tive statistical cutoff, we detected 779 alternatively spliced events that 2 kb of the splice sites. A splice index score for all exons, a measure significantly changed on TDP-43 depletion (Supplementary Fig. 9). similar to the ‘percent spliced in’ (or ψ) metric24, was determined by Notably, included (P < 10−3), but not excluded, exons (P < 0.3) were sig- the number of reads that mapped on exons as well as reads that mapped nificantly enriched for TDP-43 binding (~1.8-fold and ~1.3-fold, respec- at exon junctions (Fig. 4b). This analysis resulted in the identification tively) when compared with the unchanged exons on the microarray of 203 cassette exons that were differentially included (93) or excluded (Fig. 4a), similar to the trend seen by RNA-seq. The combined RNA-seq (110) (P < 0.01) when TDP-43 was depleted. Notably, sortilin 1, the gene and splicing-sensitive microarray data defined a set of 512 alternatively encoding the receptor for progranulin25,26, had the highest splice index spliced cassette exons whose splicing was affected by loss of TDP-43. The score, with exon 18 exclusion requiring TDP-43 (Fig. 4b). Included exons majority of human orthologs of these murine exons (85% of those with (P < 3 × 10−6) and, to a lesser extent, excluded exons (P < 2 × 10−3) iden- excluded and 57% with included exons) had prior EST/mRNA evidence tified by RNA-seq were significantly enriched (~2.7-fold and ~2.0-fold, for alternative splicing (Fig. 4c). respectively) for TDP-43 binding when compared to all mouse exons Semi-quantitative RT-PCR on selected RNAs validated splicing alter- (Fig. 4a). Only 33% of RNA-seq–verified TDP-43–regulated cassette ations with more inclusion or exclusion after TDP-43 reduction (Fig. 4d exons had previous expressed sequence tag (EST)/mRNA evidence for and Supplementary Fig. 10). Notably, varying the extent of TDP-43 alternative splicing, demonstrating that our approach identified previ- downregulation (between 40–80%) correlated with the magnitude of ously unknown alternative splicing events. splicing changes (Supplementary Fig. 11). However, the majority of

6 advance online publication nature neuroscience articles

abc altered splicing events observed on TDP-43 5 kb –3 WT TDP-43 Tg 120 P < 4 x 10 depletion do not have TDP-43 clusters within myc–TDP-43 2 kb of the splice sites, implicating longer- 100 end. TDP-43 range interactions or indirect effects of TDP- 80 GAPDH 43 through other splicing factors. Consistent 60 100 80 40 with this latter hypothesis, we identified TDP- 60 TDP-43 mRNA TDP-43 mRNA 20 40

43 binding on pre-mRNAs of RNA-binding Percentage endogenous 20 CLIP reads Percentage 0 endogenous 0 proteins, including Fus/Tls, Ewsr1, Taf15, combined WT TDP-43 Tg TDP-43 protein WT TDP-43 Tg Adarb1, Cugbp1, RBFox2 (Rbm9), Tia1, Nova 1 and 2, Mbnl, and neuronal Ptb (or Ptbp2). d Tet-on: 24 h 48 h After TDP-43 depletion, mRNA levels of GFP–TDP-43: ++–– GFP-TDP-43 Fus/Tls and Adarb1 were reduced and exon 5 58 in the Tia1 transcript was more included, 46 Endogenous CLIP clusters 30 TDP-43 whereas exon 10 in Ptbp2 was more excluded UGUGUG 25 (Supplementary Table 5). 1 Tardbp 2 GAPDH TDP-43 autoregulation through binding 3 on its 3′ UTR –2 e isoform 3–specific primers f P < 3 x 10 g We found TDP-43–binding sites in an alter- –3 180 P < 2 x 10 natively spliced intron in the 3′ UTR of the TDP-43 3’ UTR 3 Unrelated 3’ UTR 40 160 Short TDP-43 3’ UTR –6 P < 10 140 TDP-43 pre-mRNA (Fig. 5a). This binding Long TDP-43 3’ UTR 30 100 P < 10–12 120 does not coincide with a long stretch of UG P < 10–4 80 100 20 repeats, suggesting a lower strength of bind- –4 60 80 P < 10 ing (Supplementary Fig. 2a), consistent with a 40 60 10 isoform 3 TDP-43 mRNA TDP-43 mRNA isoform 3 Fold change endogenous 28 (normalized to RFU) recent report . TDP-43 mRNAs spliced at this 20 Percentage of control 40 isoform 3 TDP-43 mRNA TDP-43 mRNA isoform 3

Fold change endogenous 0 site (Fig. 5a) are predicted to be substrates for 0 20 Tet-on: 48 24 48 0 myc–TDP-43–HA – + – + nonsense-mediated RNA decay (NMD), a pro- GFP–TDP-43: – ++ No co-transfection RFP myc–TDP-43–HA siRNA UPF1 siRNA UPF1 ––+ + cess that targets mRNAs for degradation when exon-junction complexes (EJCs) deposited Figure 5 Autoregulation of TDP-43 through binding on the 3′ UTR of its own transcript. (a) CLIP-seq during splicing, located 3′ of the stop codon, reads and clusters on TDP-43 transcript showing binding mainly in an alternatively spliced part of the are not displaced during the pioneer round 3′ UTR, lacking long uninterrupted UG repeats. (b) qRT-PCR showing ~50% reduction of endogenous TDP-43 mRNA in transgenic mice overexpressing human myc–TDP-43 not containing introns and of translation29. In contrast, TDP-43 mRNAs 3′ UTR. (c) Immunoblots confirming the reduction of endogenous TDP-43 protein (upper panel) with an unspliced 3′ UTR would not have such to 50% of control levels (by densitometry, lower panel) in response to human myc–TDP-43 a premature termination codon and should overexpression. (d) Immunoblot showing reduction of endogenous TDP-43 protein in HeLa cells on escape NMD. This TDP-43 binding implies tetracycline induction of a transgene encoding GFP-myc–TDP-43–HA (annotated GFP–TDP-43). The autoregulatory mechanisms reminiscent to ~30-kDa product accumulating after 48 h (red arrow) was immunoreactive with four TDP-43–specific those reported for other RNA-binding pro- antibodies. (e) qRT-PCR using primers spanning the junctions of TDP-43 isoform 3 (upper panel), teins30,31. Indeed, expression in mice of a TDP- showed ~100-fold increase in response to tetracycline induction of GFP–TDP-43. TDP-43 isoform 3

© 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 was present at very low levels before tetracycline induction. (f) Luciferase assays showing significant 43–encoding transgene without the regulatory reduction of relative fluorescence units (RFUs) in cells expressing renilla luciferase under the control 3′ UTR (E.S., S.-C.L. and D.W.C., unpublished of long TDP-43 3′ UTR when co-transfected with myc–TDP-43–HA. Cells treated with siRNA against observations) led to significant reduction of UPF1 showed a significant increase of RFUs when expressing Luciferase with the long TDP-43 endogenous TDP-43 mRNA and protein 3′ UTR, but not in controls. (g) qRT-PCR scoring levels of endogenous TDP-43 isoform 3 in HeLa cells (P < 4 × 10–3; Fig. 5b,c) in the CNS. transiently transfected with myc–TDP-43–HA, UPF1 siRNA or both simultaneously. TDP-43 isoform 3 To identify the molecular basis of this mech- was increased in response to elevated TDP-43 protein levels, blocking of NMD (by UPF1 siRNA) and there was a synergistic effect in the combined condition. anism, we generated HeLa cells in which GFP- myc–TDP-43–HA mRNA lacking introns and 3′ UTR was transcribed from a single copy, tetracycline-inducible trans- sites) of the TDP-43 3′ UTR downstream of the stop codon of a renilla gene inserted at a predefined by site-directed (Flp) recombinase16. luciferase gene (Supplementary Fig. 12a). Both unspliced and spliced After 24 or 48 h of GFP-myc–TDP-43–HA induction, a substantial 3′ UTRs were determined to be present in brain RNAs from mouse and reduction of endogenous TDP-43 protein was observed, accompanied human CNSs (Supplementary Fig. 12a). Both variants, as well as an by accumulation of a shorter, ~30-kDa product (Fig. 5d) recognized unaltered luciferase reporter, were transfected into HeLa cells along with by four different TDP-43–specific antibodies. Although this ~30-kDa plasmids driving either increased TDP-43 expression or red fluorescent band could be derived from the transgene encoding TDP-43, it was protein (RFP; Supplementary Fig. 12b). Increased levels of TDP-43 pro- not recognized by antibodies to myc or HA and its size is compatible tein led to a significant reduction of luciferase produced from the gene with the endogenous TDP-43 isoform 3. Using qRT-PCR with primers carrying the long, intron-containing TDP-43 3′ UTR when compared with spanning the exon junctions of TDP-43 isoform 3, we found a ~100-fold the short or unrelated 3′ UTR (P < 10–4; Fig. 5f). Moreover, co-transfection increase of the spliced isoform 3 on overexpression of GFP-myc–TDP- of the reporters with siRNAs targeting UPF1 (Supplementary Fig. 12c), 43–HA protein (Fig. 5e). an essential component that marks an NMD substrate for degrada- To test whether TDP-43 drives splicing of its pre-mRNA through bind- tion32, enhanced luciferase produced by the intron-containing 3′ UTR ing to its 3′ UTR, we cloned a long unspliced version (containing the TDP- by ~1.5-fold, indicating that UPF1-dependent degradation of this mRNA 43–binding sites) and a short spliced version (without TDP-43–binding occurred (Fig. 5f). Lastly, the endogenous spliced isoform 3 of TDP-43

nature neuroscience advance online publication 7 articles

Figure 6 TDP-43 regulates expression of FUS/TLS and progranulin. a 5 kb (a) CLIP-seq reads and clusters on Fus/Tls transcript showing TDP-43 binding in introns 6 and 7 (also annotated as an alternative 3′ UTR) and the canonical 3′ UTR. RNA-seq reads from control or TDP-43 knockdown samples (equal scales) showed a slight reduction in Fus/Tls mRNA in the TDP-43 knockdown group and expression values from RNA-seq confirmed that Fus/Tls mRNA was reduced to 70% of control on TDP-43 reduction. (b) qRT-PCR confirmed mRNA downregulation on TDP-43 depletion. Fus/Tls CLIP reads s.d. was calculated within each group for three biological replicas. (c) Semi- combined quantitative immunoblot (upper panel) revealed a slight, but consistent, reduction of FUS/TLS protein in ASO-treated mice to ~70% of control levels, as quantified by densitometric analysis (lower panel). (d) CLIP-seq reads and clusters on progranulin transcript (Grn) showing a sharp binding in the 3′ UTR. RNA-seq reads from control or TDP-43 knockdown samples (equal scales) showed a significant increase in progranulin mRNA in the TDP-43 knockdown group compared with controls. (e) qRT-PCR confirmed −4 the statistically significant increase (P < 3 × 10 ) in progranulin mRNA in CLIP clusters samples with reduced TDP-43 levels when compared with controls. s.d. was calculated in each group for three biological replicas. RNA-seq Control

was increased, not only on elevated TDP-43 expression (by transient trans- RNA-seq KD/control = 0.7 TDP-43 KD fection), but also on blocking of NMD, with a synergistic effect in the combined conditions (Fig. 5g). Fus/Tls TDP-43 regulates expression of disease-related transcripts TDP-43 protein bound and directly regulated a variety of transcripts involved in neurological diseases (Supplementary Table 6 and Mammalian Supplementary Figs. 8 and 13), including Fus/Tls and Grn (Fig. 6), conservation which encode FUS/TLS and progranulin, mutations in which cause 10,11 33,34 ′ b c ALS or FTLD , respectively. TDP-43 bound to the 3 UTR of 120 Fus/Tls mRNA and to introns 6 and 7, all of which are highly conserved Control oligo TDP-43 KD 100 between mammalian species (Fig. 6a). Gene annotation and the pres- FUS/TLS ence of RNA-seq reads in these introns were consistent with either 80 an alternative 3′ UTR or intron retention. Fus/Tls mRNA and protein 60 GAPDH 120 were reduced to approximately 40% of their normal levels (Fig. 6b,c). Percentage 40 FUS/TLS mRNA 80 Progranulin mRNA, on the other hand, was markedly increased by 20 40 ~3–6-fold compared with controls (Fig. 6d,e). 0 Saline Control TDP-43 Percentage 0 CLIP-seq data also confirmed TDP-43 binding to two RNAs that were oligo KD FUS/TLS protein Control oligo TDP-43 KD previously reported to be associated with TDP-43: histone deacetylase 35 36 6 (Hdac6) and low–molecular weight neurofilament subunit (Nefl) . d 2 kb e © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 Our RNA-seq data indicate that HDAC6, which functions to promote the degradation of polyubiquitinated proteins, was reduced on TDP-43 CLIP reads depletion (Supplementary Fig. 14a,b), albeit to a lesser degree in vivo combined 7 6 than previously reported in cell culture35. It has been known for many CLIP clusters 5 years that Nefl mRNA levels are reduced in degenerating motor neurons 4 37 RNA-seq from individuals with ALS . We identified TDP-43 clusters in the 3′ UTR 3 Control of Nefl (Supplementary Fig. 14c). In addition, RNA-seq data confirmed 2 ′ 1 that the mouse Nefl 3 UTR was longer than annotated and Nefl mRNA RNA-seq KD/control = 3 levels were slightly reduced on TDP-43 depletion (Supplementary TDP-43 KD Fold change Grn mRNA 0 Saline Control TDP-43 Fig. 14d). Multiple TDP-43–binding sites were also present in the pre- oligo KD mRNA from the Mapt gene encoding tau, whose mutation or altered Grn splicing of exon 10 has been implicated in FTD38. However, neither the levels nor the splicing pattern of Mapt RNA were affected by TDP-43 depletion (Supplementary Table 6). Moreover, we identified multiple DISCUSSION TDP-43 intronic binding sites in the Hdh transcript (Supplementary TDP-43 is a central component in the pathogenesis of an ever-increas- Fig. 13), which encodes huntingtin, the protein whose polyglutamine ing list of neurodegenerative conditions. We created a genome-wide expansion causes Huntington’s disease in humans39, accompanied RNA map of >39,000 TDP-43–binding sites in the mouse transcrip- by cytoplasmic TDP-43 accumulations40. Moreover, Hdh levels were tome and determined that levels of 601 mRNAs and splicing patterns decreased in mouse brain on TDP-43 depletion (Supplementary Table 6). of 965 mRNAs were altered following TDP-43 reduction in the adult In contrast, we found no evidence for direct binding or TDP-43 regula- nervous system. Thus, although earlier efforts have implicated TDP- tion of the Sod1 transcript (Supplementary Table 6), whose aberrant 43 as a splicing regulator of a few candidate genes20,43,44, our RNA-seq splicing in familial ALS cases41,42 has raised the possibility that SOD1 and microarray results establish that TDP-43 regulates the largest set missplicing may be involved in the pathogenesis of sporadic ALS. (512) of cassette exons thus far reported, demonstrating its broad role in

8 advance online publication nature neuroscience articles

alternative splicing regulation. We also found that TDP-43 was required synthesis of new TDP-43 in the cytoplasm whose subsequent co-aggre- for maintenance of 42 non-overlapping, noncoding RNAs. gation into the initial complexes would drive their growth. Disrupted These findings also provide a direct test for how the nuclear loss autoregulation, by any event that lowers nuclear TDP-43, thus provides of TDP-43 widely reported in the remaining motor neurons in ALS a mechanistic explanation for what may be a critical, intermediate step autopsy samples1 may contribute to neuronal dysfunction, independent in the molecular mechanisms underlying age-dependent degeneration of potential damage from TDP-43 aggregates. The results of our experi- and death of neurons in TDP-43 proteinopathies. ments using in vivo reduction of TDP-43 coupled with RNA sequencing indicate that TDP-43 is crucial for sustaining levels of 239 mRNAs, METHODS including those encoding synaptic proteins, the choline acetyltrans- Methods and any associated references are available in the online ver- ferase, and the disease-related proteins FUS/TLS and progranulin. A sion of the paper at http://www.nature.com/natureneuroscience/. substantial proportion of these pre-mRNAs are directly bound by TDP- 43 at multiple sites in exceptionally long introns, a feature that we found Accession codes. Microarray CEL files and sequenced reads have been deposited most prominently in brain-enriched transcripts (Fig. 3), thereby identi- at the Gene Expression Omnibus database repository and the NCBI Short Read Archive, respectively. All our data (microarray, RNA-seq and CLIP-seq) are fying one component of neuronal vulnerability from TDP-43 loss. combined as a ‘SuperSeries entry’ under one accession number GSE27394. A plausible model for the role of TDP-43 in sustaining the levels of mRNAs derived from long pre-mRNAs is that TDP-43 binding in long Note: Supplementary information is available on the Nature Neuroscience website. introns prevents unproductive splicing events that would introduce premature stop codons and thereby promote RNA degradation. Our ACKNOWLEDGMENTS The authors would like to thank members of B. Ren’s laboratory, especially Z. Ye, results thus identify a previously unknown conserved role for TDP-43 in S. Kuan and L. Edsall for technical help with the Illumina sequencing and regulating a subset of these long intron-containing brain-enriched genes. U. Wagner for helpful discussions, K. Clutario and J. Boubaker for technical help, None of our evidence eliminates the possibility that TDP-43 affects RNA as well as all of the members of the Yeo and Cleveland laboratories, M. Ares Jr levels by additional mechanisms, such as through transcription regula- for generous support, and the neuro-team of ISIS Pharmaceuticals for critical comments and suggestions on this project. M.P. is the recipient of a Human tion or by facilitation of RNA polymerase elongation, similar to what has Frontier Science Program Long Term Fellowship. C.L.-T. is the recipient of the been shown for another splicing regulator, SC35 (ref. 45). Milton-Safenowitz post-doctoral fellowship from the Amyotrophic Lateral FUS/TLS is another RNA-binding protein whose mutation causes Sclerosis Association. D.W.C. receives salary support from the Ludwig Institute ALS10,11 and, in some rare cases, FTLD-U. Similar to TDP-43, FUS/ for Cancer Research. S.C.H. is funded by a National Science Foundation Graduate TLS aggregation has been observed in different neurodegenerative Research Fellowship. This work was supported by grants from the US National Institutes of Health (R37 NS27036 and an American Recovery and Reinvestment conditions, including Huntington’s disease and spinocerebellar ataxia Act Challenge grant) to D.W.C and partially by grants from the US National (reviewed in ref. 3). We have now shown that Fus/Tls mRNA is a direct Institutes of Health (HG004659 and GM084317) and the Stem Cell Program at the target of TDP-43 and its level is reduced on TDP-43 depletion (Fig. 6), University of California, San Diego to G.W.Y. thereby identifying a previously unknown FUS/TLS dependency on AUTHOR CONTRIBUTIONS TDP-43. The latter is also true for two additional disease-relevant pro- M.P., C.L.-T., J.M. and T.Y.L. performed the experiments. K.R.H., S.C.H. and T.Y.L. teins, progranulin and its proposed receptor sortilin. Progranulin levels conducted the bioinformatics analysis. S.-C.L. developed the monoclonal TDP-43– were sharply increased on TDP-43 reduction and splicing of sortilin was specific antibody used for CLIP-seq and the tetracycline-inducible GFP–TDP-43– altered. In fact, TDP-43 directly binds and regulates the levels and splic- expressing HeLa cells. S.-C.L. and E.S. generated the transgenic myc–TDP-43 mice. 46 J.P.D. and L.S. conducted the preliminary splice-junction microarray analyses. M.P., ing patterns of transcripts implicated in various neurologic diseases C.L.-T., E.W., C.M., Y.S., C.F.B. and H.K. conducted the antisense oligonucleotide (Supplementary Table 6 and Supplementary Fig. 13), consistent with experiments. M.P., C.L.-T., K.R.H., G.W.Y. and D.W.C. designed the experiments. a broad role of TDP-43 in these conditions3. M.P., C.L.-T., K.R.H., S.C.H., G.W.Y. and D.W.C. wrote the paper. © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 Finally, in contrast with a recent study that reported an ­autoregulatory COMPETING FINANCIAL INTERESTS 28 mechanism for TDP-43 that is independent of pre-mRNA splicing , The authors declare no competing financial interests. our results indicate that TDP-43 acts as a splicing regulator to reduce its own expression level by binding to the 3′ UTR of its own pre-mRNA. Published online at http://www.nature.com/natureneuroscience/. 28 Reprints and permissions information is available online at http://www.nature.com/ Although there may be additional mechanisms beyond NMD , we found reprintsandpermissions/. that TDP-43 enhanced splicing of an alternative intron in its own 3′ UTR, thereby autoregulating its levels through a mechanism that involves splic- ing-dependent RNA degradation by NMD (Fig. 5). TDP-43 autoregula- 1. Neumann, M. et al. Ubiquitinated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Science 314, 130–133 (2006). tion occurs in the mammalian CNS, as seen by a significant reduction of 2. Arai, T. et al. TDP-43 is a component of ubiquitin-positive tau-negative inclusions endogenous TDP-43 mRNA and protein in response to the expression of in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Biochem. Biophys. Res. Commun. 351, 602–611 (2006). a TDP-43 transgene lacking the regulatory intron, as we found and others 3. Lagier-Tourenne, C., Polymenidou, M. & Cleveland, D.W. TDP-43 and FUS/TLS: emerg- 47,48 have reported previously . Both spliced and unspliced TDP-43 RNAs ing roles in RNA processing and neurodegeneration. Hum. Mol. Genet. 19, R46–R64 were found in human and mouse brain, consistent with autoregulation (2010). 4. Gitcho, M.A. et al. TDP-43 A315T mutation in familial motor neuron disease. Ann. at normal TDP-43 levels that substantially attenuates TDP-43 synthesis Neurol. 63, 535–538 (2008). through production of unstable, spliced RNA. 5. Kabashi, E. et al. TARDBP mutations in individuals with sporadic and familial amyo- TDP-43-dependent splicing of its 3′ UTR intron as a key component trophic lateral sclerosis. Nat. Genet. 40, 572–574 (2008). 6. Sreedharan, J. et al. TDP-43 mutations in familial and sporadic amyotrophic lateral of a TDP-43 autoregulatory loop could participate in a feedforward sclerosis. Science 319, 1668–1672 (2008). mechanism enhancing the cytoplasmic TDP-43 aggregates that are hall- 7. Van Deerlin, V.M. et al. TARDBP mutations in amyotrophic lateral sclerosis with TDP-43 neuropathology: a genetic and histopathological analysis. Lancet Neurol. 7, marks of familial and sporadic ALS. Following an initiating insult (for 409–416 (2008). example, one that traps some TDP-43 in initial cytoplasmic aggregates), 8. Buratti, E. et al. Nuclear factor TDP-43 can affect selected microRNA levels. the reduction in nuclear TDP-43 levels would decrease splicing of its 3′ FEBS J. 277, 2268–2281 (2010). 9. Cooper, T.A., Wan, L. & Dreyfuss, G. RNA and Disease. Cell 136, 777–793 (2009). UTR, which would in turn produce an elevated pool of stable TDP-43 10. Kwiatkowski, T.J. Jr. et al. Mutations in the FUS/TLS gene on 16 cause mRNA. Repeated translation of that TDP-43 mRNA would increase familial amyotrophic lateral sclerosis. Science 323, 1205–1208 (2009).

nature neuroscience advance online publication 9 articles

11. Vance, C. et al. Mutations in FUS, an RNA processing protein, cause familial amyo- 31. Sureau, A., Gattoni, R., Dooghe, Y., Stevenin, J. & Soret, J. SC35 autoregulates its trophic lateral sclerosis type 6. Science 323, 1208–1211 (2009). expression by promoting splicing events that destabilize its mRNAs. EMBO J. 20, 12. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quan- 1785–1796 (2001). tifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008). 32. Perlick, H.A., Medghalchi, S.M., Spencer, F.A., Kendzior, R.J. Jr. & Dietz, H.C. 13. Ule, J. et al. CLIP identifies Nova-regulated RNA networks in the brain. Science 302, Mammalian orthologs of a yeast regulator of nonsense transcript stability. Proc. Natl. 1212–1215 (2003). Acad. Sci. USA 93, 10928–10932 (1996). 14. Licatalosi, D.D. et al. HITS-CLIP yields genome-wide insights into brain alternative 33. Baker, M. et al. Mutations in progranulin cause tau-negative frontotemporal dementia RNA processing. Nature 456, 464–469 (2008). linked to chromosome 17. Nature 442, 916–919 (2006). 15. Yeo, G.W. et al. An RNA code for the FOX2 splicing regulator revealed by mapping 34. Cruts, M. et al. Null mutations in progranulin cause ubiquitin-positive frontotemporal RNA-protein interactions in stem cells. Nat. Struct. Mol. Biol. 16, 130–137 (2009). dementia linked to chromosome 17q21. Nature 442, 920–924 (2006). 16. Ling, S.C. et al. ALS-associated mutations in TDP-43 increase its stability and promote 35. Fiesel, F.C. et al. Knockdown of transactive response DNA-binding protein (TDP-43) TDP-43 complexes with FUS/TLS. Proc. Natl. Acad. Sci. USA 107, 13318–13323 downregulates histone deacetylase 6. EMBO J. 29, 209–221 (2010). (2010). 36. Strong, M.J. et al. TDP43 is a human low molecular weight neurofilament (hNFL) 17. Zisoulis, D.G. et al. Comprehensive discovery of endogenous Argonaute binding sites mRNA-binding protein. Mol. Cell. Neurosci. 35, 320–327 (2007). in Caenorhabditis elegans. Nat. Struct. Mol. Biol. 17, 173–179 (2010). 37. Bergeron, C. et al. Neurofilament light and polyadenylated mRNA levels are decreased 18. Sephton, C.F. et al. Identification of neuronal RNA targets of TDP-43-containing in amyotrophic lateral sclerosis motor neurons. J. Neuropathol. Exp. Neurol. 53, ribonucleoprotein complexes. J. Biol. Chem. 286, 1204–1215 (2011). 221–230 (1994). 19. Mili, S. & Steitz, J.A. Evidence for reassociation of RNA-binding proteins after cell 38. Hutton, M. et al. Association of missense and 5′-splice-site mutations in tau with the lysis: implications for the interpretation of immunoprecipitation analyses. RNA 10, inherited dementia FTDP-17. Nature 393, 702–705 (1998). 1692–1694 (2004). 39. MacDonald, M.E. et al. A novel gene containing a trinucleotide repeat that is expanded 20. Buratti, E. et al. Nuclear factor TDP-43 and SR proteins promote in vitro and in vivo and unstable on Huntington’s disease . Cell 72, 971–983 (1993). CFTR exon 9 skipping. EMBO J. 20, 1774–1784 (2001). 40. Schwab, C., Arai, T., Hasegawa, M., Yu, S. & McGeer, P.L. Colocalization of transacti- 21. Chi, S.W., Zang, J.B., Mele, A. & Darnell, R.B. Argonaute HITS-CLIP decodes vation-responsive DNA-binding protein 43 and huntingtin in inclusions of Huntington microRNA-mRNA interaction maps. Nature 460, 479–486 (2009). disease. J. Neuropathol. Exp. Neurol. 67, 1159–1165 (2008). 22. Parkhomchuk, D. et al. Transcriptome analysis by strand-specific sequencing of 41. Valdmanis, P.N. et al. A mutation that creates a pseudoexon in SOD1 causes familial complementary DNA. Nucleic Acids Res. 37, e123 (2009). ALS. Ann. Hum. Genet. 73, 652–657 (2009). 23. Pang, K.C. et al. RNAdb—a comprehensive mammalian noncoding RNA database. 42. Birve, A. et al. A novel SOD1 splice site mutation associated with familial ALS revealed Nucleic Acids Res. 33, D125–D130 (2005). by SOD activity analysis. Hum. Mol. Genet. 19, 4201–4206 (2010). 24. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 43. Mercado, P.A., Ayala, Y.M., Romano, M., Buratti, E. & Baralle, F.E. Depletion of TDP 456, 470–476 (2008). 43 overrides the need for exonic and intronic splicing enhancers in the human apoA-II 25. Hu, F. et al. Sortilin-mediated endocytosis determines levels of the frontotemporal gene. Nucleic Acids Res. 33, 6000–6010 (2005). dementia protein, progranulin. Neuron 68, 654–667 (2010). 44. Dreumont, N. et al. Antagonistic factors control the unproductive splicing of SC35 26. Carrasquillo, M.M. et al. Genome-wide screen identifies rs646776 near sortilin as a regula- terminal intron. Nucleic Acids Res. 38, 1353–1366 (2009). tor of progranulin levels in human pasma. Am. J. Hum. Genet. 87, 890–897 (2010). 45. Lin, S., Coutinho-Mansfield, G., Wang, D., Pandit, S. & Fu, X.D. The splicing factor 27. Du, H. et al. Aberrant alternative splicing and extracellular matrix gene expression in SC35 has an active role in transcriptional elongation. Nat. Struct. Mol. Biol. 15, mouse models of myotonic dystrophy. Nat. Struct. Mol. Biol. 17, 187–193 (2010). 819–826 (2008). 28. Ayala, Y.M. et al. TDP-43 regulates its mRNA levels through a negative feedback loop. 46. Tollervey, J.R. et al. Characterizing the RNA targets and position-dependent splicing EMBO J. 30, 277–288 (2011). regulation by TDP-43. Nat. Neurosci. advance online publication, doi:10.1038/nn.2778 29. Le Hir, H., Moore, M.J. & Maquat, L.E. Pre-mRNA splicing alters mRNP composition: (27 February 2011). evidence for stable association of proteins at exon-exon junctions. Genes Dev. 14, 47. Xu, Y.F. et al. Wild-type human TDP-43 expression causes TDP-43 phosphorylation, 1098–1108 (2000). mitochondrial aggregation, motor deficits, and early mortality in transgenic mice. 30. Wollerton, M.C., Gooding, C., Wagner, E.J., Garcia-Blanco, M.A. & Smith, C.W. J. Neurosci. 30, 10851–10859 (2010). Autoregulation of polypyrimidine tract binding protein by alternative splicing leading 48. Igaz, L.M. et al. Dysregulation of the ALS-associated gene TDP-43 leads to neuronal to nonsense-mediated decay. Mol. Cell 13, 91–100 (2004). death and degeneration in mice. J. Clin. Invest. 121, 726–738 (2011). © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011

10 advance online publication nature neuroscience ONLINE METHODS cat #A300-302A; 1:5,000), rabbit antibody to TDP-43 (Proteintech, cat #10782; CLIP-seq library preparation and sequencing. Brains from 8-week-old female 1:2,000), rabbit antibody to TDP-43 (Aviva System Biology, cat #ARP38942_T100; C57Bl/6 mice were rapidly dissociated by forcing through a cell strainer with a 1:2,000), custom-made mouse antibody to TDP-43 (1:1,000)16, custom-made pore size of 100 µm (BD Falcon) before ultraviolet crosslinking. CLIP-seq librar- rabbit antibody to RFP raised against the full-length protein (1:7,000), mouse ies were constructed as previously described15, using a custom-made mouse DM1α antibody to tubulin (1:10,000) and mouse antibody to GAPDH (Abcam, monoclonal antibody to TDP-43 (ref. 16, 40 µl of ascites per 400 µl of beads cat #AB8245; 1:10,000). per sample). Libraries were subjected to standard Illumina GA2 sequencing protocol for 36 cycles. Cells, cloning and luciferase assays. HeLa Flp-In cells expressing GFP-myc– TDP-43–HA were generated as previously described16. Isogenic cell lines were Generation of transgenic mice. All animal procedures were conducted in accor- grown at 37 °C and 5% CO2 in Dulbecco’s modified Eagle medium (DMEM), dance with the guidelines of the Institutional Animal Care and Use Committee supplemented with 10% tetracycline-free fetal bovine serum (vol/vol) and pen- of the University of California. cDNA containing N-terminal myc-tagged full- icillin/streptomycin. Expression of GFP-myc–TDP-43–HA was induced with length human TDP-43 were amplified and digested by SalI and cloned into the 1 mg ml–1 tetracycline for 24–48 h. XhoI cloning site of the MoPrP.XhoI vector (ATCC #JHU-2). The resultant To assess the mechanism of TDP-43 autoregulation, we cloned the proximal MoPrP.XhoI-myc-hTDP-43 construct was then digested upstream of the mini- part of mouse TDP-43 3′ UTR into the psiCHECK-2 vector (Promega), which mal Prnp promoter and downstream of the Prnp exon 3 using BamHI and NotI contains a Renilla luciferase and a Firefly luciferase reporter expression cassettes. and cloned into a shuttle vector containing loxP flanking sites. The final con- The following primers were used to amplify 1.7 kb of TDP-43 3′ UTR using struct was then linearized using XhoI, injected into the pro-nuclei of fertilized cDNA from mouse brain: 5′-aaa CTC GAG cag gct ttt ggt tct gga C56Bl6/C3H hybrid eggs and implanted into pseudopregnant female mice. aa-3′ and 5′-aaa gcg gcc gcA CCA TTT TAG GTG CGG TCA C-3′. We obtained two products of 1.7 kb and 1.1 kb, corresponding, respectively, to an Stereotactic injections of antisense oligonucleotides. We anesthetized unspliced and a spliced isoform of TDP-43 3′ UTR (Supplementary Fig. 13a). 8–10-week-old female C57Bl/6 mice with isofluorane. Using stereotaxic guides, Both products were purified on 1% agarose gels and cloned independently in 3 µl of ASO solution, corresponding to a total of 75 µg or 100 µg ASOs, or saline the psiCHECK-2 vector using NotI and XhoI restriction sites located 3′ to the buffer was injected using a Hamilton syringe directly into the striatum. Mice Renilla luciferase translational stop. Given that binding sites for TDP-43 mainly were monitored for any adverse effects for the next 2 weeks until they were killed. lie in the alternative intron, the spliced isoform was used as a control to assess The striatum and adjacent cortex area were dissected and frozen at –80 °C in the effect of TDP-43 protein on its own RNA. 1 ml Trizol (Invitrogen). Trizol extraction of RNA and protein was performed Human myc-TDP-43-HA cDNA (a generous gift from C. Shaw, King’s College according to the manufacturer’s instructions. London) was cloned into mammalian expression vector, pCl-neo (Promega). RNA quality and RNA-seq library preparation. RNA quality was measured using RFP was cloned in the vector pcDNA3 (Invitrogen). ′ the Agilent Bioanalyzer system according to the manufacturer’s recommenda- 250 ng of psiCHECK-2 vector with or without TDP-43 3 UTR and 250 ng of the tions. RNA-seq libraries were constructed as described previously22. We used 8 pM vector expressing TDP-43 or RFP were co-transfected in Hela cells using Fugene 6 of amplified libraries for sequencing on the Illumina GA2 for 72 cycles. transfection reagent (Roche) in 12-wells cell culture plates. Luciferase assays were performed 48 h after transfection using the Dual-Luciferase Reporter 1000 assay RT and qRT-PCR. cDNA of total RNA extracted from striatum was generated system (Promega) according to the manufacturer’s instructions. Five independent using oligodT and Superscript III reverse transcriptase (Invitrogen) according experiments were performed and 20 µl of lysate were used in duplicate for each to the manufacturer’s instructions. To test candidate splicing targets, RT-PCR condition. RFU for Renilla luciferase were normalized to firefly luciferase to con- amplification using between 24 and 27 cycles were performed from at least three trol for transfection efficiency. Duplicates were averaged and each condition was mice treated with a control ASO and three mice with treated with a TDP-43 ASO. expressed as a percentage of the samples without transfection of TDP-43 or RFP Products were separated on 10% polyacrylamide gels followed by staining with cDNA. Inter-group differences were assessed by two-tailed Student’s t test. SYBR gold (Invitrogen). Quantification of the different isoforms was performed with ImageJ software (US National Institutes of Health). Intensity ratio between Mouse gene structure annotations. The mouse genome sequence (mm8) and annotations for protein-coding genes were obtained from the UCSC Genome © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 products with included and excluded exons were averaged for three biological replicates per group. Browser. Known mouse genes (knownGene containing 31,863 entries) and qRT-PCR for mouse Tardbp and Fus/Tls were performed using the Express known isoforms (knownIsoforms containing 31,014 entries in 19,199 unique iso- One-Step SuperScript qRT-PCR kits (Invitrogen) and the thermocycler ABI form clusters) with annotated exon alignments to the mouse genomic sequence Prism 7700 (Applied Biosystems). cDNA synthesis and amplification were per- were processed as follows. Known genes that were mapped to different isoform formed according to the manufacturer’s instruction using specific primers and clusters were discarded. All mRNAs aligned to mm8 that were greater than 300 5′ FAM, 3′ TAMRA-labeled probes. The cyclophilin gene was used to normalize nucleotides were clustered together with the known isoforms. For the purpose the expression values. of inferring alternative splicing, genes containing fewer than three exons were qRT-PCR for all other genes tested were performed with 3–5 mice for each removed from further consideration. A total of 1.9 million spliced ESTs were group (treated with saline, control ASO or ASO to TDP-43) and two technical mapped onto the 16,953 high-quality gene clusters to identify alternative splic- replicates using the iQ SYBR green supermix (Bio-Rad) on the IQ5 multicolor ing events. Final annotated gene regions were clustered together so that any real-time PCR detection system (Bio-Rad). The analysis was done using the IQ5 overlapping portion of these databases was defined by a single genomic position. optical system software (Bio-Rad; version 2.1). Expression values were normal- Promoter regions were arbitrarily defined as 1.5 kb upstream of the transcrip- ized to at least two of the following control genes: Actb, Actg1 and Rsp9. Expression tional start site of the gene and intergenic regions as unannotated regions in values were expressed as a percentage of the average expression of the saline the genome. To identify 5′ and 3′ UTRs, we relied on the coding annotation in treated samples. Inter-group differences were assessed by two-tailed Student’s UCSC Genome Browser known genes that we extended 1.5 kb downstream or t test. Primers for RT-PCR and qRT-PCR were designed using Primer3 software upstream the start and stop codons, respectively. (http://frodo.wi.mit.edu/primer3/) and sequences are available on request. Comparison to human alternatively spliced exons. To find evidence for conserved Immunoblots. Proteins were separated on custom-made 12% SDS page gels alternative splicing patterns in humans, we used the UCSC LiftOver tool to obtain and transferred to nitrocellulose membrane (Whatman) following standard the corresponding human coordinates of the mouse exons that had evidence for protocols. Membranes were blocked overnight in Tris-buffered saline Tween-20 differential splicing either from RNA-seq or splicing-sensitive microarray data. (TBST) and 5% non-fat dry milk (wt/vol) at 4 °C, incubated for 1 h at 20 °C with These coordinates were then compared with human gene structure primary antibodies and then again with horseradish peroxidise–linked second- annotations constructed analogously to mouse annotations as described above to ary antibodies to rabbit or mouse (GE Healthcare) in TBST with 1% milk. For determine whether the exon in the human ortholog was alternatively or constitu- primary antibodies, we used rabbit antibody to FUS/TLS (Bethyl Laboratories, tively spliced based on transcript evidence in human EST or mRNA databases49.

nature neuroscience doi:10.1038/nn.2779 Computational identification of CLIP-seq clusters. CLIP-seq reads were Functional annotation analysis. We used the Database for Annotation, trimmed to remove adaptor sequences and homopolymeric runs, and mapped Visualization and Integrated Discovery (DAVID Bioinformatic Resources 6.7; to the repeat-masked mouse genome (mm8) using the bowtie short-read align- http://david.abcc.ncifcrf.gov/). For all genes downregulated or upregulated on ment software (version 0.12.2) with parameters -q -p 4 -e 100 -y -a -m 10 --best TDP-43 depletion, a background corresponding to all genes expressed in brain --strata (-q for fastq file, -p for cpus, -e for mismatch threshold, -y to do a sensi- was used. tive search, -a to report all alignments, --best for ordered output, --strata report only the best group of alignments), incorporating the base-calling quality of the Transcriptome and splicing analysis. Strand-specific RNA-seq reads each from reads. To eliminate redundancies created by PCR amplification, we considered control oligonucleotide, saline- and TDP-43 oligonucleotide–treated animals all reads with identical sequence to be a single read. Significant clusters were were generated and ~50% mapped uniquely to our annotated gene structure calculated by first determining read number cutoffs using the Poisson ­distribution, database using the bowtie short-read alignment software (version 0.12.2, with λ k e−λ fk();λ = , where λ is the frequency of reads mapped over an interval of parameters -m 5 -k 5 --best --un --max -f) incorporating the base-calling quality k! of the reads. To eliminate redundancies created by PCR amplification, all reads nucleotide sequence, k is the number of reads being analyzed for significance and with identical sequence were considered single reads. The expression value of f(k;λ) return the probability that exactly k reads would be found. For any desired each gene was computed by RPKM. Each RNA-seq sample was summarized by P value, P cutoff, a read number cutoff was calculated by summing the probabil- a vector of RPKM values for every gene and pair-wise correlation coefficients ities for finding k or more tags, and determining the minimum value of k that were calculated for all replicates using a linear least-squares regression against satisfies i = k such that f(k;λ) > P cutoff. The frequency λ was calculated by divid- the log RPKM vectors. Hierarchical clustering revealed that the three treated ing the total number of mapped reads by the number of non-overlapping intervals conditions clustered into similar groups. The reads in each condition were com- present in the transcriptome. The interval size was chosen on the basis of the bined to identify genes that were significantly up- and downregulated on TDP-43 average size of the CLIP product, which includes only the selected RNA frag- depletion. Local mean and s.d. were calculated for the nearest 1,000 genes, as ments, but not any ligated adapters (150 bp). A global and a local cutoff were determined by log average RPKM values between knockdown and control and a determined using the whole transcriptome frequency or gene-specific frequency, local Z score was defined. The resulting Z scores were used to assign significantly respectively. The gene-specific frequency was the number of reads overlapping changed genes (Z > 2 upregulated, Z < –2 downregulated), as well as ranking that gene divided by the pre-mRNA length. A sliding window of 150 bp was used the entire gene list for relative expression changes. Specific parameters, such as to determine where the read numbers exceed both the global and local cutoffs. intron length or TDP-43–binding sites, were plotted for the next 100 genes. At each significant interval, we attempted to extend the region by adding in the Exons with canonical splice signals (GT-AG, AT-AC, GC-AG) were retained, next read, ensuring continued significance at the same P cutoff. resulting in a total of 190,161 exons. For each protein-coding gene, the 50 bases at the 3′ end of each exon were concatenated with the 50 bases at the 5′ end of Log curve analysis to determine target saturation of CLIP-derived clusters. We the downstream exon producing 1,827,013 splice junctions. An equal number used a curve-fitting approach with various sampling rates to estimate the number of ‘impossible’ junctions was generated by joining the 50-base exon junction of CLIP-seq reads required to discover additional CLIP clusters. The target rate was sequences in reverse order. To identify differentially regulated alternative cas- calculated by determining the new set of CLIP-derived gene targets found at each sette exons, we employed a modification of a published method24. In short, the step-wise increase in the number of sequenced reads. A scatter plot of targets found read count supporting inclusion of the exon (overlapping the cassette exon and versus sampling rate was fitted with a log curve, which was then used to extrapolate splice junctions including the exon) were compared with the read count support- the number of targets expected to be found by increasing read counts. ing exclusion of the exon (overlapping the splice junction of the upstream and downstream exon. For a splice junction read to be enumerated, we required that Splicing-sensitive microarray data analysis. Microarray data analysis was per- at least 18 nucleotides of the read aligned and 5 bases of the read extended over formed as previously described, selecting significant, high-quality events using the splice junction with no mismatches 5 bases around the splice junction. For a q value > 0.05 and an absolute separation score (sepscore) > 0.5 (ref. 27). The the TDP-43 and control ASOs comparison, we constructed a 2 × 2 contingency equation for sepscore = log [TDP-43 depleted (Skip/Include)/ Control treated 2 table using of the counts of the reads supporting the inclusion and exclusion of (Skip/Include)]. For each replicate set, the log ratio of skipping to inclusion was 2 the exon, in two conditions. Every cell in the 2 × 2 table had to contain at least estimated using robust least-squares analysis. Previously published work using 2 27 five counts for a χ statistic to be computed. At P < 0.01, 110 excluded and 93

© 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011 similar cutoffs have validated about 85% of splicing events by RT-PCR . included single cassette exons were detected to be differentially regulated by Events on the array were defined in the mm9 genome annotation, and for TDP-43. As an estimate of false discovery, we observed that ~20 single cassette proper comparison to RNA-seq and CLIP-seq data, cassette events were converted exons were detected by using the impossible junction database. to the mm8 genome annotation using the UCSC LiftOver tool. If an event did not exactly overlap an mm8 annotated exon, it was left out of further analyses. Pre-mRNA features and tissue-specificity analysis. Affymetrix microarray data Genomic analysis of CLIP clusters. Two biologically independent TDP-43 CLIP- representing 61 mouse tissues were downloaded from the Gene Expression seq libraries were generated and sequenced on the Illumina GA2. Subjecting reads Omibus repository (http://www.ncbi.nih.gov/geo) under accession number from each library to our cluster-identification pipeline described above defined GSE1133 (ref. 50). Probes on the microarrays were cross-referenced to 15,541 15,344 and 30,744 clusters for CLIP experiments 1 and 2, respectively. A gene genes in our database using files downloaded from the UCSC Genome Browser was considered to be a TDP-43 target that overlapped in both experiments if it (knownToGnf1m and knownToGnfAtlas2). The expression value for each gene contained an overlapping cluster. We generated ten randomly distributed cluster was represented by the average value of the two replicate microarray experiments sets and compared each to the original clusters. To compute the significance of for each tissue. To identify genes enriched in brain, we grouped 13 tissues as the overlap, we calculated the standard or Z-score as follows: (percent overlap in ‘brain’ (substantia nigra, hypothalamus, preoptic, frontal cortex, cerebral cortex, the two experiments – mean percent overlap in ten randomly distributed cluster amygdala, dorsal striatum, hippocampus, olfactory bulb, cerebellum, trigeminal, sets)/ s.d. of percent overlap in the randomly distributed cluster sets. A P value dorsal root ganglia, pituitary) and the remaining 44 tissues as ‘non-brain’, was computed from the standard normal distribution and assigned significance if ­excluding the five embryonic tissues (embryonic days 10.5, 9.5, 8.5, 7.5 and 6.5). ()µµ− it was lower than P < 0.01. To generate the final set of TDP-43 CLIP-seq clusters, For each gene, the t statistic was computed as t = brainnon-brain , w h e r e 22 unique reads from both experiments were combined and subjected to our cluster- ssbrainn+ on-brain identification pipeline. Overall, clusters were at most 300 bases in length, with the mbrain(sbrain) and mnon-brain(snon-brain) were the average (s.d.) of the gene expres- median of 142 bases. As a comparison to a published HITS-CLIP/CLIP-seq data- sion values in brain and non-brain tissues, respectively. At a t statistic value set of a RNA binding protein in mouse brain, we downloaded Ago-HITS-CLIP cutoff of ≥1.5, 388 genes were categorized as being brain enriched. Concurrently, reads (Brain[A–E]_130_50_fastq.txt) from http://ago.rockefeller.edu/rawdata. at a cutoff of <1.5, 15,153 genes were categorized as being non-brain enriched. php and subjected the combined 1,651,104 reads to our cluster-identification and Random sets of 388 genes were selected from the 15,541 genes as controls for generated 33,390 clusters in 7,745 genes21. To identify enriched motifs in cluster the brain-enriched set. To determine whether pre-mRNA features were signifi- regions, Z scores were computed for hexamers as previously published15. cantly different in the set of brain-enriched genes (or randomly chosen genes)

doi:10.1038/nn.2779 nature neuroscience compared with non-brain–enriched genes, we performed the two-sample ‘brain’ (temporal lobe, globus pallidus, cerebellum peduncles, cerebellum, cau- Kolmogorov-Smirnov goodness-of-fit hypothesis test, which determines date nucleus, whole brain, parietal lobe, medulla oblongata, amygdala, prefrontal whether the distribution of features were drawn from the same underlying con- cortex, occipital lobe, hypothalamus, thalamus, subthalamic nucleus, cingulated tinuous population. Cumulative distribution plots of pre-mRNA features were cortex, pons, fetal brain), and the remaining 62 tissues as ‘non-brain’. At the same generated to illustrate the differences. t statistic cutoffs, 387 and 17,985 genes were categorized as brain enriched and Affymetrix microarray data representing 79 human tissues were downloaded non-brain enriched, respectively. Random sets of 387 genes were selected from from the Gene Expression Omibus repository under the same accession number the 18,372 genes as controls for the brain-enriched set. GSE1133 (ref. 50). Probes on the human microarrays were cross-referenced to 49. Yeo, G.W., Van Nostrand, E.L. & Liang, T.Y. Discovery and analysis of evolutionarily 18,372 genes in our database using files downloaded from the UCSC Genome conserved intronic splicing regulatory elements. PLoS Genet. 3, e85 (2007). Browser (knownToGnfAtlas2 – hg18). The same analysis as done for the mouse 50. Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. array data was repeated for these human array data. We grouped 17 tissues as Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004). © 2011 Nature America, Inc. All rights reserved. All rights Inc. America, Nature © 2011

nature neuroscience doi:10.1038/nn.2779 Polymenidou et al SUPPLEMENTARY INFORMATION

Long pre‐mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP‐43

Polymenidou M*, Lagier‐Tourenne C*, Hutt KR*, Huelga SC, Moran J, Liang TY, Ling SC, Sun E, Wancewicz E, Mazur C, Kordasiewicz H, Sedaghat Y, Donohue JP, Shiue L, Bennett CF, Yeo GW and Cleveland DW

Supplementary Figure 1 TDP‐43 CLIP‐seq method, reproducibility and saturation of binding sites. (a) Schematic representation of cross‐linking, immunoprecipitation and high‐throughput sequencing protocol (CLIP‐seq). (b) Autoradiographs of CLIP‐seq experiments performed in parallel using two different antibodies, a commercially available rabbit polyclonal TDP‐43 antibody (Aviva Systems Biology, Cat. No. ARP38942_T100) left and an in‐house monoclonal antibody recognizing an epitope within amino acids 251‐414 of human TDP‐4316. Under identical conditions and exposure times, our monoclonal antibody showed a much higher potency of immunoprecipitating TDP‐ 43‐RNA complexes. The orange box depicts the excised low mobility complexes used for the cDNA libraries for comparison to the monomeric TDP‐43 complexes. (c) Reproducibility of CLIP‐seq experiments on a genome‐wide scale. Clusters identified in independent CLIP‐seq experiments were considered to overlap if 1 base of a cluster in one experiment extended to a cluster in the other experiment (left panel). A gene containing an overlapping cluster was considered to overlap (right panel). Venn diagrams revealed statistically significant overlap between clusters and genes in replicate CLIP‐seq libraries, with randomly distributed clusters shown as a comparison (lower panel). (d) Gene target saturation chart plotting gene targets found by cluster‐finding on increasing sampling rates (5% intervals). Data follows a logarithmic expansion with an R2 value of 0.98.

nature neuroscience 1

Nature Neuroscience: doi:10.1038/nn.2779 Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 2 Cluster strength and gene-location biases of TDP-43 binding. (a) Box plots of cluster strength (by χ² analysis) versus GUGU-cluster bins (counts of non-overlapping GUGU tetramer found within the cluster). By t-test comparison to all clusters, bins of 11-15 and 16+ GUGUs were significantly enriched (p < 0.01 and p < 10-37, respectively). On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually (b) Pre-mRNAs were divided into annotated 5′ and 3′ untranslated regions, exons, proximal introns (<0.5kb from a splice junction), distal introns (>0.5kb from a splice junction) and unannotated regions. Pie-charts revealed that the majority of TDP-43 binding sites were located in distal intronic regions of pre-mRNA (left panel). For comparison, previously published Argonaute (Ago) binding sites in mouse brain21 are displayed as analogous pie-charts (right panel). (c) CLIP binding sites of TDP-43 in the two replicates, Ago, and random TDP-43 clusters across genes. The fraction of CLIP clusters was plotted depending on the relative gene position from the 5’ to the 3’ ends of each target gene.

Supplementary Figure 3 RNA-seq biological replicas. Correlation between RNA-seq results obtained from mice subjected to different ASO treatments. Heatmap generated from Cluster3 and Matlab, using linear least-squares regression correlation coefficients of the pair-wise comparison between RPKM values of all genes from each experiment. “K” is TDP-43 knockdown samples, “C” is control oligo-, “S” is saline- treated samples, and numbers represent biological replicates.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 2

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 4 Correlation between RNA-seq and CLIP-seq data using various parameters. (a) Genes were ranked randomly instead of upon their degree of regulation after TDP-43 depletion and the average TDP-43 CLIP clusters found in introns of the next 100 genes in the ranked list were plotted (green line). Similarly, the median of total intron length for the next 100 genes was plotted (blue line). (b) Genes were ranked by ascending expression (RPKM) values in the samples treated with control ASO. (c) Genes were ranked by ascending expression (RPKM) values in the samples treated with TDP-43 ASO. (d) Genes were ranked upon their degree of regulation after TDP-43 depletion and average Ago clusters found in introns of the next 100 genes (green line) or the total intron length for the next 100 genes (blue line) were plotted. (e) Genes were ranked upon their degree of regulation after TDP-43 depletion and average clusters found in 5’UTR for the next 100 genes (green line) or the total 5’UTR length for the next 100 genes (blue line) were plotted. (f) Genes were ranked upon their degree of regulation after TDP-43 depletion and average clusters found in 3’UTR for the next 100 genes (green line) or the total 3’UTR length for the next 100 genes (blue line) were plotted. (g) Genes were ranked upon their degree of regulation after TDP-43 depletion and average clusters found in exons for the next 100 genes (green line) or the total exon length for the next 100 genes (blue line) were plotted. No correlation between CLIP and RNA- seq data was observed (a-f). (h) Genes were ranked upon their degree of regulation after TDP-43 depletion and average clusters found in introns for the next 100 genes (green line) or the average intron length for the next 100 genes (blue line) were plotted. (i) Genes were ranked upon their degree of regulation after TDP-43 depletion and average cluster density (cluster count/intron length) for the next 100 genes was plotted.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 3

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 5 The calcium signaling pathway is affected by TDP‐43 depletion. (a) The calcium‐signaling pathway annotated by KEGG (Kyoto Encyclopedia of Genes and Genomes) was found to be significant using the functional annotation tool DAVID. Red‐bordered gene groups contain TDP‐43 targets that are downregulated upon TDP‐43 depletion. (b) List of downregulated TDP‐43 targets contributing to calcium‐signaling pathway

Naturenature neuroscience Neuroscience: doi:10.1038/nn.2779 4

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 6 The long-term potentiation pathway is affected by TDP-43 depletion. (a) The long-term potentiation pathway annotated by KEGG was found to be significant using the functional annotation tool DAVID. Red- bordered gene groups contain TDP-43 targets that are downregulated upon TDP-43 depletion. (b) List of downregulated TDP-43 targets contributing to long-term potentiation pathway.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 5

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 7 Representative examples of genes with long introns and multiple TDP-43 binding sites. (a) Most transcripts in this group (mean intron length >10kb) displayed multiple intronic TDP-43 binding sites as determined by CLIP-seq (purple bars above each gene). In contrast, Chat is an example of a downregulated gene with short intron length (mean intron length <10kb) that shows a single TDP-43 binding site. The number of clusters per gene (n) is in purple on the left. Note that each cluster represents a binding cluster defined by a collection of overlapping reads as shown for Park2 (inset). (b) Examples of conservation of gene structures in human neurexin 3 and parkin 2 showing similar genic architectures as their mouse orthologues.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 6

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 8 TDP-43 binding on the 3′UTR of RNAs represses gene expression. (a) Normalized expression (based on RPKM values from RNA-seq) of selected up-regulated genes upon TDP-43 depletion that also contain TDP-43 binding regions in their 3’UTR. (b) Enrichment for binding within the 3’UTR among upregulated TDP-43 targets, compared to unchanged genes. *Upregulated with intronic binding versus unchanged with intronic binding (p<1.8x10-6), **Upregulated with 3’UTR binding versus unchanged with 3’UTR binding (p<8.2x10-3), ***Upregulated with 3’UTR binding versus downregulated with 3’UTR binding (p < 1.4 x 10-2). P-values calculated by chi-square.

Supplementary Figure 9 Splicing changes upon TDP-43 depletion identified by exon- junction arrays. Distribution of alternative splicing events upon TDP-43 depletion detected using splicing-sensitive microarrays. Of the 326 cassette events defined by Affymetrix on mm9 annotation, 287 were annotated by our stringent gene structure annotations in mm8, and were used for comparisons to RNA-seq and CLIP- seq data.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 7

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 10 Examples of alternative splicing regulation by TDP-43. RNA targets with exons included (a) or excluded (b) upon TDP-43 knockdown (same as in Fig. 4d). Semi-quantitative RT-PCR of selected targets showing splice changes in samples with TDP-43 knockdown compared to controls. Schematic representation of assessed exons with the changed exon(s) colored orange (left). The exon number of the gene is indicated below each box. The red arrows depict the position of the primers used for RT-PCR. Purple boxes indicate TDP-43 binding sites as defined by CLIP-seq. Representative acrylamide gel pictures of RT-PCR products from control or knockdown adult brain samples, as indicated (right panel). Quantification of splicing changes from three biological replicas per group is shown in the middle panel.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 8

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 11 Splicing changes correlate with the level of TDP-43 knockdown. (a) Acrylamide gels with semi- quantitative RT-PCR samples showing sortilin 1 (Sort1) splicing changes in mice with various levels of TDP-43 reduction (shown as percentages above each bar). (b) Significant correlation of TDP-43 levels to Sort1 splicing changes (correlation value = -0.905, n=27). (c) Acrylamide gels with semi-quantitative RT-PCR samples showing Kcnip splicing changes in mice with various levels of TDP-43 reduction (shown as percentages above each bar on the lowest panel). (D-E) Significant correlation of TDP-43 levels to Kcnip splicing changes (correlation value = 0.91 for exon 3 and 0.83 for exon 2 and 3, n=12).

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 9

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 12 TDP-43 3’UTRs used in luciferase assays and control immunoblots. (a) Two alternative 3’UTRs of TDP-43 were amplified from mouse or human cDNA using primers whose approximate locations are shown by the small red arrows below the schemes on the right. The first 1.7kb or 1.1kb of each long (unspliced) or short (spliced) 3’UTR respectively were then cloned into a renilla luciferase reporter vector. The latter also contains the firefly luciferase gene that is expressed independently from renilla luciferase and serves as a transfection normalization control. (b) Representative immunoblots of HeLa cell lysates used for luciferase assays in Fig. 5F showing increased expression of TDP-43 in samples transfected with a myc-hTDP-43-HA expressing vector. For control, cells were transfected with a vector expressing an unrelated protein, namely red fluorescence protein (RFP), whose levels were confirmed by immunoblot (second panel). (c) Representative immunoblots of HeLa cell lysates used for Fig. 5F,G showing decreased levels of UPF1 in samples transfected with UPF1 siRNA.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 10

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 13 TDP-43 binding and regulation of Hdac6 and Nefl transcripts. (a) CLIP-seq reads and clusters on Hdac6 transcript showing intronic TDP-43 binding. RNA-seq reads from control or TDP-43 knockdown samples (equal scales) show a slight decrease in Hdac6 mRNA in the TDP-43 knockdown group. (b) Quantitative RT-PCR confirmed the slight decrease in Hdac6 mRNA upon TDP-43 depletion. Standard deviation was calculated within each group for 3 biological replicas. (c) CLIP-seq reads and clusters on Nefl mRNA showing two distinct TDP-43 binding sites within the 3’UTR. Our RNA-seq reads indicate that the 3’UTR of mouse Nefl is longer than reported in UCSC database (light blue bar indicates the extention). RNA-seq reads from control or TDP-43 knockdown samples show a slight decrease in Nefl mRNA in the TDP-43 knockdown group, in accordance with previous report.

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 11

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Figure 14 TDP-43 binding patterns on transcripts encoding for selected disease-related proteins. The number of clusters per gene (n) is in purple on the left. Note that each cluster represents a binding site defined by a collection of overlapping reads as shown for ataxin 1 (inset).

Naturenature Neuroscience: neuroscience doi:10.1038/nn.2779 12

Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Table 1 Genes upregulated upon TDP-43 depletion in adult mouse brain

Number of CLIP-seq clusters RNA-seq

Refseq/mRNA in in in RPKM in RPKM in Ratio Gene symbol Protein Total in 3'UTR Z score identifier 5'UTR introns exons TDP-43 KD control KD/control Oasl2 Oasl2 protein BC034361 0 0 0 0 0 29.97 2.86 10.49 11.66 Interferon-induced protein with Ifit1 NM_008331 0 0 0 0 0 23.26 2.11 11.03 11.23 tetratricopeptide repeats 1 (IFIT-1) Interferon-induced protein with Ifit3 NM_010501 0 0 0 0 0 20.03 3.07 6.53 8.15 tetratricopeptide repeats 3 (IFIT-3) Bst2 DAMP-1 protein NM 198095 0 0 0 0 0 16.05 3.21 5.00 6.96 5830458K16Rik Receptor transporting protein 4 NM 023386 0 0 0 0 0 10.03 1.44 6.98 6.86 Signal transducer and activator of Stat1 AK046517 0 0 0 0 0 20.59 4.52 4.56 6.81 transcription 1 Gpnmb Glycoprotein (transmembrane) nmb AK079220 0 0 0 0 0 70.72 16.86 4.20 6.62 Lectin, galactoside-binding, soluble, 3 Lgals3bp NM_011150 0 0 0 0 0 73.81 17.96 4.11 6.61 binding Interferon induced transmembrane protein Ifitm3 NM_025378 0 0 0 0 0 30.74 8.52 3.61 6.58 3 Tagln2 Transgelin 2 BC049861 0 0 0 0 0 15.17 3.62 4.19 6.18 Interferon-induced protein with Ifit3 AK077243 1 0 0 0 1 9.91 1.72 5.75 6.15 tetratricopeptide repeats 3 (IFIT-3) Irgm Interferon inducible protein 1 AK002545 0 0 0 0 0 14.53 3.68 3.95 6.02 Vim Vimentin NM 011701 0 0 0 0 0 111.40 31.33 3.56 5.92 C3 Complement component 3 NM 009778 0 0 0 0 0 8.71 1.57 5.56 5.91 Gbp2 Guanylate nucleotide binding protein 2 NM 010260 0 0 0 0 0 7.49 1.33 5.61 5.89 Cd5l CD5 antigen-like NM 009690 0 0 0 0 0 6.39 0.98 6.50 5.76 Cd44 CD44 antigen isoform b NM 001039150 0 0 0 0 0 12.36 2.87 4.30 5.69 Serine (or cysteine) proteinase inhibitor, Serping1 NM_009776 0 0 0 0 0 14.77 3.96 3.73 5.65 clade C3ar1 Complement component 3a receptor 1 NM 009779 0 0 0 0 0 16.68 4.72 3.53 5.40 Tripartite motif protein 30 (Down regulatory Trim30 AF220014 0 0 0 0 0 8.96 1.91 4.69 5.39 protein of interleukin 2 receptor) Cybb Cytochrome b-245, beta polypeptide NM 007807 0 0 0 0 0 5.49 0.86 6.36 5.32 Cd84 CD84 antigen AK171067 0 0 0 0 0 22.69 7.54 3.01 5.29 A2m Alpha-2-macroglobulin NM 175628 0 0 0 0 0 11.32 2.76 4.10 5.25 Ifi27 Interferon stimulated gene 12(b1) NM 029803 0 0 0 0 0 4.29 0.55 7.76 5.22 Grn Granulin NM 008175 1 0 0 0 1 70.17 23.12 3.04 5.22 Fbxo39 Novel TRAF-type finger protein AK046736 0 0 0 0 0 11.26 2.78 4.05 5.20 Ccnd1 Cyclin D1 NM 007631 0 0 0 0 0 33.11 11.12 2.98 5.19 Lgals3 Galectin 3 NM 010705 0 0 0 0 0 12.07 3.31 3.65 5.11 Cdkn1a Cyclin-dependent kinase inhibitor 1A NM 007669 1 0 1 0 0 26.03 9.17 2.84 5.05 Usp18 Ubiquitin specific protease 18 NM 011909 0 0 0 0 0 4.11 0.57 7.25 5.05 Gbp1 Guanylate nucleotide binding protein 1 NM 010259 0 0 0 0 0 6.55 1.40 4.67 5.05 Eif2ak2 Protein kinase, interferon-inducible double NM 011163 0 0 0 0 0 11.99 3.26 3.68 5.05 Apobec1 Apolipoprotein B mRNA editing enzyme BC003792 0 0 0 0 0 11.28 2.94 3.84 5.00 Spp1 Secreted phosphoprotein 1 NM 009263 0 0 0 0 0 35.34 12.35 2.86 4.93 TYRO protein tyrosine kinase binding Tyrobp NM_011662 0 0 0 0 0 37.90 13.18 2.87 4.90 protein Lgals9 Lectin, galactose binding, soluble 9 NM 010708 0 0 0 0 0 16.47 5.34 3.09 4.90 Slfn8 Schlafen 8 NM 181545 0 0 0 0 0 5.31 0.97 5.45 4.84 Cst7 Cystatin F (leukocystatin) NM 009977 0 0 0 0 0 12.31 3.60 3.42 4.83 Insulin-like growth factor I precursor (IGF-I) Igf1 AY878193 0 0 0 0 0 7.26 1.73 4.20 4.80 (Somatomedin) Tgm1 Transglutaminase 1, K polypeptide NM 019984 0 0 0 0 0 4.50 0.69 6.48 4.80 Fcer1g Fc receptor, IgE, high affinity I, gamma NM 010185 0 0 0 0 0 33.93 12.36 2.75 4.72 DEAD (Asp-Glu-Ala-Asp) box polypeptide Ddx58 AK155930 0 0 0 0 0 6.74 1.66 4.07 4.70 58 Isgf3g Interferon dependent positive acting NM 008394 0 0 0 0 0 15.30 5.00 3.06 4.68 Complement component 1, q C1qa NM_007572 0 0 0 0 0 137.39 50.81 2.70 4.66 subcomponent, A chain H2-K1 H2-K1 protein BC080756 0 0 0 0 0 31.08 11.60 2.68 4.66 Parp9 B aggressive lymphoma NM 030253 0 0 0 0 0 6.64 1.68 3.94 4.56 Cd52 CD52 antigen NM 013706 0 0 0 0 0 10.40 2.87 3.62 4.53 H-2 class I histocompatibility antigen, D-B H2-D1 AK159572 0 0 0 0 0 44.59 16.67 2.67 4.53 alpha chain precursor (H- 2D(B)) Cd68 Macrosialin precursor (CD68 antigen) BC021637 0 0 0 0 0 38.54 14.73 2.62 4.51 S100a6 S100 calcium binding protein A6 (calcyclin) NM_011313 0 0 0 0 0 11.39 3.44 3.31 4.50 Viral hemorrhagic septicemia virus(VHSV) Rsad2 NM_021384 0 0 0 0 0 4.28 0.70 6.12 4.49 induced B2m Beta-2-microglobulin NM 009735 0 0 0 0 0 139.63 54.00 2.59 4.45 Low affinity immunoglobulin gamma Fc Fcgr2b NM_010187 0 0 0 0 0 11.96 3.85 3.11 4.42 region receptor II precursor Slfn2 Schlafen 2 NM 011408 0 0 0 0 0 7.60 2.09 3.64 4.41 Actin related protein 2/3 complex, subunit Arpc1b AK171364 0 0 0 0 0 11.69 3.81 3.07 4.32 1B Highly similar to Mus musculus interferon- Iigp2 AK128991 0 0 0 0 0 8.11 2.26 3.59 4.32 gamma induced GTPase Ly9 Ly9 protein BC095921 0 0 0 0 0 9.30 2.77 3.37 4.20 Sn Sialoadhesin NM 011426 0 0 0 0 0 3.25 0.55 5.88 4.18 Slfn9 Schlafen 9 NM 172796 0 0 0 0 0 3.48 0.60 5.81 4.15 Weakly similar to B-lymphoma- and BAL- Dtx3l BC085093 0 0 0 0 0 8.50 2.56 3.32 4.11 associated protein (Rhysin 2) NM 205820 Toll-like receptor 13 NM 205820 0 0 0 0 0 7.47 2.23 3.36 4.11 Lyzs Lysozyme C type M precursor AK159276 0 0 0 0 0 111.97 46.71 2.40 4.11 Cxcr4 C-X-C chemokine receptor type 4 NM 009911 0 0 0 0 0 6.74 1.94 3.47 4.09 Slfn5 Schlafen 5 NM 183201 0 0 0 0 0 8.63 2.60 3.32 4.08 Msn Moesin NM 010833 0 0 0 0 0 33.37 14.04 2.38 4.08 Lilrb4 Leukocyte immunoglobulin-like receptor, NM 013532 0 0 0 0 0 5.25 1.29 4.07 4.07 Poly (ADP-ribose) polymerase family, Parp14 NM_001039530 0 0 0 0 0 6.08 1.65 3.70 4.05 member 14 Cd48 CD48 antigen NM 007649 0 0 0 0 0 8.11 2.49 3.26 4.04

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 13 Polymenidou et al SUPPLEMENTARY INFORMATION

Edg3 Endothelial differentiation, sphingolipid NM 010101 0 0 0 0 0 10.24 3.29 3.11 4.03 Lysosomal-associated protein Laptm5 NM_010686 0 0 0 0 0 53.57 23.32 2.30 4.03 transmembrane 5 Msr2 Macrophage scavenger receptor 2 NM 030707 0 0 0 0 0 60.70 25.61 2.37 4.00 Cellular repressor of E1A-stimulated genes Creg1 NM_011804 0 0 0 0 0 25.09 11.02 2.28 3.99 1 Emp1 Epithelial 1 NM 010128 0 0 0 0 0 5.99 1.59 3.76 3.99 Hmox1 Heme oxygenase (decycling) 1 NM 010442 0 0 0 0 0 12.84 4.95 2.60 3.96 Zbtb20 Zinc finger protein 288 BC056446 27 0 27 0 0 10.94 3.79 2.89 3.95 Triggering receptor expressed on myeloid Trem2 NM_031254 0 0 0 0 0 46.13 19.85 2.32 3.92 cells Cd53 CD53 antigen NM 007651 0 0 0 0 0 20.62 9.15 2.25 3.84 Angiogenin, ribonuclease A family, member Ang1 NM_007447 0 0 0 0 0 21.94 9.78 2.24 3.79 1 Ms4a7 MS4A7 protein homolog AK079888 0 0 0 0 0 5.29 1.41 3.74 3.78 ATP-binding cassette sub-family A member Abca1 NM_013454 6 0 6 0 0 32.69 14.78 2.21 3.73 1 Ctsz Cathepsin Z preproprotein NM 022325 0 0 0 0 0 55.45 25.30 2.19 3.73 E030016B05 product:embryonic epithelial Slc43a3 AK086962 0 0 0 0 0 4.91 1.29 3.82 3.73 gene 1 AI451617 Hypothetical protein LOC209387 NM 199146 0 0 0 0 0 4.37 1.02 4.28 3.72 Transporter 1, ATP-binding cassette, sub- Tap1 NM_013683 0 0 0 0 0 4.36 1.03 4.22 3.68 family Ctsd Cathepsin D AK152638 2 0 2 0 0 333.96 152.95 2.18 3.68 Osmr Oncostatin M receptor beta AB015978 0 0 0 0 0 8.11 2.73 2.97 3.67 Cd300lf Polymeric immunoglobulin receptor 3 AF251703 0 0 0 0 0 3.16 0.65 4.86 3.67 Cyba Cytochrome b-245, alpha polypeptide NM 007806 0 0 0 0 0 12.34 4.92 2.51 3.67 Tapbp TAP binding protein isoform 1 NM 001025313 0 0 0 0 0 18.91 8.34 2.27 3.66 Complement component 1, q C1qb NM_009777 2 0 2 0 0 110.40 50.84 2.17 3.65 subcomponent, B chain Cd180 CD180 antigen NM 008533 0 0 0 0 0 9.19 3.20 2.87 3.63 Ifit2 Interferon-induced protein with NM 008332 0 0 0 0 0 9.25 3.26 2.84 3.61 Lgals1 Lectin, galactose binding, soluble 1 NM 008495 0 0 0 0 0 9.55 3.44 2.77 3.56 Gfap Glial fibrillary acidic protein AK140151 0 0 0 0 0 98.80 46.64 2.12 3.54 Melanoma differentiation associated Ifih1 NM_027835 0 0 0 0 0 5.25 1.57 3.35 3.53 protein-5 Tmem106a Hypothetical protein LOC217203 NM 144830 0 0 0 0 0 4.01 0.94 4.25 3.52 Cxcl5 Chemokine (C-X-C motif) ligand 5 NM 009141 0 0 0 0 0 2.79 0.60 4.63 3.49 Slc15a3 15, member 3 NM 023044 0 0 0 0 0 6.74 2.34 2.88 3.49 Complement component 1, q C1qg NM_007574 0 0 0 0 0 82.06 39.22 2.09 3.48 subcomponent, gamma Ly86 Lymphocyte antigen 86 NM 010745 0 0 0 0 0 16.67 7.32 2.28 3.47 AK029112 Cybrd1 protein AK029112 0 0 0 0 0 4.58 1.24 3.70 3.47 Ftl1 Ferritin light chain 1 NM 010240 0 0 0 0 0 33.68 16.07 2.10 3.45 AK141670 Lysosomal acid lipase 1 AK141670 1 0 1 0 0 18.56 8.58 2.16 3.45 Plscr2 Phospholipid scramblase 2 NM 008880 0 0 0 0 0 3.30 0.75 4.41 3.44 Ctss Cathepsin S AK028366 0 0 0 0 0 107.38 52.02 2.06 3.42 Lcn2 Lipocalin 2 NM 008491 0 0 0 0 0 4.10 1.03 3.99 3.42 Tlr2 Toll-like receptor 2 NM 011905 0 0 0 0 0 5.61 1.75 3.21 3.40 Unc93b1 Unc93 homolog B NM 019449 1 0 1 0 0 20.40 9.91 2.06 3.38 Cd274 CD274 antigen NM 021893 0 0 0 0 0 4.75 1.37 3.47 3.38 Cd9 CD9 antigen NM 007657 0 0 0 0 0 44.66 21.83 2.05 3.33 Apoc1 Apolipoprotein C-I NM 007469 0 0 0 0 0 3.18 0.74 4.29 3.32 Hexa Hexosaminidase A AK159091 3 0 3 0 0 34.64 16.94 2.04 3.32 Igfbp5 Insulin-like growth factor binding protein 5 NM 010518 0 0 0 0 0 128.27 63.50 2.02 3.32 Anxa2 Anxa2 protein BC004659 0 0 0 0 0 7.60 2.86 2.65 3.29 Glipr1 GLI pathogenesis-related 1 (glioma) NM 028608 0 0 0 0 0 3.45 0.84 4.09 3.29 Pdpn Glycoprotein 38 precursor (GP38) BC026551 0 0 0 0 0 15.04 6.93 2.17 3.29 Psmb8 Proteasome subunit beta type 8 precursor BC013785 0 0 0 0 0 3.36 0.81 4.14 3.29 Hspb6 Heat shock protein, alpha-crystallin-related NM_001012401 0 0 0 0 0 7.11 2.68 2.65 3.23 5430435G22Rik Ras-related protein Rab-7b AK030688 0 0 0 0 0 4.86 1.51 3.22 3.23 Tmem10 Transmembrane protein 10 NM 153520 1 0 1 0 0 14.00 6.33 2.21 3.23 Trim25 Tripartite motif protein 25 AK169562 0 0 0 0 0 6.39 2.35 2.72 3.22 H2-T23 Histocompatibility 2, T region locus 23 NM 010398 0 0 0 0 0 12.19 5.55 2.20 3.21 Ms4a6c Ms4a6c protein BC062247 0 0 0 0 0 2.10 0.52 4.08 3.20 Axl AXL receptor tyrosine kinase NM 009465 1 0 1 0 0 20.86 10.40 2.01 3.20 MHC class I Q4 beta-2-microglobulin (Qb- H2-Q1 AK037574 0 0 0 0 0 7.52 2.85 2.64 3.19 1) Tubulin beta-5 chain (Beta-tubulin class-V) Tubb6 BC008225 0 0 0 0 0 2.19 0.54 4.03 3.16 homolog Tnfrsf1a Tumor necrosis factor receptor superfamily NM_011609 0 0 0 0 0 10.35 4.37 2.37 3.15 Lcp1 Lymphocyte cytosolic protein 1 NM 008879 0 0 0 0 0 10.43 4.40 2.37 3.14 Glycoprotein galactosyltransferase alpha Ggta1 AK088605 0 0 0 0 0 8.84 3.48 2.54 3.14 1, 3 Lrrc47 Leucine rich repeat containing 47 NM 201226 2 0 0 1 1 35.00 17.95 1.95 3.13 Dehydrogenase/reductase (SDR family) Dhrs1 NM_026819 0 0 0 0 0 42.45 21.35 1.99 3.13 member 1 Cd22 CD22 antigen AK171589 0 0 0 0 0 4.48 1.36 3.29 3.12 Arhgdib Rho, GDP dissociation inhibitor (GDI) beta NM_007486 0 0 0 0 0 19.72 9.76 2.02 3.09 , erythrocyte (Urea Slc14a1 AK153891 0 0 0 0 0 12.80 6.01 2.13 3.09 transporter B) (UT-B) Trim21 Tripartite motif protein 21 AK141294 0 0 0 0 0 3.70 1.03 3.59 3.08 Nfe2l2 Nuclear factor erythroid 2 related factor 2 U20532 0 0 0 0 0 17.09 8.45 2.02 3.08 Fabp7 Fatty acid binding protein 7, brain NM 021272 0 0 0 0 0 10.95 4.83 2.27 3.08 Novel protein similar to extracellular BC089618 BC089618 0 0 0 0 0 2.71 0.69 3.93 3.06 proteinase inhibitor Expi Secreted frizzled-related sequence protein Sfrp5 NM_018780 0 0 0 0 0 3.18 0.83 3.81 3.05 5 D12Ertd647e Hypothetical protein LOC52668 isoform 2 NM 194066 0 0 0 0 0 16.90 8.51 1.99 3.04 Capg Gelsolin-like capping protein NM 007599 0 0 0 0 0 3.46 0.93 3.70 3.03 Phosphoinositide-3-kinase adaptor protein Pik3ap1 NM_031376 0 0 0 0 0 5.62 1.99 2.83 3.03 1 Timp1 Metalloproteinase inhibitor 1 precursor BC008107 0 0 0 0 0 1.86 0.50 3.70 3.01

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 14 Polymenidou et al SUPPLEMENTARY INFORMATION

Cd63 Cd63 antigen NM 007653 0 0 0 0 0 26.25 14.29 1.84 2.99 1810029B16Rik Hypothetical protein LOC66282 NM 025465 0 0 0 0 0 6.39 2.51 2.55 2.99 Ctsc Cathepsin C preproprotein NM 009982 0 0 0 0 0 10.99 4.96 2.21 2.98 Gusb Beta-glucuronidase structural BC004616 0 0 0 0 0 11.23 5.15 2.18 2.98 Stat2 Stat2 protein AF206162 0 0 0 0 0 14.11 6.79 2.08 2.97 Poly (ADP-ribose) polymerase family, Parp12 NM_172893 1 0 1 0 0 7.20 2.87 2.51 2.97 member 12 Tcn2 Transcobalamin 2 NM 015749 0 0 0 0 0 19.96 10.54 1.89 2.96 Fcgr1 Fc receptor, IgG, high affinity I NM 010186 0 0 0 0 0 6.63 2.64 2.52 2.96 Colony stimulating factor 2 receptor, beta Csf2rb2 AK155626 0 0 0 0 0 2.04 0.55 3.70 2.95 2, low-affinity (granulocyte-macrophage) C5r1 Complement component 5, receptor 1 NM 007577 0 0 0 0 0 3.23 0.88 3.67 2.95 Cd151 Cd151 protein BC012236 0 0 0 0 0 13.80 6.75 2.05 2.93 CKLF-like MARVEL transmembrane Cmtm3 domain-containing protein 3 (Chemokine- BC003230 0 0 0 0 0 7.46 3.07 2.43 2.92 like factor superfamily member 3) Epsti1 Epithelial stromal interaction 1 isoform b NM 178825 0 0 0 0 0 1.90 0.52 3.62 2.91 Bzrp Peripheral benzodiazepine receptor NM 009775 0 0 0 0 0 4.33 1.42 3.06 2.88 0610039P13Rik Hypothetical protein LOC74096 NM 028752 0 0 0 0 0 9.16 3.95 2.32 2.88 Rpl39 Ribosomal protein L39 NM 026055 0 0 0 0 0 11.50 5.32 2.16 2.87 Scotin Scotin isoform 1 NM 025858 1 0 0 0 1 22.93 12.42 1.85 2.87 Bcl3 B-cell leukemia/lymphoma 3 NM 033601 0 0 0 0 0 2.60 0.71 3.68 2.85 Cnn3 Calponin 3, acidic NM 028044 2 0 1 0 1 26.55 14.74 1.80 2.85 Ifi47 Interferon gamma inducible protein 47 NM 008330 0 0 0 0 0 1.77 0.52 3.40 2.83 Adfp Adipose differentiation related protein NM 007408 0 0 0 0 0 5.59 2.19 2.55 2.83 Tumor necrosis factor receptor Tnfrsf1b NM_011610 0 0 0 0 0 7.31 3.11 2.35 2.83 superfamily, Lhfpl2 Lipoma HMGIC fusion partner-like 2 NM 172589 1 0 1 0 0 11.88 5.89 2.02 2.82 Ifi35 Interferon-induced protein 35 NM 027320 0 0 0 0 0 3.02 0.84 3.57 2.82 Emp3 Epithelial membrane protein 3 NM 010129 0 0 0 0 0 2.29 0.63 3.66 2.82 Tcf7l2 Transcription factor 7-like 2 AJ223070 9 0 9 0 0 5.23 1.95 2.69 2.82 Hypothetical Peptidase family AK128969 AK128969 0 0 0 0 0 4.50 1.50 3.00 2.81 M20/M25/M40 containing protein Tgm2 Transglutaminase 2, C polypeptide NM 009373 0 0 0 0 0 7.86 3.34 2.35 2.81 Lgmn Legumain NM 011175 4 0 4 0 0 49.09 27.55 1.78 2.81 Ucp2 Uncoupling protein 2 NM 011671 0 0 0 0 0 11.71 5.81 2.01 2.79 Ela1 Elastase 1, pancreatic NM 033612 0 0 0 0 0 2.64 0.74 3.57 2.79 DNA segment, Chr 11, Lothar D11Lgp2e NM_030150 0 0 0 0 0 2.49 0.69 3.61 2.76 Hennighausen 2, Cd74 Ia-associated invariant chain NM 010545 0 0 0 0 0 7.68 3.34 2.30 2.73 Tgfbr2 Transforming growth factor, beta receptor II NM_009371 0 0 0 0 0 10.18 4.75 2.14 2.73 Rac2 RAS-related C3 botulinum substrate 2 NM 009008 0 0 0 0 0 4.47 1.58 2.83 2.72 Apoe Apolipoprotein E NM 009696 0 0 0 0 0 556.59 314.21 1.77 2.71 Olig2 Oligodendrocyte transcription factor 2 NM 016967 0 0 0 0 0 13.58 6.98 1.94 2.71 Anxa3 Annexin A3 AK169423 0 0 0 0 0 9.95 4.68 2.13 2.71 Apod Apolipoprotein D NM 007470 0 0 0 0 0 84.02 47.53 1.77 2.70 Slc11a1 Solute carrier family 11 NM 013612 0 0 0 0 0 6.06 2.55 2.38 2.70 Cxcl16 Chemokine (C-X-C motif) ligand 16 NM 023158 0 0 0 0 0 4.46 1.58 2.83 2.70 Rab31 Rab31 protein BC013063 9 0 9 0 0 36.69 20.69 1.77 2.70 Fcgr3 Fcgr3 protein BC052819 0 0 0 0 0 26.35 15.06 1.75 2.69 BC016112 Col20a1 protein BC016112 0 0 0 0 0 2.62 0.76 3.43 2.68 P2ry13 Purinergic receptor P2Y13 NM 028808 0 0 0 0 0 11.97 6.17 1.94 2.68 Ctsh Cathepsin H NM 007801 1 0 1 0 0 13.22 6.76 1.96 2.67 1810009M01Rik LR8 protein NM_023056 0 0 0 0 0 23.57 13.38 1.76 2.66 Mcam Cell surface glycoprotein MUC18 precursor AB035509 0 0 0 0 0 9.44 4.35 2.17 2.66 Rps27l Ribosomal protein S27-like NM 026467 0 0 0 0 0 7.61 3.38 2.25 2.65 Abhd4 Abhydrolase domain containing 4 NM 134076 0 0 0 0 0 38.83 21.78 1.78 2.65 Zfp36l1 Zinc finger protein 36, C3H type-like 1 NM 007564 0 0 0 0 0 17.26 9.32 1.85 2.64 Ctsl Cathepsin L precursor AK159511 0 0 0 0 0 62.27 35.74 1.74 2.64 B-cell leukemia/lymphoma 2 related Bcl2a1b NM_007534 0 0 0 0 0 2.30 0.67 3.44 2.63 protein A1b Ctsb Cathepsin B preproprotein NM 007798 1 0 0 0 1 174.71 100.36 1.74 2.63 Hepatocellular carcinoma-associated 0610011I04Rik NM_025326 0 0 0 0 0 10.51 5.16 2.04 2.63 antigen 112 Dbi Diazepam binding inhibitor isoform 1 NM 001037999 0 0 0 0 0 55.28 31.49 1.76 2.62 Prg1 Proteoglycan 1, secretory granule NM 011157 0 0 0 0 0 8.49 3.85 2.20 2.61 Icam1 Intercellular adhesion molecule NM 010493 0 0 0 0 0 4.49 1.70 2.65 2.60 Hypothetical Alpha-macroglobulin receptor domain/Terpenoid cylases/Protein Cd109 BC052443 0 0 0 0 0 2.38 0.70 3.39 2.59 prenyltransferases structure containing protein Ccl3 Chemokine (C-C motif) ligand 3 NM 011337 0 0 0 0 0 3.94 1.33 2.96 2.59 Wiskott-Aldrich syndrome protein Waspip NM_153138 2 0 2 0 0 12.07 6.37 1.90 2.59 interacting Npc2 Niemann Pick type C2 NM 023409 0 0 0 0 0 43.30 24.85 1.74 2.59 -4 (AQP-4) (WCH4) (Mercurial- Aqp4 BC024526 1 0 1 0 0 35.71 20.55 1.74 2.59 insensitive water channel) (MIWC) Rps14 Ribosomal protein S14 NM 020600 1 0 1 0 0 21.63 12.39 1.75 2.57 Poly (ADP-ribose) polymerase family, Parp3 NM_145619 0 0 0 0 0 5.57 2.37 2.35 2.57 member 3 Trim14 Tripartite motif-containing 14 AK148837 0 0 0 0 0 2.07 0.64 3.23 2.57 Hypothetical CD9/CD37/CD63 antigens Tspan2 AK020982 1 0 0 0 1 44.17 25.55 1.73 2.57 containing protein Cyr61 Cysteine rich protein 61 NM 010516 0 0 0 0 0 5.91 2.59 2.28 2.57 Aebp1 AEBP1 protein BC082577 0 0 0 0 0 8.49 3.94 2.16 2.56 Tumor necrosis factor, alpha-induced Tnfaip2 AK154405 0 0 0 0 0 2.85 0.87 3.27 2.56 protein 2 Dnase1l1 Deoxyribonuclease I-like 1 precursor BC023246 0 0 0 0 0 1.97 0.62 3.19 2.56 Tnc Tenascin C (Hexabrachion) AK085667 0 0 0 0 0 2.68 0.82 3.28 2.55 Anxa5 Annexin A5 NM 009673 1 0 1 0 0 21.03 12.04 1.75 2.55 Ephx1 Ephx1 protein BC057857 1 0 1 0 0 18.49 10.10 1.83 2.54 AK153119 Hypothetical protein AK153119 0 0 0 0 0 2.37 0.71 3.33 2.54 Tlr7 Toll-like receptor 7 precursor AK154906 0 0 0 0 0 6.81 3.09 2.20 2.53 Mmp2 Matrix metalloproteinase 2 NM 008610 0 0 0 0 0 2.85 0.88 3.24 2.53

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 15 Polymenidou et al SUPPLEMENTARY INFORMATION

Proteosome (prosome, macropain) Psmb9 NM_013585 0 0 0 0 0 2.14 0.66 3.22 2.53 subunit, beta Similar to mannose-P-dolichol utilization Pqlc3 AK138680 0 0 0 0 0 2.42 0.73 3.30 2.53 defect 1 protein homolog Mgst1 Microsomal glutathione S-transferase 1 NM 019946 0 0 0 0 0 10.11 4.95 2.04 2.52 Ms4a6d MS4A6D protein NM 026835 0 0 0 0 0 4.62 1.80 2.57 2.52 Zc3hav1 Hypothetical protein AK143568 0 0 0 0 0 3.61 1.22 2.95 2.51 Zfp488 Zinc finger protein 488 NM 001013777 0 0 0 0 0 4.02 1.46 2.76 2.51 Tmem86a Hypothetical protein LOC67893 NM 026436 0 0 0 0 0 10.94 5.60 1.95 2.50 V-maf musculoaponeurotic fibrosarcoma Maff NM_010755 0 0 0 0 0 1.67 0.56 2.96 2.50 oncogene Hexb Hexosaminidase B NM 010422 0 0 0 0 0 75.76 44.83 1.69 2.50 1110007F12Rik Hypothetical protein LOC68487 NM 197986 0 0 0 0 0 1.49 0.53 2.81 2.48 AK051656 Protein KIAA1949 homolog AK051656 0 0 0 0 0 7.93 3.68 2.16 2.48 Rps26 40S ribosomal protein S26 BC100456 0 0 0 0 0 27.29 16.03 1.70 2.48 Plek Pleckstrin NM 019549 0 0 0 0 0 29.57 17.32 1.71 2.48 Prkcd Protein kinase C, delta NM 011103 3 0 3 0 0 9.26 4.51 2.05 2.47 Similar to Chromodomain-helicase-DNA- AK048742 AK048742 3 0 3 0 0 4.75 1.90 2.49 2.47 binding protein Pllp Transmembrane 4 superfamily member 11 NM_026385 0 0 0 0 0 32.01 18.77 1.71 2.47 Ppgb Protective protein for beta-galactosidase NM 001038492 0 0 0 0 0 43.83 26.11 1.68 2.46 Mpzl1 Myelin protein zero-like 1 AK090278 5 0 5 0 0 13.64 7.37 1.85 2.46 Aldehyde dehydrogenase family 1, Aldh1a1 NM_013467 0 0 0 0 0 18.51 10.33 1.79 2.45 subfamily A1 Ninj1 Ninjurin 1 NM 013610 0 0 0 0 0 8.57 4.14 2.07 2.44 NIMA (never in mitosis gene a)-related Nek6 NM_021606 0 0 0 0 0 10.89 5.66 1.92 2.43 expressed BC013712 Hypothetical protein LOC230787 NM 001033308 0 0 0 0 0 2.82 0.90 3.12 2.43 Olfml3 Olfactomedin-like 3 NM 133859 0 0 0 0 0 15.03 8.31 1.81 2.43 Atf3 Activating transcription factor ATF3b BC019946 0 0 0 0 0 2.40 0.76 3.18 2.43 Anxa4 Annexin A4 AK032703 0 0 0 0 0 4.72 1.92 2.46 2.42 Tspan4 Tetraspanin 4 NM 053082 0 0 0 0 0 5.11 2.16 2.37 2.42 Rpl32 Ribosomal protein L32 NM 172086 1 0 1 0 0 23.83 14.34 1.66 2.42 Dbp D site albumin promoter binding protein NM 016974 0 0 0 0 0 23.96 14.46 1.66 2.42 Clic1 Chloride intracellular channel 1 NM 033444 0 0 0 0 0 8.39 4.02 2.09 2.41 Col5a3 Adipocyte-specific protein 6 AB040491 0 0 0 0 0 1.45 0.53 2.74 2.41 Vat1 Vesicle amine transport protein 1 homolog NM 012037 0 0 0 0 0 26.33 15.78 1.67 2.41 Lyn Tyrosine-protein kinase Lyn (EC 2.7.1.112) BC031547 0 0 0 0 0 6.83 3.17 2.15 2.39 S100a1 S100 calcium binding protein A1 NM 011309 0 0 0 0 0 40.59 24.20 1.68 2.39 Plau Plasminogen activator, urokinase NM 008873 0 0 0 0 0 4.84 2.04 2.37 2.38 Prdx1 Peroxiredoxin 1 NM 011034 1 0 1 0 0 40.08 23.94 1.67 2.38 Csf1 Colony stimulating factor 1 NM 007778 0 0 0 0 0 12.14 6.65 1.82 2.38 Tlr3 Toll-like receptor 3 NM 126166 0 0 0 0 0 11.10 5.84 1.90 2.36 Gpr17 G protein-coupled receptor 17 NM 001025381 0 0 0 0 0 24.29 14.61 1.66 2.36 Itgb5 Integrin beta-5 precursor BC058246 1 0 1 0 0 21.94 13.15 1.67 2.35 Klk6 Protease, serine, 18 NM 011177 0 0 0 0 0 8.26 4.00 2.07 2.35 Trf Transferrin NM 133977 1 0 1 0 0 135.42 82.88 1.63 2.34 Lipopolysaccharide inducible C-C Ccrl2 NM_017466 0 0 0 0 0 1.39 0.52 2.67 2.34 chemokine P4ha3 P4ha3 protein BC082538 0 0 0 0 0 4.46 1.83 2.44 2.33 Apobec3 Apolipoprotein B editing complex 3 AK153719 0 0 0 0 0 5.37 2.44 2.20 2.32 Gns Glucosamine (N-acetyl)-6-sulfatase NM 029364 0 0 0 0 0 45.72 28.30 1.62 2.32 Tcirg1 T-cell, immune regulator 1 NM 016921 0 0 0 0 0 8.29 4.07 2.04 2.31 Parvg Parvin, gamma NM 022321 1 0 1 0 0 5.93 2.79 2.13 2.31 AK042457 Flavin containing monooxygenase 1 AK042457 0 0 0 0 0 6.68 3.22 2.07 2.31 Serinc5 Serine incorporator 5 NM 172588 5 0 4 1 0 37.22 22.46 1.66 2.31 epsilon-1 protein (- Gje1 AK147913 2 0 2 0 0 96.07 59.23 1.62 2.31 29) Lxn Latexin NM 016753 0 0 0 0 0 20.88 12.57 1.66 2.31 A930021H16Rik Weakly similar to MICAL-like 2 AK151086 0 0 0 0 0 1.97 0.67 2.93 2.30 Scpep1 Serine caroboxypeptidase 1 AK153485 0 0 0 0 0 15.18 8.64 1.76 2.30 Sox9 SRY-box containing gene 9 NM 011448 1 0 0 0 1 20.07 12.18 1.65 2.30 Nupr1 P8 protein NM 019738 0 0 0 0 0 8.30 4.12 2.01 2.30 Pros1 Protein S (alpha) AK145501 0 0 0 0 0 11.06 5.90 1.87 2.30 Sfpi1 Transcription factor PU.1 L03215 0 0 0 0 0 5.42 2.49 2.18 2.29 CASP8 and FADD-like apoptosis regulator Cflar NM_207653 1 0 0 0 1 13.76 7.83 1.76 2.29 isoform Interleukin-10 receptor beta chain Il10rb AK160991 0 0 0 0 0 7.84 3.84 2.04 2.28 precursor Nes Nes protein (Fragment) AF076623 0 0 0 0 0 7.21 3.52 2.05 2.28 S100a13 S100 calcium binding protein A13 NM 009113 0 0 0 0 0 12.91 7.24 1.78 2.28 Proteasome (prosome, macropain) 28 Psme1 NM_011189 0 0 0 0 0 9.19 4.67 1.97 2.27 subunit Evi2b Ecotropic viral integration site 2b NM 146023 0 0 0 0 0 14.44 8.24 1.75 2.27 Naglu Alpha-N-acetylglucosaminidase NM 013792 0 0 0 0 0 8.57 4.29 2.00 2.27 Glipr2 GLI pathogenesis-related 2 NM 027450 0 0 0 0 0 3.49 1.30 2.68 2.27 Tyrosine-protein phosphatase non-receptor Ptpn6 AK132509 0 0 0 0 0 5.80 2.76 2.10 2.26 type 6 BC060276 Filamin C (Gamma-filamin) (Filamin 2) BC060276 0 0 0 0 0 2.56 0.87 2.94 2.26 BC046309 Novel protein BC046309 0 0 0 0 0 4.31 1.78 2.42 2.26 Sparc Secreted acidic cysteine rich glycoprotein AK148670 3 0 1 0 2 119.54 74.49 1.60 2.26 Apob48r Apolipoprotein B48 receptor NM 138310 0 0 0 0 0 1.61 0.59 2.70 2.25 Cd300d MAIR-Iib AB091768 0 0 0 0 0 9.78 5.12 1.91 2.25 Tgif TG-interacting factor NM 009372 0 0 0 0 0 2.94 1.03 2.86 2.25 Id3 Inhibitor of DNA binding 3 NM 008321 0 0 0 0 0 16.04 9.47 1.69 2.25 Litaf LPS-induced TN factor NM 019980 1 0 1 0 0 18.08 10.54 1.72 2.24 Hypothetical SAND/Sp100 containing 5830484A20Rik AK150373 0 0 0 0 0 1.42 0.55 2.58 2.24 protein Hcls1 Hematopoietic cell specific Lyn substrate 1 NM_008225 0 0 0 0 0 5.13 2.27 2.26 2.23 Pabpc1 Poly A binding protein, cytoplasmic 1 NM 008774 0 0 0 0 0 55.02 34.31 1.60 2.22 Ubiquitin-specific protease 2 isoform Usp2- Usp2 NM_198091 1 0 1 0 0 22.60 13.99 1.62 2.22 45

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 16 Polymenidou et al SUPPLEMENTARY INFORMATION

Protein tyrosine phosphatase, receptor Ptprc NM_011210 2 0 1 0 1 4.38 1.86 2.36 2.21 type, C Sdc4 Sdc4 protein BC002312 3 0 0 0 3 46.40 29.47 1.57 2.21 Clu Clusterin NM 013492 1 0 0 0 1 259.47 163.59 1.59 2.20 H2-Eb1 Histocompatibility 2, class II antigen E beta NM_010382 0 0 0 0 0 2.45 0.85 2.87 2.20 Ikbke Inhibitor of kappaB kinase epsilon NM 019777 0 0 0 0 0 1.67 0.62 2.70 2.20 Slc7a2 Solute carrier family 7 (cationic amino acid NM_007514 4 0 0 0 4 18.08 10.65 1.70 2.20 Lysosomal membrane glycoprotein 2 Lamp2 NM_001017959 0 0 0 0 0 29.58 18.32 1.61 2.20 isoform 1 Histocompatibility 2, class II antigen A, H2-Aa NM_010378 0 0 0 0 0 3.69 1.40 2.64 2.20 alpha Tor3a Torsin family 3, member A BC071216 0 0 0 0 0 5.58 2.69 2.08 2.19 Arl11 ADP-ribosylation factor-like 11 NM 177337 0 0 0 0 0 2.92 1.05 2.80 2.19 Hspb8 Heat shock protein 20-like protein NM 030704 0 0 0 0 0 10.56 5.80 1.82 2.19 Signal transducer and activator of Stat5a NM_011488 0 0 0 0 0 1.38 0.54 2.54 2.19 transcription Hypothetical RING finger containing Rnf122 AK171823 0 0 0 0 0 5.33 2.52 2.12 2.19 protein Transforming growth factor-beta-induced Tgfbi AK155828 0 0 0 0 0 3.85 1.52 2.54 2.19 protein ig-h3 precursor (Beta ig-h3) Fgl2 Fibrinogen-like protein 2 NM 008013 0 0 0 0 0 2.20 0.76 2.88 2.18 Erbb2ip Erbb2 interacting protein isoform 2 NM 021563 15 0 14 1 0 52.87 33.12 1.60 2.18 Solute carrier family 6 (neurotransmitter Slc6a6 AK162776 10 0 7 0 3 65.13 41.37 1.57 2.17 transporter, taurine), member 6 Prostaglandin D2 synthase 2, Ptgds2 NM_019455 0 0 0 0 0 8.10 4.14 1.95 2.17 hematopoietic Rgma RGM domain family, member A NM 177740 0 0 0 0 0 21.16 13.15 1.61 2.17 Ehd4 EH domain containing protein MPAST2 NM 133838 0 0 0 0 0 8.51 4.41 1.93 2.16 Lats2 LATS2B AY015061 3 0 2 0 1 8.99 4.72 1.90 2.16 Add3 Adducin 3 (gamma) AK144174 14 0 14 0 0 38.18 23.73 1.61 2.15 Colony stimulating factor 2 receptor, beta Csf2rb1 AK170199 0 0 0 0 0 2.66 0.94 2.84 2.15 1, low-affinity (granulocyte-macrophage) 9230117N10Rik Interleukin 33 NM 133775 0 0 0 0 0 35.00 22.20 1.58 2.14 Hypothetical ATP-dependent DNA ligase AK046834 AK046834 1 0 1 0 0 6.80 3.39 2.01 2.14 containing protein CTD (carboxy-terminal domain, RNA Ctdsp1 NM_153088 0 0 0 0 0 18.40 11.18 1.65 2.13 polymerase II Transcription factor SOX-10 (SOX-21) BC023356 BC023356 0 0 0 0 0 19.60 12.35 1.59 2.10 (Transcription factor SOX-M) Gm2a GM2 ganglioside activator protein NM 010299 0 0 0 0 0 44.63 28.75 1.55 2.10 Rhoc Ras homolog gene family, member C NM 007484 0 0 0 0 0 8.55 4.50 1.90 2.10 Chi3l1 Chitinase 3-like 1 NM 007695 0 0 0 0 0 7.43 3.80 1.95 2.10 Vcam1 Vascular cell adhesion molecule 1 AK030195 0 0 0 0 0 12.23 7.14 1.71 2.10 Rpl14 Ribosomal protein L14 NM 025974 0 0 0 0 0 13.16 7.69 1.71 2.09 Anpep Alanyl (membrane) aminopeptidase NM 008486 1 0 1 0 0 3.27 1.25 2.63 2.09 Ly6e Lymphocyte antigen Ly-6E precursor BC005684 0 0 0 0 0 66.62 43.05 1.55 2.09 Pbxip1 Pre-B-cell leukemia transcription factor NM 146131 0 0 0 0 0 19.28 12.04 1.60 2.09 Cldn11 Claudin 11 NM 008770 1 0 1 0 0 121.77 78.83 1.54 2.08 Cpne3 Copine III NM 027769 4 0 3 0 1 12.09 7.11 1.70 2.07 Sema domain, transmembrane domain Sema6a (TM), and cytoplasmic domain, AK042751 4 0 4 0 0 10.85 6.10 1.78 2.07 (semaphorin) 6A Hpse Heparanase NM 152803 0 0 0 0 0 2.46 0.91 2.72 2.07 Il1a Interleukin 1 alpha NM 010554 0 0 0 0 0 2.04 0.76 2.69 2.06 Ltbr Lymphotoxin B receptor NM 010736 0 0 0 0 0 5.40 2.68 2.01 2.05 S100a4 S100 calcium binding protein A4 NM 011311 0 0 0 0 0 3.08 1.19 2.60 2.05 Syngr2 Synaptogyrin 2 NM 009304 0 0 0 0 0 6.68 3.44 1.94 2.05 Cd83 CD83 antigen BC107344 0 0 0 0 0 8.06 4.24 1.90 2.05 Pycard PYD and CARD domain containing NM 023258 0 0 0 0 0 3.35 1.35 2.48 2.04 Dap Death-associated protein NM 146057 0 0 0 0 0 5.44 2.73 1.99 2.04 Trp53inp1 Transformation related protein 53 inducible NM_021897 0 0 0 0 0 9.11 4.94 1.84 2.04 Cotl1 Coactosin-like protein BC010249 3 0 3 0 0 42.36 27.47 1.54 2.04 Stom Stom protein BC003789 0 0 0 0 0 9.75 5.41 1.80 2.04 Pdlim4 Reversion induced LIM gene NM 019417 0 0 0 0 0 5.47 2.73 2.00 2.04 Npl N-acetylneuraminate pyruvate lyase NM 028749 0 0 0 0 0 5.27 2.60 2.03 2.04 Sgpl1 Sphingosine phosphate lyase 1 NM 009163 1 0 1 0 0 22.27 14.29 1.56 2.04 Voltage-dependent calcium channel Cacng5 NM_080644 3 0 3 0 0 7.26 3.80 1.91 2.03 gamma-5 Hypothetical Actin-crosslinking proteins 4930429M06Rik AK090002 1 0 1 0 0 4.32 1.94 2.22 2.03 containing protein Tal1 T-cell acute lymphocytic leukemia 1 NM 011527 0 0 0 0 0 1.36 0.57 2.39 2.02 Adaptin-ear-binding coat-associated Necap2 NM_025383 1 0 1 0 0 18.09 11.08 1.63 2.02 protein 2 Bag3 Bcl2-associated athanogene 3 NM 013863 0 0 0 0 0 6.88 3.62 1.90 2.02 Man2b2 Mannosidase alpha class 2B member 2 NM 008550 1 0 1 0 0 16.04 9.90 1.62 2.01 Vamp8 Vesicle-associated membrane protein 8 NM 016794 0 0 0 0 0 9.46 5.26 1.80 2.01 Interferon consensus sequence binding Irf8 BC005450 1 0 1 0 0 5.86 3.00 1.95 2.01 protein 1 Signal transducer and activator of Stat3 NM_011486 3 0 3 0 0 20.13 12.93 1.56 2.01 transcription Stress-associated endoplasmic reticulum D3Ucla1 NM_030685 1 0 0 0 1 29.42 18.99 1.55 2.01 protein Ccng1 Cyclin G1 NM 009831 0 0 0 0 0 35.70 23.34 1.53 2.00 Loss of heterozygosity, 11, chromosomal Loh11cr2a NM_172767 1 0 0 0 1 11.14 6.60 1.69 2.00 region

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 17 Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Table 2 Genes downregulated upon TDP-43 depletion in adult mouse brain

CLIP-seq number of clusters RNA-seq Refseq/mRNA in in RPKM in RPKM in Ratio Gene symbol Protein Total in 5'UTR in 3'UTR Z score identifier introns exons TDP-43 KD control KD/control T-cell lymphoma breakpoint associated Tcba1 NM_001025286 58 0 57 1 0 3.16 18.06 0.18 -8.17 target 1 Tardbp TAR DNA binding protein isoform 5NM_001003898 5 0 0 0 5 10.17 43.61 0.23 -6.91 Cadps Calcium-dependent secretion activator 1 BC079679 56 0 56 0 0 7.62 23.80 0.32 -5.90 Csmd1 CUB and Sushi multiple domains 1NM_053171 154 0 152 1 1 2.51 9.43 0.27 -5.39 Ctnna2 Ctnna2 protein BC079648 115 0 115 0 0 8.80 22.44 0.39 -4.88 Hypothetical carbonic anhydrase structure Car10 AK012302 52 1 51 0 0 6.05 15.81 0.38 -4.67 containing protein Clstn2 Calsyntenin 2NM_022319 38 0 38 0 0 7.70 19.32 0.40 -4.59 Nrg3 Neuregulin 3NM_008734 132 0 131 0 1 2.75 8.15 0.34 -4.47 2900006F19Rik Hypothetical protein LOC72899 NM_028387 36 0 36 0 0 2.94 8.52 0.34 -4.35 Adcy2 Adenylate cyclase 2NM_153534 15 0 15 0 0 8.57 20.60 0.42 -4.25 Potassium voltage-gated channel, Shal- Kcnd2 NM_019697 78 0 77 0 1 14.60 34.39 0.42 -4.18 related Low density lipoprotein-related protein 1B Lrp1b AK136118 87 0 87 0 0 2.59 6.87 0.38 -4.08 (deleted in tumors) Calcium channel, voltage-dependent, Cacna2d3 NM_009785 63 0 63 0 0 12.00 25.88 0.46 -4.05 alpha2/delta Nrxn3 Neurexin III BC060719 178 0 178 0 0 16.25 37.23 0.44 -3.93 Hyaluronan and proteoglycan link protein 4 Hapln4 AK034300 0 0 0 0 0 9.71 20.10 0.48 -3.86 precursor Kl Klotho NM_013823 0 0 0 0 0 2.53 6.33 0.40 -3.84 Park2 Parkin AF250294 39 0 39 0 0 0.63 2.18 0.29 -3.78 Fgf14 Fibroblast growth factor 14 isoform bNM_207667 85 0 85 0 0 6.99 14.99 0.47 -3.76 Sdk1 Sidekick homolog 1NM_177879 29 0 29 0 0 0.51 1.54 0.33 -3.69 Efna5 Ephrin A5 isoform 2NM_010109 21 0 21 0 0 3.90 9.33 0.42 -3.69 Kcnip4 Kv channel interacting protein 4 AK148685 109 0 109 0 0 3.98 9.30 0.43 -3.58 Cntnap2 Contactin associated protein-like 2 isoform b NM_025771 116 0 116 0 0 11.00 21.03 0.52 -3.52 Tmem28 Transmembrane protein 28 NM_173446 79 0 79 0 0 6.21 12.27 0.51 -3.49 Sntg1 Syntrophin, gamma 1NM_027671 23 0 23 0 0 2.01 4.88 0.41 -3.43 Hs6st3 Heparan sulfate 6-O-sulfotransferase 3NM_015820 80 0 79 0 1 4.14 8.95 0.46 -3.35 AK134882 Hypothetical protein AK134882 2 0 2 0 0 2.47 5.55 0.44 -3.33 Kcnj3 Potassium inwardly-rectifying channel J3 NM_008426 21 0 21 0 0 6.57 12.70 0.52 -3.32 Hdh Huntington disease gene homolog NM_010414 10 0 9 1 0 8.52 16.55 0.51 -3.29 Arc Activity regulated cytoskeletal-associated NM_018790 0 0 0 0 0 25.40 50.51 0.50 -3.29 Rims1 Rims1 protein BC055294 6 0 6 0 0 15.32 29.50 0.52 -3.25 Klf10 Kruppel-like factor 10 NM_013692 0 0 0 0 0 6.55 12.13 0.54 -3.19 Nlgn1 Neuroligin 1NM_138666 102 0 101 1 0 8.55 16.28 0.53 -3.19 Kcnk4 TRAAK NM_008431 0 0 0 0 0 0.76 2.07 0.37 -3.16 Folr1 Folate receptor 1 NM_008034 0 0 0 0 0 0.67 1.81 0.37 -3.16 Slc17a8 Vesicular -3 NM_182959 0 0 0 0 0 1.08 2.84 0.38 -3.16 Similar to potassium channel protein erg Kcnh7 AK034003 23 0 23 0 0 2.90 5.92 0.49 -3.12 long isoform AK171114 weakly similar to Keratin 8 AK171114 0 0 0 0 0 0.56 1.39 0.40 -3.11 Chat Choline acetyltransferase NM_009891 2 0 2 0 0 0.97 2.59 0.37 -3.11 Rgs6 Regulator of G-protein signaling 6NM_015812 37 0 37 0 0 1.68 4.03 0.42 -3.10 Slit3 Slit homolog 3NM_011412 40 0 40 0 0 4.10 8.39 0.49 -3.10 Grm7 Glutamate receptor, metabotropic 7NM_177328 78 0 77 1 0 8.82 16.33 0.54 -3.09 Protein tyrosine phosphatase, receptor type, Ptprn2 AK138655 64 0 63 0 1 18.06 34.41 0.52 -3.08 N polypeptide 2 Lsamp Limbic system-associated membrane protein AK140043 71 0 70 0 1 15.54 28.93 0.54 -3.07 Opcml Opioid binding protein/cell adhesion NM_177906 116 0 115 0 1 18.55 35.28 0.53 -3.06 Hypothetical Glycosyl transferase, family 2 4930431L04Rik BC111102 23 0 23 0 0 0.63 1.54 0.41 -3.00 containing protein Ica1 Islet cell autoantigen 1 NM_010492 16 0 16 0 0 4.83 9.23 0.52 -2.99 Col8a2 Procollagen, type VIII, alpha 2NM_199473 0 0 0 0 0 0.61 1.46 0.42 -2.98 Kcnq5 KCNQ5 potassium channel AK147264 36 0 36 0 0 14.96 26.18 0.57 -2.97 Mapk11 Mitogen-activated protein kinase 11 NM_011161 0 0 0 0 0 5.36 9.86 0.54 -2.97 Gpc5 Similar to BA93M14.1 (Glypican 5) AK084223 24 0 24 0 0 3.87 7.69 0.50 -2.96 Sgcz Zeta-sarcoglycan (Zeta-SG) (ZSG1) AK136779 56 1 55 0 0 0.74 1.85 0.40 -2.95 Astn2 Astrotactin 2 isoform aNM_019514 72 0 72 0 0 5.08 9.45 0.54 -2.95 Otud7 Hypothetical protein LOC170711 NM_130880 30 0 30 0 0 4.47 8.80 0.51 -2.95 Potassium inwardly-rectifying channel, Kcnj6 AK044218 28 0 27 0 1 3.28 6.37 0.52 -2.92 subfamily J, member 6 Myl4 Myosin, light polypeptide 4NM_010858 0 0 0 0 0 2.83 5.55 0.51 -2.91 Sostdc1 Cystine knot-containing secreted protein NM_025312 0 0 0 0 0 0.75 1.82 0.41 -2.90 Npr3 Natriuretic peptide receptor 3 isoform bNM_001039181 2 0 2 0 0 0.64 1.52 0.43 -2.89 Carboxyl-terminal PDZ ligand of neuronal AK138471 AK138471 38 0 38 0 0 5.60 9.97 0.56 -2.88 nitric oxide synthase homolog Kirrel2 Kin of IRRE-like 2NM_172898 0 0 0 0 0 0.86 2.12 0.41 -2.85 Vamp1 Vesicle-associated membrane protein 1 BC049902 1 1 0 0 1 19.15 34.73 0.55 -2.83 Trhde Thyrotropin-releasing hormone degrading NM_146241 17 0 17 0 0 2.82 5.43 0.52 -2.81 Gfra2 Glial cell line derived neurotrophic factor NM_008115 6 0 6 0 0 9.38 16.06 0.58 -2.80 C630028F04Rik Hypothetical protein LOC243274 NM_172885 43 0 43 0 0 3.15 5.93 0.53 -2.79 Adra1d Adrenergic receptor, alpha 1d NM_013460 0 0 0 0 0 0.99 2.30 0.43 -2.77 Upp2 Weakly similar to uridine phosphorylase BC027189 23 0 23 0 0 0.77 1.77 0.43 -2.76 C630007B19Rik TAFA1 protein NM_182808 56 0 56 0 0 1.93 4.10 0.47 -2.76 Neurexin-1-alpha precursor (Neurexin I- Nrxn1 AB093249 162 0 162 0 0 18.66 33.28 0.56 -2.76 alpha) Hypothetical Proline-rich region AB125594 profile/Serine-rich region profile containing AB125594 45 0 43 0 2 9.06 15.71 0.58 -2.76 protein

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 18 Polymenidou et al SUPPLEMENTARY INFORMATION

Protein tyrosine phosphatase, receptor type, Ptprt NM_021464 74 0 74 0 0 8.13 14.29 0.57 -2.75 T Grm8 Glutamate receptor, metabotropic 8NM_008174 21 0 21 0 0 1.18 2.67 0.44 -2.75 Siglech SIGLEC-like protein NM_178706 0 0 0 0 0 6.14 10.50 0.59 -2.74 Ryr2 Ryr2 protein BC043140 41 0 41 0 0 1.82 3.83 0.47 -2.73 Nrn1 Neuritin 1NM_153529 2 0 2 0 0 17.81 31.03 0.57 -2.73 A830018L16Rik VEST-1 protein homolog AK135047 8 0 8 0 0 15.04 24.75 0.61 -2.73 Smyd3 SET and MYND domain containing 3NM_027188 39 0 39 0 0 4.41 8.20 0.54 -2.72 Tumor suppressing subtransferable Tssc1 NM_201357 6 0 6 0 0 4.51 8.29 0.54 -2.72 candidate 1 AK161480 Hypothetical protein AK161480 0 0 0 0 0 0.55 1.18 0.46 -2.71 Trpc3 Transient receptor potential cation channel NM_019510 3 0 3 0 0 3.69 6.83 0.54 -2.71 Gamma-aminobutyric acid (GABA-A) Gabra3 NM_008067 2 0 2 0 0 15.39 25.51 0.60 -2.71 receptor Aqp1 Aquaporin 1NM_007472 0 0 0 0 0 0.67 1.44 0.46 -2.68 Plcb1 Plcb1 protein BC058710 59 0 59 0 0 24.54 44.03 0.56 -2.68 Kcnk4 Potassium channel subfamily K member 4 BC061075 0 0 0 0 0 0.57 1.21 0.47 -2.67 Calcium-activated potassium channel alpha Kcnma1 U09383 96 0 96 0 0 8.34 14.36 0.58 -2.66 subunit 1 Cacna1c Calcium channel, voltage-dependent, L type, NM_009781 77 0 77 0 0 5.57 9.56 0.58 -2.65 Slc10a4 Solute carrier family 10 NM_173403 0 0 0 0 0 1.24 2.70 0.46 -2.65 E030025D05Rik E030025D05Rik protein AK173298 20 1 18 0 1 6.06 10.10 0.60 -2.62 Ttr Transthyretin NM_013697 0 0 0 0 0 44.88 80.56 0.56 -2.62 Zinc transporter 3 (ZnT-3) (Solute carrier Slc30a3 BC066199 0 0 0 0 0 5.66 9.55 0.59 -2.60 family 30 member 3) Kcns3 Potassium voltage-gated channel, NM_173417 0 0 0 0 0 0.58 1.20 0.48 -2.60 4933439F18Rik Hypothetical protein LOC66771 NM_025757 1 1 0 0 0 29.33 51.30 0.57 -2.59 Two cut domains-containing homeodomain Satb2 NM_139146 5 0 5 0 0 6.79 10.98 0.62 -2.58 protein Kin of IRRE-like protein 3 precursor (Kin of Kirrel3 AK045373 74 0 74 0 0 6.02 9.99 0.60 -2.58 irregular chiasm-like protein 3) F5 Coagulation factor VNM_007976 0 0 0 0 0 0.94 2.08 0.45 -2.57 Similar to neurula-specific ferrodoxin 2810401C16Rik AK158809 3 0 2 0 1 10.73 17.93 0.60 -2.57 reductase-like protein homolog Adcy1 Adenylate cyclase 1NM_009622 17 0 12 0 5 38.58 68.45 0.56 -2.56 A130090K04Rik Hypothetical protein LOC320495 NM_001033391 3 0 3 0 0 2.68 4.95 0.54 -2.56 Glutamate receptor, ionotropic, NMDA2A Grin2a NM_008170 83 0 83 0 0 2.19 4.23 0.52 -2.56 (epsilon) Cacna1e Calcium channel, voltage-dependent, R type NM_009782 40 0 34 6 0 8.43 14.18 0.59 -2.55 Fgf12 Fibroblast growth factor 12 isoform bNM_010199 81 0 81 0 0 16.54 27.53 0.60 -2.54 Cpne4 Weakly similar to copine VII AK014396 8 0 8 0 0 7.73 12.71 0.61 -2.53 Weakly similar to heterogeneous nuclear 0710005M24Rik BC052358 59 1 57 0 1 7.07 11.27 0.63 -2.53 ribonucleoproteins C1/C2 (hnRNPC1/C2) Col19a1 Procollagen, type XIX, alpha 1NM_007733 3 0 3 0 0 0.61 1.24 0.49 -2.53 Hypothetical calcium-binding EGF-like AK018073 AK018073 39 0 39 0 0 0.57 1.16 0.49 -2.52 domain containing protein homolog Gpr26 G protein-coupled receptor 26 NM_173410 2 0 2 0 0 10.21 16.56 0.62 -2.51 Cpne9 Similar to copine-like protein KIAA1599 AK044080 3 0 3 0 0 5.59 9.29 0.60 -2.51 Hypothetical EF-hand/Phorbol esters/diacylglycerol binding Dgkb AK083222 28 0 28 0 0 35.79 62.69 0.57 -2.51 domain/Cytochrome c family heme-binding site containing protein Leucine-rich repeat-containing protein 17 BC030317 BC030317 0 0 0 0 0 0.52 1.06 0.50 -2.50 precursor Calcium/calmodulin-dependent protein Camk4 NM_009793 28 0 28 0 0 20.31 34.26 0.59 -2.48 kinase IV Odz1 Odd Oz/ten-m homolog 1NM_011855 26 0 26 0 0 2.37 4.38 0.54 -2.48 Wwox WW-domain oxidoreductase NM_019573 62 0 62 0 0 1.14 2.38 0.48 -2.47 Igsf4d Weakly similar to BK134P22.1 AK046800 110 0 109 1 0 18.54 30.84 0.60 -2.47 AK138164 NB-2 homolog AK138164 34 0 34 0 0 2.34 4.32 0.54 -2.47 Glutamate receptor, ionotropic, kainate 2 Grik2 NM_010349 52 0 52 0 0 17.00 27.85 0.61 -2.46 (beta) Gfod1 Hypothetical protein LOC328232 NM_001033399 20 0 20 0 0 15.37 23.88 0.64 -2.46 Glutamate receptor, ionotropic, NMDA2C Grin2c NM_010350 3 0 3 0 0 7.38 11.76 0.63 -2.45 (epsilon) Klc2 Klc2 protein BC014845 0 0 0 0 0 15.86 24.95 0.64 -2.45 Ttc15 Weakly similar to CGI-87 PROTEIN AK031763 2 0 2 0 0 10.60 16.93 0.63 -2.45 L42339 3 L42339 2 0 2 0 0 8.94 14.67 0.61 -2.42 Rtn4r Nogo receptor NM_022982 4 0 4 0 0 7.34 11.59 0.63 -2.41 Syt13 Synaptotagmin XIII NM_030725 4 0 3 0 1 18.66 30.56 0.61 -2.41 Weakly similar to Hypothetical band 4.1 Pdzd10 AK147370 63 0 63 0 0 9.67 15.47 0.62 -2.41 family containing protein Ecel1 Damage-induced neuronal endopeptidase NM_021306 0 0 0 0 0 2.49 4.46 0.56 -2.40 Dpp10 Dipeptidyl peptidase 10 NM_199021 75 0 74 0 1 10.39 16.43 0.63 -2.40 Fbxw8 F-box protein FBX29 AK015338 0 0 0 0 0 1.45 2.83 0.51 -2.39 Collagen and calcium binding EGF domains Ccbe1 NM_178793 4 0 3 0 1 0.61 1.18 0.52 -2.37 1 Slc24a3 Solute carrier family 24 NM_053195 17 0 17 0 0 13.17 20.23 0.65 -2.36 Snrp70 U1 small nuclear ribonucleoprotein 70 kDa NM_009224 2 0 1 1 0 69.49 117.85 0.59 -2.36 Podxl2 Podocalyxin-like 2NM_176973 1 0 1 0 0 12.22 18.69 0.65 -2.35 A930001M12Rik Hypothetical protein LOC320500 NM_177175 0 0 0 0 0 0.85 1.68 0.51 -2.34 Gpr158 G protein-coupled receptor 158 isoform bNM_175706 20 0 20 0 0 29.70 48.92 0.61 -2.34 Zfp180 Zfp180 protein BC063741 3 1 2 0 0 16.63 26.55 0.63 -2.34 Cbln2 Cerebellin 2 precursor protein NM_172633 0 0 0 0 0 3.66 6.17 0.59 -2.33 Stx1a Syntaxin 1A NM_016801 1 0 1 0 0 24.86 41.33 0.60 -2.33 Neural cell adhesion molecule 2 precursor Ncam2 (N-CAM 2) (RB-8 neural cell adhesion AF001287 37 0 37 0 0 7.79 12.23 0.64 -2.33 molecule) Fosl2 Fos-like antigen 2 AK145822 1 0 1 0 0 6.83 10.53 0.65 -2.32

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 19 Polymenidou et al SUPPLEMENTARY INFORMATION

Pcdh7 Protocadherin 7NM_018764 0 0 0 0 0 8.97 14.37 0.62 -2.32 Phosphodiesterase 1A, calmodulin- Pde1a NM_001009978 28 0 28 0 0 11.79 17.86 0.66 -2.32 dependent Tnfrsf25 Tnfrsf25 protein BC017526 1 0 1 0 0 1.64 3.11 0.53 -2.32 Amyloid beta A4 protein-binding family A Apba2bp member 2-binding protein (X11L-binding AK013520 0 0 0 0 0 6.64 10.29 0.65 -2.31 protein 51) (mXB51) Kcns2 K+ voltage-gated channel, subfamily S, 2NM_181317 0 0 0 0 0 6.22 9.68 0.64 -2.30 BC052055 Hypothetical protein LOC328643 NM_182636 0 0 0 0 0 4.33 7.20 0.60 -2.30 Alas2 Aminolevulinic acid synthase 2, erythroid NM_009653 0 0 0 0 0 0.53 1.00 0.53 -2.30 D14Ertd171e CAZ-associated structural protein NM_177814 62 0 61 0 1 17.51 27.80 0.63 -2.30 Magi2 Activin receptor interacting protein 1 AK147530 62 0 62 0 0 15.73 23.87 0.66 -2.30 UDP-N-acetyl-alpha-D- Galnt9 NM_198306 9 0 9 0 0 5.48 8.80 0.62 -2.30 galactosamine:polypeptide Orphan nuclear receptor NR4A3 (Orphan Nr4a3 nuclear receptor TEC) (Translocated in BC068150 0 0 0 0 0 4.32 7.17 0.60 -2.30 extraskeletal chondrosarcoma) Cntn4 Contactin 4 AK163520 37 0 37 0 0 6.24 9.63 0.65 -2.29 Atp2b3 Plasma membrane calcium ATPase 3NM_177236 2 0 2 0 0 15.15 23.00 0.66 -2.28 G protein-activated inward rectifier potassium channel 3 (GIRK3) (Potassium Kcnj9 channel, inwardly rectifying subfamily J BC065161 0 0 0 0 0 14.13 21.28 0.66 -2.28 member 9) (Inwardly rectifier K(+) channel Kir3.3) Large like-glycosyltransferase NM_010687 38 0 38 0 0 17.42 27.47 0.63 -2.28 Accn1 Accn1 protein BC038551 110 0 110 0 0 5.29 8.55 0.62 -2.28 RAS protein-specific guanine nucleotide- Rasgrf2 AK158472 1 0 1 0 0 5.30 8.57 0.62 -2.27 releasing factor 2 Xkr4 XK-related protein 4NM_001011874 42 0 42 0 0 3.75 6.24 0.60 -2.27 Cabp1 calcium binding protein 1NM_013879 2 0 2 0 0 11.29 17.56 0.64 -2.26 Scube1 Scube1 protein BC066066 4 0 4 0 0 3.63 5.88 0.62 -2.26 Ephb6 Eph receptor B6 NM_007680 1 0 1 0 0 13.80 20.81 0.66 -2.26 AK147314 Delta kalirin-7 homolog AK147314 33 0 33 0 0 38.00 63.06 0.60 -2.25 Wnt9a Wnt9a protein BC066165 1 0 1 0 0 1.75 3.26 0.54 -2.25 2900046G09Rik Hypothetical protein LOC78408 NM_133778 0 0 0 0 0 24.10 39.57 0.61 -2.25 Voltage-gated potassium channel beta-3 Kcnab3 subunit (K(+) channel beta-3 subunit) (Kv- NM_010599 0 0 0 0 0 4.85 7.87 0.62 -2.25 beta-3) Atrnl1 Attractin-like 1NM_181415 37 0 37 0 0 11.96 17.88 0.67 -2.24 Kcnmb4 Calcium activated potassium channel beta 4NM_021452 4 0 4 0 0 8.43 12.95 0.65 -2.23 Nuclear receptor subfamily 4, group A, Nr4a1 NM_010444 0 0 0 0 0 26.01 42.41 0.61 -2.23 member 1 Phospho1 Phosphatase, orphan 1NM_153104 0 0 0 0 0 1.90 3.41 0.56 -2.22 Bai3 Brain-specific angiogenesis inhibitor 3 BC032251 91 0 91 0 0 13.83 20.65 0.67 -2.21 RalBP1 associated Eps domain containing Reps2 AK147418 4 0 4 0 0 25.57 41.37 0.62 -2.21 protein Hypothetical Phenylalanine-rich region Mamdc1 AK163637 74 1 73 0 0 5.78 8.96 0.65 -2.21 profile containing protein Emx1 Empty spiracles homolog 1NM_010131 0 0 0 0 0 0.99 1.89 0.53 -2.20 Slit2 Slit homolog 2 protein precursor (Slit-2) AK220505 10 0 10 0 0 2.05 3.63 0.56 -2.20 Rgs7 Regulator of G protein signaling 7NM_011880 55 1 54 0 0 9.76 15.02 0.65 -2.20 Rims4 Regulating synaptic membrane exocytosis 4NM_183023 5 0 5 0 0 3.62 5.80 0.62 -2.20 Osbpl10 Oxysterol-binding protein-like protein 10 NM_148958 27 0 27 0 0 3.10 5.03 0.62 -2.20 Slc5a7 Solute carrier family 5 () NM_022025 0 0 0 0 0 3.50 5.60 0.62 -2.20 Tbr1 T-box brain gene 1 AK158835 2 0 0 0 2 15.44 23.26 0.66 -2.19 Rfx3 Regulatory factor X, 3 NM_011265 30 0 30 0 0 3.30 5.27 0.63 -2.19 Lynx1 Ly6/neurotoxin 1NM_011838 0 0 0 0 0 62.35 101.97 0.61 -2.19 Itpka Inositol 1,4,5-trisphosphate 3-kinase ANM_146125 0 0 0 0 0 17.25 26.73 0.65 -2.19 Irxl1 Hypothetical protein LOC210719 NM_177595 5 0 5 0 0 1.79 3.27 0.55 -2.19 Sema5b Semaphorin-5B precursor (Semaphorin G) BC052397 1 0 1 0 0 3.97 6.42 0.62 -2.18 Hnt Neurotrimin precursor BC023307 127 0 127 0 0 21.16 33.56 0.63 -2.18 AMP-activated protein kinase subunit Prkag1 AK157795 0 0 0 0 0 2.04 3.59 0.57 -2.18 gamma 1 Chrm3 Cholinergic receptor, muscarinic 3, cardiac NM_033269 42 17 42 0 0 18.85 29.40 0.64 -2.17 3110035E14Rik Hypothetical protein LOC76982 NM_178399 6 0 5 0 1 40.66 66.20 0.61 -2.17 2610044O15Rik Hypothetical protein AK137318 1 0 1 0 0 9.02 13.90 0.65 -2.16 Rit2 Ras-like without CAAX 2NM_009065 10 0 10 0 0 12.49 18.40 0.68 -2.16 AK142702 Hypothetical protein AK142702 0 0 0 0 0 0.84 1.54 0.55 -2.16 Egr4 Early growth response 4NM_020596 0 0 0 0 0 6.65 9.92 0.67 -2.15 Mical2 Flavoprotein oxidoreductase MICAL2 NM_177282 14 0 14 0 0 20.36 31.81 0.64 -2.15 Sstr3 Somatostatin receptor 3NM_009218 0 0 0 0 0 2.43 4.08 0.60 -2.15 Ldlr Low density lipoprotein receptor AK170264 1 0 1 0 0 3.78 6.03 0.63 -2.14 Hcn1 Hyperpolarization-activated, cyclic NM_010408 23 0 23 0 0 4.69 7.45 0.63 -2.13 Hypothetical homeodomain-like structure E330039K12Rik AK220342 19 0 19 0 0 10.34 15.48 0.67 -2.13 containing protein Lrrn6c Leucine rich repeat neuronal 6C NM_175516 1 0 1 0 0 2.24 3.84 0.58 -2.13 BC107398 Hypothetical protein BC107398 1 0 1 0 0 1.04 1.93 0.54 -2.13 Sidt1 SID1 transmembrane family, member 1NM_198034 13 0 13 0 0 3.38 5.31 0.64 -2.12 Oprd1 Opioid receptor, delta 1NM_013622 0 0 0 0 0 7.33 10.66 0.69 -2.12 Insig1 Insulin induced gene 1NM_153526 0 0 0 0 0 9.29 14.22 0.65 -2.12 Mib1 Ubiquitin ligase mind bomb NM_144860 15 0 12 2 1 14.45 21.16 0.68 -2.12 B230343H07Rik RIKEN cDNA B230343H07 NM_001037906 19 0 19 0 0 3.48 5.48 0.64 -2.11 FXYD domain-containing ion transport Fxyd7 NM_022007 1 0 1 0 0 4.44 7.04 0.63 -2.11 regulator Unc5d Netrin receptor Unc5h4 NM_153135 45 0 45 0 0 1.48 2.62 0.57 -2.11 Hypothetical protein FLJ34862 (Malic Me3 AK133593 2 0 2 0 0 5.74 8.73 0.66 -2.10 enzyme) homolog Calcium channel, voltage-dependent, P/Q Cacna1a AK052625 35 0 35 0 0 20.73 32.14 0.65 -2.10 type, alpha 1A subunit Potassium voltage-gated channel subfamily S69381 S69381 0 0 0 0 0 19.96 30.98 0.64 -2.10 C member 3

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 20 Polymenidou et al SUPPLEMENTARY INFORMATION

Negr1 Neurotractin isoform aNM_001039094 92 0 91 1 0 28.02 44.28 0.63 -2.09 Similar to CUB and sushi multiple domains AK144913 AK144913 2 0 2 0 0 2.04 3.48 0.59 -2.09 protein 2 St6galnac3 ST6 NM_011372 60 0 60 0 0 1.31 2.32 0.56 -2.09 Serpinb8 Serine protease inhibitor 8 AK048230 2 0 2 0 0 1.45 2.56 0.57 -2.09 Rnf152 RING finger protein 152 AK035832 10 0 10 0 0 0.64 1.11 0.58 -2.08 AI894139 Weakly similar to zinc finger protein ZNF65 AK133549 1 0 1 0 0 3.61 5.65 0.64 -2.08 Unc13a Weakly similar to Munc- 13 BC058348 15 2 13 0 1 40.18 64.15 0.63 -2.08 Dlgap1 Disks large-associated protein 1 (DAP-1) AK163689 83 1 82 0 0 25.05 39.32 0.64 -2.06 Klf16 Kruppel-like factor 16 NM_078477 0 0 0 0 0 14.71 21.28 0.69 -2.06 Sodium/calcium exchanger 1 precursor Slc8a1 NM_011406 36 0 36 0 0 8.61 12.73 0.68 -2.06 (Na(+)/Ca(2+)-exchange protein 1) Egr2 Early growth response 2NM_010118 0 0 0 0 0 3.64 5.59 0.65 -2.06 Dhcr24 24-dehydrocholesterol reductase NM_053272 0 0 0 0 0 6.85 10.00 0.69 -2.06 UDP-N-acetyl-alpha-D- Galnt13 NM_173030 26 0 26 0 0 9.46 14.28 0.66 -2.06 galactosamine:polypeptide Mesenchymal stem cell protein DSC54 AK046456 AK046456 20 0 20 0 0 1.63 2.86 0.57 -2.05 homolog Dock4 Dedicator of cytokinesis 4NM_172803 47 0 46 0 1 14.46 20.85 0.69 -2.05 BC025575 Hypothetical protein LOC217219 NM_199200 0 0 0 0 0 17.20 25.89 0.66 -2.05 Ncald Neurocalcin delta NM_134094 34 0 34 0 0 35.45 55.75 0.64 -2.03 Neurofilament triplet M protein (160 kDa Nef3 AK051696 0 0 0 0 0 43.91 69.36 0.63 -2.03 neurofilament protein) Neurod6 Neurogenic differentiation 6NM_009717 0 0 0 0 0 9.52 14.29 0.67 -2.02 Sema7a Semaphorin 7A NM_011352 1 0 1 0 0 26.85 41.81 0.64 -2.02 Syt16 Synaptotagmin 14-like AK137129 17 0 17 0 0 10.45 15.29 0.68 -2.02 Rims3 Hypothetical protein AK122226 7 0 7 0 0 22.53 34.57 0.65 -2.02 Sodium channel, voltage-gated, type I, beta AK028238 AK028238 3 0 3 0 0 35.01 55.05 0.64 -2.02 polypeptide Arfl4 ADP-ribosylation factor 4-like NM_025404 0 0 0 0 0 14.88 21.39 0.70 -2.02 Hypothetical P-loop containing nucleotide AK034089 triphosphate hydrolases structure containing AK034089 6 0 6 0 0 1.05 1.86 0.57 -2.02 protein Cckbr Gastrin/cholecystokinin type B receptor BC103530 0 0 0 0 0 7.29 10.40 0.70 -2.02 Itpr1 Inositol 1,4,5-triphosphate receptor 1 AK049015 31 1 28 0 2 69.56 109.54 0.63 -2.01 BC089626 Protein C6orf142 homolog BC089626 9 0 8 1 0 3.45 5.26 0.66 -2.01 Eda Ectodysplasin A (EDA protein homolog) AJ243657 1 0 1 0 0 0.71 1.20 0.59 -2.01 Protein kinase, cGMP-dependent, type I Prkg1 NM_001013833 46 0 46 0 0 0.77 1.31 0.59 -2.01 alpha U7 snRNA-associated Sm-like protein Lsm11 NM_028185 0 0 0 0 0 0.99 1.72 0.57 -2.01 Lsm11 3-hydroxy-3-methylglutaryl-Coenzyme A Hmgcr NM_008255 2 0 1 1 0 13.27 18.85 0.70 -2.01 reductase A230078I05Rik Hypothetical protein LOC319998 NM_177056 0 0 0 0 0 7.82 11.29 0.69 -2.00 5730507A09Rik Hypothetical protein LOC70638 NM_183087 2 0 2 0 0 4.64 7.12 0.65 -2.00 Potassium voltage gated channel, Shaw- Kcnc2 NM_001025581 27 0 26 1 0 8.75 12.79 0.68 -2.00 related

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 21 Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Table 3 ncRNAs with changes in expression levels upon TDP-43 depletion in adult mouse brain

RNA-seq RNAdb RPKM in RPKM in RPKM in name Chr Start Stop strand Change index TDP-43 KD control saline reverse delta5-desaturase RNA LIT3360 chr19 10249682 10250370 - 3.00 0.12 0.12 up (Revdelta5ase) Kruppel-type zinc-finger protein LIT1822 ZIM3-like mRNA, complete chr7 6559771 6580748 - 0.62 0.01 0.01 up sequence LIT1602 Tsix gene chrX 99634235 99687675 + 4.45 0.08 0.11 up domesticus antisense RNA LIT1604 from the Xist locus, complete chrX 99634235 99687675 + 4.45 0.08 0.11 up sequence LIT3442 C030002C11Rik gene chr1 196724832 196738628 + 2.96 8.63 6.81 down LIT3440 D630033A02Rik gene chr10 122388136 122388891 - 0.64 1.72 1.95 down LIT3441 2210403K04Rik gene chr11 75277842 75282884 + 1.29 3.58 3.11 down clone MBI-109 miscellaneous LIT1799 chr12 90196061 90196315 + 0.19 2.00 1.95 down RNA, partial sequence LIT1623 Rian mRNA, imprinted gene chr12 110093968 110100034 + 6.43 20.01 17.72 down LIT1749 Mirg mRNA chr12 110182787 110197261 + 2.12 7.39 6.70 down Otx2 opposite strand (Otx2OS) LIT3409 chr14 47591238 47715298 + 0.01 0.02 0.01 down RNA, variant isoform Otx2 opposite strand (Otx2OS) LIT3410 chr14 47592109 47641765 + 0.01 0.02 0.01 down RNA, variant isoform Otx2 opposite strand (Otx2OS) LIT3411 chr14 47595527 47783406 + 0.01 0.02 0.01 down RNA, variant isoform RIKEN cDNA 2810037O22 LIT3368 chr14 59340093 59342302 + 4.89 15.56 18.00 down gene Dleu2 mRNA, partial sequence, LIT3422 chr14 60557141 60636460 - 0.39 0.88 1.02 down alternatively spliced retinal noncoding RNA 3 LIT3369 chr14 63542224 63547029 + 11.49 42.49 35.15 down (RNCR3) RIKEN cDNA 3010002C02 LIT3366 chr16 8766673 8767567 + 6.62 13.75 18.15 down gene retinal noncoding RNA 1 LIT3371 chr17 85515685 85526691 - 0.61 1.41 2.15 down (RNCR1) LIT2029 vault RNA, complete sequence chr18 36927839 36927979 + 2.96 8.92 3.67 down vault-associated RNA LIT2028 chr18 36927840 36927980 + 2.96 8.92 3.67 down sequence metastasis associated lung adenocarcinoma transcript 1 LIT3352 chr19 5537030 5801973 - 3.61 13.25 10.70 down (non-coding RNA), mRNA (cDNA clone IMAGE:3582796) metastasis associated lung adenocarcinoma transcript 1 LIT3351 chr19 5625873 5801971 - 5.29 19.76 15.91 down (non-coding RNA), mRNA (cDNA clone IMAGE:3601939) Hepcarcin RNA, complete LIT3362 chr19 5795689 5802671 - 123.89 522.39 428.33 down sequence clone MBI-32 miscellaneous LIT1787 chr19 5796492 5796672 - 97.17 415.23 430.62 down RNA, partial sequence clone MBII-386 miscellaneous LIT1651 chr19 5801468 5801548 - 53.60 219.41 222.85 down RNA, partial sequence U22 snoRNA host gene (UHG) LIT1387 chr19 8791184 8793485 + 2.73 7.76 7.07 down gene, complete sequence empty spiracles 2 opposite LIT1111 chr19 59478353 59511885 - 0.08 0.21 0.21 down strand (EMX2OS) LIT1671 Nespas mRNA chr2 173909368 173909996 - 0.46 1.10 1.68 down RIKEN cDNA 2610100L16 LIT3367 chr3 17982103 17991742 + 2.07 4.94 7.09 down gene RNA component of LIT3323 mitochondrial RNAase P (Rmrp) chr4 43513885 43514159 - 11.64 40.17 29.24 down on chromosome 4. clone MBII-109 miscellaneous LIT1806 chr4 108974503 108974582 - 15.05 38.10 3.63 down RNA, partial sequence clone MBII-367 miscellaneous LIT1647 chr4 131581604 131581673 + 2.12 8.30 8.88 down RNA, partial sequence

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 22 Polymenidou et al SUPPLEMENTARY INFORMATION

RNA transcript from U17 small LIT1313 chr4 131624009 131625709 - 1.42 3.89 3.02 down nucleolar RNA host gene LIT1314 U17HG gene chr4 131624010 131625709 - 1.42 3.89 3.02 down clone MBII-123 miscellaneous LIT1808 chr4 148151418 148151511 - 17.86 42.33 13.45 down RNA, partial sequence makorin 1 pseudogene mRNA, LIT1345 chr5 89913208 89913907 - 0.48 1.69 1.18 down partial sequence retinal noncoding RNA 2 LIT3370 chr5 112453533 112456585 - 9.34 37.31 18.35 down (RNCR2) LIT3450 Evf1 RNA, full length chr6 6771701 6811661 - 0.35 1.27 1.82 down Mit1/Lb9 mRNA, partial LIT1225 chr6 30736982 30738860 + 12.42 34.37 39.93 down sequence molossinus Mit1/Lb9 gene, LIT1226 chr6 30738334 30738754 + 13.80 39.68 46.32 down partial sequence Y3 scRNA gene, partial LIT2020 chr6 47711224 47711289 + 741.35 2636.11 1178.13 down sequence. Y1 scRNA gene, partial LIT2015 chr6 47717676 47717749 - 703.99 2681.21 1067.28 down sequence ROSA 26 transcription 1 mRNA LIT3416 chr6 113036182 113042974 - 1.12 2.43 2.52 down sequence UBE3A-ATS transcript LIT3432 chr7 59091716 59094070 - 1.32 4.60 3.67 down containing exon L Ipw gene chr7 59483753 59544353 - 0.03 0.10 0.08 down LIT1675 Ipw mRNA, partial sequence chr7 59487027 59544354 - 0.39 1.50 1.23 down Pwcr1 mRNA, complete LIT1355 chr7 59739300 59744327 - 0.07 0.26 0.12 down sequence LIT1629 KvLQT1-AS allele (LIT1) chr7 143103111 143107841 - 0.23 0.70 0.62 down LIT1703 7SK class III RNA gene chr9 77960986 77961317 - 37.46 124.99 80.77 down inactive X specific transcripts LIT3324 (Xist) on chromosome X, chrX 99663092 99685952 - 12.17 37.86 38.29 down transcript variant 1 inactive X specific transcripts LIT3329 (Xist) on chromosome X, chrX 99663092 99685952 - 12.17 37.86 38.29 down transcript variant 2 LIT1010 Jpx mRNA, partial sequence chrX 99696298 99696574 + 1.64 5.92 4.94 down LIT1761 Jpx mRNA, partial sequence chrX 99696350 99705674 + 0.42 1.04 0.28 down LIT1009 Jpx mRNA, partial sequence chrX 99696350 99699246 + 0.33 0.98 3.13 down LIT1760 Jpx mRNA, partial sequence chrX 99696383 99705680 + 0.39 0.94 0.90 down LIT1008 Jpx mRNA, partial sequence chrX 99696384 99705680 + 0.39 0.94 0.90 down LIT1006 Ftx noncoding RNA, exons 6-7 chrX 99809719 99817912 - 0.19 0.42 0.63 down LIT1003 Ftx noncoding RNA, exons 6-7 chrX 99810272 99826413 - 0.19 0.38 0.50 down LIT1001 Ftx noncoding RNA, exons 6-7 chrX 99810325 99812065 - 0.17 0.47 4.06 down

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 23 Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Table 4 Enriched Gene Ontology terms in TDP-43 target RNAs that are downregulated upon TDP-43 depletion in adult brain

Gene Number of Total number % of listed Corrected Ontology Enriched Gene Ontology Term listed genes of genes in genes in p-value* Category in term term term

GO:0005216 ion channel activity 21 164 12.8 6.64E-12 GO:0022838 substrate specific channel activity 21 165 12.7 3.74E-12 GO:0015267 channel activity 21 169 12.4 3.97E-12 GO:0022803 passive transmembrane transporter activity 21 169 12.4 3.97E-12 GO:0005261 cation channel activity 18 115 15.7 7.19E-12 GO:0022836 gated channel activity 19 138 13.8 9.42E-12 GO:0046873 metal ion transmembrane transporter activity 19 146 13.0 2.13E-11 GO:0005509 calcium ion binding 26 391 6.6 1.03E-09 GO:0022843 voltage-gated cation channel activity 11 67 16.4 5.31E-07 GO:0005244 voltage-gated ion channel activity 12 94 12.8 1.20E-06 GO:0022832 voltage-gated channel activity 12 94 12.8 1.20E-06 Molecular GO:0005262 calcium channel activity 8 34 23.5 6.74E-06 GO:0022834 ligand-gated channel activity 8 46 17.4 5.34E-05 Function GO:0015276 ligand-gated ion channel activity 8 46 17.4 5.34E-05 GO:0031420 alkali metal ion binding 10 103 9.7 1.94E-04 GO:0030955 potassium ion binding 8 63 12.7 3.87E-04 GO:0005267 potassium channel activity 8 65 12.3 4.42E-04 GO:0005249 voltage-gated potassium channel activity 7 52 13.5 1.09E-03 GO:0005246 calcium channel regulator activity 4 8 50.0 2.00E-03 GO:0005245 voltage-gated calcium channel activity 4 12 33.3 7.09E-03 GO:0016247 channel regulator activity 4 13 30.8 8.62E-03 GO:0043167 ion binding 46 2204 2.1 8.83E-03 GO:0046872 metal ion binding 45 2157 2.1 1.06E-02 GO:0043169 cation binding 45 2177 2.1 1.25E-02 GO:0008066 glutamate receptor activity 4 17 23.5 1.61E-02 GO:0005886 plasma membrane 61 1375 4.4 1.32E-11 GO:0031224 intrinsic to membrane 77 2367 3.3 2.20E-09 GO:0045202 synapse 21 220 9.5 5.97E-08 GO:0044456 synapse part 15 143 10.5 7.11E-06 GO:0016021 integral to membrane 67 2262 3.0 1.10E-05 GO:0031225 anchored to membrane 12 106 11.3 6.04E-05 GO:0044459 plasma membrane part 33 831 4.0 2.46E-04 Cellular GO:0030054 cell junction 17 272 6.3 4.13E-04 Component GO:0042734 presynaptic membrane 6 21 28.6 4.61E-04 GO:0045211 postsynaptic membrane 9 70 12.9 4.62E-04 GO:0034703 cation channel complex 7 48 14.6 2.64E-03 GO:0005887 integral to plasma membrane 13 205 6.3 3.37E-03 GO:0043005 neuron projection 12 182 6.6 4.33E-03 GO:0034702 ion channel complex 8 76 10.5 4.16E-03 GO:0031226 intrinsic to plasma membrane 13 214 6.1 4.01E-03 GO:0014069 postsynaptic density 5 37 13.5 3.85E-02 GO:0006811 ion transport 24 360 6.7 1.23E-06 GO:0007268 synaptic transmission 15 124 12.1 1.12E-06 GO:0030001 metal ion transport 19 228 8.3 1.19E-06 GO:0019226 transmission of nerve impulse 16 157 10.2 1.50E-06 GO:0007267 cell-cell signaling 16 165 9.7 2.38E-06 GO:0050877 neurological system process 20 284 7.0 3.25E-06 GO:0006812 cation transport 19 275 6.9 9.59E-06 GO:0044057 regulation of system process 13 118 11.0 1.34E-05 GO:0006816 calcium ion transport 10 72 13.9 8.28E-05 GO:0015674 di-, tri-valent inorganic cation transport 10 92 10.9 5.95E-04 GO:0050804 regulation of synaptic transmission 9 70 12.9 5.56E-04 GO:0046903 secretion 11 118 9.3 5.54E-04 GO:0048878 chemical homeostasis 14 205 6.8 5.46E-04 GO:0051969 regulation of transmission of nerve impulse 9 73 12.3 6.00E-04 GO:0006873 cellular ion homeostasis 12 152 7.9 7.05E-04 GO:0031644 regulation of neurological system process 9 76 11.8 7.09E-04 GO:0055082 cellular chemical homeostasis 12 156 7.7 7.96E-04 GO:0032940 secretion by cell 10 103 9.7 8.34E-04 GO:0006813 potassium ion transport 9 85 10.6 1.37E-03 GO:0050801 ion homeostasis 12 168 7.1 1.35E-03 GO:0006887 exocytosis 8 66 12.1 1.81E-03 Biological GO:0019725 cellular homeostasis 13 209 6.2 1.96E-03 GO:0030182 neuron differentiation 13 221 5.9 3.21E-03 Process GO:0015672 monovalent inorganic cation transport 11 158 7.0 3.47E-03 GO:0006836 neurotransmitter transport 7 52 13.5 3.34E-03 GO:0042391 regulation of membrane potential 8 79 10.1 4.63E-03 GO:0007610 behavior 12 200 6.0 4.86E-03 GO:0048666 neuron development 11 170 6.5 5.44E-03 GO:0031175 neuron projection development 10 141 7.1 6.04E-03 GO:0007186 G-protein coupled receptor protein signaling pathway 12 207 5.8 5.91E-03

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 24 Polymenidou et al SUPPLEMENTARY INFORMATION

GO:0042592 homeostatic process 15 329 4.6 8.13E-03 GO:0051899 membrane depolarization 5 26 19.2 1.27E-02 GO:0003001 generation of a signal involved in cell-cell signaling 6 48 12.5 1.56E-02 GO:0006874 cellular calcium ion homeostasis 6 52 11.5 2.19E-02 GO:0055074 calcium ion homeostasis 6 54 11.1 2.53E-02 GO:0055066 di-, tri-valent inorganic cation homeostasis 7 82 8.5 2.75E-02 GO:0050905 neuromuscular process 5 33 15.2 2.76E-02 GO:0007626 locomotory behavior 8 113 7.1 2.77E-02 GO:0048812 neuron projection morphogenesis 8 114 7.0 2.84E-02 GO:0006875 cellular metal ion homeostasis 6 57 10.5 2.82E-02 GO:0048667 cell morphogenesis involved in neuron differentiation 8 115 7.0 2.84E-02 GO:0055065 metal ion homeostasis 6 59 10.2 3.14E-02 GO:0007611 learning or memory 6 60 10.0 3.30E-02 GO:0007612 learning 5 38 13.2 3.94E-02 GO:0007628 adult walking behavior 4 19 21.1 4.33E-02 *The p-value indicated is corrected for multiple testing using the Benjamini-Hochberg method.

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 25 Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Table 5 TDP-43 binding and regulation on transcripts encoding for proteins involved in RNA metabolism

Number of CLIP-seq clusters RNA-seq Splicing changes Change upon TDP-43 Refseq/mRNA in in in RPKM in RPKM in Ratio Z Gene symbol Protein Total in 3'UTR depletion (exon identifier 5'UTR introns exons TDP-43 KD control KD/control score coordinates) More exclusion A2bp1/Fox1 Ataxin-2 binding protein 1 AY659958 309 1 308 0 0 52.07 53.16 0.98 -0.02 (chr16:7307308-7307361) Adarb1 Adenosine deaminase, RNA-specific, B1 NM_001024839 14 0 14 0 0 21.21 27.16 0.78 -1.24 No (ADAR2) Als4 Senataxin NM 198033 1 0 1 0 0 13.72 12.97 1.06 0.04 No Atxn2 Ataxin-2 NM 009125 18 0 18 0 0 18.71 15.96 1.17 0.58 No CUG triplet repeat, RNA-binding protein Cugbp1 NM_017368 17 0 17 0 0 34.87 37.92 0.92 -0.35 No 1 Elavl1 ELAV NM 010485 2 0 2 0 0 17.20 16.79 1.02 -0.09 No Elavl2 ELAV-like protein 2 (Hu-antigen B) BC049125 17 0 17 0 0 9.71 11.13 0.87 -0.91 No Elavl3 ELAV-like protein 3 NM 010487 7 0 7 0 0 35.32 35.20 1.00 0.05 No Elavl4 ELAV-like 4 isoform a NM 010488 8 0 8 0 0 14.14 14.16 1.00 -0.23 No Elp3 Elongation protein 3 homolog AK088457 3 0 3 0 0 15.61 17.54 0.89 -0.81 No Ewsr1 Ewing sarcoma homolog AK147909 8 0 6 0 2 44.86 48.66 0.92 -0.31 No Fus Fus/TLS protein BC011078 4 0 0 3 1 92.88 129.56 0.72 -1.46 No Fusip1 FUS interacting protein AK169071 3 0 1 0 2 16.64 17.47 0.95 -0.46 No Mbnl1 Muscleblind-like 1 NM 020007 19 0 19 0 0 39.54 34.50 1.15 0.69 No Mbnl2 Muscleblind-like 2 AK164372 14 0 14 0 0 68.83 66.40 1.04 0.24 No Nova1 Neuro-oncological ventral antigen 1 NM 021361.1 15 0 13 0 2 7.62 16.97 0.45 No Nova2 Neuro-oncological ventral antigen 2 NM 001029877. 7 0 6 0 1 15.04 36.85 0.41 No More exclusion Ptbp2 Polypyrimidine tract binding protein 2 NM_019550 3 0 3 0 0 21.64 21.98 0.98 -0.17 (chr3:119716615-119716649) More exclusion Raly hnRNP-associated with lethal yellow NM_023130 9 0 9 0 0 17.42 19.52 0.89 -0.73 (chr2:154551351-154551399) Rbm9/Fox2 Fox-1 homolog AK044929 32 0 32 0 0 45.73 54.32 0.84 -0.72 No Sfrs9 Splicing factor, arginine/serine rich 9 NM 025573 1 0 1 0 0 13.12 12.28 1.07 0.08 No TAF15 RNA polymerase II, TATA box Taf15 binding protein (TBP)-associated factor, AK011843 5 0 5 0 0 26.82 28.03 0.96 -0.22 No 68 kDa Cytotoxic granule-associated RNA More inclusion Tia1 NM_011585 6 0 6 0 0 26.46 21.48 1.23 0.92 binding protein (chr6:86384734-86384767) Coordinates on mm8 mouse genome. *Human orthologue gene.

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 26 Polymenidou et al SUPPLEMENTARY INFORMATION

Supplementary Table 6 TDP-43 binding and regulation on transcripts associated with disease

Number of CLIP-seq clusters RNA-seq Splicing changes Refseq/mRNA RPKM in RPKM in Ratio Change upon TDP-43 Gene symbol Protein Total in 5'UTR in introns in exons in 3'UTR Z score identifier TDP-43 KD control KD/control depletion (exon coordinates) Adenosine deaminase, RNA- Adarb1 (ADAR2)* NM_001024839 14 0 14 0 0 21.21 27.16 0.78 -1.24 No specific, B1 Als2 Alsin AK020484 8 0 8 0 0 6.67 6.94 0.96 -0.55 No Als4 Senataxin NM 198033 1 0 1 0 0 13.72 12.97 1.06 0.04 No Ang1 Angiogenin NM 007447 0 0 0 0 0 21.94 9.78 2.24 3.79 No App Amyloid beta precursor protein AK148439 34 0 29 3 2 219.61 263.72 0.83 -0.77 No Ar Androgen receptor NM_013476 0 0 0 0 0 2.90 2.73 1.06 -0.33 No Atxn1 Ataxin-1 NM 009124 64 0 64 0 0 16.02 19.97 0.80 -1.31 No Atxn10 Ataxin-10 NM 016843 9 0 9 0 0 43.25 40.78 1.06 0.30 No Atxn2 Ataxin-2 NM 009125 18 0 18 0 0 18.71 15.96 1.17 0.58 No Atxn7 Ataxin-7 NM 139227 4 0 3 0 1 6.55 6.70 0.98 -0.45 No Calcium channel, voltage- Cacna1c NM_009781 77 0 77 0 0 5.57 9.56 0.58 -2.65 No dependent, L type Chmp2b Chromatin modifying protein 2B NM 026879 1 0 1 0 0 16.51 12.82 1.29 0.96 No Dctn1 Dynactin 1 AK157867 8 0 8 0 0 40.37 50.11 0.81 -0.93 No Elp3 Elongation protein 3 homolog AK088457 3 0 3 0 0 15.61 17.54 0.89 -0.81 No Fbxo7 F-box only protein 7 BC032153 5 0 5 0 0 9.99 7.35 1.36 0.95 No Fibroblast growth factor 14 Fgf14 NM_207667 85 0 85 0 0 6.99 14.99 0.47 -3.76 No isoform b Sac domain-containing inositol Fig4 NM_133999 3 0 3 0 0 10.36 8.58 1.21 0.50 No phosphatase 3 Fragile X mental retardation Fmr1 NM_008031 0 0 0 0 0 17.03 17.38 0.98 -0.31 No protein 1 Fus/Tls Fus/Tls protein BC011078 4 0 0 3 1 92.88 129.56 0.72 -1.46 No Fxn Frataxin BC058533 0 0 0 0 0 1.57 1.10 1.43 0.41 No Glutamate receptor, ionotropic, Gria2 NM_001039195 18 1 14 0 3 105.96 110.65 0.96 -0.12 No AMPA2 (alpha 2) Glutamate receptor, ionotropic, More exclusion Gria3 NM_016886 18 0 17 0 1 33.82 45.41 0.74 -1.35 AMPA3 (alpha) (chrX:37898880-37914127) Glutamate receptor, ionotropic, Grik2 NM_010349 52 0 52 0 0 17.00 27.85 0.61 -2.46 No kainate 2 (beta) Glutamate receptor, ionotropic, More exclusion Grin1 BC039157 3 0 3 0 0 62.95 83.99 0.75 -1.25 NMDA1 (zeta 1) (chr2:25133351-25133414) Glutamate receptor, ionotropic, Grin2a NM_008170 83 0 83 0 0 2.19 4.23 0.52 -2.56 No NMDA2A (epsilon) Glutamate receptor, ionotropic, Grin2c NM_010350 3 0 3 0 0 7.38 11.76 0.63 -2.45 No NMDA2C (epsilon) Grn Progranulin NM 008175 1 0 0 0 1 70.17 23.12 3.04 5.22 no Hdac6 Histone deacetylase 6 NM 010413 3 0 3 0 0 8.98 10.54 0.85 -1.05 No Hdh Huntingtin NM 010414 10 0 9 1 0 8.52 16.55 0.51 -3.29 No Immunoglobulin mu binding Ighmbp2 NM_009212 1 0 1 0 0 3.26 2.98 1.09 -0.19 No protein 2 Calcium-activated potassium channel alpha subunit 1 Kcnma1 (Calcium-activated potassium U09383 96 0 96 0 0 8.34 14.36 0.58 -2.66 No channel, subfamily M, alpha subunit 1) Kif5a Kinesin family member 5A NM_001039000 5 0 2 0 3 261.61 364.81 0.72 -1.45 No Kifap3 Kinesin-associated protein 3 NM 010629 5 0 5 0 0 43.38 50.66 0.86 -0.65 No

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 27 Polymenidou et al SUPPLEMENTARY INFORMATION

Lrrk2 Leucine-rich repeat kinase 2 NM 025730 4 0 4 0 0 11.28 14.49 0.78 -1.41 No Mapt Tau protein NM 001038609 10 0 10 0 0 80.18 97.03 0.83 -0.80 No Mecp2 Methyl CpG binding protein 2 AK158664 4 0 4 0 0 29.52 30.32 0.97 -0.15 No Mjd/Atxn3 Mjd/Ataxin-3 protein BC087880 3 0 3 0 0 9.65 8.49 1.14 0.21 No Nefl Neurofilament, light polypeptide NM 010910 1 0 0 0 1 40.04 61.29 0.65 -1.88 No Nlgn1 Neuroligin 1 NM 138666 102 0 101 1 0 8.55 16.28 0.53 -3.19 No Neurexin-I-alpha precursor Nrxn1 AB093249 162 0 162 0 0 18.66 33.28 0.56 -2.76 No (Neurexin I-alpha) More exclusion Nrxn2 Hypothetical Laminin G AK163904 28 0 27 0 1 41.94 45.65 0.92 -0.34 (chr19:6513715-6513805) More inclusion Nrxn3 Neurexin III BC060719 178 0 178 0 0 16.25 37.23 0.44 -3.93 (chr12:90604171-90604261) Optn Optineurin NM 181848 1 0 1 0 0 6.07 5.06 1.20 0.29 No Park2 Parkin AF250294 39 0 39 0 0 0.63 2.18 0.29 -3.78 No Park7 DJ-1 protein NM 020569 0 0 0 0 0 15.89 14.92 1.06 0.07 No Pink1 PTEN induced putative kinase 1 NM 026880 0 0 0 0 0 81.90 82.18 1.00 0.06 No Pla2g6 Phospholipase A2, group VI NM 016915 0 0 0 0 0 2.70 3.00 0.90 -0.81 No Prnp Prion protein NM 011170 4 0 2 1 1 179.48 153.18 1.17 0.81 No Psen1 Presenilin 1 NM 008943 4 1 3 0 0 18.45 16.55 1.11 0.34 No Psen2 Presenilin 2 AF038935 1 0 1 0 0 4.09 3.29 1.24 0.23 No Slc1a2/Glt1 (EAAT2)* Solute carrier family 1 NM 011393 28 0 26 1 1 139.06 156.13 0.89 -0.46 No Slc1a3 (EAAT1)* Solute carrier family 1 NM 148938 2 0 2 0 0 134.38 97.73 1.37 1.55 No Slc1a6 (EAAT4)* Solute carrier family 1 NM 009200 3 0 3 0 0 1.27 1.32 0.96 -0.63 No Smn1 Survival motor neuron protein BC045158 0 0 0 0 0 6.40 5.76 1.11 0.01 No Snca Synuclein, alpha AK014472 3 0 2 0 1 37.26 42.97 0.87 -0.63 No Sncb Synuclein, beta NM 033610 3 1 1 0 1 54.68 68.33 0.80 -0.95 No Sod1 Superoxyde dismutase 1 BC057074 0 0 0 0 0 49.95 45.48 1.10 0.50 No More inclusion Sort1 Sortilin 1 NM_019972 8 0 8 0 0 75.67 86.65 0.87 -0.55 (chr3:108483527-108483626) Spg11 Spatacsin BC019404 4 0 4 0 0 4.75 4.26 1.12 -0.04 No Tbp TATA binding protein BC016476 0 0 0 0 0 7.33 6.06 1.21 0.35 No Tnrc15/Gigyf2 Tnrc15 protein BC027137 15 1 14 0 1 14.87 13.80 1.08 0.12 No Ubiquitin protein ligase E3A Ube3a NM_173010 3 0 3 0 0 23.19 19.27 1.20 0.80 No isoform 1 Ubiquitin carboxy-terminal Uchl1 NM_011670 0 0 0 0 0 93.43 90.90 1.03 0.20 No hydrolase L1 Vesicle-associated membrane Vapb NM_019806 7 2 5 0 2 23.29 26.90 0.87 -0.73 No protein, associated Vcp Valosin containing protein NM_009503 1 0 1 0 0 11.57 9.91 1.17 0.43 No Z-scores between -1.96 and 1.96 represent p-values greater than 0.05. Coordinates on mm8 mouse genome. *Human orthologue gene.

Nature Neuroscience:nature neuroscience doi:10.1038/nn.2779 28