bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 2 Functions of Gtf2i and Gtf2ird1 in the developing brain: transcription, DNA-binding, 3 and long term behavioral consequences. 4 5 6 Authors: 7 Nathan D. Kopp1,2, Kayla R. Nygaard1,2, Katherine B. McCullough1,2, Susan E. Maloney2,3, 8 Harrison W. Gabel4, Joseph D. Dougherty1,2,3,* 9 10 11 Affiliations: 12 13 1Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA 14 2Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA 15 3Intellectual and Developmental Disabilities Research Center, Washington University School 16 of Medicine, St. Louis, MO, USA 17 4Department of Neuroscience, Washington University School of Medicine, St Louis MO, USA 18 19 Contact Information: 20 Dr. Joseph Dougherty 21 Department of Genetics 22 Campus Box 8232 23 4566 Scott Ave. 24 St. Louis, MO. 63110-1093 25 P: 314-286-0752 26 F: 314-362-7855 27 E: [email protected] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

1 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 2 Abstract 3 4 Gtf2ird1 and Gtf2i may mediate aspects of the cognitive and behavioral phenotypes

5 of (WS) – a microdeletion syndrome encompassing these

6 transcription factors (TFs). Knockout mouse models of each TF show behavioral

7 phenotypes. Here we identify their genomic binding sites in the developing brain,

8 and test for additive effects of their mutation on transcription and behavior. Both

9 TFs target constrained chromatin modifier and synaptic , including a

10 significant number of ASD genes. They bind promoters, strongly overlap CTCF

11 binding and TAD boundaries, and moderately overlap each other, suggesting

12 epistatic effects. We used single and double mutants to test whether mutating both

13 TFs will modify transcriptional and behavioral phenotypes of single Gtf2ird1

14 mutants. Despite little difference in DNA-binding and transcriptome-wide

15 expression, Gtf2ird1 mutation caused balance, marble burying, and conditioned fear

16 phenotypes. However, mutating Gtf2i in addition to Gtf2ird1 did not further modify

17 transcriptomic or most behavioral phenotypes, suggesting Gtf2ird1 mutation alone

18 is sufficient.

19

20

21

22

23

24

25

2 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1

2 Introduction

3

4 The Williams syndrome critical region (WSCR) contains 28 genes that are

5 typically deleted in Williams syndrome (WS) (OMIM#194050). The genes in this

6 region are of interest for their potential to contribute to the unique physical,

7 cognitive, and behavioral phenotypes of WS, which include craniofacial

8 dysmorphology, mild to severe intellectual disability, poor visuospatial cognition,

9 balance and coordination problems, and a characteristic hypersocial personality (1–

10 3). Single knockout mouse models exist for many of the genes in the region,

11 with differing degrees of face validity for WS phenotypes (4–9). Two genes have

12 been highlighted in the human and mouse literature as playing a large role in the

13 social and cognitive tasks: Gtf2i and Gtf2ird1. Mouse models of each gene have

14 shown social phenotypes as well as balance and anxiety phenotypes (4,8–12). Since

15 evidence shows that each gene affects similar behaviors, we set out to test the

16 hypothesis that simultaneous knockdown of both genes would cause more severe

17 phenotypes. Investigating the genes together, rather than individually, could

18 provide a more complete understanding of how the genes in the WSCR contribute to

19 the phenotypes of WS.

20 Gtf2i and Gtf2ird1 are part of the General 2i family of

21 genes. A third member, Gtf2ird2, is located in the WSCR that is variably deleted in

22 WS patients with larger deletions (13). This gene family arose from gene duplication

23 events, resulting in high between the genes (14). The defining

3 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 feature of this gene family is the presence of the helix-loop-helix I repeats, which are

2 involved in DNA and protein binding (15). Gtf2i’s roles include regulating

3 transcriptional activity in the nucleus. However, this multifunctional transcription

4 factor also exists in the cytoplasm where it conveys messages from extracellular

5 stimuli and regulates calcium entry into the cell (16,17). So far, Gtf2ird1 has only

6 been described in the nucleus of cells and is thought to regulate transcription and

7 associate with chromatin modifiers (18). The DNA binding of these two

8 transcription factors (TFs) has only been studied in ES cells and embryonic

9 craniofacial tissue. They recognize similar and disparate genomic loci, suggesting

10 the interact to regulate specific regions of the genome (19,20). However,

11 binding of these TFs has not been studied in the developing brain, which could

12 provide more relevant insight on how the General Transcription Factor 2i family

13 contributes to cognitive and behavioral phenotypes.

14 We performed ChIP-seq on GTF2I and GTF2IRD1 in the developing mouse

15 brain to define where these TFs bind and then tested the downstream consequences

16 of disrupting this binding. We used the CRISPR/Cas9 system to make a mouse model

17 with a mutation in Gtf2ird1 alone and a mouse model with mutations in both Gtf2i

18 and Gtf2ird1 to test how adding a Gtf2i mutation modifies the effects of Gtf2ird1

19 mutation. We showed the mutation in Gtf2ird1 resulted in the production of an N-

20 truncated protein that disrupts the binding of GTF2IRD1 at the Gtf2ird1 promoter

21 and deregulates the transcription of Gtf2ird1, moderately decreasing protein levels

22 so homozygous mutants approximate levels expected from hemizygosity of the

23 WSCR. While there are mild consequences of the mutation on genome-wide

4 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 transcription, the mutant mouse exhibited clear balance, marble burying, and

2 conditioned fear differences. Comparing the single gene mutant to the double

3 mutant did not reveal more severe transcriptional changes and behavioral

4 phenotypes, however adding Gtf2i on top of a background of two Gtf2ird1 mutations

5 resulted in abnormal pre-pulse inhibition and cued fear conditioning. This suggests

6 Gtf2ird1 drives the majority of the phenotypes observed in current studies, and total

7 protein level, N-terminal truncation, or both have functional consequences on DNA-

8 binding and behavior.

9

10 Results

11

12 Gtf2i and Gtf2ird1 bind at active promoters and conserved sites

13

14 The paralogous transcription factors, GTF2I and GTF2IRD1, have been

15 implicated in the behavioral phenotypes seen in humans with WS as well as mouse

16 models (4,8,11,12,21,22). However, the underlying mechanisms by which the

17 General Transcription Factor 2i family acts are not well understood. One approach

18 to begin to identify how these TFs can regulate phenotypes is by identifying where

19 they bind in the genome. This has been done in ES cells and embryonic facial tissue

20 and revealed that both of these transcription factors bind to genes involved in

21 craniofacial development (19). However, these are not relevant tissues when

22 considering phenotypes related to brain development and subsequent behavior. To

23 address this, we performed ChIP-seq for GTF2IRD1 and GTF2I in the developing

5 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 brain at embryonic day 13.5 (E13.5), a time point when both of these proteins are

2 highly expressed (23).

3 We identified 1,410 peaks that were enriched in the GTF2IRD1 IP samples

4 compared to the input (Supplemental Table 1). The GTF2IRD1-bound regions

5 were strikingly enriched in the promoters of genes and along the gene body, more

6 so than would be expected by randomly sampling the genome (Figure 1A) (χ2 =

7 1537.8, d.f. =7, p < 2.2x10-16). The bound peaks were found mostly in H3K4me3

8 bound regions (OR=779.5, p<2.2x10-16 Fisher’s exact test (FET)), suggesting they are

9 in active sites in the genome. While GTF2IRD1-bound regions were also enriched in

10 repressed regions of the genome as defined by H3K27me3 marks (OR=4.090, p<

11 2.2x10-16 FET), only 11% of GTF2IRD1 peaks were found in H3K27me3 regions as

12 opposed to 94% in H3K4me3 regions (Figure 1B), suggesting GTF2IRD1 may have

13 more of a role in activation than repression.

14 To understand the common functions of the genes that have GTF2IRD1

15 bound at the promoter we performed a GO analysis. The top ten results were

16 consistent with the functions previously described for GTF2IRD1, specifically

17 regulation of transcription and chromatin organization, but also identified new

18 categories, such as protein ubiquitination (Figure 1C). To further test these regions

19 for functional consequences, we compared the conservation of GTF2IRD1-bound

20 peaks to a random sample of the genome and found the GTF2IRD1 peaks are more

21 conserved (t=18.131, d.f.=2403, p < 2x10-16) (Figure 1D). We conducted motif

22 enrichment analysis using HOMER to identify other factors that share binding sites

23 with GTF2IRD1 (Figure 1E). The GSC motif, which is similar to the core RGATTR

6 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 motif for GTF2I and GTF2IRD1, was identified in 4.64% of the targets (24).

2 Interestingly, the CTCF motif was found at 11% of the GTF2IRD1 targets. These

3 findings suggest the GTF2i may modulate chromatin organization both through its

4 direct binding at CTCF sites and through regulation of chromatin modifier genes.

5 GTF2I ChIP-seq showed similar results to those of GTF2IRD1. We identified

6 1,755 (Supplemental Table 2) WT GTF2I peaks that had significantly higher

7 coverage in the WT IP compared to the KO IP (Supplemental Figure 1A). These

8 peaks were significantly enriched for promoter regions as well as the gene body

9 when compared to random genomic targets (Figure 2A)(χ2 = 911.63, d.f.=7, p <

10 2.2x10-16). Like GTF2IRD1, the majority of the GTF2I peaks (78.7%) overlapped

11 H3K4me3 peaks (OR=160.98, p< 2.2x10-16 FET), with a smaller subset of peaks

12 (20.7%) overlapping with the H3K27me3 mark (OR 7.022, p<2.2x10-16 FET). This

13 suggests that these peaks are located mainly in active regions of the genome (Figure

14 2B). Summarizing the common functions of these target genes by GO analysis

15 showed enrichment for biological processes such as intracellular signal

16 transduction and phosphorylation (Figure 2C). For example, GTF2I binds within the

17 gene body of the Src gene (Figure 2D), which has been shown to phosphorylate

18 GTF2I in order to activate its transcriptional activity as well as regulate calcium

19 entry into the cell (16,17). The GTF2I binding sites are also significantly more

20 conserved than random sampling of the genome, further suggesting important

21 functional roles of these regions (Figure 2E). Motif enrichment of the GTF2I peaks

22 revealed GC rich binding motifs, such as for the KLF and SP families of transcription

23 factors, as well as Lhx family members. Finally, we see an enrichment of the CTCF

7 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 motif, which is fitting as GTF2I has been shown to help target CTCF to specific

2 genomic regions (25) (Figure 2F).

3

4 Gtf2i and Gtf2ird1 binding sites have distinct features, yet overlap at a subset of

5 promoters

6

7 One way in which GTF2I and GTF2IRD1 can interact is by binding the same

8 sites in the genome. We therefore directly compared the regions bound by these

9 proteins. First, we compared the GTF2I and GTF2IRD1 ChIP peaks and found the

10 pattern of their binding sites is significantly different (χ2 = 282.84, d.f.=7, p < 2.2x10-

11 16) (Figure 3A); while both transcription factors mainly bind in promoters and the

12 gene body, GTF2IRD1 has a higher proportion of peaks at the promoter compared to

13 GTF2I, whereas GTF2I has more peaks at intergenic regions. Interestingly, when we

14 compared them directly to each other, the GTF2IRD1-bound peaks were

15 significantly more conserved than the GTF2I-bound peaks (t=7.81, d.f.=2736.5,

16 p=8.2x10-15) (Figure 3B). Next, to identify common targets, we identified the genes

17 with both transcription factors at their promoter and found a significant overlap of

18 148 genes (p < 1x10-38 FET) (Figure 3C). Motif analysis on the shared peaks showed

19 further enrichment of both CTCF and GSC motifs (Figure 3D).

20

21 The GO functions of the overlapped genes highlight specific roles in synaptic

22 functioning and signal transduction (Figure 3E). Mapk14 is an example of a gene

23 involved in signal transduction that has both GTF2I and GTF2IRD1 bound at its

8 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 promoter (Figure 3F). Shared targets such as this suggest that there are points of

2 convergence where deleting both genes, such as in WS, might result in synergistic

3 downstream impacts.

4 Finally, given the consistent enrichment of CTCF binding sites in both GTF2I

5 and GTF2IRD1 bound regions, we also compared the targets for each transcription

6 factor to CTCF targets in E14.5 whole brain (26). We found a highly significant

7 overlap between GTF2IRD1 and CTCF peaks, with roughly two-thirds of GTF2IRD1

8 binding overlapping with CTCF bound sites (939 shared peaks, OR=89.60, p <

9 2.2x10-16 FET). Similarly, GTF2I shared more peaks in common with CTCF (43%)

10 than we would expect by chance (756 shared peaks, OR=28.16, p < 2.2x10-16 FET).

11 Next, since CTCF has been shown to be present at topological associating domain

12 (TAD) boundaries, we compared the GTF2IRD1 and GTF2I peaks with TAD

13 boundaries determined in E14.5 cortical neurons and found significant overlaps

14 (557 shared peaks, OR =0.16, p < 2.2x10-16 FET and 451 shared peaks, OR=5.19, p <

15 2.2x10-16 FET, respectively).

16 Thus, this TF family is remarkably enriched at promoters of neural genes,

17 overlaps substantially at regions binding the chromatin looping protein CTCF, and is

18 highly enriched a subset of TAD boundaries detected in the developing mouse brain.

19 Overall, these TFs are poised to be important regulators of neural development and

20 thus might regulate other genes associated with developmental diseases.

21

22 Gtf2i and Gtf2ird1 bind promoters of known autism genes.

23

9 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Copy number variations in the WS locus are clearly associated with autism

2 spectrum disorder (27) and recent exome and genome sequencing analyses have

3 identified over 100 more associated loci, mostly containing single genes where loss

4 of function mutations can cause ASD (28). Interestingly, these genes are

5 substantially enriched in chromatin modifiers and transcriptional regulators, just

6 like GTF2IRD1 targets. However, it is unclear how mutations in this wide variety of

7 genes all lead to a common cognitive phenotype. One widely proposed possibility is

8 these genes are part of a functional network and mutation of any one gene might

9 disrupt some core facet of transcriptional regulation, though most of the evidence to

10 support this idea comes from analyses of co-expression across development (29),

11 rather than any measure of functional interaction or targeted regulation. Therefore,

12 we next examined whether GTF2IRD1 and GTF2I are positioned to regulate other

13 ASD genes by binding within them or nearby. We found that GTF2IRD1 targets were

14 highly enriched for ASD genes (OR >3.5, p-value <1.5x10-5 FET) – targeting 19 of the

15 102 genes, including a variety of epigenetic regulators such as DNMT3A, SETD5,

16 NSD1, ADNP and SIN3A (Figure 3G). GTF2IRD1 was generally bound to their

17 promoters or nearby CpG islands, suggesting direct regulation of these genes. GTF2I

18 was likewise enriched at ASD genes, albeit more moderately (OR=2.1, p<.02 FET),

19 targeting 14 genes. Five genes were targeted by both (Figure 3G). Thus our DNA

20 binding data supports prior coexpression data that suggests a subset of ASD genes

21 may be part of an interrelated module of chromatin modifying genes, and is

22 consistent with prior suggestions that this convergence indicates these genes may

23 have similar downstream pathways leading to disease (30).

10 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 It has also been found that genes mutated in ASD tend to be highly

2 constrained in the general human population, with almost no loss of function

3 mutations observed across >140,000 exome sequencing samples (summarized as a

4 pLI score of >.9)(31). Strong constraint, even of heterozygous loss of function

5 mutations, suggests that these genes do not tolerate decreased expression and thus

6 require a very precise amount of transcription for normal human survival or

7 reproductive fitness. We find >30% of GTF2IRD1-bound genes meet this criterion, a

8 result highly unlikely to be due to chance (OR=2.48, p-value<2.2x10-16 FET). GTF2I

9 targets also show a significant, but more modest, enrichment (OR=1.27, p-

10 value<1.1x10-7 FET) (Figure 3G). Together, these results show that these TFs,

11 especially GTF2IRD1, tend to bind genes requiring tight regulation of expression, as

12 indicated by severe phenotypes and intolerance to loss of function mutations.

13

14 Frameshift mutation in Gtf2ird1 results in truncated protein and affects DNA binding

15 at the Gtf2ird1 promoter

16

17 To investigate the functional role of Gtf2ird1 and Gtf2i at these bound sites

18 and understand how these genes interact, we made loss of function models of

19 Gtf2ird1 individually and a double mutant with mutations in both Gtf2i and Gtf2ird1.

20 We designed one gRNA for each gene and injected them simultaneously into FVB/NJ

21 mouse embryos to obtain single and double gene mutations. We first characterized

22 the consequences of a one adenine insertion in exon three of Gtf2ird1. This

23 frameshift mutation introduces a premature stop codon in exon three, an early

11 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 constitutively expressed exon, which we expected to trigger nonsense-mediated

2 decay (Figure 4A). We crossed heterozygous mutant animals to analyze Gtf2i and

3 Gtf2ird1 transcript and protein abundance in heterozygous and homozygous

4 mutants compared to WT littermates (Figure 4B). The western blots and qPCR

5 were performed using the whole brain at E13.5. As expected, the Gtf2ird1 mutation

6 did not affect Gtf2i transcript or protein levels (Figure 4C,D). Contrary to our

7 prediction of nonsense-mediated decay, we observed a 1.74-fold increase in the

8 Gtf2ird1 transcript with each copy of the mutation and a 40% reduction of the

9 protein in homozygous mutants compared to WT with no significant difference

10 between the WT and heterozygous mutants (Figure 4E, F). This suggests the

11 mutation did have an effect on protein abundance and disrupted the normal

12 transcriptional regulation of the gene, with the homozygous mutant modeling

13 protein levels that might be expected in WS.

14 We noticed a slight shift in the homozygous mutant band, which may

15 correspond to loss of the N-terminal end of the protein. Similar results were

16 reported in another mouse model that deleted exon two of Gtf2ird1, in which lower

17 levels of an N-terminally truncated protein was caused by a translation re-initiation

18 event at methionine-65 in exon three (32). The N-terminal end codes for a

19 conserved , which participates in dimerization as well as DNA-binding

20 (32,33). Mutating the leucine zipper was shown to affect binding of the protein to

21 the Gtf2ird1 upstream regulatory (GUR) element located at the promoter of Gtf2ird1.

22 Given the previous findings that GTF2IRD1 negatively autoregulates its own

23 transcription and mutating the leucine zipper affects binding to the GUR, we

12 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 hypothesized that the frameshift mutation diminished the ability of GTF2IRD1 to

2 bind its promoter resulting in increased transcript abundance. We tested this by

3 performing ChIP-qPCR in the E13.5 brain in WT and Gtf2ird1-/- mutants. In the WT

4 brain, GTF2IRD1 IP enriched for the GUR 13-20 times over off-target sequences,

5 which was significantly higher than the GTF2IRD1 IP in the Gtf2ird1-/- brain (Figure

6 4G,H,I). Taken together, nonsense transcripts of Gtf2ird1 with a stop codon in exon

7 3 can reinitiate at a lower level to produce a N-truncated protein with diminished

8 binding capacity at the GUR element.

9

10 Truncated GTF2IRD1 does not affect binding genome wide

11 Given that the one base pair insertion did not result in a full knockout of the

12 protein but did affect its DNA binding capacity at the GUR of Gtf2ird1, we tested

13 whether the mutant was a loss of function for all DNA binding. We performed ChIP-

14 seq in the E13.5 Gtf2ird1-/- mutants and compared it to WT ChIP-seq data. This

15 comparison confirmed the decrease in binding at the TSS of Gtf2ird1, suggesting the

16 mutation has greatly decreased binding at this locus (Figure 4J). Surprisingly, the

17 only peak identified as having differential coverage (FDR < 0.1) between the two

18 genotypes was the peak at the Gtf2ird1 TSS (Figure 4K). This suggests that the

19 frameshift mutation has a very specific consequence on how GTF2IRD1 binds to its

20 own promoter that does not robustly affect its binding elsewhere in the genome.

21 The Gtf2ird1 promoter has two instances of the R4 core motif in the sense direction

22 and one instance of the motif in the antisense orientation. We searched the

23 sequences under the identified peaks for similar orientations of this binding motif

13 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 and found three other peaks, none of which showed any difference in binding

2 coverage between genotypes. However, these three other peaks did not match the

3 exact spacing of the R4 motifs found in the Gtf2ird1 promoter. This suggests that the

4 leucine zipper is important for a specific configuration of binding sites that is only

5 present in this one instance in the genome. It has been shown that the three R4

6 motifs are present in the same orientation and spacing across great evolutionary

7 distances (32).

8

9 Gtf2ird1 frameshift mutation shows mild transcriptional differences

10

11 The N-truncation of GTF2IRD1 clearly affected its binding at the Gtf2ird1

12 promoter and affected expression levels. Although we didn’t see genome-wide

13 binding perturbed, it is possible losing the N terminus, or the decreased protein

14 level, altered the protein’s ability to recruit other transcriptional co-regulators to

15 impact gene expression. Therefore, we tested the effects of this mutation on

16 genome-wide transcription in the E13.5 brain. We compared the whole brain

17 transcriptome of WT littermates to heterozygous and homozygous mutants.

18 Strikingly similar to ChIP-seq data, the only transcript with an FDR < 0.1 was

19 Gtf2ird1, which was affected in the same direction seen in the qPCR results (Figure

20 4L and Supplemental Figure 2A). We leveraged WT ChIP-seq data to test if

21 GTF2IRD1 presence at a promoter correlates with gene expression. Binning the

22 genes according to expression level showed that the distribution of GTF2IRD1

23 targets was different than expected by chance (χ2 = 48.83, d.f.=3 p < 1.42x10-10),

14 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 suggesting highly expressed genes are more likely to have GTF2IRD1 bound at their

2 promoters (Figure 4M). The majority (1262 peaks) of the GTF2IRD1 peaks at a TSS

3 were at expressed genes, with only 410 next to genes not expressed at detectable

4 levels in the E13.5 brain. To see if there was a more subtle general effect below our

5 sensitivity to detect by analysis of single genes, we tested the bound GTF2IRD1

6 targets expressed as a population for a shift in expression. We saw a trend towards

7 significance between the bound and unbound genes, but with a small effect size: a

8 mean increase of 0.014 fold change in GTF2IRD1 targets (Kolgmogorov-Smirnov

9 test D=0.038, p=0.079). While this is perhaps unsurprising because the frameshift

10 mutation did not disturb binding genome wide (Figure 4N), the homozygous

11 mutants do have an overall decrease in protein level of ~50%, which should mimic a

12 WS deletion. Thus, transcriptional consequences of hemizygosity of this gene might

13 be similarly small.

14

15 Frameshift mutation is sufficient to affect behavior

16

17 Although we observed only small differences in DNA binding and overall

18 brain transcription, another Gtf2ird1 model also reported little to no effects of

19 mutation on transcriptome-wide expression in the brain, yet the model still showed

20 behavioral phenotypes (8,34). Therefore, we tested our mutation for downstream

21 consequences on adult mouse behavior (Table 1). There are many single gene

22 knockout models of Gtf2ird1 and each shows distinct behavioral differences which

23 are, in some instances, contradictory (8,9,21,35). One consistent phenotype across

15 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Table 1: Behaviors and sample sizes for Gtf2ird1+/- x Gtf2ird1+/- cross 2 Cohort male female Behavior Experimenter WT Gtf2ird1+/- Gtf2ird1-/- WT Gtf2ird1+/- Gtf2ird1-/- 1-hour locomotor Female 10 21 12 15 16 17 activity Ledge Female 9 21 13 15 17 17 Marble burying Female 8 20 10 14 17 14 Pre-pulse inhibition Female 8 20 11 14 17 15 Conditioned fear Female 9 18 10 15 17 17 Shock sensitivity Female 9 10 11 15 17 17 3

4 models is a deficit in motor coordination, which is also affected in individuals with

5 WS. Similarly, we observed a significant effect of genotype (H2=16.35, p=0.0003), on

6 balance. Heterozygous and homozygous animals fell off a ledge sooner than WT

7 littermates (p=0.0038, p=0.0007, respectively) (Figure 5A). Marble burying has not

8 been reported in other Gtf2ird1 models, but in larger WS models that delete the

9 entire syntenic WSCR or the proximal half of the region containing Gtf2ird1 have

10 shown decreased marble burying in mutants (5,36). We observed a similar

11 significant effect of genotype on the number of marbles buried (F2,80=6.17, p

12 =0.0033), with Gtf2ird1-/- mutants burying fewer marbles than WT mice (p=0.002)

13 (Figure 5B). Reports of overall activity levels in Gtf2ird1 mouse models have been

14 discrepant (9,21). Here we show there is no main effect of genotype (F2,88=1.36,

-8 15 p=0.263) but a time by genotype interaction (F10,440=5.791, p=3.3x10 ) on total

16 distance traveled in a one-hour locomotor task. Though no single time point showed

17 a difference between genotypes when performing post hoc tests, this may reflect an

18 overall difference in habituation to the chamber (Figure 5C). When taking sex into

19 consideration, there was a main effect of sex (F1,85=5.23, p=0.025) and the genotype

-8 20 by time interaction persists (F10,425=5.82, p=3.06x10 ) (Supplemental Figure 3

21 A,B). Time spent in the center of an open field is used as a measure of anxiety-like

16 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 behavior in mice. Anxiety-like behaviors in Gtf2ird1 models have also been

2 discrepant in the literature (8). Here we show that there was no main effect of

3 genotype on center variables in this task (F2,88=0.88, p=0.42) (Figure 5D).

4 Finally, as individuals with WBS also show a high prevalence of phobias,

5 sensitivity to sounds, and learning deficits (3,37), we tested sensory motor gating

6 and learning and memory. Pre-pulse inhibition (PPI) results from a decreased

7 startle to an auditory stimulus when the startle stimulus is preceded by a smaller

8 stimulus. PPI was reduced in animals with the proximal WSCR region deleted, but

9 mice with the distal deletion or full deletion did not have any abnormal phenotype

10 (6). In our study, there was no main effect of genotype on PPI (F2,87=0.24, p=0.79),

11 but a pre-pulse by genotype interaction (F4,174=2.66, p=0.034), which suggests that

12 for some pre-pulse stimuli there is a difference between mutants and WT

13 littermates, however no comparisons survived multiple testing corrections in the

14 post hoc test (Figure 5E).

15 In our assessment of learning and memory with the conditioned fear

16 paradigm, there was a main effect of genotype (F2,83=4.82, p=0.010) and minute

17 (F1,83=9.75, p=0.002) on baseline freezing. Post hoc tests on baseline data showed

18 the heterozygous mutants froze more than homozygous mutants during minute one

19 (p=0.0065). After the baseline, when animals were trained to associate a tone with a

20 footshock, we observed all mice had increased freezing over time (F2,122=26.77,

21 p=2.28x10-10) as expected. (Figure 5E). On the second day, which tested contextual

22 fear memory, all genotypes exhibited a fear memory response as indicated by the

-16 23 significant effect of the context compared to baseline (F1,83=173.20, p<2x10 ). Each

17 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 group froze more during the first two minutes of day two than on day one (WT:

2 p=1.98x10-8, Gtf2ird1+/-: p=4.97x10-11, Gtf2ird1-/-: p<2x10-16) (Supplemental Figure

3 3C). When we analyzed the entire time of the experiment of contextual fear we

4 similarly saw no main effect of genotype (F2,83=2.8946, p=0.061), but a significant

-16 5 effect of time (F7,581=15.05, p<2x10 ) and a time by genotype interaction

6 (F14,581=2.01, p=0.016). Post hoc analysis showed that during minute two the

7 homozygous animals froze significantly more than the Gtf2ird1+/- mutants

8 (p=0.026). Similarly, the homozygous mutants froze more than WT littermates

9 during minutes two (p=0.021) and three (p=0.012), suggesting an increased

10 contextual fear memory response (Figure 5G). On day three of the experiment, we

11 tested cued fear. During day three baseline we saw a difference in freezing between

12 genotypes (F2,83=4.13, p=0.02) as well as a time by genotype interaction (F2,83=4.47,

13 p=0.014), with homozygous mutants freezing more in minute two than WT

14 littermates (p=0.002). All genotypes had a similar response to the tone (F2,83=0.36,

15 p=0.70) (Figure 5H). These differences could not be explained by differences in

16 shock sensitivity (flinch: H2=2.52, p=0.28, escape: H2=3.13, p=0.21, vocalization:

17 H2=2.20, p=0.33) (Supplemental Figure 3D). Thus, mutation of Gtf2ird1 appears to

18 enhance contextual fear learning.

19 Overall, these behavioral analyses show the N-terminal truncation and/or

20 decreased protein levels of the Gtf2ird1 mutant still result in adult behavioral

21 phenotypes, specifically in the domains of balance, marble burying, and fear

22 conditioning. The most severe phenotypes were observed in the homozygous

23 mutants, which may model the haploinsufficiency of WS deletions.

18 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1

2 Generation of a Gtf2i and Gtf2ird1 double mutant

3

4 The evidence of functional consequences from the one base pair Gtf2ird1

5 frameshift mutation led us to characterize a double mutant that was generated

6 during the dual gRNA CRISPR/Cas9 injections. This mutant allowed us to test the

7 effects of knocking out Gtf2i combined with a Gtf2ird1 mutation, and test different

8 Gtf2ird1 mutations for consistency in phenotypes. The double mutant described

9 here has a two base pair deletion in exon five of Gtf2i and a 589bp deletion that

10 encompasses most of exon three of Gtf2ird1 (Figure 6A). We carried out a

11 heterozygous cross of the double mutants to test the protein and transcript

12 abundance of each gene in the heterozygous and homozygous states. The

13 homozygous double mutant is embryonic lethal due to the lack of Gtf2i, which has

14 been described in other Gtf2i mutants (Figure 6B) (4,38). We were, however, able

15 to detect homozygous embryos up to E15.5. Thus, we focused molecular analyses on

16 E13.5 brains. The two base pair deletion in exon five of Gtf2i leads to a premature

17 stop codon resulting in a full protein knockout, and decreases the transcript

18 abundance consistent with degradation of the mRNA due to nonsense-mediated

19 decay (Figure 6C,D). The 589bp deletion in Gtf2ird1 removes all of exon three

20 except the first 14bp. We observed the same increase in transcript abundance that

21 was detected in the one base pair insertion mutation but this mutation had a larger

22

19 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 effect on protein levels across genotype, with homozygous mutants producing a

2 truncated protein at about 10% of WT levels (Figure 6E,F).

3

4 Knocking down both Gtf2i and Gtf2ird1 produces mild transcriptomic changes

5

6 To test if combined mutation of Gtf2i and Gtf2ird1 had a larger effect on the

7 transcriptome, we performed whole brain RNA-seq analysis on WT E13.5 brains

8 and compared them to Gtf2i+/-/Gtf2ird1+/- littermates. Similar to what was seen in

9 the previous Gtf2ird1-/- mutants, there were only mild differences between the

10 transcriptomes (Figure 6G). We also compared WT transcriptomes to the

11 homozygous double mutants, which showed greater differences. However, since

12 these mutants have a very severe phenotype, including neural tube closure defects,

13 any direct transcriptional consequences are probably masked by a large number of

14 indirect effects. Indeed, GO term analysis suggested that overall nervous system

15 development and glial cell differentiation is disrupted (Supplemental Figure 4A,B).

16 We also analyzed GTF2I ChIP-seq data with RNA-seq data. Unlike the enriched

17 binding at highly expressed genes we saw with GTF2IRD1 alone, gene expression

18 levels were not significantly related to GTF2I binding (χ2 = 6.58, d.f.=3 p=0.086)

19 (Figure 6H). This is consistent with a previous report of GTF2I ChIP-seq data.

20 Again, the majority (963 peaks) of the TSS GTF2I peaks were nearby expressed

21 genes, with 458 next to genes not expressed at detectable levels in the E13.5 brain.

22 There is a slight but significant increase in gene expression of genes bound by GTF2I

23 compared to genes that are not (0.0213 logFC, Kolgmogorov-Smirnov test D=0.075,

20 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 p=9.50x10-5) (Figure 6I). Thus, heterozygous Gtf2i and Gtf2ird1 mutation , like

2 Gt2ird1 mutation alone, results in very subtle transcriptional changes.

3 Double mutants show behavioral consequences similar to single Gtf2ird1 mutants

4

5 To test the effects of mutating both Gtf2i and Gtf2ird1 on behavior, we

6 crossed the heterozygous double mutant to the single Gtf2ird1 heterozygous mouse

7 (Figure 7A). This breeding strategy produced four littermate genotypes, WT,

8 Gtf2ird1+/-, Gtf2i+/-/Gtf2ird1+/-, and Gtf2i+/-/Gtf2ird1-/- for direct and well-controlled

9 comparisons. To test for additive effects, the primary comparison was contrasting

10 Gtf2ird1+/- mice to Gtf2i+/-/Gtf2ird1+/- littermates. The remaining genotype, Gtf2i+/-

11 /Gtf2ird1-/-, further tested the effects of heterozygous Gtf2i mutation in the presence

12 of both Gtf2ird1 mutations. To be thorough, we tested protein and transcript

13 abundance of each gene in all four genotypes. As expected, all genotypes with the

14 Gtf2i mutation showed decreased protein and transcript levels. The Gtf2ird1 results

15 reflected what was previously shown for each mutation, however, Gtf2i+/-/Gtf2ird1-/-

16 did not show any further detectable decrease in protein abundance compared to the

17 Gtf2i+/-/Gtf2ird1+/- genotype (Supplemental Figure 5A-D), with both at about 50%

18 of WT levels.

19 We repeated the same behaviors performed on the one base pair Gtf2ird1

20 mutants (Table 2). We saw a similar significant effect of genotype on balance

+/- -/- 21 (H3=10.68, p=0.014), with Gtf2i /Gtf2ird1 mice falling off sooner compared to WT

22 littermates (p=0.025) (Figure 7B). There was no significant difference between the

23 Gtf2ird1+/- and Gtf2i+/-/Gtf2ird1+/- genotypes, suggesting that decreasing the dosage

21 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Table 2: Behaviors and sample sizes for Gtf2ird1+/- x Gtf2i+/-/Gtf2ird1+/- cross 2 Cohort1 male female Behavior Experimenter WT Gtf2ird1+/- Gtf2i+/- Gtf2i+/- WT Gtf2ird1+/- Gtf2i+/- Gtf2i+/- /Gtf2ird1+/- /Gtf2ird1-/- /Gtf2ird1+/- /Gtf2ird1-/- 1-hour Male 8 8 11 7 14 9 9 11 locomotor activity Ledge Male 8 8 12 7 14 9 11 11

Marble Male 8 8 12 7 14 9 11 11 burying Cohort2 Pre-pulse Male 13 6 12 6 11 15 13 12 inhibition Conditioned Male 11 4 9 5 7 13 12 11 fear Shock Male 12 6 10 6 10 15 13 12 sensitivity 3

4 of GTF2I does not strongly modify the Gtf2ird1+/- phenotype. There was a significant

5 effect of genotype on the number of marbles buried (F3,76=2.93, p=0.039). Post hoc

6 analysis showed a significant difference between only Gtf2ird1+/- and Gtf2i+/-

7 /Gtf2ird1-/- littermates (p=0.050) (Figure 7C), with a trend in the same direction as

8 was previously seen in the Gtf2ird1-/- mutants. We saw a main effect of genotype on

9 activity levels in the one-hour locomotor task (F3,69=3.22, p=0.028), but we did not

10 see the same main effect of sex (F1,69=2.29, p=0.14), or a sex by genotype interaction

11 (F3,69=1.82, p=0.15); however we did see a three-way sex by time by genotype

+/- - 12 interaction (F15,345=1.95, p=0.018). The combined sex data showed Gtf2i /Gtf2ird1

13 /- mice travel a greater distance than WT and Gtf2ird1+/- mice at time point 40

14 (Figure 7D). When we looked at the data by sex, we saw a larger effect in females

15 with Gtf2ird1+/- and Gtf2i+/-/Gtf2ird1+/- genotypes intermediate to Gtf2i+/-/Gtf2ird1-/-

16 (Supplemental Figure 5E,F). There was also a main effect of genotype on the time

17 spent in the center of the apparatus (F3,69=3.60, p=0.018) that was not seen in the

22 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 previous Gtf2ird1 cross. Gtf2i+/-/Gtf2ird1-/- mice spent less time in the center during

2 the first ten minutes of the task compared to WTs (p=0.0019) (Figure 7E).

3 Finally, we repeated the PPI and conditioned fear memory tasks using this

4 breeding strategy. In contrast to what was observed for PPI in the Gtf2ird1 cross, we

+/- 5 saw a significant main effect of genotype (F3,84=4.59, p=0.0051). The Gtf2i

6 /Gtf2ird1-/- mice showed an attenuated PPI response especially at the louder pre-

7 pulse stimuli compared to WT littermates (PPI8:p=0.02, PPI16:p=0.018) and

8 Gtf2ird1+/- mice (PPI16:p=0.02) (Figure 7F). On day one of the conditioned fear

9 task, all genotypes showed increased freezing with subsequent foot shocks as

10 expected. WT animals exhibited higher freezing during minute one of baseline, but

11 this difference diminished during minute two (Figure 7G). All animals showed a

12 contextual fear memory response when they were re-introduced to the chamber on

-13 13 day two (F1,68=81.21, p=3.21x10 ) (Supplemental Figure 5G). While there was no

+/- 14 main effect of genotype (F3,68=1.61, p=0.19) (Figure 7H), the Gtf2ird1 and double

15 mutants showed a trend towards increased freezing that was seen in the previous

16 behavior cohort. On day three, when cued fear was tested, there was a significant

17 effect of genotype on the freezing behavior (F3,68=3.17, p=0.030) and a time by

18 genotype interaction (F21,476=1.63, p=0.040). During minute five of the task the

19 Gtf2i+/-/Gtf2ird1-/- mutants froze significantly more than WTs (p=0.030) as did the

20 Gtf2ird1+/- (p=0.024) (Figure 7I). The cued fear phenotype could not be explained

21 by differences in shock sensitivity (Supplemental Figure 5H).

22 By crossing these mutant lines, we tested the hypothesis that the double

23 heterozygous mutant would be more severe than a mutation only affecting Gtf2ird1.

23 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Gtf2ird1+/- and Gtf2i+/-/Gtf2ird1+/- genotypes resulted in mild deficits compared to

2 WTs that, in some cases, were intermediate to the Gtf2i+/-/Gtf2ird1-/- phenotype.

3 There were no instances where the Gtf2ird1+/- or Gtf2i+/-/Gtf2ird1+/- genotypes were

4 significantly different from each other, suggesting that in the behaviors we have

5 tested, the Gtf2i mutation does not modify the effects of a Gtf2ird1 mutation. This

6 unique cross also allowed us to characterize a new mouse line Gtf2i+/-/Gtf2ird1-/-,

7 which had the largest impact on behaviors. The phenotypes of Gtf2i+/-/Gtf2ird1-/-

8 were always in the same direction as the phenotypes in the Gtf2ird1-/- mouse model,

9 but we also saw a significant PPI deficit and cued fear difference when the Gtf2i

10 mutation was added. This further supports that the behaviors tested here, such as,

11 balance, marble burying, and learning and memory are largely affected by

12 homozygous mutations in Gtf2ird1.

13

14 Discussion

15 We have described the in vivo DNA binding sites of GTF2IRD1 and GTF2I in

16 the developing mouse brain. This is the first description of these two transcription

17 factors in a tissue that is relevant for the behavioral phenotypes seen in mouse

18 models of WS. GTF2IRD1 showed a preference for active sites and promoter regions.

19 The conservation of GTF2IRD1 targets was higher on average than would be

20 expected by chance, which provides evidence that these are functionally important

21 regions of the genome. The functions of genes bound by GTF2IRD1 include

22 transcriptional regulation, such as chromatin modifiers, as well as post-translational

23 regulation including protein ubiquitination. A role for GTF2IRD1 in regulating genes

24 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 involved in protein ubiquitination has not been described before. This supports the

2 role of GTF2IRD1 in regulating chromatin by transcriptionally controlling other

3 chromatin modifiers. These data, along with the localization pattern of GTF2IRD1 in

4 the nucleus and its direct association with other chromatin modifiers such as

5 ZMYM5 (18,39), suggests that GTF2IRD1 can exert its regulation of chromatin at

6 several different levels of biological organization. Motif enrichment analysis of

7 GTF2IRD1 peaks indicated that CTCF cobind with GTF2IRD1. Consistent with this,

8 GTF2IRD1 is also often present at TAD boundaries where CTCF is known to be

9 enriched. This further suggests GTF2IRD1 may have a role in defingin chromatin

10 topology. GTF2I has been shown to interact with and target CTCF to specific sites in

11 the genome (25) so it would be interesting to test if GTF2IRD1 has a similar

12 relationship with CTCF.

13 Overall, GTF2I showed a similar preference for promoters and active regions,

14 although it had more intergenic targets than GTF2IRD1, and the conservation of

15 GTF2I peaks was significantly lower than GTF2IRD1 peaks. The genes bound by

16 GTF2I were enriched for signal transduction and phosphorylation. Interestingly,

17 GTF2I was bound to the Src gene body . SRC is known to phosphorylate GTF2I to

18 induce its transcriptional activity (16). Phosphorylation of GTF2I by SRC also

19 antagonizes calcium entry into the cell (17). While knocking out Gtf2i did not affect

20 the expression of Src, it would be interesting to understand the functional

21 consequence of GTF2I binding Src, especially since Src knockout mice exhibit similar

22 behaviors as Gtf2i mouse models (40).

25 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 The overlap of GTF2I and GTF2IRD1 targets was significant, and the target

2 genes were enriched for synaptic activity and signal transduction. This is evidence

3 that these genes can interact via their binding targets to produce cognitive and

4 behavioral phenotypes. To test the difference between combined Gtf2i and Gtf2ird1

5 mutation and mutation of Gtf2ird1 alone, we characterized two new mouse models.

6 We used the CRISPR/Cas9 system to generate multiple mutations in the two genes

7 individually as well as together from one embryo injection. The ease and

8 combinatorial possibilities of this technology will be amenable to testing many

9 unique combinations of genetic mutations in copy number variant regions, which

10 will be important to fully understand the complex relationships of genes in these

11 disorders.

12 We found a frameshift mutation expected to trigger non-sense mediated

13 decay in Gtf2ird1 did not degrade the mRNA but did result in an N-terminal

14 truncation and protein level reduction in the homozygous mutant (23). Even a

15 larger, 589bp deletion of exon three in Gtf2ird1 did not result in mRNA degradation,

16 but did have a larger effect on protein level. This phenomenon of increased Gtf2ird1

17 RNA levels has been seen in at least three other mouse models of Gtf2ird1 (8,23,32).

18 Two of these were made using classic homologous recombination removing either

19 exon two alone or exon two through part of exon five. In both of these models,

20 Gtf2ird1 transcript was still made, but no in vivo protein analysis was done due to

21 poor quality antibodies and the undetectably low protein expression in WT mice.

22 The third model also saw the N-terminal truncation. The presence of an aberrant

23 protein that can still bind the genome, such as the mutant described here, could

26 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 explain the lack of transcriptomic differences in the brain shown here and by others

2 (34). The mutant protein may also still interact with other binding partners and be

3 trafficked to the appropriate genomic loci. This mutation did disrupt the binding of

4 GTF2IRD1 to its own promoter, which resulted in an increase in transcript levels.

5 The property that specifies GTF2IRD1 binding to its own promoter must be unique,

6 as DNA binding genome-wide was not robustly perturbed in the mutant.

7 In the end, GTF2IRD1 has proven to be a remarkably difficult protein to

8 disrupt in a targeted manner – a finding that may modify interpretation of prior

9 studies using a variety of mutant lines, including ours (23). Indeed, it took years to

10 establish a sufficiently sensitive immunoblotting protocol for GTF2IRD1, much less a

11 ChIP-seq protocol to study its binding genome wide. Thus, even in the current study,

12 much of the transcriptional and behavioral characterization was complete prior to

13 discovering that substantial DNA binding remained. Presumably the resiliency of

14 this binding also extends to the mutants we used in our recent test of the sufficiency

15 of Gtf2i family mutants to recapitulate the deletion of the entire locus (23).

16 Therefore, it remains challenging to determine to what extent existing Gtf2ird1

17 exonic mutants model the loss of this gene in WS, where the whole genic locus is

18 deleted. Interestingly, when the whole locus is deleted in the “CD” complete

19 deletion mice, the elevation of Gtf2ird1 mRNA seen in exonic mutants does not occur

20 (23). However, protein levels appear to stay above 50%, albeit with substantial

21 mouse to mouse variation. Post-mortem patient brain samples, if available, may

22 help to resolve the consequences of the human mutation on GTF2IRD1 protein

23 levels. Further, it may be worth revisiting the consequences of Gtf2ird1 mutation

27 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 following generation of full genic deletions in mice. In the meantime, it may be that

2 some of the homozygous point mutations, though different from the heterozygous

3 mutations of WS, may better model WS protein levels as they can result in a

4 reduction of GTF2IRD1 protein (Figure 4). Indeed, most studies of Gtf2ird1 mutant

5 behavioral consequences have shown atypical phenotypes in homozygous mutant

6 mice (8,9,21).

7 It is worth noting that even in those mutants with a 50% reduction in

8 protein, transcriptional changes are very subtle, at least at steady state. This is

9 similar to findings in other chromatin modifying knockouts, where final changes in

10 transcription are highly subtle (41–45), even when behavioral consequences can be

11 severe or even lethal, in the case of Mecp2 mutants Guy et al. Nat Gen. 2001 (46).

12 Thus, a second remaining puzzle about these genes is what their role is in regulating

13 transcription at the majority of their binding sites. One possibility is these genes are

14 essential for regulating the proper dynamics of gene expression, something not

15 captured when assessing a population at steady state. Another possibility is they

16 affect phenotypes via actions in rare cell types not easily detected in whole brain

17 RNA-seq. Both hypotheses await further experimentation. Nonetheless, the strong

18 enrichment of ASD and constrained genes among GTF2I and especially GTF2IRD1

19 targets suggest that these factors may be key regulators. Such functional

20 interactions suggest common pathways across these chromatin-related forms of

21 ASD and intellectual disability.

22 Regardless of the resiliency at the protein level, we show heterozygous and

23 homozygous mutations of Gtf2ird1 were sufficient to cause adult behavioral

28 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 abnormalities. This supports the hypothesis that the N-terminal end of the protein

2 has other important functions beyond DNA binding. Similarly, the N-truncation of

3 GTF2I did not affect DNA-binding genome-wide, but still resulted in behavioral

4 deficits (47). The single Gtf2ird1 homozygous mutant showed balance deficits,

5 which is consistent across many mouse models of WS. We also observed decreased

6 marble burying. This task is thought to be mediated at least in part by hippocampal

7 function, suggesting a possible disruption of the hippocampus caused by this

8 mutation (48). We also observed an increase in contextual fear response, another

9 cognitive task that is thought to be under hippocampal and amygdalar regulation.

10 Increase in contextual fear was also seen in another Gtf2ird1 mouse model (49).

11 Given the prior evidence that these two transcription factors are both

12 involved in cognitive and behavioral phenotypes of WS (7,50), and the evidence that

13 their shared binding targets regulate synaptic genes, we tested if having both Gtf2i

14 and Gtf2ird1 mutated could modify the phenotype seen when just Gtf2ird1 was

15 mutated. Contrary to our prediction, we did not see a large effect of adding a Gtf2i

16 mutation to differences in transcriptome wide expression or behavioral phenotypes.

17 This was also surprising given that we successfully reduced GTF2I protein and it has

18 been described in the literature as regulating transcription (51). Again, whole E13.5

19 brain analysis could diminish any effects of transcriptional differences in specific

20 cell types. This potential confound could be overcome using single cell sequencing

21 technologies in the future when those technologies mature and become more

22 reliable for detecting within cell type differences of expression.

29 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 When Gtf2i was knocked down in the presence of two Gtf2ird1 mutations, we

2 saw phenotypes in the same direction as the homozygous one base pair insertion

3 Gtf2ird1 mutant, as well as significant increases in the cued fear memory task. Thus,

4 the behaviors tested in this study seem to be mainly driven by Gtf2ird1 mutant

5 homozygosity, which is consistent across the different mutations. The one exception

6 seemed to be PPI, in which the knockdown of Gtf2i in the presence of two Gtf2ird1

7 mutations attenuated the effect of the pre-pulse. There was no effect of homozygous

8 Gtf2ird1 mutation alone on this phenotype, suggesting that Gtf2i and not Gtf2ird1 is

9 playing a larger role. However, the more severe phenotype was seen in the Gtf2i+/-

10 /Gtf2ird1-/- mutant compared to Gtf2i+/-/Gtf2ird1+/-, suggesting a contribution from

11 both genes. This does not exclude the possibility that Gtf2i can modify the

12 phenotype of Gtf2ird1 knockdown in other behavioral domains. For example, it

13 would be interesting to see the effect of adding a Gtf2i mutation on top of a Gtf2ird1

14 mutation on social behaviors. However, on the FVB/AntJ background used here and

15 in F1 FVB/AntJ x C57BL/6J crosses used previously, we have not seen any social

16 behavior disruptions when these genes are mutated or the entire WS locus is

17 deleted (23).

18 Likewise, the fear conditioning effects of even heterozygous Gtf2ird1

19 mutation are very clear on the FVB/AntJ background used here, but appear to be

20 masked in F1 FVB x C57BL/6J crosses used in our prior study (23). This indicates

21 that while we did not find evidence for epistasis between Gtf2i and Gft2ird1, there is

22 epistasis between these genes and other loci in the genome. Thus, mouse mapping

23 studies using large cohorts of F2 hybrids might provide an opportunity to leverage

30 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 this strain difference to find genes that interact with Gtf2ird1 to contribute to these

2 phenotypes. Such studies could help define novel interaction partners for this

3 relatively understudied gene.

4 Overall, our study has provided the first description of the DNA-binding of

5 both GTF2I and GTF2IRD1 in the developing mouse brain and showed they have

6 unique and overlapping targets. These data will be used to inform downstream

7 studies to understand how these transcription factors interact with the genome. We

8 generated two new mouse models that tested the importance of the N-terminal end

9 of GTF2IRD1 and the effect of mutating both Gtf2i and Gtf2ird1 together. We

10 provided evidence that despite neither gene having much effect on transcription, the

11 Gtf2ird1 mutation affects balance, marble burying, activity levels and fear memory

12 while adding a Gtf2i mutation leads to a larger effect on PPI.

13

14 Materials and Methods

15 Generating genome edited mice

16

17 We generated Gtf2i family mutants as described in (23). To generate unique

18 combinations of gene knockouts, we designed gRNAs targeting early constitutive

19 exons of the mouse Gtf2i and Gtf2ird1 genes. The gRNAs were separately cloned into

20 the pX330 Cas9 expression plasmid (a gift from F. Zhang) and transfected in N2a

21 cells to test for cutting efficiency. DNA was harvested from the cells and cutting was

22 detected using the T7 endonuclease assay. The gRNAs were transcribed in vitro

23 using the MEGAShortScript kit (Ambion, Austin, TX) and the Cas9 mRNA was in vitro

31 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 transcribed using the mMessageMachine kit (Ambion). The two gRNAs and Cas9

2 mRNA were injected into FVB/NJ mouse embryos and implanted into donor

3 females. The resulting offspring were genotyped for mutations with gene specific

4 primers designed with the Illumina adapter sequences concatenated to their 3’

5 prime end to allow for deep sequencing of the amplicons surrounding the expected

6 cut sites. In one line, a large 589 bp deletion in Gtf2ird1 was detected by amplifying

7 3.5kb that included exon two, exon three and part of intron three, then using a

8 Nextera library prep (Illumina, San Diego, CA) to deep sequence the amplicon. Here

9 we focus on two founder mice obtained from these injections. Founder lines were

10 bred to FVB/AntJ mice to ensure the mutations existed in the germline and, for

11 double mutant founders, on the same . The mice were further back-

12 crossed until the mutations were on a complete FVB/AntJ background, which differs

13 from the FVB/NJ background at two loci: Tyrc-ch, which gives FVB/AntJ a chinchilla

14 coat color, and the 129P2/OlaHSd wildtype (WT) Pde6b allele, which prevents

15 FVB/AntJ from becoming blind in adulthood. Coat color was identified by eye, and

16 the Pde6b gene was genotyped by PCR. These mouse lines are available through the

17 MMRRC (accession numbers pending).

18

19 Western blotting

20

21 Embryos were harvested on embryonic day 13.5 (E13.5) and the whole brain was

22 dissected in cold PBS then flash frozen in liquid nitrogen. The brains were stored at -

23 80°C until they were lysed. The frozen brain was homogenized in 500ul of 1xRIPA

32 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 buffer (10mM Tris HCl pH 7.5, 140mM NaCl, 1mM EDTA, 1% Triton X-100, 0.1%

2 DOC, 0.1% SDS, 10mM Na3V04, 10mM NaF, 1x protease inhibitor (Roche, Basil,

3 Switzerland)) along with 1:1000 dilution of RNase inhibitors (RNasin (Promega,

4 Madison, WI) and SUPERase In (Thermo Fisher Scientific, Waltham, MA). The

5 homogenate was incubated on ice for 20 minutes then spun at 10,000g for 10

6 minutes at 4°C to clear the lysate. The lysate was stored in two aliquots of 100μl at -

7 80°C for later protein analysis and 250μl of the lysate was added to 750μl of TRIzol

8 LS and stored at -80°C for later RNA extraction and qPCR. Total protein was

9 quantified using the BCA assay and 25-50μg of protein in 1x Lamelli Buffer with β-

10 mercaptoethanol was loaded onto 4-15% TGX protean gels (Bio-Rad, Hercules, CA).

11 The protein was transferred to a .2μm PVDF membrane by wet transfer. The

12 membrane was blocked with 5% milk in TBST for one hour at room temperature.

13 The membrane was cut at the 75KDa protein marker; the bottom was probed with a

14 GAPDH antibody as an endogenous loading control, while the top was probed with

15 an antibody for either GTF2I or GTF2IRD1. The primary incubation was performed

16 overnight at 4°C. The membrane was washed three times in TBST for five minutes,

17 then incubated with an HRP conjugated secondary antibody diluted in 5% milk in

18 TBST for one hour at room temperature. The blot was washed three times with

19 TBST for five minutes then incubated with Clarity Western ECL substrate (Bio-Rad)

20 for five minutes. The blot was imaged in a MyECL Imager (Thermo Fisher Scientific).

21 Relative protein abundance was quantified using Fiji (NIH) and normalized to

22 GAPDH levels in a reference WT sample. The antibodies used were: Rabbit anti-

23 GTF2IRD1 (1:500, Novus, NBP1-91973), Mouse anti-GTF2I (1:1000 BD

33 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Transduction Laboratories, Lexington, KY, BAP-135), and Mouse anti-GAPDH

2 (1:10,000, Sigma Aldrich, St. Louis, MO G8795), HRP-conjugated Goat anti Rabbit-

3 IgG (1:2000, Sigma Aldrich, AP307P) and HRP-conjugated Goat anti-Mouse IgG

4 (1:2000, Bio-Rad, 1706516).

5

6 Transcript abundance using RT-qPCR

7

8 RNA was extracted from TRIzol LS using the Zymo Clean and Concentrator-5 kit

9 with on column DNase-I digestion following the manufacturer’s instructions. The

10 RNA was eluted in 30μl of RNase free water and quantified using a Nanodrop 2000

11 (Thermo Fisher Scientific). One microgram of RNA was transcribed into cDNA using

12 the qScript cDNA synthesis kit (Quanta Biosciences, Beverly, MA). Half a microliter

13 of cDNA was used in a 10μl PCR reaction with 500nM of target specific primers and

14 the PowerUP SYBR green master mix (Applied Biosystems, Foster City, CA). The

15 primers were designed to amplify exons that were constitutively expressed in both

16 Gtf2i (exons 25 and 27) and Gtf2ird1 (exons 8 and 9) and span an intron (23). The

17 RT-qPCR was carried out in a QuantStudio6Flex machine (Applied Biosystems)

18 using the following cycling conditions: 1) 95°C for 20 seconds, 2) 95°C for 1 second,

19 3) 60°C for 20 seconds, then repeat steps 2 and 3 40 times. Each target and sample

20 was run in triplicate technical replicates, with three biological replicates for each

21 genotype. The relative transcript abundance was determined using the delta CT

22 method normalizing to Gapdh.

23

34 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 ChIP

2

3 Chromatin was prepared as described previously (23). Frozen brains were

4 homogenized in 10mL of cross-linking buffer (10mM HEPES pH7.5, 100mM NaCl,

5 1mM EDTA, 1mM EGTA, 1% Formaldehyde (Sigma Aldrich)). The homogenate was

6 spun down and resuspended in 5mL of 1x L1 buffer (50mM HEPES pH 7.5, 140 mM

7 NaCl, 1mM EDTA, 1mM EGTA, 0.25% Triton X-100, 0.5% NP40, 10.0% glycerol,

8 1mM BGP (Sigma Aldrich), 1x Na Butyrate (Millipore, Burlington, MA), 20mM NaF,

9 1x protease inhibitor (Roche)) to release the nuclei. The nuclei were spun down and

10 resuspended in 5mL of L2 buffer (10mM Tris-HCl pH 8.0, 200mM NaCl, 1mM BGP,

11 1x Na Butyrate, 20mM NaF, 1x protease inhibitor) and rocked at room temperature

12 for five minutes. The nuclei were spun down and resuspended in 950μl of buffer L3

13 (10mM Tris-HCl pH 8.0, 1mM EDTA, 1mM EGTA, 0.3% SDS, 1mM BGP, 1x Na

14 Butyrate, 20mM NaF, 1x protease inhibitor) and sonicated to a fragment size of 100-

15 500bp in a Covaris E220 focused-ultrasonicator with 5% duty factor, 140 PIP, and

16 200cbp. The sonicated chromatin was diluted with 950ul of L3 buffer and 950μl of

17 3x Covaris buffer (20mM Tris-HCl pH 8.0, 3.0% Triton X-100, 450mM NaCl, 3mM

18 EDTA). The diluted chromatin was pre-cleared using 15μl of protein G coated

19 streptavidin magnetic beads (Thermo Fisher Scientific) for two hours at 4°C. For IP,

20 15μl of protein G coated streptavidin beads were conjugated to either 10μl of

21 GTF2IRD1 antibody (Rb anti-GTF2IRD1, NBP1-91973 LOT:R40410) or 10μl of

22 GTF2I antibody (Rb anti-GTF2I, Bethyl Laboratories, Montgomery, TX, A301-330A)

23 for one hour at room temperature. 80μl of the pre-cleared lysate was saved for an

35 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 input sample. 400μl of the pre-cleared lysate was added to the beads and incubated

2 overnight at 4°C. The IP sample was then washed twice with low salt wash buffer

3 (10mM Tris-HCl pH 8.0, 2mM EDTA, 150mM NaCl, 1.0% Triton X-100, 0.1% SDS),

4 twice with a high salt buffer (10mM Trish-HCl pH 8.0, 2mM EDTA, 500mM NaCl,

5 1.0% Triton X-100, 0.1% SDS), twice with LiCl wash buffer (10mM Tris-HCl pH 8.0,

6 1mM EDTA, 250mM LiCl (Sigma Aldrich), 0.5% NaDeoxycholate, 1.0% NP40), and

7 once with TE buffer (10mM Tris-HCl pH 8.0, 1mM EDTA). The DNA was eluted from

8 the beads with 200μl of 1x TE and 1% SDS by incubating at 65°C in an Eppendorf R

9 thermomixer shaking at 1400rpm. The DNA was de-crosslinked by incubating at

10 65°C for 15 hours in a thermocycler. RNA was removed by incubating with 10ug of

11 RNase A (Invitrogen, Carlsbad, CA) at 37°C for 30 minutes and then treated with

12 140ug of Proteinase K (NEB, Ipswhich, MA) incubating at 55°C in a thermomixer

13 mixing at 900rpm for two hours. The DNA was extracted with 200μl of

14 phenol/chloroform/isoamyl alcohol (Ambion) and cleaned up using the Qiagen PCR

15 purification kit then eluted in 60ul of elution buffer. Concentration was assessed

16 using the high sensitivity DNA kit for Qubit quantification (Thermo Fisher

17 Scientific).

18

19 ChIP-qPCR

20

21 Primers were designed to amplify the upstream regulatory element of Gtf2ird1. Two

22 off target primers were designed: one 10kb upstream of the transcription start site

23 of Bdnf and the other 7kb upstream of the Pcbp3 transcription start site. The input

36 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 sample was diluted 1:3, 1:30, and 1:300 to create a standard curve for each primer

2 set and sample. Each standard, input, and IP sample for each primer set was

3 performed in triplicate in 10μl reactions using the PowerUP SYBR green master mix

4 (Applied Biosystems) and 250nM of forward and reverse primers. The reactions

5 were performed in a QuantStudio6Flex machine (Applied Biosystems) with the

6 following cycling conditions: 1) 50°C for 2 minutes, 2) 95°C for 10 minutes, 3) 95°C

7 15 seconds, 4) 60°C for 1 minute, then repeat steps 3-4 40 times. The relative

8 concentration of the input and IP samples were determined from the standard curve

9 for each primer set. Enrichment of the IP samples was determined by dividing the

10 on target upstream regulatory element relative concentration by the off target

11 relative concentration.

12

13 ChIP-seq

14

15 ChIP seq libraries were prepared using the Swift Accel-NGS 2S plus DNA library

16 prep kits with dual indexing (Swift Biosciences, Ann Arbor, MI). The final libraries

17 were enriched by 13 cycles of PCR. The libraries were sequenced by the Genome

18 Technology Access Center at Washington University School of Medicine on a

19 HiSeq3000 producing 1x50 reads.

20 Raw reads were trimmed of adapter sequences and bases with a quality

21 score below 25 using the Trimmomatic Software (52). The trimmed reads were

22 aligned to the mm10 genome using the default settings of bowtie2 (53). Reads with

23 a mapping quality of less than 10 were removed. Picard tools was used to remove

37 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 duplicates from the filtered reads (http://broadinstitute.github.io/picard). Macs2

2 was used to call peaks on the WT IP, Gtf2ird1-/- IP, and Gtf2i-/-/Gtf2ird1-/- IP samples

3 with the corresponding sample’s input as the control for each biological replicate

4 (54). Macs2 used an FDR of 0.01 as the threshold to call a significant peak. High

5 confidence peaks were those peaks that had some overlap within each biological

6 replicate for each genotype using bedtools intersect (55). The read coverage for the

7 WT high confidence peaks was determined using bedtools coverage for all

8 genotypes. To identify peaks with differential coverage, we used EdgeR to compare

9 the WT peak coverage files to the corresponding mutant peak coverage; differential

10 peaks were defined as having an FDR < 0.1 (56). To determine GTF2I high

11 confidence peaks, we used peaks that had overlap between all four biological

12 replicates and WT peaks with an FDR < 0.1 and log2FC > 0 when compared to the

13 Gtf2i-/-/Gtf2ird1-/- IP coverage, since this mutation represents a full knockout of the

14 protein.

15 Annotations of peaks and motif analysis was performed using the HOMER

16 software on the high confidence peaks (57). Peaks were annotated at the

17 transcription start (TSS) of genes if the peak overlapped the +2.5kbp or -1kbp of the

18 TSS using a custom R script. GO analysis on the ChIP target genes was performed

19 using the goseq R package. Comparison to ASD genes entailed testing for overlap

20 between ChIP target genes and the genes identified by (28), their table S4, using

21 Fisher’s exact test and assuming 22,007 protein coding genes in the genome based

22 on current NCBI annotation. Slightly more significant results were obtained when

23 replicating this analysis using the SFARI gene database (58), accessed 8/4/2019,

38 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 and testing for enrichment of score 1 and 2 ASD genes. We also analyzed pLI scores,

2 downloaded from Gnomad (31) on 9/16/2019, of ChIP peaks. Similar results were

3 obtained using pLI cutoffs of .9 and .99. For epigenetic overlaps, we used E13.5

4 H3K4me3 and E13.5 H3K27me3 forebrain narrow bed peak files from the mouse

5 ENCODE project to overlap with our peak datasets (59). Deeptools was used to

6 generate bigwig files normalized to the library size for each sample by splitting the

7 genome into 50bp overlapping bins (60). Deeptools was used to visualize the ChIP-

8 seq coverage within the H3K4me3 and H3K27me3 peak regions. The LICR TFBS

9 E14.5 whole brain CTCF peak data set was downloaded from the UCSC genome

10 browser and lifted over to mm10 coordinates. TAD analysis of E14.5 cortical

11 neuron HiC data (61) was carried out using domains previously called by arrowhead

12 (Rao et al. Cell 2014, Clemens et al. Mol Cell in press). Domain boundaries were

13 defined as 10kb regions centered on the start and end of each domain and lifted

14 over to mm10 coordinates. Overlaps between GTF2IRD1 and GTF2I peaks and

15 other peak datasets was performed using the bedtools fisher function. PhyloP scores

16 for the region underneath the WT ChIP-seq peaks and random genomic regions of

17 the same length were retrieved using the UCSC table browser 60 Vertebrate

18 Conservation PhyloP table. The Epigenome browser was used to visualize the ChIP-

19 seq data as tracks (62).

20

21

22 RNA-seq

23

39 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 One microgram of E13.5 whole brain total RNA extracted from TRIzol LS was used

2 as input for rRNA depletion using the NEBNext rRNA Depletion Kit

3 (Human/Mouse/Rat). The rRNA-depleted RNA was used as input for library

4 construction using the NEBNext Ultra II RNA library prep kit for Illumina. The final

5 libraries were indexed and enriched by PCR using the following thermocycler

6 conditions: 1) 98°C for 30 seconds, 2) 98°C 10 seconds, 3) 65°C 75 seconds, 4) 65°C

7 5 minutes, 5) hold at 4°C, repeating steps 2-3 six times. The libraries were

8 sequenced by the Genome Technology Access Center at Washington University

9 School of Medicine on a HiSeq3000 producing 1x50 reads.

10

11 RNA-seq analysis

12

13 The raw RNA-seq reads were trimmed of Illumina adapters and bases with quality

14 scores less than 25 using Trimmomatic Software. The trimmed reads were aligned

15 to the mm10 mouse genome using the default parameters of STARv2.6.1b (63). We

16 used HTSeq-count to determine the read counts for features using the Ensembl

17 GRCm38 version 93 gtf file (64). Differential gene expression analysis was done

18 using EdgeR. We compared the expression of genes that are targets of either

19 GTF2IRD1 or GTF2I to non-bound genes by generating a cumulative distribution

20 plot of the average log CPM of the genes between genotypes. GO analysis was

21 performed using the goseq R package.

22

23 Data Availability

40 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1

2 All RNA-seq and ChIP-seq data is available at GEO accession GSE138234.

3

4

5 Behavioral tasks

6

7 All animal testing was done with approval from the Washington University in St.

8 Louis Institutional Animal Care and Use Committee. Mice were group housed in

9 same-sex, mixed-genotype cages with two to five mice per cage in standard mouse

10 cages (dimensions 28.5 x 17.5 x 12 cm) on corn cob bedding. Mice had ad libitum

11 access to food and water and followed a 12-hour light-dark cycle (light from

12 6:00am-6:00pm). Animal housing rooms were kept at 20-22°C with a relative

13 humidity of 50%. All mice were maintained on the FVB/AntJ (65) background from

14 Jackson Labs. All behaviors were done in adulthood between ages P58-P133. A week

15 prior to behavioral testing mice were handled by the experimenter for habituation.

16 On testing days the mice were moved to habituate to the testing room and the

17 experimenter if male for 30 minutes before testing started.

18

19 Ledge

20

21 To test balance, we measured how long a mouse could balance on an acrylic ledge

22 with a width of 0.5cm and a height of 38cm as described (65). The time when the

23 mouse fell off the ledge was recorded up to 60 seconds when the trial was stopped.

41 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 If the mouse fell off within five seconds, the time was restarted and the mouse

2 received another attempt. If the mouse fell off within the first five seconds on the

3 third attempt, that time was recorded. We tested all mice on the ledge twice, with a

4 rest period of at least 20 minutes between trials. Trial average was used for analysis.

5

6 One-hour locomotor activity

7

8 We assessed activity levels in a one-hour locomotor task, as previously described

9 (65). Mice were placed in the center of a standard rat-sized cage (dimensions 47.6 x

10 25.4 x 20.6cm). The rat-sized cage was located inside a sound-attenuating box with

11 white light at 24 lux. The mice could freely explore the cage for one hour. An acrylic

12 lid with air holes was placed on top of the cage to prevent mice from jumping out.

13 The position and horizontal movement of the mice was tracked using ANY-maze

14 software (Stoelting Co., Wood dale, IL: RRID: SCR_014289). The apparatus was

15 divided into two zones: a 33 x 11cm center zone and an edge zone of 5.5cm that

16 bordered the cage. The animal was considered in a zone if 80% of the mouse was

17 detected in that zone. ANY-maze recorded the time, distance, and number of entries

18 into each zone. After the task, the mouse was returned to its home cage and the

19 apparatus was thoroughly cleaned with 70% ethanol.

20

21 Marble burying

22

42 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Marble burying is a species-specific task that measures the compulsive digging

2 behavior of mice. Normal hippocampal function is thought to be required for normal

3 phenotypes in this task. We tested marble burying as described previously (65). A

4 rat cage was filled with aspen bedding to a depth of 3cm and placed in a sound-

5 attenuating box with white light set to 24 lux. Twenty marbles were placed on top of

6 the bedding in a 5 x 4 evenly spaced grid. The experimental mouse was placed in the

7 center of the chamber and allowed to freely explore and dig for 30 minutes. An

8 acrylic lid with air holes was placed on top of the cage to prevent mice from

9 escaping. After 30 minutes, the animal was returned to its home cage. Two scorers

10 counted the number of marbles not buried (less than two-thirds of the marble was

11 covered with bedding). The number of marbles buried was then determined, and

12 the average of the two scores was used in the analysis. After the marbles were

13 counted, the bedding was disposed of and the cage and marbles were cleaned with

14 70% ethanol.

15

16 Pre-pulse inhibition

17

18 Pre-pulse inhibition (PPI) measures the suppression of the acoustic startle reflex

19 when an animal is presented with a smaller stimulus (pre-pulse) before a more

20 intense stimulus (startle). Acoustic startle response and pre-pulse inhibition were

21 measured as previously described (23). Startle Monitor II software was used to run

22 the protocol in a sound-attenuating chamber (Kinder Scientific, LLC, Poway, CA).

23 Mice were secured in a small enclosure and placed on a force-sensitive plate inside

43 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 the chamber. Animals were acclimated to the test chamber for 5 minutes before

2 trials began. Startle amplitude was recorded with 1ms force readings during 65

3 pseudo-randomized trials that alternated between various stimulus conditions. An

4 auditory stimulus of 120dB was presented for 40ms to illicit the startle response.

5 Pre-pulse inhibition was measured in three different types of trials where the

6 120dB stimulus was preceded by pre-pulses of 4, 8, and 16 dB above the

7 background sound level (65dB). PPI was calculated by using the average startle

8 response (measured in Newtons) of the 120dB stimulus trials minus the average

9 startle response to the stimulus after each respective pre-pulse level. This was

10 divided by the 120dB stimulus startle response and multiplied by 100 to get a

11 percent inhibition of startle.

12

13 Contextual and Cued Fear Conditioning

14

15 Learning and memory were tested using the contextual and cued fear conditioning

16 paradigm as previously described (66). Contextual fear memory is thought to be

17 driven by hippocampal functioning whereas cued fear is thought to be driven by

18 amygdala functioning. On day one of the experiment, animals were placed in an

19 acrylic chamber (26cm x 18cm x 18cm; Med Associates Inc., Fairfax, VT) with a

20 metal grid floor and a peppermint odor from an unobtainable source. The chamber

21 light was on for the duration of the five-minute task. During the first two minutes,

22 the animal freely explored the apparatus to measure baseline freezing. At 100

23 seconds, 160 seconds, and 220 seconds, an 80dB white noise tone (conditioned

44 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 stimulus, CS) was played for a duration of 20 seconds. During the last two seconds of

2 this tone, the mice received a 1.0mA foot shock (unconditioned stimulus, UCS). The

3 animal’s freezing behavior was monitored by FreezeFrame software (Actimetrics,

4 Evanston, IL) in 0.75s intervals. Freezing was defined as no movement besides

5 respiration and was used as a measure of the fear response in mice. After the five-

6 minute task, the mice were returned to their home cage. On day two, we tested

7 contextual fear memory. The mice were placed in the same chamber as day one with

8 the peppermint odor, and freezing behavior was measured over eight minutes. The

9 first two minutes of day two were compared to the first two minutes of day one to

10 test for the acquisition of fear memory. The mice were then returned to their home

11 cage. On day three, to test cued fear, the mice were placed in a new black and white

12 chamber that was partitioned into a triangle shape and had a new coconut scent.

13 The mice were allowed to explore the chamber and the first two minutes were

14 considered baseline. After minute two the 80dB tone (CS) was played for the

15 remaining eight minutes. Freezing behavior was monitored during the entire ten-

16 minute task.

17

18 Shock sensitivity

19

20 We tested the shock sensitivity of the mice to ensure that differences in conditioned

21 fear were not due to differing responses to the shock itself, following a previously

22 established protocol (23). Mice were placed in the apparatus used for day one and

23 day two of the conditioned fear task. The mice were delivered a two second 0.05mA

45 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 shock and their behavior was observed. The shock was increased by 0.05mA up to

2 1mA or until the mice exhibited flinching, escape, and vocalization behaviors. Once

3 the mouse had shown each behavior, the test was ended and the final mA used was

4 recorded.

5

6 Statistical Analysis

7

8 All statistical analyses were performed in R v3.4.2 and are reported in Supplemental

9 Table 3. The ANOVA assumption of normality was assessed using the Shapiro-

10 Wilkes test and manual inspection of qqPlots, and the assumption of equal variances

11 was assessed with Levene’s Test. When appropriate, ANOVA was used to test for

12 main effects and interaction terms. Post hoc analyses were done to compare

13 between genotypes. If the data violated the assumptions of ANOVA, non-parametric

14 tests were performed. If the experiment was performed over time, linear mixed

15 models were used to account for the repeated measures of an animal using the lme4

16 R package. Post hoc analyses were then conducted to compare between genotypes

17 within time bins. Post hoc analyses were done using the multcomp R package (67).

18 Animals were removed from analysis if they had a value that was 3.29 standard

19 deviations above the mean or had poor video tracking and could not be analyzed.

20 21 Acknowledgements 22 This work was supported by 1R01MH107515 JDD, and the Autism Science Foundation, and 23 the National Science Foundation Graduate Research Fellowship DGE-1745038 to NDK and 24 DGE-1745038 to KRN. We would like to thank the Genome Technology Access Center for 25 technical support, as well as Dr. Beth Kozel for critical advice on this project. We would also 26 like to thank Dr. David Wozniak and the Animal Behavior Core at the Washington University 27 School of Medicine for their support.

46 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 2 Conflict of Interest Statement 3 None of the authors have any conflict of interest that could bias the work presented here. 4 5 References 6 7 1. Merla, G., Brunetti-Pierri, N., Micale, L., et al. (2010) Copy number variants at 8 Williams–Beuren syndrome 7q11.23 region. Hum Genet, 128, 3–26.

9 2. Meyer-Lindenberg, A., Mervis, C. B. and Faith Berman, K. (2006) Neural 10 mechanisms in Williams syndrome: a unique window to genetic influences 11 on cognition and behaviour. Nat Rev Neurosci, 7, 380–393.

12 3. Korenberg, J. R., Chen, X.-N., Hirota, H., et al. (2000) VI. Genome Structure and 13 Cognitive Map of Williams Syndrome. Journal of Cognitive Neuroscience, 12, 14 89–107.

15 4. Sakurai, T., Dorr, N. P., Takahashi, N., et al. (2011) Haploinsufficiency of Gtf2i, 16 a gene deleted in Williams Syndrome, leads to increases in social 17 interactions. Autism Res, 4, 28–39.

18 5. Segura-Puimedon, M., Sahún, I., Velot, E., et al. (2014) Heterozygous deletion 19 of the Williams–Beuren syndrome critical interval in mice recapitulates most 20 features of the human disorder. Hum. Mol. Genet., 23, 6481–6494.

21 6. Li, H. H., Roy, M., Kuscuoglu, U., et al. (2009) Induced chromosome deletions 22 cause hypersociability and other features of Williams–Beuren syndrome in 23 mice. EMBO Molecular Medicine, 1, 50–65.

24 7. Osborne, L. R. (2010) Animal models of Williams syndrome. Am. J. Med. 25 Genet., 154C, 209–219.

26 8. Young, E. J., Lipina, T., Tam, E., et al. (2008) Reduced fear and aggression and 27 altered serotonin metabolism in Gtf2ird1-targeted mice. Genes, Brain and 28 Behavior, 7, 224–234.

29 9. Howard, M. L., Palmer, S. J., Taylor, K. M., et al. (2012) Mutation of Gtf2ird1 30 from the Williams–Beuren syndrome critical region results in facial 31 dysplasia, motor dysfunction, and altered vocalisations. Neurobiology of 32 Disease, 45, 913–922.

33 10. Schaefer, M. L., Wong, S. T., Wozniak, D. F., et al. (2000) Altered stress- 34 induced anxiety in adenylyl cyclase type VIII-deficient mice. J. Neurosci., 20, 35 4809–4820.

47 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 11. Barak, B., Zhang, Z., Liu, Y., et al. (2019) Neuronal deletion of Gtf2i , 2 associated with Williams syndrome, causes behavioral and myelin 3 alterations rescuable by a remyelinating drug. Neuroscience, 22, 700.

4 12. Martin, L. A., Iceberg, E. and Allaf, G. (2018) Consistent hypersocial behavior 5 in mice carrying a deletion of Gtf2i but no evidence of hyposocial behavior 6 with Gtf2i duplication: Implications for Williams-Beuren syndrome and 7 autism spectrum disorder. Brain Behav, 8, e00895.

8 13. Porter, M. A., Dobson-Stone, C., Kwok, J. B. J., et al. (2012) A Role for 9 Transcription Factor GTF2IRD2 in Executive Function in Williams-Beuren 10 Syndrome. PLOS ONE, 7, e47457.

11 14. Gunbin, K. V. and Ruvinsky, A. (2012) Evolution of General Transcription 12 Factors. J Mol Evol, 76, 28–47.

13 15. Bayarsaihan, D. and Ruddle, F. H. (2000) Isolation and characterization of 14 BEN, a member of the TFII-I family of DNA-binding proteins containing 15 distinct helix–loop–helix domains. PNAS, 97, 7342–7347.

16 16. Cheriyath, V., Desgranges, Z. P. and Roy, A. L. (2002) c-Src-dependent 17 Transcriptional Activation of TFII-I. J. Biol. Chem., 277, 22798–22805.

18 17. Caraveo, G., Rossum, D. B. van, Patterson, R. L., et al. (2006) Action of TFII-I 19 Outside the Nucleus as an Inhibitor of Agonist-Induced Calcium Entry. 20 Science, 314, 122–125.

21 18. Carmona-Mora, P., Widagdo, J., Tomasetig, F., et al. (2015) The nuclear 22 localization pattern and interaction partners of GTF2IRD1 demonstrate a 23 role in chromatin regulation. Hum Genet, 134, 1099–1115.

24 19. Makeyev, A. V., Enkhmandakh, B., Hong, S.-H., et al. (2012) Diversity and 25 Complexity in Chromatin Recognition by TFII-I Transcription Factors in 26 Pluripotent Embryonic Stem Cells and Embryonic Tissues. PLOS ONE, 7, 27 e44443.

28 20. Bayarsaihan, D., Makeyev, A. V. and Enkhmandakh, B. (2012) Epigenetic 29 modulation by TFII-I during embryonic stem cell differentiation. Journal of 30 Cellular Biochemistry, 113, 3056–3060.

31 21. Schneider, T., Skitt, Z., Liu, Y., et al. (2012) Anxious, hypoactive phenotype 32 combined with motor deficits in Gtf2ird1 null mouse model relevant to 33 Williams syndrome. Behavioural Brain Research, 233, 458–473.

34 22. Dai, L., Bellugi, U., Chen, X.-N., et al. (2009) Is it Williams syndrome? 35 GTF2IRD1 implicated in visual–spatial construction and GTF2I in sociability 36 revealed by high resolution arrays. Am. J. Med. Genet., 149A, 302–314.

48 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 23. Kopp, N., McCullough, K., Maloney, S. E., et al. (2019) Gtf2i and Gtf2ird1 2 mutation do not account for the full phenotypic effect of the Williams 3 syndrome critical region in mouse models. Hum Mol Genet.

4 24. Lazebnik, M. B., Tussie-Luna, M. I. and Roy, A. L. (2008) Determination and 5 Functional Analysis of the Consensus Binding Site for TFII-I Family Member 6 BEN, Implicated in Williams-Beuren Syndrome. J. Biol. Chem., 283, 11078– 7 11082.

8 25. Peña-Hernández, R., Marques, M., Hilmi, K., et al. (2015) Genome-wide 9 targeting of the epigenetic regulatory protein CTCF to gene promoters by the 10 transcription factor TFII-I. PNAS, 112, E677–E686.

11 26. Yue, F., Cheng, Y., Breschi, A., et al. (2014) A comparative encyclopedia of 12 DNA elements in the mouse genome. Nature, 515, 355–364.

13 27. Sanders, S. J., Ercan-Sencicek, A. G., Hus, V., et al. (2011) Multiple recurrent de 14 novo CNVs, including duplications of the 7q11.23 Williams syndrome region, 15 are strongly associated with autism. Neuron, 70, 863–885.

16 28. Satterstrom, F. K., Kosmicki, J. A., Wang, J., et al. (2018) Novel genes for 17 autism implicate both excitatory and inhibitory cell lineages in risk. bioRxiv, 18 484113.

19 29. Kopp, N., Climer, S. and Dougherty, J. D. (2015) Moving from capstones 20 toward cornerstones: successes and challenges in applying systems biology 21 to identify mechanisms of autism spectrum disorders. Front Genet, 6.

22 30. Parikshak, N. N., Luo, R., Zhang, A., et al. (2013) Integrative Functional 23 Genomic Analyses Implicate Specific Molecular Pathways and Circuits in 24 Autism. Cell, 155, 1008–1021.

25 31. Karczewski, K. J., Francioli, L. C., Tiao, G., et al. (2019) Variation across 26 141,456 human exomes and genomes reveals the spectrum of loss-of- 27 function intolerance across human protein-coding genes. bioRxiv, 531210.

28 32. Palmer, S. J., Santucci, N., Widagdo, J., et al. (2010) Negative Autoregulation of 29 GTF2IRD1 in Williams-Beuren Syndrome via a Novel DNA Binding 30 Mechanism. J. Biol. Chem., 285, 4715–4724.

31 33. Hinsley, T. A., Cunliffe, P., Tipney, H. J., et al. (2004) Comparison of TFII-I gene 32 family members deleted in Williams-Beuren syndrome. Protein Science, 13, 33 2588–2599.

34 34. O’Leary, J. and Osborne, L. R. (2011) Global Analysis of Gene Expression in 35 the Developing Brain of Gtf2ird1 Knockout Mice. PLoS ONE, 6, e23868.

49 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 35. Tassabehji, M., Hammond, P., Karmiloff-Smith, A., et al. (2005) GTF2IRD1 in 2 Craniofacial Development of Humans and Mice. Science, 310, 1184–1187.

3 36. Borralleras, C., Sahun, I., Pérez-Jurado, L. A., et al. (2015) Intracisternal Gtf2i 4 Gene Therapy Ameliorates Deficits in Cognition and Synaptic Plasticity of a 5 Mouse Model of Williams–Beuren Syndrome. Mol Ther.

6 37. Dykens, E. M. (2003) Anxiety, Fears, and Phobias in Persons With Williams 7 Syndrome. Developmental Neuropsychology, 23, 291–316.

8 38. Enkhmandakh, B., Makeyev, A. V., Erdenechimeg, L., et al. (2009) Essential 9 functions of the Williams-Beuren syndrome-associated TFII-I genes in 10 embryonic development. PNAS, 106, 181–186.

11 39. Widagdo, J., Taylor, K. M., Gunning, P. W., et al. (2012) SUMOylation of 12 GTF2IRD1 Regulates Protein Partner Interactions and Ubiquitin-Mediated 13 Degradation. PLoS ONE, 7, e49283.

14 40. Sinai, L., Ivakine, E. A., Lam, E., et al. (2015) Disruption of Src Is Associated 15 with Phenotypes Related to Williams-Beuren Syndrome and Altered Cellular 16 Localization of TFII-I. eNeuro, 2, ENEURO.0016-14.2015.

17 41. Wade, A. A., Lim, K., Catta-Preta, R., et al. (2019) Common CHD8 Genomic 18 Targets Contrast With Model-Specific Transcriptional Impacts of CHD8 19 Haploinsufficiency. Front. Mol. Neurosci., 11.

20 42. Tudor, M., Akbarian, S., Chen, R. Z., et al. (2002) Transcriptional profiling of a 21 mouse model for Rett syndrome reveals subtle transcriptional changes in the 22 brain. PNAS, 99, 15536–15541.

23 43. Chahrour, M., Jung, S. Y., Shaw, C., et al. (2008) MeCP2, a key contributor to 24 neurological disease, activates and represses transcription. Science, 320, 25 1224–1229.

26 44. Stroud, H., Su, S. C., Hrvatin, S., et al. (2017) Early-Life Gene Expression in 27 Neurons Modulates Lasting Epigenetic States. Cell, 171, 1151-1164.e16.

28 45. Fazel Darbandi, S., Robinson Schwartz, S. E., Qi, Q., et al. (2018) Neonatal 29 Tbr1 Dosage Controls Cortical Layer 6 Connectivity. Neuron, 100, 831- 30 845.e7.

31 46. Guy, J., Hendrich, B., Holmes, M., et al. (2001) A mouse Mecp2 -null mutation 32 causes neurological symptoms that mimic Rett syndrome. Nature Genetics, 33 27, 322–326.

34 47. Lucena, J., Pezzi, S., Aso, E., et al. (2010) Essential role of the N-terminal 35 region of TFII-I in viability and behavior. BMC Medical Genetics, 11, 61.

50 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 48. Deacon, R. M. J., Croucher, A. and Rawlins, J. N. P. (2002) Hippocampal 2 cytotoxic lesion effects on species-typical behaviours in mice. Behavioural 3 Brain Research, 132, 203–213.

4 49. van Hagen, J. M., van der Geest, J. N., van der Giessen, R. S., et al. (2007) 5 Contribution of CYLN2 and GTF2IRD1 to neurological and cognitive 6 symptoms in Williams Syndrome. Neurobiology of Disease, 26, 112–124.

7 50. Botta, A., Novelli, G., Mari, A., et al. (1999) Detection of an atypical 7q11.23 8 deletion in Williams syndrome patients which does not include the STX1A 9 and FZD3 genes. Journal of Medical Genetics, 36, 478–480.

10 51. Grueneberg, D. A., Henry, R. W., Brauer, A., et al. (1997) A multifunctional 11 DNA-binding protein that promotes the formation of serum response 12 factor/homeodomain complexes: identity to TFII-I. Genes Dev., 11, 2482– 13 2493.

14 52. Bolger, A. M., Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible 15 trimmer for Illumina sequence data. Bioinformatics, 30, 2114–2120.

16 53. Langmead, B., Wilks, C., Antonescu, V., et al. (2019) Scaling read aligners to 17 hundreds of threads on general-purpose processors. Bioinformatics, 35, 421– 18 432.

19 54. Zhang, Y., Liu, T., Meyer, C. A., et al. (2008) Model-based Analysis of ChIP-Seq 20 (MACS). Genome Biology, 9, R137.

21 55. Quinlan, A. R. and Hall, I. M. (2010) BEDTools: a flexible suite of utilities for 22 comparing genomic features. Bioinformatics, 26, 841–842.

23 56. Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a 24 Bioconductor package for differential expression analysis of digital gene 25 expression data. Bioinformatics, 26, 139–140.

26 57. Heinz, S., Benner, C., Spann, N., et al. (2010) Simple Combinations of Lineage- 27 Determining Transcription Factors Prime cis-Regulatory Elements Required 28 for Macrophage and B Cell Identities. Molecular Cell, 38, 576–589.

29 58. Abrahams, B. S., Arking, D. E., Campbell, D. B., et al. (2013) SFARI Gene 2.0: a 30 community-driven knowledgebase for the autism spectrum disorders 31 (ASDs). Mol Autism, 4, 36.

32 59. Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., et al. (2012) An 33 encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biology, 13, 34 418.

51 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 60. Ramírez, F., Ryan, D. P., Grüning, B., et al. (2016) deepTools2: a next 2 generation web server for deep-sequencing data analysis. Nucleic Acids Res, 3 44, W160–W165.

4 61. Bonev, B., Cohen, N. M., Szabo, Q., et al. (2017) Multiscale 3D Genome 5 Rewiring during Mouse Neural Development. Cell, 171, 557-572.e24.

6 62. Li, D., Hsu, S., Purushotham, D., et al. (2019) WashU Epigenome Browser 7 update 2019. Nucleic Acids Res, 47, W158–W165.

8 63. Dobin, A., Davis, C. A., Schlesinger, F., et al. (2013) STAR: ultrafast universal 9 RNA-seq aligner. Bioinformatics, 29, 15–21.

10 64. Anders, S., Pyl, P. T. and Huber, W. (2015) HTSeq—a Python framework to 11 work with high-throughput sequencing data. Bioinformatics, 31, 166–169.

12 65. Maloney, S. E., Akula, S., Rieger, M. A., et al. (2018) Examining the 13 Reversibility of Long-Term Behavioral Disruptions in Progeny of Maternal 14 SSRI Exposure. eNeuro, 5.

15 66. Maloney, S. E., Yuede, C. M., Creeley, C. E., et al. (2019) Repeated neonatal 16 isoflurane exposures in the mouse induce apoptotic degenerative changes in 17 the brain and relatively mild long-term behavioral deficits. Scientific Reports, 18 9, 2779.

19 67. Hothorn, T., Bretz, F. and Westfall, P. (2008) Simultaneous Inference in 20 General Parametric Models. Biometrical Journal, 50, 346–363.

21 22 23 24 25 26 Table 1: Behaviors and sample sizes for Gtf2ird1+/- x Gtf2ird1+/- cross 27 Cohort male female Behavior Experimenter WT Gtf2ird1+/- Gtf2ird1-/- WT Gtf2ird1+/- Gtf2ird1-/- 1-hour locomotor Female 10 21 12 15 16 17 activity Ledge Female 9 21 13 15 17 17 Marble burying Female 8 20 10 14 17 14 Pre-pulse inhibition Female 8 20 11 14 17 15 Conditioned fear Female 9 18 10 15 17 17 Shock sensitivity Female 9 10 11 15 17 17 28 29 30 31 32

52 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Table 2: Behaviors and sample sizes for Gtf2ird1+/- x Gtf2i+/-/Gtf2ird1+/- cross 2 Cohort1 male female Behavior Experimenter WT Gtf2ird1+/- Gtf2i+/- Gtf2i+/- WT Gtf2ird1+/- Gtf2i+/- Gtf2i+/- /Gtf2ird1+/- /Gtf2ird1-/- /Gtf2ird1+/- /Gtf2ird1-/- 1-hour Male 8 8 11 7 14 9 9 11 locomotor activity Ledge Male 8 8 12 7 14 9 11 11

Marble Male 8 8 12 7 14 9 11 11 burying Cohort2 Pre-pulse Male 13 6 12 6 11 15 13 12 inhibition Conditioned Male 11 4 9 5 7 13 12 11 fear Shock Male 12 6 10 6 10 15 13 12 sensitivity 3 4 5 Figure Legends 6 7 Figure 1: GTF2IRD1 binds preferentially to promoters in conserved, active

8 sites in the genome. A GTF2IRD1 binding peaks are annotated primarily in

9 promoters and gene bodies. The distribution of peak annotations is significantly

10 different from random sampling of the genome. B GTF2IRD1 peaks were enriched in

11 H3K4me3 sites marking active regions of the genome and to a lesser extent in

12 H3K27me3 sites marking repressed regions. C GO analysis of genes that have

13 GTF2IRD1 bound to the promoter. D The conservation of sequence in GTF2IRD1-

14 bound peaks is significantly higher than expected by chance. E Motifs of

15 transcription factors enriched under Gtf2ird1 bound peaks.

16

17 Figure 2: GTF2I binds at promoters in conserved, active sites in the genome. A

18 GTF2I binding sites are annotated mostly in gene promoters and the gene body. The

19 distribution of peaks is significantly different than would be expected by chance. B

20 GTF2I peaks overlap with H3K4me3 peaks marking active regions and to a lesser

53 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 extent GTF2I peaks fall within H3K27me3 peaks marking inactive regions. C GO

2 analysis of genes that have GTF2I bound at the promoter. D Epigenome browser

3 shot of GTF2I peak bound within the Src gene. E Genomic sequence under GTF2I

4 peaks are more conserved than we would expect by chance. F Motifs of

5 transcription factors that are enriched in GTF2I bound sequences.

6

7 Figure 3: Comparison of GTF2IRD1 and GTF2I binding sites. A GTF2I and

8 GTF2IRD1 have different distributions of annotated binding sites. B GTF2IRD1

9 bound sequences are more conserved than GTF2I bound sequences. C The overlap

10 of genes that have GTF2I and GTF2IRD1 bound at their promoters. D Motifs of

11 transcription factors that are enriched in regions bound by both GTF2I and

12 GTF2IRD1. E GO analysis of genes with both GTF2I and GTF2IRD1 bound at their

13 promoters. F Epigenome browser shot of Mapk14 showing peaks for both GTF2I and

14 GTF2IRD1. G Enrichment of GTF2IRD1 and GTF2I bound genes in ASD and

15 conserved gene sets.

16

17 Figure 4: Frameshift mutation in Gtf2ird1 exon three results in a decreased

18 amount of an N-truncated protein with diminished binding at Gtf2ird1

19 promoter and has little effect on transcription in the brain. A The sequence of

20 exon three of Gtf2ird1 targeted by the underlined gRNA with the PAM sequence in

21 blue. The mutant allele contains a one base pair insertion of an adenine nucleotide

22 that results in a premature stop codon. B Breeding scheme of the intercross of

23 Gtf2ird1+/- to produce genotypes used in the experiments. C, D Mutation in Gtf2ird1

54 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 does not affect the protein or transcript levels of Gtf2i. E Frameshift mutation

2 decreases the amount of protein in Gtf2ird1-/- and causes a slight shift to lower

3 molecular weight. F The abundance of Gtf2ird1 transcript increases with increasing

4 dose of the mutation. G Schematic of Gtf2ird1 upstream regulatory element (GUR)

5 that shows the three GTF2IRD1 binding motifs. The arrows indicate the location of

6 the primers for amplifying the GUR in the ChIP-qPCR assay. H,I WT ChIP of

7 GTF2IRD1 shows enrichment of the GUR over off target regions. There is more

8 enrichment in the WT genotype compared to the Gtf2ird1-/- genotype. J Profile plots

9 of GTF2IRD1 ChIP-seq data confirms diminished binding at the Gtf2ird1 promoter. K

10 Differential peak analysis comparing WT and Gtf2ird1-/- ChIP-seq data showed only

11 the peak at Gtf2ird1 is changed between genotypes with an FDR <0.1. L Differential

12 expression analysis in the E13.5 brain comparing WT and Gtf2ird1-/- showed only

13 Gtf2ird1 as changed with FDR <0.1. M The presence of GTF2IRD1 at gene promoters

14 is not evenly distributed across expression levels. N The expression of genes bound

15 by GTF2IRD1 is not different compared to all other genes between WT and Gtf2ird1-

16 /- mutants.

17

18 Figure 5: Homozygous frameshift mutation in Gtf2ird1 is sufficient to cause

19 behavioral phenotypes. A Homozygous mutants have worse balance than WT

20 littermates in ledge task. B Homozygous mutants bury fewer marbles than WT and

21 heterozygous littermates. C Overall activity levels are not affected but a time by

22 genotype interaction shows the mutant animals are slower to habituate to the novel

23 environment. D There is no difference in time spent in the center of the apparatus

55 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 between genotypes. E All animals show an increase in startle inhibition when given

2 a pre-pulse of increasing intensity. There is no difference between genotypes. F

3 Acquisition phase of fear conditioning paradigm. All animals show the expected

4 increase in freezing to additional foot shocks. G Gtf2ird1-/- animals show an early

5 increased contextual fear memory response compared to WT and heterozygous

6 littermates. H There were no significant differences between genotypes in cued fear.

7

8 Figure 6: Mutating both Gtf2i and Gtf2ird1 does not result in larger differences

9 in brain transcriptomes. A Generation of double mutant. gRNA target is

10 underlined in exon five of Gtf2i with the PAM sequence in blue. The two base pair

11 deletion results in a premature stop codon within exon five. The Gtf2ird1 mutation

12 is a large 589bp deletion covering most of exon three as shown in the IGV browser

13 shot. B Heterozygous intercross to generate genotypes for ChIP and RNA-seq

14 experiments. The homozygous double mutants are embryonic lethal but are present

15 up to E15.5. C The two base pair deletion in Gtf2i decreases the protein by 50% in

16 heterozygous mutants and no protein is detected in the homozygous E13.5 brain. D

17 The mutation decreases the abundance of Gtf2i transcript consistent with nonsense-

18 mediated decay. E The 589bp deletion in Gtf2ird1 leads to decreased protein levels

19 in heterozygous and homozygous mutants. There is still a small amount of protein

20 made in the homozygous mutant. F The 589

21 bp deletion increases the amount of Gtf2ird1 transcript. G Volcano plot comparing

22 the expression in the E13.5 brain of WT and heterozygous double mutants. The

23 highlighted genes represent an FDR <0.1. H The presence of Gtf2i at the promoters

56 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 does not correlate with the expression of a gene. I The fold change of genes between

2 WT and heterozygous double mutants that have GTF2I bound at their promoters

3 were slightly upregulated when compared to the fold change of genes that did not

4 have GTF2I bound.

5

6 Figure 7: Gtf2i does not modify most of the phenotypes of Gtf2ird1 mutation. A

7 Breeding scheme for behavior experiments. B The Gtf2i+/-/Gtf2ird1-/- animals fell off

8 ledge sooner than WT littermates. C There was a main effect of genotype on marbles

9 buried. Post hoc analysis showed that Gtf2i+/-/Gtf2ird1-/- buried fewer marbles than

10 the Gtf2ird1-/- genotype. D Gtf2i+/-/Gtf2ird1-/- had increased overall activity levels in

11 a one-hour activity task. E Gtf2i+/-/Gtf2ird1-/- showed decreased time in the center of

12 the apparatus compared to WT. F All animals show an increased startle inhibition

13 when given a pre-pulse of increasing intensity. The double mutants show less of an

14 inhibition at higher pre-pulse levels compared to WT and Gtf2ird1+/- animals. G All

15 genotypes showed increased freezing with subsequent foot shocks. H All genotypes

16 showed a similar contextual fear response. I There was a main effect of genotype on

17 cued fear with the Gtf2ird1+/- and Gtf2i+/-/Gtf2ird1-/- genotypes showing an increased

18 fear response compared to WT.

19 20

57 bioRxiv preprint not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailable in thegenome. Figure 1:GTF2IRD1bindspreferentiallytopromotersinconserved,activesites E C B A

CPM 0.5 1.0 1.5 2.0 2.5 -0.5Kbp -0.5Kbp 0 WT GTF2IRD1IP negative regulationoftranscr Rep3 peaks Rep2 peaks Rep1 peaks proportion of targets Motif Name GTF2IRD1 H3K4me3 regulation oftranscr ZBTB33 GABPA H3K4me3 region H3K4me3 region OI CNNBRGCGCCCCCTGSTGGC BORIS doi: CTCF ETS1 GSC Otx2 GFX ETS

cellular responsetoDNAdamagestimulus 0.0 0.2 0.4 0.6 0.8 1.0 positive regulationoftranscr regulation of transcription byregulation oftranscription RNApolymerase II

Targets https://doi.org/10.1101/854851

protein ubiquitination Random 5 GGCYTGGCA10E7 0 1.00E-79 AYAGTGCCMYCTRGTGGCCA rep3 rep2 rep1 *** 0.5Kbp 0.5Kbp GTTGGGA .0-60 1.00E-46 GGVTCTCGCGAGAAC Targets TCCCAA10E6 0 1.00E-67 ATTCTCGCGAGA negative by regulationoftranscription RNApolymerase II GGGAH10E0 0.0186 1.00E-03 GGCGGGAAAH ACAGGAAGTG ACGAT10E0 0.0268 1.00E-03 RACCGGAAGT ACGAT10E0 0.0185 1.00E-03 AACCGGAAGT iption, DNA−templated NYTAATCCYB CPM RGGATTAR Consensus 0.5 1.0 1.5 2.0 2.5 -0.5Kbp -0.5Kbp cell cycle iption, DNA−templated Biological ProcessGO:WTGTF2IRD1IP positive by regulationoftranscription RNApolymerase II H3K4me3 10 WT input H3K4me3 region H3K4me3 region TTS promoter−TSS non−coding intron Intergenic exon 5' UTR 3' UTR chromatin organization iption, DNA−templated .0-60 1.00E-56 1.00E-02 .0-40.0017 1.00E-04 .0-50.0001 1.00E-05 -au q-value p-value 0.5Kbp 0.5Kbp −log10(pvalue) 15 D

0.0487 CPM 0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0 2.5 WT GTF2IRD1IP -0.5Kbp -0.5Kbp

PhyloP score under a

H3K27me3 −1 0 1 2 3 4 5 Number ofPeaks ; H3K27me3 region H3K27me3 region with consensus GTF2IRD1 Targets this versionpostedDecember3,2019. 227 101 209 132 161 203 274 133 47 64 20 CC-BY-NC 4.0Internationallicense Percent ofPeakswith 0.5Kbp 0.5Kbp

consensus CPM ***

0.5 1.0 1.5 2.0 2.5 Random Targets 16.46% 15.16% 14.72% 19.87% 11.68% -0.5Kbp -0.5Kbp 7.32% 9.57% 3.41% 9.64% 4.64% 25 H3K27me3 WT input H3K27me3 region H3K27me3 region Percent ofbackground peaks withmotif 12.24% 12.02% 16.45% 4.87% 0.67% 1.70% 2.04% 7.18% 1.79% 2.45% 30 0.5Kbp 0.5Kbp

0.5 1.0 1.5 2.0 2.5 GTF2IRD1 target CPM target GTF2IRD1 . The copyrightholderforthispreprint(whichwas Src 0 F D C A CPM Figure 2:GTF2Ibindsatpromotersinconserved,activesitesthegenome. 60 60 60 60 0 0 0 0 WT_GTF2I_ChIP_rep2 Motif Name WT_GTF2I_ChIP_rep4 WT_GTF2I_ChIP_rep3 WT_GTF2I_ChIP_rep1 Nkx6.1 KLF14 CTCF KLF5 Lhx2 Lhx3 Lhx1 Oct6 Maz Sp1 proportion of targets bioRxiv preprint not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailable positiv

positiv GTF2I Targets

2 0.0 0.2 0.4 0.6 0.8 1.0 regulation oftranscr e regulationoftranscr ranscription b e regulationoftranscription positiv AYAGTGCCMYCTRGTGGCCA Random Wnt signalingpathway GGGGGC10E0 .01696 0.0001 1.00E-06 RGKGGGCGKGGC GCCCCC10E0 .04239 0.0004 1.00E-05 GGCCCCGCCCCC e regulationofcellmigration AGAAGG10E0 0.0063 1.00E-03 WATGCAAATGAG GGGGC10E0 .05505 0.0015 1.00E-04 DGGGYGKGGC NTATR10E0 0.0003 1.00E-05 NNYTAATTAR DTATR10E0 .03106 0.0003 1.00E-05 ADBTAATTAR *** GGGG10E0 .04569 0.0074 1.00E-03 GGGGGGGG GKTAATGR TAATTAGN Consensus 4 m protein phosphor endocytosis Biological ProcessGO:WTGTF2IIP Targets ulticellular organismdev iption, DNA−templated iption, DNA−templated phosphor doi: intr acellular signaltr 6 y RNApolymerase II TTS promoter−TSS non−coding intron Intergenic exon 5' UTR 3' UTR ylation ylation https://doi.org/10.1101/854851 1.00E-03 .0-40.0015 1.00E-04 .0-5000 158 0.0003 1.00E-05 −log10(pv -au q-value p-value elopment alue) 0.0093 8 ansduction E Number ofPeaks with consensus PhyloP score 46 68 68 31

10 −1 0 1 2 3 4 5

GTF2I Targets Percent ofPeakswith consensus 12 15.68% 37.34% 33.14% 10.37% 45.67% 3.02% 4.46% 6.96% 4.46% 2.03%

Random Targets *** under a Percent ofbackground 14 ; peaks withmotif this versionpostedDecember3,2019. 33.01% 28.44% 39.29% 11.88% 1.73% 2.63% 4.40% 2.45% 0.98% 7.12% CC-BY-NC 4.0Internationallicense B

Rep4 peaks Rep3 peaks Rep2 peaks Rep1 peaks 1.0 2.0 CPM 3.0 4.0 -0.5Kbp -0.5Kbp WT GTF2IIP H3K4me3 H3K4me3 region H3K4me3 region rep4 rep3 rep2 rep1 0.5Kbp 0.5Kbp

1.0 CPM2.0 3.0 4.0 -0.5Kbp -0.5Kbp H3K4me3 H3K4me3 region WT input H3K4me3 region . The copyrightholderforthispreprint(whichwas 0.5Kbp 0.5Kbp 1.0 2.0 2.5 3.0 3.5 4.0 0.5 1.5

1.0 CPM2.0 3.0 4.0 -0.5Kbp -0.5Kbp WT GTF2IIP H3K27me3 H3K27me3 region H3K27me3 region 0.5Kbp 0.5Kbp

1.0 CPM2.0 3.0 4.0 -0.5Kbp -0.5Kbp H3K27me3 WT input H3K27me3 region H3K27me3 region 0.5Kbp 0.5Kbp

4.0 3.0 3.5 2.0 2.5 1.0 1.5 0.5 GTF2I target CPM target GTF2I Mapk14 Motif Name NF bioRxiv preprint CPM ZBTB33 Znf263 OI CNNBRGCGCCCCCTGSTGGC BORIS not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailable CTCF E2F4 GSC CRX Otx2 GFX A 60 60 60 60 60 60 60 Figure 3:ComparisonofGTF2IRD1andGTF2Ibindingsites. D T 0 0 0 0 0 0 0 A F A1SAR :AP1

proportion of targets AYAGTGCCMYCTRGTGGCCA WT_GTF2I_ChIP_rep1 WT_GTF2I_ChIP_rep4 WT_GTF2I_ChIP_rep3 WT_GTF2I_ChIP_rep2 WT_GTF2IRD1_ChIP_rep3 WT_GTF2IRD1_ChIP_rep2 WT_GTF2IRD1_ChIP_rep1 GGVTCTCGCGAGAAC

TGGAAAA GTF2I Targets ATTCTCGCGAGA GGCGGGAAAH

CVGTSCTCCC 0.0 0.2 0.4 0.6 0.8 1.0 NYT Consensus GCT RGGA AATCCYB AA

WR GTF2IRD1 TT TCC AR TGAGTCAB doi: *** Targets https://doi.org/10.1101/854851 .0-30.0134 1.00E-03 .0-30.0316 1.00E-03 0.0316 1.00E-03 0.0174 1.00E-03 0 1.00E-14 .0-20.1954 1.00E-02 0.1632 1.00E-02 0.0316 1.00E-03 .0-20.2917 1.00E-02 .0-10 1.00E-11 -au q-value P-value TTS promoter−TSS non−coding intron Intergenic exon 5' UTR 3' UTR Number ofPeaks with consensus 28 30 49 24 10 12 21 6 9 3 Percent ofPeaks with consensus B 22.05% 23.62% 38.58% 18.90% 16.54% 4.72% 7.87% 9.45% 7.09% 2.36% PhyloP score

−1 0 1 2 3 4 5 GTF2IRD1 Targets under a Percent ofbackground peaks withmotif ; this versionpostedDecember3,2019. 14.77% 27.42% 4.71% 0.60% 9.18% 2.18% 2.75% 1.71% 1.81% 0.34% CC-BY-NC 4.0Internationallicense

GTF2I Targets *** 0 GTF2IRD1 &ASDgenes E G GTF2I &ASDgenes GTF2IRD1 &pLI>.9 Both &ASDgenes GTF2I &pLI>.9 Both &pLI>.9 C Biological ProcessGO:GTF2IandGTF2IRD1commontargets positiv 1 ranscription, initiation e regulationofDNA−templatedtranscription, response tom intr regulation ofaxonextension GTF2IRD1-bound TSS acellular signaltr regulation ofsynapticmembr cellular responsetoreactive oxygen species synaptic v 1.0 actin filamentorganization angiogenesis uscle stretch esicle clustering synaptic v . −log10(pv The copyrightholderforthispreprint(whichwas ansduction 1513 2 esicle cycle alue) ane adhesion 2.0 148 OR 3 GTF2I-bound TSS 4.0 1265 4 8.0 GTF2IRD1 GAPDH GAPDH GTF2I bioRxiv preprint not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailable Mutant L J G E C A and haslittleeffectontranscriptioninthebrain. amount ofanN-truncatedproteinwithdiminishedbindingatGtf2ird1promoter Figure 4:FrameshiftmutationinGtf2ird1exonthreeresultsadecreased WT Genotype log10(Pvalue) CPM -strand CPM 134,350,000 0 10 20 30 40 Genotype 0 10 20 30 40 150 KDa -2.0 Kb -2.0 Kb 37 KDa 150 KDa 0 5 10 15 20 25 30 37 KDa TAATCC Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 Gtf2ird1 151bp+ 5’ 5’ DE genes 2−1 −2 GCCTGCGTGGCGGTACACAATGAGAGCGTCTTCGTGATGGGCACC GCCTGCGTTGGCGGTACACAATGAGAGCGTCTTCGTGATGGGCACC A A C C WT vsGtf2ird1 GGATTA WT IP WT input Gtf2ird1 log2(fold change) 100bp+ TSS TSS doi: V V WT WT A G GGATTA 2 1 0 Gtf2ird1 134,400,000 https://doi.org/10.1101/854851 62bp+ Gtf2ird1 V G

Gtf2ird1 +/- H Gtf2ird1 +/- G rep3 rep2 rep1 rep3 rep2 rep1 Gtf2ird1 −/− +2.0 Kb +2.0 Kb Gtf2ird1

WT N

-/- G WT -/-

TSS Gtf2ird1 E X

Gtf2ird1 Gtf2ird1

Gtf2ird1 +/- S CPM 0 10 20 30 40 M CPM 0 10 20 30 40 +/- Gtf2ird1 WT V -2.0 Kb -2.0 Kb -/- F

Gtf2ird1 134,450,000 Proportion of GTF2IRD1-bound genes WT -/- V H Gtf2ird1 Gtf2ird1 +/-

0.0 0.2 0.4 M

Gtf2ird1 -/- Gtf2ird1 GUR/Bdnf enrichment Gtf2ird1 +/- 10 15 20 25 30 35 0 5 -/- TSS 5 0 5 100% 75% 50% 25% TSS IP -/-

input -/- 3’ log2(RPKM) Quantile 3’ Gtf2ird1 WT IP under a

−/− Relative GTF2I intensity IP rep3 rep2 rep1 B

rep3 rep2 rep1 Relative GTF2IRD1 intensity ; Relative GTF2IRD1protein levels Relative GTF2Iprotein levels +2.0 Kb +2.0 Kb

0.0 1.0 2.0 this versionpostedDecember3,2019.

0.0 0.5 1.0 1.5 Gtf2ird1 Gtf2ird1 Gtf2ird1 WT ** Gtf2ird1 Gtf2ird1 WT CC-BY-NC 4.0Internationallicense Gtf2ird1 −/− +/− −/− +/− +/+ * N K +/- I log10(Pvalue)

proportion of genes Gtf2ird1 X

0 5 10 15 GUR/Pcbp enrichment 10 20 30 40 50 Gtf2ird1 3− −1 −2 −3 0.0 0.2 0.4 0.6 0.8 1.0 0 F D DE peaks 2−1 −2 +/- GTF2IRD1 targetgenes

Gtf2ird1 WT IP log2(FC) wrt to Gapdh HOM vWTFC: HOM vWTFC: log2(FC) wrt to Gapdh log2(fold change) +/- WT vsGtf2ird1 all genes 6 5 4 3 2 −/− −7 −6 −5 −4 −3 −2 IP log2(fold change) Gtf2ird1 Gtf2i exon 25/27 Gtf2ird1 exon 8/9 3 2 1 0 ** *** 2 1 0 *** 0.0 0.4 0.8 -/- −/− −0.4 *** . The copyrightholderforthispreprint(whichwas . 0.4 0.0 bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 5: Homozygous frameshift mutation in Gtf2ird1 is sufficient to cause behavioral phenotypes. A B

WT n= 24 Gtf2ird1+/− n= 38 WT n= 22 −/− Gtf2ird1+/− n= 37 Gtf2ird1 n= 30 −/− ** Gtf2ird1 n= 24 *** **

Time on ledge (s) WT Gtf2ird1+/− Gtf2ird1−/− buried Number marbles 0 5 10 15 20 25 30 0 20 40 60 80 C D WT n= 25 +/− WT n= 25 Gtf2ird1 n= 37 +/− −/− Gtf2ird1 n= 37 Gtf2ird1 −/− n= 29 Gtf2ird1 n= 29 Distance (m) Time in center (s) 0 10 20 30 40 50 60 0 50 150 250 10 20 30 40 50 60 10 20 30 40 50 60 Time (10 minutes) E F Time (10 minutes) WT n= 24 WT n= 25 +/− n= 35 +/− Gtf2ird1 n= 37 −/− Gtf2ird1 Gtf2ird1 n= 27 Gtf2ird1−/− n= 28 T/S

T/S

20 60 100 T/S % time freezing Baseline % inhibition of startle ** −20

4dB 8dB 16dB 0 20 40 60 80 100 Pre pulse above background 1 2 3 4 5 minute G H WT n= 24 +/− WT n= 24 Gtf2ird1 n= 35 +/− −/− Gtf2ird1 n= 35 Gtf2ird1 n= 27 −/− Gtf2ird1 n= 27 Tone * * * % time freezing

% time freezing Baseline ** 0 20 40 60 80 100 0 20 40 60 80 100

1 2 3 4 5 6 7 8 1 3 5 7 9 minute minute bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 6: Mutating both Gtf2i and Gtf2ird1 does not result in larger differences in brain transcriptomes.

A B 134,350,000 134,400,000 134,450,000 134,250,000 134,300,000 Gtf2i+/-/Gtf2ird1+/- Gtf2i+/-/Gtf2ird1+/-

Gtf2i Gtf2ird1 Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2i Gtf2ird1 Gtf2ird1 Gtf2ird1 X Gtf2ird1 Gtf2ird1 Gtf2ird1 -strand

5’CACCCCGAACATTACGACCTCGCAACC ... ATCATTAAGAGA3’ X WT H P E H Y D L A T ... I I K R

5’CACCCCGAATTACGACCTCGCAACCTT ... CTTCATCATTAAGAGA3’ Mutant H P E L R P R N L ... L H H X WT

Mutant Gtf2i+/+/Gtf2ird1+/+ Gtf2i-/-/Gtf2ird1-/- Gtf2i+/-/Gtf2ird1+/-

C D G WT vs Gtf2i/Gtf2ird1+/−

+/- -/- +/- -/- +/- -/- Relative GTF2I protein levels Gtf2i exon 25/27 WT Gtf2i+/−/Gtf2ird1+/− DE genes Gtf2i−/−/Gtf2ird1−/− Gtf2i

+/- /Gtf2ird1-/- /Gtf2ird1 +/- /Gtf2ird1-/- /Gtf2ird1 +/- /Gtf2ird1-/- /Gtf2ird1 *** Genotype WT Gtf2i Gtf2i WT Gtf2i Gtf2i WT Gtf2i Gtf2i * ** *** GTF2I 150 KDa GAPDH 37 KDa log2(FC) wrt to Gapdh Relative GTF2I intensity Relative −9 −7 −5 0.0 0.5 1.0 1.5 2.0

Apobec3

F log10(Pvalue) Gtf2ird1 E Relative GTF2IRD1 protein levels Gtf2ird1 exon 8/9 +/- +/- +/- -/- -/- -/- WT Gtf2i+/−/Gtf2ird1+/− Gtf2i−/−/Gtf2ird1−/−

+/- /Gtf2ird1-/- /Gtf2ird1 +/- /Gtf2ird1-/- /Gtf2ird1 +/- /Gtf2ird1-/- /Gtf2ird1 ** ***

** 0 5 10 15 20 Genotype WT Gtf2i Gtf2i WT Gtf2i Gtf2i WT Gtf2i Gtf2i * ** GTF2IRD1 150 KDa −2 −1 0 1 2 GAPDH 37 KDa log2(FC) wrt to Gapdh log2(fold change) −7 −6 −5 −4 −3 Relative GTF2IRD1 intensity Relative 0.0 0.5 1.0 1.5

H I

HET v WT FC: all genes 0.8 1.0 HET v WT FC: GTF2I target genes 0.6 0.0 0.4 0.8 proportion of genes tion of GTF2I-bound genes 0.0 0.2 0.4 −0.4 0.0 0.4

25% 50% 75% 100% 0.0 0.2 0.4

Propor −2 −1 0 1 2 log2(RPKM) Quantile log2FC bioRxiv preprint doi: https://doi.org/10.1101/854851; this version posted December 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Figure 7: Gtf2i does not modify most of the phenotypes of Gtf2ird1 mutation.

A B C Gtf2ird1+/- Gtf2i+/-/Gtf2ird1+/-

WT n= 22 WT n= 22 Gtf2i+/−/Gtf2ird1+/− n= 23 Gtf2i+/−/Gtf2ird1+/− n= 23 Gtf2ird1+/− n= 17 Gtf2ird1+/− n= 17 Gtf2i+/−/Gtf2ird1−/− n= 18 Gtf2i+/−/Gtf2ird1−/− n= 18 X *

* bles buried bles Time on ledge (s) Number of mar

+/+ +/+ +/- +/- Gtf2i /Gtf2ird1 Gtf2i /Gtf2ird1 0 5 10 15 20 25 30 0 20 40 60 80 Gtf2ird1+/- Gtf2i+/-/Gtf2ird1-/-

D E F WT n= 24 Gtf2i+/−/Gtf2ird1+/− n= 25 +/− Gtf2ird1 n= 21 WT n= 22 +/− −/− +/− +/− WT n= 22 Gtf2i /Gtf2ird1 n= 18 Gtf2i /Gtf2ird1 n= 20 +/− +/− +/− Gtf2i /Gtf2ird1 n= 20 Gtf2ird1 n= 17 +/− +/− −/− Gtf2ird1 n= 17 Gtf2i /Gtf2ird1 n= 18 +/− −/− * Gtf2i /Gtf2ird1 n= 18 *

* * ** ** * Distance (m) % inhibition of startle 0 20 40 60 80 100 Time spent in center (s) 0 10 20 30 40 50 60 0 50 150 250

10 20 30 40 50 60 10 20 30 40 50 60 4dB 8dB 16dB Time (10 minutes) Time (10 minutes) Pre pulse intensity over background G H I

n= 18 WT n= 18 WT n= 18 WT +/− +/− +/− +/− +/− +/− Gtf2i /Gtf2ird1 n= 21 Gtf2i /Gtf2ird1 n= 21 Gtf2i /Gtf2ird1 n= 21 +/− +/− +/− Gtf2ird1 n= 17 Gtf2ird1 n= 17 Gtf2ird1 n= 17 +/− −/− +/− −/− +/− −/− Gtf2i /Gtf2ird1 Gtf2i /Gtf2ird1 Gtf2i /Gtf2ird1 n= 16 n= 16 n= 16 Tone

* * T/S T/S

T/S % Time freezing % Time freezing Baseline % Time freezing ** baseline 0 20 40 60 80 100 0 10 20 30 40 50 60 0 10 20 30 40

1 2 3 4 5 1 2 3 4 5 6 7 8 1 3 5 7 9 Time (minute) Time (minute) Time (minute)