FUNCTIONAL CHARACTERIZATION OF SCHIZOPHRENIA-ASSOCIATED VARIATION IN CACNA1C

by Nicole Eckart

A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy in Human Genetics

Baltimore, Maryland March, 2016

Abstract

Background

Schizophrenia is a complex psychiatric disorder with heritability estimated to be around 80%. Genetic studies have identified over 100 schizophrenia risk loci. These risk loci most often lie in non-coding sequences and are enriched for expression quantitative trait loci (eQTLs), suggesting that dysregulation of transcriptional control plays a role in complex disease pathogenesis. Of particular interest are non-coding variants in calcium channel subunits, which have been associated with multiple psychiatric disorders.

Specifically, genome wide association studies (GWAS) have repeatedly identified the single nucleotide polymorphism (SNP) rs1006737 in the third intron CACNA1C to be strongly associated with schizophrenia.

Methods

We genotyped schizophrenia-associated variants and measured expression by qPCR in human post mortem brain samples. We looked for statistically significant genotype-expression correlations to identify eQTLs. We further investigated the putative eQTLs with dual luciferase reporter assays by transfecting reporter constructs into relevant cell lines for all variants in high linkage disequilibrium (LD) with the schizophrenia-associated variant.

Furthermore, we investigated allele-specific binding of variant sequences in the putative eQTL through electrophoretic mobility shift assays (EMSAs) with nuclear extract from two cell lines is incubated with radiolabeled DNA probes, and a larger molecular weight band is produced if the probe binds to a protein or protein complex from the nuclear extract. With protein microarrays, are immobilized on a glass

ii slide and fluorescently labeled DNA probes are hybridized to the microarray, producing a signal when they bind a protein, we identified specific DNA-protein interactions.

Lastly, we identified potential regulatory elements that interact with the promoters of our of interest through circular conformation capture with next generation sequencing (4C-seq). In this assay, cells were cross-linked to capture protein- mediated DNA-DNA interactions. After digestion, interacting DNA fragments were ligated together to form small hybrid circular molecules. Then, for a known viewpoint of interest, the unknown interacting fragments were identified through next generation sequencing.

Results and Conclusions

Here, we showed that rs1006737 marks an eQTL for CACNA1C transcript levels in human post mortem brain tissue. We tested 16 SNPs in high LD with rs1007637 and found that one, rs4765905, consistently showed allele-dependent regulatory function in reporter assays. We found allele-specific protein binding for 13 SNPs including rs4765905 and using protein microarrays, we identified several proteins binding more than three SNPs, but not control sequences, suggesting possible functional interactions and combinatorial haplotype effects. Finally, using 4C-seq, we showed interaction of the

CACNA1C promoter with the eQTL and other potential regulatory regions. Our results elucidate the pathogenic relevance of one of the best-supported risk loci for schizophrenia.

Advisor: Dimitrios Avramopoulos

Reader: Andrew S. McCallion

iii Acknowledgements

First, I’d like to thank my thesis advisor, Dimitri Avramopoulos, for his guidance and support. I have learned so much about genetics and genomics and have become a better scientist from working with you. I’d also like to thank the members of my thesis committee, Andy McCallion, Dave Valle, Kathy Burns, and Ann Pulver, for your feedback and many suggestions as this research project progressed.

This work wouldn’t have been possible without the support of my lab mates, past and present. Ruihua Wang extracted much of the DNA and RNA needed for the eQTL study and always was a valuable resource for reagents, protocols, and encouragement.

Megan Szymanski Pierce also extracted DNA and RNA for the eQTL study and helped me settle into the lab before she graduated. Rebecca Yang, a summer student, performed some of the protein microarrays. Thank you to Mariela Zeledon for teaching me the dual luciferase assay and EMSA protocols, Cassandra Obie for teaching me everything I need to know about cell culture, and Gary Steele for helping me troubleshoot PCR and cloning experiments. I would also like to thank Xuan Pham for sharing a bay with me and making my time in lab so enjoyable. I will miss all of our conversations about science and life!

And thank you to all of the many other members of the Avramopoulos, Valle, and

Vernon labs for your constant support and intellectual contributions.

Our collaborators in Heng Zhu’s lab enabled us to conduct the protein microarray study. Thank you especially to Qifeng Song, who manufactured the microarrays and taught me the protocol, and to Cedric Moore, who helped me with the analysis. Thank you, also, to our collaborators in Andy McCallion’s lab. Maggie Baker provided lots of support and advice as we optimized the 4C-seq protocol together. Sarah McClymont and

iv Xylena Reed helped me with next-gen sequencing. Xylena and Dave Gorkin also offered their insight on the experimental design of the dual luciferase assays.

Thank you to the Human Genetics program for giving me this incredible opportunity. Dave Valle, Kirby Smith, Andy McCallion, and Sandy Muscelli do so much to run the program and support the students’ education. Thank you to the many faculty and staff who have been so influential by sharing their personal stories and offering their insights on how to become a successful scientist.

I would also like to thank all of my classmates for always being there to encourage each other. You have challenged me to become a better student, presenter, and scientist. Thank you to all of the Human Genetics students for being mentors, role models, and friends. And a special thanks to Foram Ashar, Shannon Ellis, and Courtney

Woods for not only being wonderful colleagues, but also the greatest of friends.

Thank you to my amazing family and friends for supporting me throughout this entire process. I am grateful for your encouragement and proud to share these accomplishments with you. To my in-laws, who love me as one of their own, I’m so happy to be a part of your family. Thank you to my parents for all of your love and support, for the many opportunities you gave me, and for teaching me the virtues of hard work, integrity, and compassion. And finally, I am thankful for my husband, Tim, and his unwavering support. Thank you for all of the sacrifices you’ve made, your patience, and your sense of humor. Without all of you, none of this would have been possible.

v Table of Contents

Abstract ...... ii Acknowledgements ...... iv Table of Contents ...... vi List of Tables ...... vii List of Figures ...... viii Chapter 1: Introduction ...... 1 Chapter 2: Schizophrenia Risk Variants as Potential eQTLs ...... 7 2.1 Introduction ...... 7 2.2 Methods...... 13 2.3 Results ...... 16 2.4 Discussion ...... 18 2.5 Chapter 2 Tables ...... 21 2.6 Chapter 2 Figures ...... 24 Chapter 3: Enhancer Reporter Assays ...... 29 3.1 Introduction ...... 29 3.2 Methods...... 31 3.3 Results ...... 36 3.4 Discussion ...... 37 3.5 Chapter 3 Tables ...... 40 3.6 Chapter 3 Figures ...... 41 Chapter 4: Protein Binding Assays ...... 44 4.1 Introduction ...... 44 4.2 Methods...... 46 4.3 Results ...... 56 4.4 Discussion ...... 58 4.5 Chapter 4 Tables ...... 60 4.6 Chapter 4 Figures ...... 66 Chapter 5: Chromatin Conformation Capture...... 71 5.1 Introduction ...... 71 5.2 Methods...... 73 5.3 Results ...... 84 5.4 Discussion ...... 85 5.5 Chapter 5 Tables ...... 86 5.6 Chapter 5 Figures ...... 89 Chapter 6: Conclusion...... 92 References ...... 96 Curriculum Vitae ...... 105

vi List of Tables

Table 1. Schizophrenia risk variants chosen for the eQTL study and the most proximal gene...... 21

Table 2. Transcript-specific primers for qPCR of cDNA from post mortem brain samples...... 22

Table 3. Primer sequences to genotype rs12807809 in NRGN...... 23

Table 4. Primer sequences for dual luciferase reporter constructs...... 40

Table 5. EMSA probe sequences...... 60

Table 6. Protein microarray probe sequences...... 61

Table 7. Summary of EMSA binding results...... 62

Table 9. Complete protein microarray results...... 63

Table 9. Summary of proteins binding multiple variants in the protein microarray...... 65

Table 10. Primers Used for 4C-seq ...... 86

Table 11. Summary of sequencing metrics for 4C-seq...... 88

vii List of Figures

Figure 1. Schematic representation of primers for RT-qPCR...... 24

Figure 2. NRG3 transcript class IV expression by genotype of rs7899151...... 25

Figure 3. NRG3 exon 1B transcript expression by genotype of rs10748842 and rs60827755...... 26

Figure 4. NRG3 exon 1B transcript expression relative to class I expression by genotype at rs10748842 and rs60827755...... 27

Figure 5. CACNA1C expression by genotype of rs1006737...... 28

Figure 6. Dual luciferase reporter assay results...... 41

Figure 7. Dual luciferase replication in HEK293 cells...... 42

Figure 8. Dual luciferase replication results in SK-N-SH cells...... 43

Figure 9. EMSA for rs4765905 with HEK293 and SK-N-SH nuclear extracts...... 66

Figure 10. EMSAs for the remaining 15 variants in LD with rs1006737...... 68

Figure 11 Competition EMSAs...... 70

Figure 12. 4C-seq results from the CACNA1C promoter viewpoint in SK-N-SH cells. .. 89

Figure 13. 4C-seq results from the CACNA1C promoter viewpoint in HEK293 cells. .... 90

Figure 14. 4C-seq results from the CCAT promoter viewpoint...... 91

viii Chapter 1: Introduction

Schizophrenia is a complex chronic psychiatric disorder affecting 1% of the

American population. It is characterized by variable phenotype, including positive symptoms such as hallucinations and delusions, negative symptoms, such as flat affect and social withdrawal, and cognitive impairment. Disease onset occurs in the patients’ late teens to early twenties (van Os and Kapur, 2009; The National Institute of Mental

Health, 2016). The course of schizophrenia is lifelong, taking a significant toll on the individuals’ well being. This disease carries a massive public health burden, with increased rates of homelessness, drug abuse, and suicide among affected individuals, and almost $65 billion spent on care in the United States in 2002. Costs include hospitalization, supervised living, and extensive treatment for patient (McEvoy, 2007;

Mueser and McGurk, 2004; Picchioni and Murray, 2007). With such high prevalence and cost to society, it is critical to understand the pathogenesis of schizophrenia in order to develop effective treatments and reduce the burden of the disease.

Although schizophrenia is a complex disease with both environmental and genetic risk factors, it has been observed that biological relatives of patients with schizophrenia have a higher risk of developing the disorder compared to the general population (Eaton,

1985; McGue and Gottesman, 1991). Furthermore, twin studies show that monozygous twin concordance rates range from 41-65% and dizygous twin concordance rates range from 0-28%, depending on the population surveyed (Cardno and Gottesman, 2000).

These twin studies yield heritability estimates around 80%, indicating the genetic component to the pathogenesis of schizophrenia is very strong (Cardno and Gottesman,

2000; Owen et al., 2016; Sullivan et al., 2003). To identify specific genetic risk loci,

1 researchers have utilized linkage studies, in which they look for co-segregation of genetic markers with disease in large pedigrees (Claes et al., 2012). Not all linkage studies have resulted in reproducible findings, however a large meta-analysis of 20 genome wide linkage studies showed strong statistical evidence for linkage at 10 loci: 1p13.3-q23.3,

2p12-q23.3, 3p25.3-p22.1, 5q23.2-q34, 6pter-p21.1, 8p22-p21.1, 11q22.3-q24.1, 14pter- q13.1, 20p12.3-p11, and 22pter-q12.3 (Lewis et al., 2003). Linkage studies were at the forefront of identifying genetic risk loci for schizophrenia, but the regions remain quite large and require fine mapping to identify candidate functional variants.

Genome wide association studies (GWAS) identify potential risk loci in complex genetic disorders, like schizophrenia, by comparing allele frequencies of common variants among large populations of cases and controls. The loci identified by GWAS are often much smaller than those identified by linkage studies because of the higher density of markers and the larger sample sizes (Bentham and Vyse, 2013). Recent independent

GWAS for schizophrenia risk loci have achieved sample sizes in the tens of thousands, and as a result they have identified over 100 loci with robust statistical support

(Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). These loci are enriched for genes highly expressed in the brain cortex and for ion channel subunits, suggesting that tissue specific expression and activity of ion channels may play an important role in pathogenesis of psychiatric disorders (Pers et al., 2016). Although

GWAS have made great strides in identifying genetic risk loci, little progress has been made in understanding molecular mechanisms by which they contribute to disease risk.

Complicating the study of schizophrenia is the variable presentation of schizophrenia. The American Psychiatric Association (APA) recommends that two of the

2 following five symptoms should be present: delusions, hallucinations, disorganized speech, disorganized or catatonic behavior, and negative symptoms, and at least one symptom must be in the first three of the list (delusions, hallucinations, disorganized speech) for a diagnosis of schizophrenia (American Psychiatric Association, 2013).

Factor analysis can be utilized to produce detailed quantitative phenotypes taking into account variables beyond case or control status (McGrath JA et al., 2009). Genetic analysis may uncover genetic risk variants for these quantitative phenotypes, identifying loci that may be otherwise obscured by the heterogeneity of the case population (Chen et al., 2009).

Additionally, schizophrenia has overlapping phenotypes with another chronic psychiatric disease, bipolar disorder. Among the shared features are similar age of onset, delusions, negative symptoms, and increased risk of suicide (van Os and Kapur, 2009;

Williams et al., 2011a). Relatives of a proband with either bipolar disorder or schizophrenia have increased risk for both disorders (Lichtenstein et al., 2009; Owen,

2012). These observations suggest that schizophrenia and bipolar disorder are not entirely discrete disorders, and may have a common etiology. Recent GWAS have confirmed that some genetic risk factors are shared between the two disorders (Ferreira et al., 2008;

Hamshere et al., 2013; Schizophrenia Working Group of the Psychiatric Genomics

Consortium, 2014).

These various types of genetic studies of complex disease all identify risk loci for schizophrenia. However, the identification of a genetic risk variant that shows a statistical association with disease is of no value unless it is followed by work that will point to specific genes or other functional entities influenced by the variant. Therefore, the goal of

3 my thesis work has been to perform functional studies that will determine the molecular mechanism of schizophrenia risk loci and link their function to the wider context of disease.

The majority of complex disease-associated variants are in non-coding regions, so the most prevalent hypothesis in the field is that these non-coding variants function as regulatory elements, and the disruption of these elements is what increases disease risk.

In fact, disease-associated variants are enriched for eQTLs, but this observation still does not point to the specific mechanism through which each disease-associated variant functions (Nicolae et al., 2010). Therefore, in this work we seek to elucidate the biological connection between several well-supported genetic variants and schizophrenia by studying how genotype influences the complex regulation of .

First, we examine human post mortem brain tissue for correlations between genotype at 14 well-supported schizophrenia risk variants and local gene expression.

These variants were chosen for investigation because they were identified by linkage studies, GWAS, or factor analysis. Some are associated with both schizophrenia and bipolar disorder. We found positive correlations for two variants, rs10748842 in NRG3 and rs1006737 in CACNA1C. The NRG3 results were incorporated with the work of

Mariela Zeledón (Zeledón et al., 2015), and the CACNA1C results served as the foundation for the remainder of my thesis work.

We were specifically interested in CACNA1C, which encodes an alpha1 subunit of a L-type voltage gated calcium channel, because calcium channel subunits have been implicated in the pathogenesis of multiple psychiatric disorders (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). For example, CACNB2 is associated with

4 schizophrenia and a combined group of patients diagnosed with any one of five psychiatric disorders (2013) and missense mutations in CACNA1H are implicated in autism spectrum disorders (Splawski et al., 2006). The presence of multiple association signals within this family of genes further suggests that the disruption of calcium signaling increases risk for schizophrenia. Calcium signaling is involved in neurotransmitter release and regulation of gene expression, and disruption of these functions may play an important role in psychiatric disease. (Khosravani and Zamponi,

2006).

The schizophrenia-associated variant rs1006737 tags a non-coding linkage disequilibrium (LD) block of ~60kb contained completely within intron 3 of CACNA1C.

Any variant(s) in this LD block may be functional and drive the correlation with

CACNA1C expression. Therefore, we characterized 16 variants in high LD through a combination of in vitro reporter and protein binding assays. We found that rs4675905 has an allele-specific enhancer effect in dual luciferase reporter assays in SK-N-SH cells, and is therefore the lead functional candidate. We also observed that all 16 variants exhibit allele-specific protein binding, suggestive of a haplotype-mediated enhancer effect.

With circular chromatin conformation capture and next-generation sequencing

(4C-seq) analysis, we observed that the LD block tagged by rs1006737 is physically interacting with the CACNA1C promoter in 3-dimmensional space in the cell. This further suggests that the LD block has a regulatory function effect on CACNA1C expression. Together with the other data, this effect is likely driven by the genotype rs4765905 in combination with the other variants and mediated by DNA-protein

5 complexes that allow the DNA sequences in or near the LD block to come into physical proximity of the CACNA1C promoter.

In this work, we have utilized an innovative strategy that characterizes the possible regulatory mechanisms of a non-coding schizophrenia-associated . This combination of experiments in human post mortem brain tissue, cell culture, and in vitro leads to a better understanding of the ways in which genetic variation can perturb gene expression. This work made significant and novel contributions to the research on schizophrenia, and has implications for the study of other disease-associated non-coding variants.

6 Chapter 2: Schizophrenia Risk Variants as Potential eQTLs

2.1 Introduction

2.1.1 Background on Schizophrenia-Associated Non-Coding Variants

Genetic studies have successfully identified numerous genetic risk variants for complex diseases, including schizophrenia, which may have otherwise gone undiscovered. The vast majority of these risk variants are in non-coding regions of the genome and their function remains unknown. Many recent studies have shown that the non-coding disease-associated variants likely have roles in regulating gene expression and epigenetic landscapes (Dimas and Dermitzakis, 2009; Grisanzio et al., 2012;

McVicker et al., 2013; Nicolae et al., 2010; Schaub et al., 2012). Modifications to the timing and quantity of gene expression can be more nuanced than changing the protein sequence or structure, which may help to explain why complex diseases often have later onset and milder phenotypes than some Mendelian disorders caused by coding variants

(Nicolae et al., 2010). Therefore, we were interested in evaluating the regulatory function of several schizophrenia risk variants.

We chose variants that have been identified through linkage studies, GWAS, and factor analysis, prioritizing variants that were implicated in both schizophrenia and bipolar disorder and those that were proximal to genes (Table 1). We hypothesize that some of these non-coding variants will be expression quantitative trait loci (eQTLs) that regulate local gene expression. The regulatory effects may change the total expression of a gene or may influence , so we sought to quantify both types of changes (Figure 1 and Table 2). It is possible that these non-coding variants contribute to the pathogenesis of schizophrenia through aberrant gene regulation.

7 We chose to quantify gene expression in post-mortem brain samples from superior temporal gyrus (STG), which is involved in the processing of language and facial stimuli, and social cognition. Studies have shown patients with schizophrenia have structural abnormalities in this the STG (Bigler et al., 2007; Kasai et al., 2003; Radua et al., 2010). To follow up our positive results, we also quantified expression in the post mortem dorsolateral prefrontal cortex (DLPFC) of unrelated individuals. The DLPFC functions in many cognitive processes, such as planning and working memory, and patients with schizophrenia show an overall lack of activity in the frontal lobe (Chan et al., 2008; Elliott, 2003; Monsell, 2003; Phan et al., 2006). Both are relevant regions of the brain for the study of schizophrenia, and serve to provide insight into how gene regulation may be similar or different across brain regions.

2.1.2 CACNA1C

We chose to investigate three intronic variants in the CACNA1C gene on the short arm of 12. The variant rs1006737 lies within the third intron of CACNA1C, in a region that is not in linkage disequilibrium (LD) with any coding variation (Johnson et al., 2008; Kent et al., 2002). It has been associated by GWAS with both bipolar disorder and schizophrenia (Ferreira et al., 2008; Green et al., 2010; Liu et al., 2011;

Ripke et al., 2011). A proximal variant, rs4765913 (r2 = 0.401 and D’ = 0.876 with rs1006737), was also identified as a risk variant in a bipolar disorder GWAS and in a joint GWAS of schizophrenia and bipolar disorder (Johnson et al., 2008; Sklar et al.,

2011). A third variant, rs7972947, in the first intron of CACNA1C with no significant LD to the other schizophrenia risk variants in CACNA1C, was associated with schizophrenia at a level just below genome wide significance (p= 7*10-7) (Ripke et al., 2011).

8 CACNA1C encodes an alpha1 subunit of the long lasting (L-type) voltage gated calcium channel and is expressed in heart, urinary bladder, and frontal cortex (Rebhan et al., 1997). L-type calcium channels are high-voltage activated, sensitive to dihydropyridine agonists and antagonists, and activate calcium dependent gene transcription. Additionally, alpha1 subunits contain the main functional components of the calcium channel, including the transmembrane pore, voltage sensor, and gating apparatus (Khosravani and Zamponi, 2006). Changes in expression of alpha1 subunits, like CACNA1C, may lead to global changes in calcium transport and gene expression.

Additionally, variants in calcium channel subunit genes are enriched in genetic studies of multiple psychiatric disorders (Cross-Disorder Group of the Psychiatric Genomics

Consortium, 2013). Another schizophrenia susceptibility gene, MIR137, is a microRNA that is predicted to target CACNA1C and reduce translation of the mRNA (Ripke et al.,

2011). This evidence provides a strong foundation of the potential for CACNA1C to contribute to the pathogenesis of schizophrenia.

CACNA1C has over 60 exons and undergoes extensive alternative splicing, making it difficult to survey all transcripts (Kent et al., 2002). There is no known alternative splicing of the first four exons, proximal to the schizophrenia-associated variants. However, alternative splicing of exons 31 – 33 produce alpha1 calcium channel subunits with different activation potentials (Tang et al., 2004). Therefore, we measured expression of classes of functionally distinct CACNA1C transcripts based on splicing of exons 31 - 33.

9 2.1.3 NRG1

NRG1 was originally identified as a schizophrenia susceptibility gene through a genome wide linkage scan of Icelandic families (Stefansson et al., 2002). Further investigation specifically identified the variant rs7014762 in the 5’ core promoter of

NRG1 to be associated with schizophrenia (Nicodemus et al., 2009). Haplotype association analysis identified rs35753505, which has no significant LD with rs7014762 as a schizophrenia risk variant in a Scottish population (Stefansson et al., 2003). rs35753505 was also nominally associated with bipolar disorder in an independent cohort

(Prata et al., 2009)

NRG1 is reported involved with brain development and neuronal function, and is therefore a practical candidate gene for schizophrenia pathogenesis (Nicodemus et al.,

2009). NRG1 encodes many classes of transcripts that differ in 5’ exon usage (Law et al.,

2006; Steinthorsdottir et al., 2004). As both risk variants are upstream of NRG1, it is possible they may alter promoter and/or 5’ exon usage, so we quantified the four most common classes of NRG1 transcripts (I-IV), which each have different 5’ exon splicing patterns.

2.1.4 NRG3

A non-coding variant in the first intron of NRG3, rs1080293, was associated with schizophrenia as a discrete phenotype in an Ashkenazi Jewish cohort, but did not withstand multiple test corrections. However, when variants were tested for association with quantitative traits derived from a factor analysis of detailed phenotype information, rs10748842 in the first intron of NRG3 was strongly associated with the “delusion” factor. Other weaker associations were observed between rs1339844 and the “scholastic”

10 factor, rs7899151 and the “disorganized” factor, and rs10886221 and the “hallucination” factor (Chen et al., 2009). We also included rs60827755, another variant in the first intron of NRG3 because preliminary data from Mariela Zeledón in our research group showed the variant had regulatory potential in dual luciferase reporter assays in vitro. Two of the variants in the first intron, rs60827755 and rs10748842 were in significant, but not complete LD (r2 = 0.92, D’ = 1.00), but the remaining variants showed no strong linkage

(Johnson et al., 2008).

Chromosome 10q22-23, which includes NRG3, was originally identified as a schizophrenia risk locus by a linkage study in Ashkenazi Jewish families conducted by our research group (Fallin et al., 2003). The chromosome 10q22 linkage peak was confirmed in an independent linkage study in families of Han Chinese descent (Faraone et al., 2006). NRG3 is strongly expressed in the central nervous system and is a paralog of

NRG1, another well-supported schizophrenia susceptibility gene. NRG3 and has a pattern of alternative splicing at the 5’ end similar to NRG1 (Kao et al., 2010). Since many of our variants of interest were upstream of or in the first intron of NRG3, we hypothesized that they may affect alternative splicing at the 5’ end of the gene, and therefore classified four classes of transcripts based on their 5’ exon usage.

2.1.5 NRGN

A non-coding variant approximately 3.5 kb upstream of NRGN, rs12807809, was associated with schizophrenia in a GWAS of combined samples from eight European locations, the International Schizophrenia Consortium, and European–American portion of the Molecular Genetics of Schizophrenia study (Stefansson et al., 2009). In an

11 independent cross-disorder association study, the same variant was found to be nominally associated with bipolar disorder, as well (Williams et al., 2011a).

NRGN is expressed exclusively in the brain and localizes to dendrites of neurons.

Based on work studying response to thyroid dysfunction, NRGN may play a role in psychotic and cognitive features of disease (Stefansson et al., 2009). There is no evidence of alternative splicing, so mRNA transcripts of NRGN were quantified based on the first two exons (Kent et al., 2002).

2.1.6 PBRM1

The synonymous variant rs2251219 in the exon 28 of PBRM1 was identified through a genome wide meta-analysis of five case-control cohorts of bipolar disorder and major depression disorder (Consortium, 2010). It was also found to be associated with schizophrenia in a completely independent cross-disorder study (Williams et al., 2011a).

PBRM1 encodes a polybromo-1, which is a protein important for chromatin remodeling and it is overexpressed in post mortem brain samples of patients diagnosed with bipolar disorder (Consortium, 2010). Although the functional connection between

PBRM1 and schizophrenia remains to be discovered, it is a very strong statistical candidate based on the associated variant and altered expression in patients. We attempted to quantify four classes of PBRM1 transcripts based on alternative splicing around rs2251219, a synonymous variant in exon 28 that may affect splicing. Transcripts have been observed that include both exons 26 and 27, skip exon 26, skip exon 27, and skip both exons 26 and 27 (Kent et al., 2002). We designed primers for all four classes of transcripts to quantify PBRM1 expression. However, we were unable to amplify any

12 cDNAs that include exon 26, but skip exon 27. This may be a result of low expression of that particular class of transcript in the brain.

2.1.7 ZNF804A

An intronic variant, rs1344706, in ZNF804A was associated with schizophrenia in a number of GWAS and replication studies, and there was even stronger evidence of association when patients with bipolar disorder were included in the case group

(International Schizophrenia Consortium et al., 2009; O’Donovan et al., 2008; Riley et al., 2010; Steinberg et al., 2011; Williams et al., 2011b).

At the time of the study design, rs1344706 was the best-supported risk locus for schizophrenia, despite the lack of research on the function of the gene. There is no evidence of alternative splicing in ZNF804A, so we quantified expression based on exons

3 and 4, downstream of the risk variant, which is located in the second intron (Kent et al.,

2002).

2.2 Methods

2.2.1 Brain Samples

We extracted DNA and RNA from 195 STG (Broadmann area 22) and 94 DLPFC flash-frozen post mortem brain samples without macroscopic pathology acquired from the Harvard Brain Tissue Resource Center and the Maryland Brain Bank, respectively.

DNA was extracted from 10 mg of tissue using the Gentra Puregene Tissue kit (Qiagen) according to the manufacturer’s protocol. RNA was extracted from 50 mg of tissue using the RNeasy Lipid Tissue Mini Kit (Qiagen) according to the manufacturer’s protocol.

This work was done largely by Ruihua Wang and Megan Szymanski Pierce. 13 STG donors had an average age of 62.1 years, were 77.4% male, and the tissue had an average post mortem interval (PMI) from death to tissue extraction of 23.2 hours.

DLPFC donors had an average age of 35.1 years, were 66.0% male, and the tissue had an average PMI of 14.7 hours.

2.2.2 Variant Genotyping

We genotyped 14 variants of interest using Taqman SNP Genotyping Assays

(Applied Biosystems) (Table 1). Commercially available assays existed for CACNA1C variants rs1006737, rs4765913, and rs7972947, NRG3 variants rs10748842, rs1339844, rs1080293, and rs10886221, NRGN variant rs12807809, PBRM1 variant rs2251219, and

ZNF804A variant rs1344706. Custom assays were designed for NRG1 variants rs7014762 and rs35753505 and NRG3 variants rs7899151 and rs60827755. All assays were performed according to the manufacturer’s protocol.

We were unable to genotype the NRGN variant rs12807809 with the TaqMan

SNP Genotyping Assay, so we designed primers around the SNP (Table 3) and submitted the resulting amplicons for Sanger Sequencing through The Synthesis and Sequencing

Facility at Johns Hopkins University School of Medicine. Genotypes were called manually by visualizing the Sanger sequence trace files with the CodonCode Aligner software.

2.2.3 Transcript Quantification

cDNA was prepared from extracted RNA with MuLV reverse transcriptase and random hexamers (Applied Biosystems). Transcript-specific primers were designed overlapping unique exon-exon junctions to amplify exclusively the cDNA of the desired

14 transcripts or class of transcripts (Figure 1 and Table 2). qPCR amplifications were performed using SYBR Green (Applied Biosystems) in triplicate and results were normalized to two housekeeping genes (MRIP and ACTB) for higher accuracy. The following cycling conditions were used: 95° for 10 minutes, 95° for 15 seconds, 60° for 1 minute, repeat steps 2 and 3 39 more times. A melting curve was produced and used to verify the absence of non-specific amplification or amplification of more than one sequence.

We experienced some difficulty with two of the four classes of PBRM1 transcripts. Class II, which included exon 26, but skipped exon 27 would not amplify under standard PCR amplification conditions. We tried multiple combinations of primers, however none produced a distinct product, possibly because that class of transcript is not expressed in the STG. Additionally, we were able to amplify class IV transcripts, which include both exons 26 and 27 under standard PCR conditions, but not with SYBR Green.

We attempted a lower annealing temperature of 55° instead of 60°, which amplified a product, but produced 4 peaks on the dissociation curve, suggesting multiple sequences had been amplified.

Due to a variant next to the unique exon junction that defines class II NRG3 transcripts (exons 1B1 – 6), I was unable to design a primer spanning the junction.

Therefore, the primers that amplify class II NRG3 transcripts will also amplify class III

(1B2 – 6) and class IV (1B2 – 4 – 6) transcripts. Exon 1B2 is an extended version of exon

1B1 that contains an additional 75bp. Class I transcripts use an entirely separate first exon (1A1) and can therefore still be quantified separately. This exon nomenclature is

15 consistent with the recommendations made by Zeledón et al. (2015) (Zeledón et al.,

2015).

2.2.4 eQTL Analysis

Transcript expression levels were log transformed and normalized to two housekeeping genes, MRIP and ACTB as described in Szymanski et al., 2011 (Szymanski et al., 2011). Correlations between genotype and normalized expression were calculated using a generalized linear model, corrected for age, sex, PMI, and plate in the statistical software R. Only local eQTLs were considered, where correlations between genotype and expression were considered between the variants of interest and expression of the most proximal gene.

2.3 Results

The risk allele (G) of rs7899151 showed a correlation with increased expression of class IV NRG3 transcripts (p = 0.009, Figure 2). Class IV transcripts are defined by the inclusion of exon 4, as named by the Zeledón nomenclature system, which the other classes of transcripts (I, II, and III) lack (Zeledón et al., 2015). In the DLPFC rs7899151 showed no significant correlations with NRG3 expression.

In the STG, the risk allele (C) of rs10748842 had a weak correlation with decreased expression of transcripts using the alternative first exon 1B (classes II, III, and

IV) (p = 0.026) (Figure 3). We also observed that rs10748842 was correlated with relative expression of transcripts using exon 1B when adjusted for transcripts using exon

1A (class I) (p = 5.11 × 10-9) (Figure 4). This suggests that while absolute expression of

NRG3 transcripts may only be mildly affected by genotype, the ratios transcripts based on

16 first exon usage (1A vs. 1B) vary significantly by genotype. To follow up these findings, we also genotyped rs10748842 and measured NRG3 expression in the DLPFC. Here, the risk allele was strongly correlated with decreased absolute expression of transcripts using exon 1B (p = 1.9 × 10-7) (Figure 3), as well as relative expression of transcripts using exon 1B compared to transcripts using exon 1A (p = 4.7 × 10-8) (Figure 4).

As expected given the strong LD (r2 = 0.92), the risk allele (G) of rs60827755 also showed similar correlations between genotype and expression to rs10748842. In the

STG, the risk allele was correlated with decreased absolute expression of exon 1B transcripts (p = 0.048) (Figure 3) and relative expression, when adjusted for exon 1A transcripts (p = 1.9 × 10-7) (Figure 4). In the DLPFC, the risk allele was correlated with decreased absolute and relative expression of exon 1B transcripts (p = 1.42 × 10-7 and p =

6.0 × 10-8, respectively) (Figures 3 and 4, respectively).

In CACNA1C, we found that the risk allele (A) of rs1006737 was correlated with progressively decreased expression of all three classes of CACNA1C transcripts quantified (class WT p = 1.14*10-3; class B p = 2.06*10-3; class D p = 8.35 *10-3). In the

DLPFC, we observed a non-significant trend for the risk allele (A) towards increased expression for all three CACNA1C transcripts (Figure 5). Although the DLPFC shows an opposite change in direction compared to the STG, it is the same direction as observed by

Bigos et al. (2010) in a larger sample set of DLPFC tissues (Bigos et al., 2010).

In the remaining four genes investigated (NRG1, NRGN, PBRM1, and ZNF804A), we did not observe any significant correlation between expression and genotype at the local variants of interest.

17 2.4 Discussion

Identifying how genetic variants influence the risk for disease is a major goal in human genetics and is becoming increasingly important as GWAS and other types of genetic studies start to produce large amounts of statistical associations without conclusively identifying the underlying genes and mechanisms. In this study, we sought to identify eQTLs among some of the best-supported schizophrenia risk variants. We identified that genotype at four (rs789915, rs10748842, and rs60827755 in NRG3 and rs1006737 in CACNA1C) of the 14 variants examined have some correlation with local gene expression. Despite the lack of positive results for the remaining ten variants, we cannot rule out the possibility that they do indeed function as eQTLs. They may regulate more distal genes that were not quantified, be active in specific times of development that were not captured by our sample, or function in cell types that were not assayed.

Our data suggest the non-coding variant rs7899151, associated with the

“disorganized” factor quantitative trait has an affect alternative splicing in NRG3 in the

STG. The correlation was not statistically significant in the DLPFC, which may be a result of the smaller sample size, or a true biological difference between NRG3 regulation in the two different brain regions.

We conclude that the “delusion” factor-associated variant rs10748842 and its closely linked neighbor rs60827755 tag an eQTL that is involved in the relative expression of various NRG3 transcripts. This suggests a transcript-specific mechanism in which the risk variants affect the utilization of alternative NRG3 first exons. Further work investigating the regulatory potential of these two variants in dual luciferase reporter assays has been previously published by our research group and implicates rs60827755 as a functional regulatory variant, with weak evidence for rs10748842 (Zeledón et al., 18 2015). Additional information about differences in the protein products produced by the various classes of NRG3 transcripts would give insight to how changes in NRG3 expression may contribute to disease pathogenesis.

Our data support that the association between rs1006737 and schizophrenia is likely due to changes in the expression of the CACNA1C gene. It should be noted that although we quantified alternatively spliced classes of transcripts, the risk variant is located approximately 400kb upstream of the alternatively spliced exons. It is possible that the variant may show an effect on alternative elsewhere in the gene. In the STG, where our sample is larger, we the risk allele (A) is correlated with decreased CACNA1C expression, but the direction of effect trends in the opposite direction in the DLPFC.

These results suggest there may be differences between brain regions, which may reflect the importance of finely tuned regulation of CACNA1C in the brain.

The DLPFC results are of particular interest because Bigos et al. (2010) previously published results from the same region indicating the risk allele (A) is correlated with increased CACNA1C expression in the DLPFC (p=0.039) (Bigos et al.,

2010). There are some significant differences between their study design and the one described here, including a larger sample size (n=261), which includes donors of

Caucasian and African ancestry and quantifying expression by microarray, which does not discriminate between alternative transcripts. Nevertheless, this independent finding gives strength to our observation that rs1006737 is correlated with CACNA1C expression and may have diverse region-specific roles.

Consistent with the hypothesis that non-coding disease-associated variants contribute to disease pathogenesis through their function as eQTLs, we identified three

19 independent eQTLs and their target genes. Our strongest result was the intronic variant rs1006737, which showed a tissue-specific effect on CACNA1C expression. The goal of the next portion of this research project is to elucidate the molecular mechanism underlying the correlation between genotype and expression.

20 2.5 Chapter 2 Tables

Table 1. Schizophrenia risk variants chosen for the eQTL study and the most proximal gene.

Proximal Relevance to Relative Position Gene SNP Schizophrenia to Proximal Gene LD CACNA1C rs1006737 BD GWAS & Associated Third Intron r2=0.401 with with SZ rs4765913 rs4765913 BD GWAS Third Intron r2=0.401 with rs1006737 rs7972947 BD & SZ GWAS First Intron NRG1 rs7014762 Linked to SZ 1.6 kb Upstream rs35753505 Associated with SZ 23kb Upstream NRG3 rs1080293 Associated with SZ First Intron rs1339844 Associated with Scholastic 121kb Downstream Factor rs7899151 Associated with 686kb Upstream Disorganization Factor rs10886221 Associated with 389kb Downstream Hallucination Factor rs10748842 Associated with Delusion First Intron r2=0.920 with Factor rs60827755 rs60827755 Shows regulatory potential First Intron r2=0.920 with in vitro rs10748842 NRGN rs12807809 SZ GWAS & Associated 3.5kb Upstream with BD PBRM1 rs2251219 BD GWAS & Associated Exon 28 with SZ (Synonymous) ZNF804A rs1344706 BD & SZ GWAS Penultimate Intron

SZ represents schizophrenia; BP represents bipolar disorder; SNP represents single nucleotide polymorphism; LD represents linkage disequilibrium.

21 Table 2. Transcript-specific primers for qPCR of cDNA from post mortem brain samples.

Unique Exon Transcript Class Junction Primer Name Primer Sequence CACNA1C Class WT 32-33 CAC_t1_R GGGTATGTTCAGCTGGGTTT CAC_t1_L GGAATACATTTGACGCCTTGA CACNA1C Class B 32ext -33 CAC_t2_R TATGTTCAGCTGGCTCGG CAC_t2_L TGCATGGAATACATTTGACG CACNA1C Class D 31 -33 CAC_t3_R TTGGGTATGTTCAGCTGGATT CAC_t3_L TGATCCCTGGAATGTTTTTGA NRG1 Class I 4 -5 NRG1_t1_R GCATACCAGTGATGATCTCGTT NRG1_t1_L CACTGGCTGATTCTGGAGAG NRG1 Class II 4 -8 NRG3_t2_R2 TTTCGATCACAAAGCACTCG NRG3_t2_L2 TGTATGCGTTGGGAGAGGAG NRG1 Class III 7 -8 NRG1_t2,3_R ACTCCCCTCCATTCACACAG NRG1_t3_L CAAACTGCTCCTAAACTTTCTACA NRG1 Class IV E187 -3 NRG1_t4_R AACTGGTTTCACACCGAAGG NRG1_t4_L CGAGTTGGCACCACAGC NRG3 Class I 1A1 -6 NRG3_R GTTTCGATCACAAAGCACTCG NRG3_t1_L CAGCCCCAAATTTCATACG NRG3 Class II 1B1 -6 NRG3_R GTTTCGATCACAAAGCACTCG NRG3_t2_L GGAGAGGAGGGGGACTACAC *Class II primers also amplify Class IIII and IV transcripts NRG3 Class III 1B2-6 NRG3_R GTTTCGATCACAAAGCACTCG NRG3_t3_L AAAGAAAAGCCCAAGATACGA NRG3 Class IV 1B1 -4 NRG3_R GTTTCGATCACAAAGCACTCG NRG3_t4_L GCTGCTGACGAATTCATACAAA NRGN 1 -2 NRGN_R GAAAACTCGCCTGGATTTTG NRGN_L CAGCATGGACTGCTGCAC PBRM1 Class I 25 -28 PBRM_t1_R CCTTGGTTCATCACACCTTCA PBRM_t1,2,3_L TACTCTTTCGGGGAGCTCAG PBRM1 Class II 26 -28 *Class II primers never amplified a product PBRM1 Class III 25-27 PBRM_t3_R GCCACCCATCATGCCTTC PBRM_t1,2,3_L TACTCTTTCGGGGAGCTCAG PBRM1 Class IV 26 -27 PBRM_t4_R2 TAGCCACCCATCATGCAGC PBRM_t4_L2 GCTCGGGGAGAAGCACTC ZNF804A 2 -3 ZNF_R3 TTTCGAGCAAATTCCCTTTG ZNF_L3 GCACCAGGAGTTTGACAATCA MRIP 3 -4 MRIPqF AGATGCCCACGACCCTTC MRIPqR GCGTCAGAATACACAGGGAGA ACTB 3-4 ACTB-TRANS-1F CGAGAAGATGACCCAGATCA ACTB-TRANS-1R AGAGGCGTACAGGGATAGCA

22 Table 3. Primer sequences to genotype rs12807809 in NRGN.

Primer Name Primer sequence rs12807809_F ACCTCCTCACACTGTTCTCG rs12807809_R ATGGCTTAAGGGGTCCATTC

23 2.6 Chapter 2 Figures

Figure 1. Schematic representation of primers for RT-qPCR.

Primer design for the three classes of CACNA1C transcripts as named by (Tang et al.,

2004) (WT, B, and D). Thick grey boxes represent exons, which are numbered above.

Solid arrows represent primers and those designed across introns are connected by dashed lines; primers span the exon junction in order to uniquely amplify a single class of cDNA transcripts.

24 Transcript Class IV

p=0.009

3

n

o

i

2

s

s

e

G

r

p

T 1

x

S

E

2

0

g

o

L

1 - Non-Risk Het. Risk

p=n.s.

3

n

o

2

i

s

s

C

e

F

r

1

P

p

x

L

E

D

0

2

g

o

L

1 - Non-Risk Het. Risk

Figure 2. NRG3 transcript class IV expression by genotype of rs7899151.

On the y-axes are the log2 expression of NRG3 transcript class IV. Samples are grouped by genotype on the x-axes. Expression in the STG and DLPFC are shown in the top and bottom, respectively. P-values from the generalized linear regression are indicated above each plot.

25 Exon 1B Transcripts rs10748842 rs60827755

p=0.026 p=0.048

6

6

4

4

n

o

i

s

s

2

2

e

G

r

p

T

0

0

x

S

E

2

2

g

2

-

-

o

L

4

4

- - Non-Risk Het. Risk Non-Risk Het. Risk

p=1.9 × 10-7 p=1.42 × 10-7

6

6

4

4

n

o

i

s

s

C

2

2

e

F

r

P

p

x

L

0

0

E

D

2

g

2

2

-

o

-

L

4

4

- - Non-Risk Het. Risk Non-Risk Het. Risk

Figure 3. NRG3 exon 1B transcript expression by genotype of rs10748842 and rs60827755.

On the y-axes is the log2 expression of transcripts starting with exon 1B. Samples are grouped by genotype on the x-axes. Data from the STG are shown on the top row and data from the DLPFC are shown on the bottom row. The first column shows rs10748842 genotype information, while the second shows the highly correlated rs60827755 data.

26 Exon 1B Transcripts Rela ve to Exon 1A Transcripts rs10748842 rs60827755

p = 5.11 × 10-9 p = 1.9 × 10-7

2

2

n

o

i

s

0

0

s

e

G

r

p

T

x

S

E

2

2

-

-

2

g

o

L

4

4

- - Non-Risk Het. Risk Non-Risk Het. Risk

p = 4.7 × 10-8 p = 6.0 × 10-8

2

2

n

o

i

s

s

C

0

0

e

F

r

P

p

x

L

E

D

2

2

2

-

-

g

o

L

4

4

- - Non-Risk Het. Risk Non-Risk Het. Risk

Figure 4. NRG3 exon 1B transcript expression relative to class I expression by genotype at rs10748842 and rs60827755.

On the y-axes is the log2 expression of transcripts starting with exon 1B that have been adjusted for the quantity of transcripts starting with exon 1A. Samples are grouped by genotype on the x-axes. Data from the STG are shown on the top row and data from the

DLPFC are shown on the bottom row. The first column shows rs10748842 genotype information, while the second shows the highly correlated rs60827755 data.

27

Figure 5. CACNA1C expression by genotype of rs1006737.

On the y-axes are the log2 expression of CACNA1C transcripts. Samples are grouped by genotype on the x-axes. Expression in the STG and DLPFC are shown in the top and bottom rows, respectively. The first column shows data from class WT transcripts, the middle column is class B transcripts, and the right column is class D transcripts. P-values from the generalized linear regression are indicated above each plot.

28 Chapter 3: Enhancer Reporter Assays

3.1 Introduction

GWAS are one of the most efficient means of identifying new risk loci for complex diseases. As of the writing of this dissertation, at least 14,876 single nucleotide polymorphisms (SNPs) have been associated with complex disease (Welter et al., 2014).

Although only about 10% of known SNPs are genotyped for inclusion in a GWAS (Grant and Hakonarson, 2008), these representative SNPs tag LD blocks that contain other

SNPs, as well as other types of variants, including copy number variants (CNVs), genomic rearrangements, and retrotransposon insertions, that are in high LD

(Raychaudhuri, 2011). Therefore, reported disease-associated SNPs may not be the functional variants causing pathogenesis, but rather one or more of the other variants in

LD may be functional. Identification of the particular causal variant(s) driving a disease- association is essential to understanding the mechanism of disease and developing therapies (Raychaudhuri, 2011).

Most disease-associated SNPs are located in non-coding regions (Nicolae et al.,

2010). While it is relatively easy to predict the consequence of coding variants, it remains much more difficult to interpret non-coding variants. On a large scale, it is predicted that disease-associated non-coding variants are enriched for eQTLs through direct regulation of gene expression or through changes in epigenetic modifications (Dimas and

Dermitzakis, 2009; Grisanzio et al., 2012; McVicker et al., 2013; Nicolae et al., 2010;

Schaub et al., 2012). Additionally, new collaborative projects such as Encyclopedia Of

DNA Elements and the International Human Epigenome Project, can give us some hints

29 about how a particular non-coding variant may contribute risk for a disease (Zhang et al.,

2012).

In this work, we seek a highly detailed characterization of the variants in high LD with rs1006737, a SNP associated with both schizophrenia and bipolar disorder and correlated with altered CACNA1C expression in human post mortem brain samples by our group and others (Bigos et al., 2010). rs1006737 is located in intron 3, more than 100 kb from each flanking exon, and does not have any coding variants in significant linkage disequilibrium (LD).

We decided to evaluate the regulatory potential of rs1006737 plus an additional

15 SNPs in strong LD. Thirteen SNPs were correlated with rs1006737 at an r2 > 0.8

(rs7965923, rs769087, rs2159100, rs12315711, rs11062170, rs4765905, rs758170, rs10774035, rs10774036, rs10744560, rs12311439, rs1024582, rs4298967), while another two were included (rs34382810 and rs2370414) because of their physical proximity (<600 bp) to rs7965923 and rs10774035, respectively. Their r2 with rs1006737 was >0.7.

We measured regulatory potential through dual luciferase reporter assays in two different cell lines: SK-N-SH cells, which are derived from a human neuroblastoma metastasis and HEK293 cells, which despite being derived from kidney show a similar transcriptional profile to neurons (Shaw et al., 2002). Luciferase constructs were prepared for each allele of each variant, containing a ~1 kb region around the variant inserted upstream of the SV40 promoter and firefly luciferase gene. For each allele, four independently prepared constructs were transfected in the cell lines to minimize effects of differences in DNA purity between constructs.

30 3.2 Methods

3.2.1 Cell Culture

HEK293 and SK-N-SH cell lines were obtained from ATCC. HEK-293 cells were grown in Dulbecco's Modified Eagle Medium (DMEM, ThermoFisher) with 10% fetal bovine serum (FBS, Gemini Bioproducts). SK-N-SH cells were grown in DMEM with

10% FBS and 1x B27 supplement (ThermoFisher). Both cell lines were grown at 37° in

5% CO2.

For general cell culture work, reference the laboratory notes below:

Clean the hood with 70% EtOH before and after each use. Spray everything down with 70% EtOH before placing it into the hood. Always use bleach to rinse vacuum tubing before and after use. When vacuum collection flask is full, switch the black rubber stopper with tubing into new, empty flask. Immediately cover old flask with foil and bring it to be autoclaved. If vacuum is not working, but collection flask is not full, replace the filter. If Fireboy ™ runs out of gas, replace canister. New canisters are under sink in cell culture room. Turn off Fireboy ™ gas valve (black knob by canister attachment) and remove the old canister and discard in biohazard box. Pop off black cap from new canister and put new canister into the Fireboy ™. Turn the gas valve back on. If liquid gets into pipette aid, or there are problems with suction, replace filter by removing the black nose and inserting a new white 0.8um filter.

To thaw a stock of cells, remove cells from freezer or liquid nitrogen tank and immediately place in 37º water bath. As soon as cells are thawed, transfer them to 15 mL conical tube (thawing take ~5 mins for 1 mL). Add media to fill the tube (~13 mL), then spin for 4 minutes at 1.0k rpm. Aspirate the supernatant. Resuspend cells in ~5 mL

31 media, then fill the tube with media (~9 mL). Spin again for 4 min at 1.0k rpm. Aspirate the supernatant. Resuspend cells in ~5 mL media and transfer them to a 750mm flask and add 5 mL media. Store cells in the 37º incubator. Replace the media the next day to make sure all DMSO is gone.

SK-N-SH cells require fresh media every 3-4 days. HEK293 cells require fresh media once the media has started to turn orange/yellow. To re-feed cells, aspirate old media out and add 10 mL of new media (for 750mm flasks). Make sure to label dishes with the dates of each re-feeding.

When cells approach 90% confluence, they need to be split. To split cells, aspirate old media out of flasks. Add 2-3 mLs of PBS to the side of each flask, as to not remove the cells from the surface of the flask. Gently swirl the PBS over the surface of the flask, then aspirate the PBS. Add 1 mL trypsin to each flask and place on 37º plate warmer.

After about 1 minute, check dishes on microscope to see if all cells have detached from plate surface. Tap flask on side of the microscope platform or palm of the hand to loosen the cells from the surface. Continue to incubate with trypsin until cells are detached, but not more than 3 minutes. Inactivate trypsin reaction by adding 9 mL media to flask.

Pipette media and cells up and down to disrupt and cells that have become adhered to one another. Cap the flask, then shake back and forth to evenly distribute cells. Do not swirl because all cells will end up in the center of the flask. Return flasks to the 37º incubator.

When cells are no longer needed for experimental procedures, freeze down remaining cells to serve as a stock for the future. Aspirate the media and rinse with PBS.

Then trypsinize the cells. Add 20 mL dMEM to the cells and transfer to 50 mL conical tube. Count the cells (1 mL in 9 mLs of Z Pack buffer) using the automated cell counter

32 (Coulter). For each vial of frozen stock, 2-3 million cells are desired. Spin down the cells in the 50 mL conical tube for 4 minutes at 1.0k rpm. Aspirate the media, leaving the cell pellet. Make enough 10% DMSO media for 1 mL per vial of cells, plus an extra 1 mL.

Add the DMSO first because it’s viscous and pipet up and down to mix after adding media. Resuspend the cells in 10% DMSO media and put 1 mL to each vial. Store vials at

-80º for at least 24 hours, then they can be stores in liquid nitrogen.

3.2.2 Construct Synthesis

Primers were designed to amplify an approximately 1kb locus including a SNP of interest (with r2 >0.8 with rs1006737) from genomic DNA (Table 4). For each SNP, two constructs were made: one with the non-risk allele and one with the risk allele. This was achieved by amplifying genomic DNA extracted from the DLPFC brain samples that were homozygous for either allele, using a high fidelity Pfu Turbo DNA polymerase

(Agilent). In addition to the SNP-based constructs, an inert 1kb “spacer” construct was designed to control for size of insert in the vector (Grice et al., 2005). A nucleotide overhangs were added to the amplicons for TA cloning, by adding 0.5 µL 100mM dATP,

0.5 µL 50mM MgCl2, 1 µL 10x buffer, and 0.1 uL Taq polymerase (Invitrogen), and 0.9 uL water to 7 µL amplicon. Incubate at 72º for 15 minutes. We then TA cloned amplicons with A overhangs into the pCR8/GW/TOPO entry vector (Invitrogen) in competent Top10 OneShot E. coli (Invitrogen) according to manufacturer’s protocols.

Plasmids were isolated with the QIAprep Spin Miniprep kit (Qiagen) and inserts were submitted for Sanger sequencing at the Synthesis and Sequencing Core at Johns Hopkins

University using RV3 and GL2 primers, which flank the insertion site, to determine the direction of the insert and verify the sequence.

33 Sequence-verified amplicons in the forward orientation were then recombined with Gateway LR Clonase II (Invitrogen), into a modified pGL3-promoter luciferase promoter vector (Promega), with the Gateway cassette inserted at the SmaI cut site at position 28 (Grice et al., 2005). They were again transformed in Top10 OneShot E. coli and plasmids were isolated with the QIAprep Spin Miniprep kit. Plasmids were digested with NotI (NEB), which cuts the destination vector once, and run on agarose gels to test their size and integrity. Inserts in plasmids of the correct size submitted for Sanger sequencing using the RV3 and PGL primers, which flank the insertion site, for sequence verification.

At least four independent clones of each construct were isolated to serve as biological replicates. For a given experiment, all replicates of all constructs to be transfected were diluted to 100 ng/uL ± 5 ng/uL. DNA concentrations were measured using the Take5 plate on the Synergy2 plate reader (BioTek) in a single session the day before transfection. To minimize evaporation, the constructs were stored in capped strips of PCR tubes at -20° overnight and thawed immediately before transfection.

3.2.3 Reporter Assays

Approximately 75,000 HEK293 cells or SK-N-SH cells, as counted by the automated cell counter (Coulter), were seeded in 24 well plates 24 hours before transfection. The corner well of the plates were not seeded, as we had noticed significant evaporation of media from those wells.

A master mix of 53 µL Opti-MEM reduced serum media (ThermoFisher) and 2

µL Lipofectamine 2000 (Invitrogen) per transfection plus 10% extra for pipetting errors was made. A separate master mix of 50.75 µL Opti-MEM and 0.25 µL of 100 ng/uL

34 (0.04 µg) Renilla transfection control (Promega) per transfection plus 10% extra was also made. For each transfection, 4 µL of 100 ng/uL (0.4 µg) of the experimental construct was pipetted into a microcentrifuge tube. Then 51 µL of the Opti-MEM and Renilla master mix was added, followed by 55 µL of the Opti-MEM and Lipofectamine master mix. These incubated at room temperature for 20 minutes and then 100 µL of each reaction was pipetted slowly in a spiral pattern into the appropriate well of cells.

After 4 hours, Opti-MEM media was replaced with the standard media for the cell type. 24 hours after transfection, cells were rinsed with 1 mL of PBS and then lysed with

150 µL passive lysis buffer (Promega) and placed on the orbital shaker for 20 minutes.

Cell lysate was pipetted into a well of a 96-well plate and briefly spun down.

Approximately 1 hour before lysis, the LARII, Stop&Glo, and passive lysis buffer reagents (Promega) are all prepared according to the manufacturer’s protocol. During the

20 minute lysis incubation, the Synergy2 plate reader (Biotek) is prepped for the luciferase reading. First, the injector tubing was washed with 5 mL of water, then 5 mL of 70% EtOH, then 5 mL of air to dry. Then, the injectors were primed with 2 mL of reagents. LARII went into injector 1 and Stop&Glo went into injector 2.

To determine the appropriate dilution of the cell lysates to use for the luciferase quantification, 1x, 1:10, 1:50, and 1:100 dilutions of two of the spacer control lysates were assayed in advance. Desired luminescence readings fall in the 1,000-100,000 RLU range. All of the lysates were then diluted accordingly and 25 µL of the diluted lysate was put into an opaque, white, round bottom 96-well plate to be assayed for luciferase activity with the Dual-Luciferase Reporter assay System (Promega) on the Synergy2 plate reader (BioTek). Using the automated injectors, 50 µL of LARII (Promega) was

35 added to 25 µL cell lysate and the experimental luciferase reading was taken after a 3 second low speed shake and a 2 second pause. Then, 50 µL Stop&Glo (Promega) was added to the sample, and the Renilla luciferase output was measured after a 3 second low speed shake and a 2 second pause.

After completion of the assay, the remaining LARII and Stop&Glo reagents may be purged from the automated injectors into their respective containers and stored at -80º.

The injectors must then be cleaned with 5 mL of water, then 5 mL of 70% EtOH, then 5 mL of air to dry.

3.2.4 Analysis

For each transfection, the ratio of experimental firefly/Renilla luciferase was calculated, and then measurements were normalized to the average of 4 biological replicates of the spacer construct. The standard error of 4 biological replicates was calculated for each allele of each SNP tested. Student t-tests were conducted to determine if the two alleles (risk and non-risk) of a given SNP showed significant differences in driving expression of the firefly luciferase reporter gene.

3.3 Results

In total, we tested 16 SNPs in a non-coding LD block for allele-specific enhancer activity in dual luciferase reporter assays. In addition to the schizophrenia-associated

SNP rs1006737, 13 additional SNPs were correlated with rs1006737 at an r2 > 0.8, while another two were included (rs34382810 and rs2370414) because of their physical proximity (<600 bp) to the 14 SNPs in LD. Their r2 with rs1006737 was >0.7. In four instances, two SNPs were located within 1 kb of each other, so they were assayed

36 together in the same construct. Due to the proximity and high LD, we tested together only the two risk alleles or the two non-risk alleles of each SNP.

In HEK293 cells, we observed potentially positive results for SNPs rs12315711, rs4765905, rs12311439, and rs1024582 (Figure 6). We decided to further scrutinize these results by repeating the assay several times for these four variants, plus one variant that did not show an allele-specific difference, rs2159100. However, the SNPs that originally gave weaker results (rs12315711, rs12311439, and rs1024582) did not replicate in the follow-up assays. And rs4765905 often showed significant differences between alleles, but the direction was not consistent (Figure 7).

In the SK-N-SH cell line, the risk allele (C) of rs4765905 consistently showed significantly reduced expression of the luciferase reporter gene (Figures 6 and 8).

From these cell-based assays, we see that the two alleles of rs4765905 show evidence of functional differences, which may also exist in vivo. Such differences may drive the genotype-dependent differences we see in CACNA1C expression in the post mortem brain samples.

3.4 Discussion

Our effort to identify specific variant sequences driving the association with

CACNA1C transcript levels and disease risk produced interesting new pieces of information. First, we identify one of the 16 tested variant sequences, the sequence including rs4765905, showing consistent allele-specific effects on driving reporter gene expression in the SK-N-SH cell line. There are undoubtedly many reasons for false negatives in reporter assays, so we do not consider this an exclusion of the remaining

37 sequences, but rather a reason to focus attention to rs4765905. It is quite possible that other sequences, in cis or trans, variant or not, may participate in the regulation of

CACNA1C, perhaps even interacting with rs4765905 in the genomic context of the gene, but are not sufficient to drive gene expression alone, transfected into the cell in a reporter gene construct.

Of note, previous work by (Roussos et al., 2014) found another SNP, rs2159100, to show allele-specific activity in reporter assays in both HEK293 and Neuro2A cells. We tested this SNP in HEK293 cells in five independent experiments but did not see the same effect. Note, however, that in our experiments the constructs contained the SV40 promoter, rather than a minimal promoter as in (Roussos et al., 2014), and each of our tests involved transfecting four independent DNA preparations of each construct. The latter safeguards against false positives, especially since the DNA prep efficiency can systematically influence the results of replicates, keeping them from being true biological replicates. Either or both of these differences might explain the disagreement.

While rs4765905 showed significant allelic effects with consistent direction in

SK-N-SH cells, this was not the case for HEK293 cells. Interestingly, while often showing a significant effect, the direction was not consistent. This result might once more reflect complex regulation of CACNA1C, similar to the differences we observed between

STG and DLPFC. This might also explain discrepancies with previously published data.

It is possible that unknown subtle variables affect the regulatory activity of these sequences, which in addition to the effect of the sequence variants, makes it difficult to consistently capture the differences experimentally. Nevertheless, for at least one cell

38 line, this SNP showed a consistent and significant effect over three independent experiments.

39 3.5 Chapter 3 Tables

Table 4. Primer sequences for dual luciferase reporter constructs.

Primer Name Primer Sequence CACf ATCTTACTGACGCGTTATGGGGTCCTGCTTTGTTC CACr2 ATCTTACTGAGATCTAAGAACATGGGGTTTGGAGA rs1006737_F ATGTCAGTGGTCGACGGAGCAACCCAAGGCTAAT rs1006737_R GTAATGCTGGATCCGCTTGCTGGACCTGAGATTC rs1024582_F3 ATGTCAGTGGTCGACGAATCAGACAGCCCACGTTT rs1024582_R3 GCTAATGCTGGATCCTGGACAAGCCTGCTCTAGGT rs10774035_F2 ATGTCAGTGGTCGACCTGAACCAGCTGCATCAAAA rs10774035_R TACGATGCTGGATCCAAGCGGGACTTAGCTCACAA rs10774036_F TGACATGAAGTCGACGGGCTGATAGGGAAGAGGAC rs10774036_R ATAGCTAATGGATCCGCTGGGAGAAAACATCTGGA rs11062170_F AGCAGCTACGTCGACGCCAGGGATTTTCATTTTGA rs11062170_R TGACATGAAGGATCCGCGATCACTGTTGATTGATACA rs12315711_F ATGTCAGTGGTCGACTTGGATGACATCAGCGAAGA rs12315711_R GTACATGCTGGATCCCCCAAACCTGATTCTGTCTG rs2159100_F ATGTCAGTGGTCGACGGCAACTGCATCCTTTGATT rs2159100_R GTAATGCTCGGATCCCTCAAAACAGGCCAACCATT rs4298967_F2 ATGTCAGTGGTCGACGCACCACCACACTCAGCTAAT rs4298967_R2 GTAATGCTGGATCCGTGAAGCAAAGGAGGAGGTG rs4765905_F ATGTCAGTGGTCGACTGCTGCTGAAGAATCAATGC rs4765905_R GTACATGCTGGATCCCAAGGAAAATGGCCAAAAGA rs758170_F ATCTTACTGACGCGTTCAATGGCTGGTCAGAGCTA rs758170_R2 CTCTTGCTAAGATCTCACTGTCCTACAACCCAAACG rs7965923_F ATGTCAGTGGTCGACAAGCCATTCCGTTCTGAAAA rs7965923_R GTACATGCTGGATCCCACATTGTGAAGGGAACATCA

The primer names indicate which variant is amplified, with the following exceptions: primers CACf and CACr2 amplify the variant rs758170, the primers for rs7965923 also amplify rs34382810, the primers for rs10774036 also amplify rs10744560, the primers for rs10774035 also amplify rs2370414, and the primers for rs1006737 also amplify rs769087.

40 3.6 Chapter 3 Figures

Figure 6. Dual luciferase reporter assay results.

Relative firefly luciferase activity for constructs transfected in HEK293 cells shown on the top, and SK-N-SH cells shown on the bottom. Firefly/Renilla ratio is normalized to

1.0 for the control construct. The SNP name(s) within each construct is listed below the bars. Each bar represent the average of four independent construct preparations and error bars represent standard error. Red bars correspond to risk alleles and blue bars to the alternative alleles. T-test based p-values are shown above the pair only when there is a significant difference between alleles.

41 2 0.019 4.8 E-4 0.022 1.5 1 2.1 E-3 0.5 0

4

2

a

l l

i 0

n

e

R

/

1.5

y

l f

e 1

r

i

F

d 0.5

e

z i

l 0

a

m r

o 4 N

2 0.015

0

1.5 0.017 1.4E-4 1 0.030 0.5 0

l 0 1 5 9 2 o 0 1 0 3 8 tr 1 7 9 4 5 n 9 5 5 1 4 o 5 1 6 1 2 C 1 3 7 3 0 s2 2 s4 2 s1 r s1 r s1 r r r Figure 7. Dual luciferase replication in HEK293 cells.

Replications of a subset of DLR constructs transfected in HEK293 cells. Data from

Figure 6 is shown again in the top panel. Relative firefly luciferase activity is shown as an average of four constructs for each allele. Error bars represent standard error. Non-risk allele is shown in blue, risk allele is in red. Significant differences between the two alleles of a construct are indicated as p-values above the pair.

42

Figure 8. Dual luciferase replication results in SK-N-SH cells.

Replications of a subset of DLR constructs transfected in SK-N-SH cells. The top panel shows the same data from Figure 6. Relative firefly luciferase activity is shown as an average of four constructs for each allele. Error bars represent standard error. Non-risk allele is shown in blue, risk allele is in red. Significant differences between the two alleles of a construct are indicated as p-values above the pair.

43 Chapter 4: Protein Binding Assays

4.1 Introduction

Regulatory elements, such as enhancers, act through interactions with protein complexes made up of transcription factors, activators, co-activators, and other proteins.

These proteins work together to remodel chromatin, create nucleosome free regions, recruit more transcription factors, and interact with the promoter to drive gene expression

(Vernimmen and Bickmore, 2015). Enhancers are critical for transcription, as deletion of an enhancer element can reduce the binding of general transcription factors and RNA polymerase II at the promoter (Spicuglia et al., 2002; Zhao et al., 2007). Additionally, there is evidence to suggest that and a given enhancer and promoter pair must be capable of binding to the same protein complexes in order to work together to drive gene expression (van Arensbergen et al., 2014). The compatibility may be disrupted by genetic variants that change the binding affinity of activator proteins, which bind DNA in a sequence-specific manner. Although activators are not sufficient to drive transcription, they recruit co-activators to form the functional regulatory DNA-protein complex that ultimately forms the bridge between the enhancer and promoter (Vernimmen and

Bickmore, 2015). Therefore, it is possible that schizophrenia-associated variants in eQTLs function by altering protein binding affinity and disrupting the formation of the required regulatory protein complex.

We are especially interested in CACNA1C regulation because calcium signaling plays an important role in neuron biology. CACNA1C is one of four L-type alpha1 subunits that are primarily responsible for signaling to the nucleus (Helton et al., 2005;

Ma et al., 2012). In addition to CACNA1C, several other calcium channel subunits have

44 been associated with schizophrenia and other psychiatric diseases (Curtis et al., 2011;

Pietrobon, 2010; Schmunk and Gargus, 2013; Sklar et al., 2011; 2013). Calcium signaling modulates many important neuronal functions which when disturbed may contribute to pathogenesis, including neurotransmitter release, dendrite growth, transcriptional activation, and neuron plasticity (Hofmann et al., 2014; Khosravani and

Zamponi, 2006). We hope that by understanding the regulation of CACNA1C expression, we may be able to identify new therapeutic targets that may restore any disruption to calcium signaling in the disease state.

Our data support the hypothesis that rs1006737 tags an eQTL regulating

CACNA1C expression. We have shown that the genotype for this schizophrenia-risk variant is correlated with CACNA1C mRNA expression in post mortem brain samples and that a variant in high LD, rs4765905, has allele-specific enhancer activity in vitro.

However, the molecular mechanism by which the eQTL functions still remains unknown, so we investigated protein binding to further characterize allele-specific regulatory effects on CACNA1C.

While rs4765905 is our lead functional candidate from the dual luciferase reporter assay, we chose to continue investigating all 16 variants tagged by the schizophrenia- associated variant, rs1006737. A negative result from the dual luciferase reporter assays does not preclude a variant from being functional, as it may be active under certain conditions that were not assayed, such as different developmental times, cell types, or in response to cellular stressors. Additionally, there is literature that suggests multiple variants in LD work together in a combinatorial manner to regulate gene expression

45 (Corradin et al., 2014a). These multi-variant interactions would not have been captured in out dual luciferase reporter assay design.

To determine if any transcription factors, activators, or other regulatory proteins are capable of binding the SNP context sequences and if they do so in an allele-specific manner, we conducted electrophoretic mobility shift assays (EMSAs) and protein microarrays. In an EMSA, a radiolabeled DNA probe is incubated with and without proteins from the nuclear extract of a cell line, and then the probes are run on an electrophoretic gel. When a protein or protein complex binds a probe, there is an increase in molecular weight of the new complex compared to the probe alone. EMSAs have the advantage of using whole nuclear protein extract, which allow a test DNA sequence to interact with the native conformation of proteins as well as any protein complexes formed in the cell. However, the identities of any proteins that bind the DNA probe remain unknown. Therefore, we also utilized a protein microarray, where proteins are immobilized on a glass slide, and a fluorescently-labeled DNA probe is hybridized to the microarray an will produce a fluorescent signal wherever it binds. The protein microarray does not test DNA-protein interactions in a native setting, but it allows us to generate a candidate list of specific proteins that are capable of interacting with the variant sequences.

4.2 Methods

4.2.1 Protein Extraction

HEK293 and SK-N-SH cells were cultured according to the methods in Chapter 3

(section 3.2.1 Cell Culture). Approximately 4-5 million cells were used for the protein

46 extraction with the NE-PER Nuclear and Cytosolic Extraction reagents (ThermoFisher

Scientific) according to manufacturer’s instructions. A former graduate student, David

Gorkin, validated this kit by running Western blots with antibodies against nuclear and cytosolic specific proteins. Both nuclear and cytosolic proteins were saved and quantified by BSA. 100 µL of protein standards were placed into tubes, plus one tube of H2O. A

1:10 dilution of protein extracts was made and 100 µL was put into tubes. Then, 1 mL of

50:1 A:B solution was added to each tube, agitated (Vortex), and stored at 37°C for 30 minutes. Then the protein aliquots were cooled at room temperature for 10 minutes and concentration was measured on the Nanodrop (Thermo Scientific). Extracts were diluted to 1 mg/mL, separated into 32 µL aliquots, and stores at -80°C. To avoid sample degradation from freeze/thaw cycles, each aliquot was never re-frozen. Any remaining protein was discarded.

4.2.2 Radiolabeled Oligo Preparation

A genomic region of 21nt, centered on the SNP of interest, was selected for the probe design (Table 5). For each SNP, 4 oligos were created: forward and reverse strands for both the non-risk and risk alleles. Oligos were ordered from IDT. To anneal the single stranded oligos of each allele, 100uM of the forward and 100uM of the reverse strand were incubated together in a single PCR tube at 95° for 15 minutes, then were allowed to cool slowly in the thermocycler for 1 hour. Then 10 µL of the new double stranded oligo was treated with 1.2 uL shrimp alkaline phosphatase (Affymetrix) plus 2.0 µL 10x SAP buffer and 6.8 µL DEPC treated water. This was incubated at 37° for 1 hour, 95° for 15 minutes, and allowed to cool slowly in the thermocycler for 1 hour.

47 SAP-treated oligos were labeled with gamma ATP-32. 2.0 µL of the SAP-treated oligo was incubated with 1.0 uL T4 Polynucleotide Kinase (New England Biolabs), 1.4

µL 10x PNK Buffer, 1.0 uL gamma ATP-32, and 8.6 uL DEPC treated water and incubated at 37° for 1 hour. Probes were purified by adding 3 µL of QuickClean enzyme removal resin (Clontech), pipetting up and down, and transferring to a Spin-X 0.22um cellulose column (Costar). Columns were spun for 15 minutes in the centrifuge in hot room. To the flow through, 3 µL yeast tRNA (Ambion), 6 µL 5M NH4Ac, and 24 µL cold 100% EtOH were added and the solution was stored at -80° overnight. The next day, the solution was spun for 15 minutes in centrifuge in hot room and the supernatant was removed. 1 mL of 70% EtOH was added and removed to rinse the pellet. After air drying for 10-15 minutes, the pellet was resuspended in 25 µL DEPC treated water.

Radiolabeled probes were counted by adding 2 µL of probe to 10 mL of scintillation fluid and measuring them in the scintillation counter (Beckman Coulter) with user program 8, which measures P32 counts per million (CPM) in 2 minutes per sample.

Probes were diluted to 4,000 CPM in DEPC treated water and recounted to ensure each allele of a given SNP had counts within 10% error of each other.

4.2.3 Electrophoretic Mobility Shift Assay (EMSA)

The EMSA buffer was made in house. For 15 mL of 2x buffer,

3.6 mL of 50% Glycerol, 360 µL of 1M HEPES (pH = 7.9), 120 µL of1M Tris-HCl (pH

= 8.3), 60 µL of 0.5M EDTA (pH = 8.0), 150 µL of 100mM DTT, 10.7 mL DEPC water, and add small amount of bromophenol blue for color (loading dye) was mixed. The buffer was be stored at 4° and stayed good for several months.

48 For a typical experiment, each probe was run in a well without any cell extract, the cell extract was also run in a well without any probe, and one well contained only buffer, BSA, poly dIdC, and water as a negative control. For competition experiments, the negative control, probe only, and extract only wells were omitted in order to have more wells with different ratios of unlabeled:labeled oligos.

EMSAs were run on precast Mini-PROTEAN 5% TBE precast gels (BioRad).

Each well received one experimental or control sample, and samples were prepared in separate tubes. Each sample had 10 µL of 2x EMSA buffer, 0.3 µL of 10 mg/mL BSA

(New England Biolabs) and 1 µg poly dIdC (Roche). Depending on the conditions for a particular sample, 5.0 µL of 1 mg/mL protein extract, 1.0 µL of 4,000 CPM/uL radiolabeled DNA probe, and/or unlabeled DNA competitor could be added. Then DEPC treated water was added to bring the total volume per sample to 20 µL. DEPC treated water, 2x Buffer, BSA, poly dIdC, then cell extract, and unlabeled competitor were added in that order as needed. Before adding the radiolabeled probe, the sample was incubated on ice for 10 minutes. After the probe was added, the samples were incubated on ice for an additional 20 minutes.

While the samples were incubating, the Mini-PROTEAN 5% TBE precast gels were removed from their packaging, the paper strip from bottom of gel was removed, and the gel was put into the electrophoresis chamber so that the side with a shorter plastic edge faces the middle of the chamber. If only one gel was being run, a dummy plastic gel cassette was placed in the other side of the chamber. The middle of the chamber, between the two gels, was filled with 1x TBE (BioRad) and the outer portion of the chamber was filled to about 1/3 full. The plastic combs were removed from gels and each gel was

49 washed by pipetting 200 uL of TBE into the well. Then the gels were pre-run, before adding samples, at 150V for 15 minutes.

To ensure even loading, 19 uL (out of 20 uL prepared) of each sample into the appropriate wells. Pipetting was done slowly, with the tip aimed toward side of well. The gel was run at 150 V for 20-30 minutes, until the blue loading dye appeared ~2/3 of the way down the gel. While the gel was running, two pieces of cellophane (BioRad) were soaked in a separate plastic bin filled with DI water to prepare for drying the gel. After the gel finished running, it was removed from the chamber and placed into plastic bowl with some 1x TBE. The GelAir Dryer system (BioRad) was used. The plastic square frame was placed on top of the raised square assembly table and sprinkled with DI water.

One sheet of pre-soaked cellophane was placed onto the assembly table and frame.

Cellophane was smoothed to ensure that there were no air bubbles between cellophane and assembly table and water was sprinkled onto cellophane so it did not dry out. The gel was removed from the plastic cassette with the provided green lever tool. Once the gel was only attached to one side of the cassette, a cut a horizontal notch was cut into the gel with a razor blade to demarcate well #1. The edges of the gel were also removed with the razor blade. Then, the gel was placed onto the sheet of cellophane and removed from the second side of the cassette. This was facilitated by first splashing the gel and cassette up and down in the plastic bowl with 1x TBE until one of the edges of the gel became detached from the cassette. Then the gel was firmly pressed down on cellophane and the cassette was pulled away starting at the detached edge. Then, the second piece of cellophane was placed down on top of first piece and the gel. Again the cellophane was slowly lowered down onto the assembly table so no air bubbles formed between the

50 layers of cellophane. Any bubbles were pushed out toward the edges. Finally, the metal frame was placed down on top of plastic frame and clipped together with two green clips per side. The frame was then placed into the GelAir Dryer with no heat for ~3 hours.

When the gel was completely dry (when cellophane is taut and the gel is so thin that its thickness can hardly be differentiated from just the 2 pieces of cellophane), it was cut out from frame with ~0.5 inches of cellophane border on each edge of gel. The gel was then taped onto a piece of cardboard that fits into the lead-lined cassettes. Using the

Glow Writer marker, the tape was marked to label the gel and demark well #1. The gel taped to cardboard was then placed into a lead-lined cassette with amplifier and BioMax

MR film (VWR) in the dark room. The film was exposed in the lead-lined cassette at -80° for several hours to several days, depending on intensity of signal.

To develop the film, the lead-lined cassette was taken back into dark room. The button on the side of the X-OMAT 2000 developer (Kodak), next to the loading tray must be pressed before the film is fed into the machine. After the film is removed from the cassette, it was air dried for 30 seconds before it was loaded onto the developer tray. The film was then re-exposed, if necessary.

4.2.4 Protein Microarray Manufacturing

The protein microarray was developed by our colleague Dr. Heng Zhu and manufacturing was conducted by Dr. Qifeng Song of the Zhu laboratory. Protein microarrays were manufactured from human proteins purified from yeast GST fusion as described in Hu et al. (2013) (Hu et al., 2011). The protein microarray included 4,215 nuclear proteins and transcription factors printed in duplicate on a glass slide.

51 Microarrays were kept in -80 degree freezer and each microarray was stamped with a bar code for identification.

4.2.5 Fluorescently Labeled Oligo Preparation

Probes were designed based on the same 21nt sequence centered on SNP of interest as for the EMSA (section 4.2.2 Radiolabeled Oligo Preparation) plus a modified

T7B reverse primer sequence (5' CCCTATAGTGAGTgcTATTA 3') was added to the 3’ end of the sequence (Table 6). This design allowed us to make many double stranded probes with the same Cy5 labeled T7B forward primer. The SNP-based and T7B forward oligos were ordered from IDT. Two T7B forward oligos were ordered, one with a Cy5 label to produce fluorescent probes, and one without a label to make unlabeled competitors.

To make the double stranded Cy5 labeled probes, 14.2 uL of water, 2.0 uL of 10x

PCR Buffer, 0.8 µL of 10mM dNTPs, 0.8 µL of 50mM MgCl2, 0.2 µL of Taq, 1.0 µL of

100uM Cy5 labeled T7B forward oligo, and 1.0 µL of 100uM SNP-based oligo were mixed. An excess of unlabeled competitor was made by mixing 284 uL of water, 40 uL of 10x PCR Buffer, 16 µL of 10mM dNTPs, 16 µL of 50mM MgCl2, 4 µL of Taq, 20.0

µL of 100uM unlabeled T7B forward oligo, and 20.0 µL of 100uM SNP-based oligo.

Both labeled and unlabeled oligos were incubated at 55° for 5 minutes, then 72° for 10 minutes for 5 cycles.

To purify the double stranded oligos, 1/10 volume of 3M sodium acetate, pH 5.2 was added and mixed well. Then, 2 volumes of 100% cold ethanol was added and mixed well. The solution was stored at -20° overnight. The next day, the solution was spun at

13k rpm for 10-15 minutes and the supernatant was decanted. The tube was filled

52 with 70% ethanol, spun briefly, and the supernatant was again decanted. The pellet was dried in the SpeedVac (Savant) and resuspended in water (20 µL for labeled and 400 µL for unlabeled oligos).

The quality of the labeled and unlabeled oligos was evaluated by running a

1:10 dilution of the double stranded oligos next to the single stranded oligos on a 2%

Agarose 1% Nusieve gel with low molecular weight ladder for at 100V for ~30 minutes.

The Cy5 labeled oligos should be slightly larger than the unlabeled double stranded oligos, which are slightly larger than the single stranded oligos.

4.2.6 Protein Microarray

Four protein microarray hybridizations were performed for each SNP of interest, two technical replicates for each allele. The Cy5 labeled allele was hybridized in 1:10 ratio with the unlabeled reciprocal allele as a competitor to ensure sequence specificity.

The 2x base buffer was made in house and composed of: 50mM HEPES pH 8.0, 100mM

L-glutamic Acid Potassium Salt Monohydrate, 0.2% Triton-X, 16mM Magnesium

Acetate Tetrahydrate, and 20% glycerol.

First the microarray was blocked with the unlabeled competitor to quench any non-specific DNA-binding proteins. For two technical replicates, 1600 µL of blocking solution was prepared: 800 µL of 2x Base Buffer, 4.8 µL of 1.0M DTT, 400nM of the unlabeled competitor, and water to a final volume of 1600 µL. Each protein microarray slide was blocked with 750 µL blocking solution using the microarray hybridization chamber (Agilent) according to the manufacturer’s recommendations for 4 hours at 4° in the dark with agitation (Nutator). After the 4 hour incubation, the microarray slide was placed in a humidified chamber and the gasket slide was removed from the microarray

53 hybridization chamber, spun for 2 minutes at 2,000 rpm in a 50 mL conical tube, and cleaned with 70% EtOH before hybridization.

The hybridization solution was prepared: 800 µL of 2x Base Buffer, 4.8 µL of

1.0M DTT, 400nM of the unlabeled competitor, 40nM Cy5 labeled probe, and water to a final volume of 1600 µL. Note, the unlabeled competitor and Cy5 labeled probe are reciprocal alleles of the same SNP. Each microarray slide was hybridized with 750 µL hybridization solution using the microarray hybridization chamber overnight at 4° in the dark with agitation (Nutator).

After the overnight hybridization, the microarray slide was washed with 4 mL 1x

Base buffer in a 4-well Nunc rectangular dish (ThermoFisher) for 2 minutes at 4° in the dark on an orbital shaker. Excess wash buffer was dabbed off the edges of the mircroarray, and it was spun in a micro slide box at 4° for 2 minutes at 2,000 rpm.

Generally the microarray slides were imaged immediately after washing, but in some cases, they had to be stored at -20° until they could be imaged. The microarray slides were imaged with the GenePix4000B scanner and GenePixPro software at 635nm,

PMT gain of 650, power 100%, and pixel size of 10 um.

To clean the re-usable materials between microarray experiments, the gasket slides from the microarray hybridization chamber were put into 50 mL conical tubes with

1 mL 5M sodium hydroxide and 49 mL DI water and shaken for 20 minutes on an orbital shaker. This was repeated 3 more times. Then, the gasket slides were spun dry in empty

50 mL conical tubes at 2000 rpm for 5 minutes. The metal components of the microarray hybridization chamber were separated and put into an empty p1000 tip box. The tip box was filled with DI water and shaken on an orbital shaker for 20 minutes. Then they were

54 rinse and allowed to air dry. The Nunc dish from wash step was cleaned by filling all 4 wells with DI water and shaking it on an orbital shaker for 20 minutes. It was then rinsed with fresh DI water and allowed to air dry.

4.2.7 Protein Microarray Analysis

To analyze the protein microarray, first a settings file, which mapped the locations of proteins on the microarray, had to be manually aligned to each microarray image in the

GenePixPro software. The brightness and contrast of the whole microarray slide were readjusted to optimize visibility of signals for analysis of each 16x16 grid on the microarray slide. The size of the each protein spot in the settings file was adjusted to cover as much of the signal observed in the image as possible while minimizing inclusion of background. Because the proteins are printed in duplicate in spots right next to each other on the microarray slide, a positive signal looked like 2 bright red circles next to each other. Spots with positive signals were marked as such. If a spot on the microarray looked like 2 vertical rectangles, rather than a single circle, it indicated that the spot was printed poorly and should not be marked positive. If the spot appeared at the same intensity red as the background, or if it appeared black, it was not a positive signal. If a scratch or other artifact created an intense red signal overlapping a protein spot, the spot was marked as bad in order to prevent false positives.

After the setting files was aligned to each microarray image, the GenePixPro software was utilized to calculate foreground and localized background signal intensities for each protein spot on the microarray. The foreground signal was calculated based on the manually adjusted spot size from the settings file and the background signal was calculated based on a concentric ring around the spot of the same area.

55 To quantify positive signals, a Z-score was calculated for the signal intensity at each spot on the microarray and these Z-scores are visualized in a density plot. Since each protein was printed in duplicate on the microarray, and two technical replicates were performed for each allele, a DNA-protein interaction was only considered positive if the average Z-score of all 4 spots was >4, virtually eliminating false positives. Then, the lists of proteins that bind each allele of a particular SNP were categorized as allele-specific, preferring one allele, or binding both alleles based on in Z-scores. A protein was considered to bind both alleles of a SNP if the average Z-score for both alleles was > 4 and the difference between the allele Z-scores is < 2. A protein is considered to bind both alleles of a SNP, but show preference for one allele if the average Z-score is > 4 for both alleles, but the difference between allele Z-scores is > 2. Otherwise, if only one allele exceeds the threshold of 4 the binding is considered allele specific.

4.3 Results

To further characterize allele-specific regulatory effects, we assayed nuclear protein interactions for both the non-risk and risk alleles of all 16 SNP-containing sequences by EMSAs using nuclear extracts from HEK293 and SK-N-SH cell lines. We found that all 32 sequences could bind nuclear proteins in vitro and that 13 of the 16

SNPs show allele-specific differences in protein binding affinity (Figures 9 and 10). For those 13 SNPs exhibiting differential protein binding, we reproduced the observation of protein binding and confirmed allele specificity by competing out each radiolabeled allele with the reciprocal allele non-labeled (Figure 11 and Table 7).

56 The variant rs4765905, for which we found consistent differential enhancer effects in the dual luciferase reporter assay in SK-N-SH cells, showed two shift bands presumably corresponding to two different protein binding partners. The risk allele (C) showed a stronger signal for the upper band, or larger protein, and a weaker signal for the lower band, as compared to the non-risk allele (G) (Figure 9). After quantification using

ImageJ (Abramoff, MD et al., 2004), the ratio of top to bottom band intensity was 6.70 for the risk allele and 0.91 for the non-risk allele in HEK293 cells and 4.47 vs. 0.75 in

SK-N-SH nuclear extracts.

To identify the specific proteins capable of binding the sequences that contained these SNPs, we utilized a microarray containing 4,215 transcription factor and nuclear proteins (Hu et al., 2013). All assayed sequences bound to one or more proteins and some proteins showed allele-dependent capacity to bind (Table 8).

Interestingly, we observed that some proteins bound multiple of the sequences. To exclude the possibility that these proteins bind DNA promiscuously or were artifacts of our microarray, we applied two filters. The first filter, addressing promiscuously binding proteins, was based on data we have previously published where we tested 460 DNA motifs on an earlier version of the microarray containing 1,017 proteins (Hu et al., 2009;

Xie et al., 2010). The second, addressing artifacts of the current microarray, was based on three negative control sequence motifs selected from those in the published experiment that did not bind our candidate proteins. After removing proteins binding more than six of the 460 DNA motifs (>1.3%) or any of the three negative controls, we found that five proteins binding between three and seven SNPs remained: PKNOX2,

PRNP, EIF1AD, GADD45A, ZKSCAN5 (Table 9). An additional 13 proteins did not

57 bind our negative control sequences, but were not present on the earlier version of the microarray, so we have no more data on their frequency of DNA binding.

The multiple protein-binding sequences in EMSA together with the strong overlap of microarray-identified binding proteins may suggest that multiple SNPs participate in protein mediated complexes to regulate the gene. This is consistent with previous reports of combinatorial haplotype effects where multiple variant sequences in LD act together to regulate gene expression (Corradin et al., 2014b).

4.4 Discussion

Our EMSA results suggest that many of the SNPs in the schizophrenia-associated

LD block in intron 3 of CACNA1C differentially bind nuclear proteins. Although there is no data on how often this might happen for random sequences, together with finding some on the proteins on the microarray binding multiple variant sequences puts forth the possibility that many of these SNPs may participate in protein-mediated 3D interactions contributing to the regulation of CACNA1C.

Interestingly, what connects these SNPs is that they are in near complete LD, located on two segregating haplotypes associated with different CACNA1C mRNA levels. Such a pattern could very well be the result of balancing selection, if each regulatory haplotype gains advantage when its frequency is reduced. A similar result extending to many more examples of disease-associated haplotypes has been reported by

(Corradin et al., 2014b).

While all this remains a speculation, our list of proteins makes for a good starting point for further studies. The most interesting of these proteins is perhaps PKNOX2.

58 While it binds only ~1% of the sequences tested on our previous experiment and none of our three negative controls, it binds seven of the 16 SNP-based sequences, of which it shows a preference for the risk allele in five and the non-risk in one. Although PKNOX2 does not bind rs4765905, it may interact with other SNPs in the locus to fine-tune the regulatory effect of rs4765905. Interestingly, the PKNOX2 gene, a Homeobox-

Containing Gene expressed highly in the brain (Imoto et al., 2001), has been previously associated with substance abuse (Guo et al., 2013) and formal thought disorder in schizophrenia (Wang et al., 2012).

59 4.5 Chapter 4 Tables

Table 5. EMSA probe sequences.

Allele Forward Sequence Reverse Sequence rs1006737_A TCAGCCCGAAATGTTTTCAGA TCTGAAAACATTTCGGGCTGA rs1006737_G TCAGCCCGAAGTGTTTTCAGA TCTGAAAACACTTCGGGCTGA rs1024582_A GTTCACGGGGAATCTTGTAAA TTTACAAGATTCCCCGTGAAC rs1024582_G GTTCACGGGGGATCTTGTAAA TTTACAAGATCCCCCGTGAAC rs10744560_C CCTCCAGTCTCACTCATCGTG CACGATGAGTGAGACTGGAGG rs10744560_T CCTCCAGTCTTACTCATCGTG CACGATGAGTAAGACTGGAGG rs10774035_C GAAGCAGAGCCGCTTTGAGAT ATCTCAAAGCGGCTCTGCTTC rs10774035_T GAAGCAGAGCTGCTTTGAGAT ATCTCAAAGCAGCTCTGCTTC rs10774036_C CCACCATAACCGCTCACCAAA TTTGGTGAGCGGTTATGGTGG rs10774036_T CCACCATAACTGCTCACCAAA TTTGGTGAGCAGTTATGGTGG rs11062170_C TTTTAGCGGTCTTACCAGGGC GCCCTGGTAAGACCGCTAAAA rs11062170_G TTTTAGCGGTGTTACCAGGGC GCCCTGGTAACACCGCTAAAA rs12311439_A GCAGAGCTGGAATCACTCAAA TTTGAGTGATTCCAGCTCTGC rs12311439_T GCAGAGCTGGTATCACTCAAA TTTGAGTGATACCAGCTCTGC rs12315711_A GCCTTTCTGCACCCTATGATG CATCATAGGGTGCAGAAAGGC rs12315711_T GCCTTTCTGCTCCCTATGATG CATCATAGGGAGCAGAAAGGC rs2159100_C TAAAAATATACGTTCAAGCAA TTGCTTGAACGTATATTTTTA rs2159100_T TAAAAATATATGTTCAAGCAA TTGCTTGAACATATATTTTTA rs2370414_A AGCTCCCCACACCCCGCCCTG CAGGGCGGGGTGTGGGGAGCT rs2370414_G AGCTCCCCACGCCCCGCCCTG CAGGGCGGGGCGTGGGGAGCT rs34382810_A CAGAACCATCATCTCCTGGTC GACCAGGAGATGATGGTTCTG rs34382810_C CAGAACCATCCTCTCCTGGTC GACCAGGAGAGGATGGTTCTG rs4298967_A GGGCATTAAAACATTTTAGTG CACTAAAATGTTTTAATGCCC rs4298967_G GGGCATTAAAGCATTTTAGTG CACTAAAATGCTTTAATGCCC rs4765905_C GCAATCTTCTCTTGGGGTCTG CAGACCCCAAGAGAAGATTGC rs4765905_G GCAATCTTCTGTTGGGGTCTG CAGACCCCAACAGAAGATTGC rs758170_C GTAAATAGTCCGCCTGAAAAA TTTTTCAGGCGGACTATTTAC rs758170_T GTAAATAGTCTGCCTGAAAAA TTTTTCAGGCAGACTATTTAC rs769087_A AACAAGGAAAAACTGAGAACT AGTTCTCAGTTTTTCCTTGTT rs769087_G AACAAGGAAAGACTGAGAACT AGTTCTCAGTCTTTCCTTGTT rs7965923_A AGTTCTCAGTATTTCCTTGTT AACAAGGAAATACTGAGAACT rs7965923_C AGTTCTCAGTCTTTCCTTGTT AACAAGGAAAGACTGAGAACT

Forward and reverse strands of each allele were annealed to create double stranded probes.

60

Table 6. Protein microarray probe sequences.

Primer Name Sequence Cy5_T7B_For TAATAGCACTCACTATAGGG T7B_For TAATAGCACTCACTATAGGG rs1006737a_T7B TCAGCCCGAAATGTTTTCAGACCCTATAGTGAGTGCTATTA rs1006737g_T7B TCAGCCCGAAGTGTTTTCAGACCCTATAGTGAGTGCTATTA rs1024582a_T7B GTTCACGGGGAATCTTGTAAACCCTATAGTGAGTGCTATTA rs1024582g_T7B GTTCACGGGGGATCTTGTAAACCCTATAGTGAGTGCTATTA rs10744560c_T7B CCTCCAGTCTCACTCATCGTGCCCTATAGTGAGTGCTATTA rs10744560t_T7B CCTCCAGTCTTACTCATCGTGCCCTATAGTGAGTGCTATTA rs10774035c_T7B GAAGCAGAGCCGCTTTGAGATCCCTATAGTGAGTGCTATTA rs10774035t_T7B GAAGCAGAGCTGCTTTGAGATCCCTATAGTGAGTGCTATTA rs10774036c_T7B CCACCATAACCGCTCACCAAACCCTATAGTGAGTGCTATTA rs10774036t_T7B CCACCATAACTGCTCACCAAACCCTATAGTGAGTGCTATTA rs11062170c_T7B TTTTAGCGGTCTTACCAGGGCCCCTATAGTGAGTGCTATTA rs11062170g_T7B TTTTAGCGGTGTTACCAGGGCCCCTATAGTGAGTGCTATTA rs12311439a_T7B GCAGAGCTGGAATCACTCAAACCCTATAGTGAGTGCTATTA rs12311439t_T7B GCAGAGCTGGTATCACTCAAACCCTATAGTGAGTGCTATTA rs12315711a_T7B GCCTTTCTGCACCCTATGATGCCCTATAGTGAGTGCTATTA rs12315711t_T7B GCCTTTCTGCTCCCTATGATGCCCTATAGTGAGTGCTATTA rs2159100c_T7B TAAAAATATACGTTCAAGCAACCCTATAGTGAGTGCTATTA rs2159100t_T7B TAAAAATATATGTTCAAGCAACCCTATAGTGAGTGCTATTA rs2370414a_T7B AGCTCCCCACACCCCGCCCTGCCCTATAGTGAGTGCTATTA rs2370414g_T7B AGCTCCCCACGCCCCGCCCTGCCCTATAGTGAGTGCTATTA rs34382810a_T7B CAGAACCATCATCTCCTGGTCCCCTATAGTGAGTGCTATTA rs34382810c_T7B CAGAACCATCCTCTCCTGGTCCCCTATAGTGAGTGCTATTA rs4298967a_T7B GGGCATTAAAACATTTTAGTGCCCTATAGTGAGTGCTATTA rs4298967g_T7B GGGCATTAAAGCATTTTAGTGCCCTATAGTGAGTGCTATTA rs4765905c_T7B GCAATCTTCTCTTGGGGTCTGCCCTATAGTGAGTGCTATTA rs4765905g_T7B GCAATCTTCTGTTGGGGTCTGCCCTATAGTGAGTGCTATTA rs758170c_T7B GTAAATAGTCCGCCTGAAAAACCCTATAGTGAGTGCTATTA rs758170t_T7B GTAAATAGTCTGCCTGAAAAACCCTATAGTGAGTGCTATTA rs769087a_T7B TCATGGGTTAAATATTTTTAACCCTATAGTGAGTGCTATTA rs769087g_T7B TCATGGGTTAGATATTTTTAACCCTATAGTGAGTGCTATTA rs7965923a_T7B AGTTCTCAGTATTTCCTTGTTCCCTATAGTGAGTGCTATTA rs7965923c_T7B AGTTCTCAGTCTTTCCTTGTTCCCTATAGTGAGTGCTATTA

Cy5_T7B_For and T7B_For were used with variant primers to make double stranded probes, labeled and unlabeled, respectively. Cy5_T7B_For had a Cy5 fluorophore.

61 Table 7. Summary of EMSA binding results.

0 1 0 5 6 0 9 1 3 7 0 1 7 5 4 3 3 6 3 2 7 8 2 7 3 0 7 1 0 0 1 0 0 5 4 8 6 2 9 8 7 1 5 2 9 7 4 4 4 4 1 5 9 8 5 0 6 9 1 6 5 1 0 7 7 4 1 4 8 3 6 9 0 5 3 0 6 8 7 7 7 7 3 2 9 4 9 6 0 1 2 1 7 5 3 0 0 0 2 0 2 3 7 7 1 2 1 1 4 7 2 1 1 1 1 1 4 rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs Binding + + + + + + + + + + + + + + + + Protein Allele - - + + + - + + + + + + + + + + Specificity Competition n/a n/a + + + n/a + + + + + + + + + + Assay

All SNPs showed a shift, indicative of protein binding as indicated by the “+” in the top row. SNPs rs34382810, rs7965923, and rs12315711 did not show any allele-specific binding as indicated by the “-“ (middle row), and were therefore not tested with a competition EMSA (bottom row). Competition EMSAs for all other SNPs confirmed the allele-specificity observed in the standard EMSA.

62 Table 8. Complete protein microarray results.

3 5 2 3 5 2 5 7 7 0 5 7 7 0 2 2 3 6 2 2 3 6 if if if 4 if if if 4 5 6 0 0 1 9 0 t t t / 5 6 0 0 1 9 0 t t t / 3 3 6 5 0 7 7 4 1 7 3 3 1 2 o o o s 3 3 6 5 0 7 7 4 1 7 3 3 1 2 o o o s 0 0 5 0 0 1 3 1 7 6 7 0 2 4 8 8 l 0 0 5 0 0 1 3 1 7 6 7 0 2 4 8 8 l 4 4 4 9 1 2 7 4 5 9 8 7 9 1 2 5 m m m ro 4 4 4 9 1 2 7 4 5 9 8 7 9 1 2 5 m m m ro 7 7 4 5 9 6 6 0 1 8 0 1 5 1 8 4 l l l t 7 7 4 5 9 6 6 0 1 8 0 1 5 1 8 4 l l l t 7 7 7 6 5 0 0 7 3 9 9 8 6 3 3 2 ro ro ro n 7 7 7 6 5 0 0 7 3 9 9 8 6 3 3 2 ro ro ro n 0 0 0 7 1 1 0 3 2 2 6 5 9 2 4 0 t t t o 0 0 0 7 1 1 0 3 2 2 6 5 9 2 4 0 t t t o 1 1 1 4 2 1 1 2 1 4 7 7 7 1 3 1 n n n C 1 1 1 4 2 1 1 2 1 4 7 7 7 1 3 1 n n n C rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs co co co N rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs co co co N

RBM12 Risk/Both Risk Both Risk/Both Risk Risk/Both Common Risk Common Risk Risk Binds 0 TERF1 Risk Risk/Both Risk Risk Common Common Risk Both Common Risk Common Binds 83

Bicc1 Risk Common Both Both Common Common Risk Both Common Risk Binds Binds Binds n/a

C9orf156 Both Risk Risk Risk Common Common Risk Common Risk Common Binds Binds Binds 9

EXOG Both Risk Risk Common Common Risk Common Risk Risk Common Binds Binds n/a UNG Both Common Common/BoCtohmmon Risk Both Risk Common Risk Both 50

HIST1H1A Both Risk Risk Both Both Risk Risk Common Risk Binds Binds 2

GAPDH Common Common Common Risk/Both Risk Common Risk Common n/a

MGC17986 Both Common Both Common Risk Common/BoCtohmmon Risk Binds Binds Binds n/a NPM2 Both Risk Both Risk/Both Common Common Common Risk Binds 61

PKNOX2 Both Common Risk Risk Risk Risk Risk 5

TAF15 Risk Common Common Risk Common Risk Common n/a

EIF4B Risk/Both Common/BoRtihsk/Both Common Risk Risk Binds 0 GAS7 Risk Both Both Common Risk Risk Binds Binds n/a HR Both Common Common Risk Both Common Binds Binds Binds n/a

INHBB Common Risk Common/Both Risk Risk Risk Binds Binds n/a

MAX Risk Risk Common/BoRtihsk Risk Common 0 SMARCC1 Both Both Risk Common/BoRtihsk Common Binds 1 WHSC2 Both Risk/Both Risk Common Risk Risk Binds 6

DDX6 Common/BoRtihsk Risk Risk Common n/a DLX4 Risk Common/Both Risk Risk Common n/a

BNC Risk Common Common Risk/Both n/a

CS Common Common Risk Common n/a HNRNPK Common Risk/Both Risk Risk Binds Binds Binds n/a LZTFL1 Risk Common Risk Risk 68

PCBP3 Risk Common Risk Risk n/a

PRNP Risk Common Common Common 3 PSPC1 Risk Common Risk Risk 0 ROD1 Risk Risk Common Risk n/a

TARDBP Common Common Risk Risk n/a

AHNAK Common/Both Risk Risk/Both n/a BNIP3L Both Common/Both Risk/Both n/a CBFA2T3 Risk Risk Risk 73

CPSF4 Risk Risk Common 19

EIF1AD Risk Both Risk n/a

GADD45A Both Common Both 6 HIST1H1C Risk/Both Common/Both Both n/a

PCBP1 Risk Both Both n/a

TCEAL6 Risk Common Risk 23 TIA1 Risk Common Risk 10

ZKSCAN5 Risk Common Risk 2

ZNF34 Both Risk Risk 59

ALDOA Common/Both Common n/a CIAO1 Risk Common n/a

DAZAP1 Common Risk 27

HEAB Common Risk n/a

HMGB1 Risk Risk n/a HMGN1 Common Common/Both 1

HNRNPC Risk Common n/a

KIAA0907 Risk Risk 16

NPM1 Risk/Both Risk 32 RBPMS Common Risk n/a

SMPX Risk Risk 9

SPG7 Risk Risk n/a TRIM24 Risk Risk n/a ZNF207 Common Risk 4

63 3 5 2 3 5 2 5 7 7 0 5 7 7 0 2 2 3 6 2 2 3 6 if if if 4 if if if 4 5 6 0 0 1 9 0 t t t / 5 6 0 0 1 9 0 t t t / 3 3 6 5 0 7 7 4 1 7 3 3 1 2 o o o s 3 3 6 5 0 7 7 4 1 7 3 3 1 2 o o o s 0 0 5 0 0 1 3 1 7 6 7 0 2 4 8 8 l 0 0 5 0 0 1 3 1 7 6 7 0 2 4 8 8 l 4 4 4 9 1 2 7 4 5 9 8 7 9 1 2 5 m m m ro 4 4 4 9 1 2 7 4 5 9 8 7 9 1 2 5 m m m ro 7 7 4 5 9 6 6 0 1 8 0 1 5 1 8 4 l l l t 7 7 4 5 9 6 6 0 1 8 0 1 5 1 8 4 l l l t 7 7 7 6 5 0 0 7 3 9 9 8 6 3 3 2 ro ro ro n 7 7 7 6 5 0 0 7 3 9 9 8 6 3 3 2 ro ro ro n 0 0 0 7 1 1 0 3 2 2 6 5 9 2 4 0 t t t o 0 0 0 7 1 1 0 3 2 2 6 5 9 2 4 0 t t t o 1 1 1 4 2 1 1 2 1 4 7 7 7 1 3 1 n n n C 1 1 1 4 2 1 1 2 1 4 7 7 7 1 3 1 n n n C rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs co co co N rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs co co co N

AAK1 Risk n/a AI043120 Risk/Both n/a

ANXA11 Risk 10

ARC Risk n/a

BAD Risk 4 BCL6 Risk n/a

CBX5 Risk/Both 1

DDX21 Common/Both n/a

DNMT3A Common/Both 4 EEF1A2 Both n/a HAGH Risk n/a

HIST1H2BO Risk 2

HMGN3 Risk n/a HSPE1 Risk 71 IARS2 Risk n/a

JMJ Common/Both n/a

MAPK8 Risk n/a MLX Risk 23

MSI1 Common 12 NFKB1 Common n/a OAT Both n/a

OBFC1 Risk 0

PABPC1 Common 38 PAXIP1 Risk 9 PCBP2 Risk n/a

PCBP4 Risk n/a

PCNXL2 Both 1 PPIH Common n/a PRM2 Risk n/a

RALY Risk 0

SCAPER Common n/a

SFRS1 Risk n/a SFRS16 Risk n/a

SP100 Risk 1

SPR Risk 13 SSBP1 Risk n/a

SUB1 Risk/Both n/a

TOB2 Common 5

TP53 Risk/Both 1 TXNL4A Risk n/a

UBE2K Common Binds n/a

UQCRB Risk/Both 12

ZC3H15 Both n/a ZCCHC10 Common Binds n/a

ZNF330 Risk 68

ZNF77 Risk n/a

ZNF830 Risk n/a CBX1 Binds n/a

DRAP1 Binds n/a

ETNK2 Binds n/a FAIM3 Binds Binds n/a

MRPS26 Binds n/a

NR4A1 Binds 5

PCCA Binds n/a RIN3 Binds n/a

RPL11 Binds n/a

TMEM161A Binds n/a ZNF187 Binds n/a

Blue fill: protein binds only to the non-risk allele, red fill: the protein binds only the risk

allele, grey fill: protein binds both alleles, no fill: no binding.

64 Table 9. Summary of proteins binding multiple variants in the protein microarray.

0 6 4 / 0 1 0 5 6 0 9 s 1 3 7 0 1 7 5 4 3 3 6 3 2 7 l 8 2 7 3 0 7 1 0 0 1 0 0 5 4 8 6 ro 2 9 8 7 1 5 2 9 7 4 4 4 4 1 5 9 t 8 5 0 6 9 1 6 5 1 0 7 7 4 1 4 8 n 3 6 9 0 5 3 0 6 8 7 7 7 7 3 2 9 o 4 9 6 0 1 2 1 7 5 3 0 0 0 2 0 2 C 3 7 7 1 2 1 1 4 7 2 1 1 1 1 1 4 Protein N rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs rs PKNOX2 5 PRNP 3 EIF1AD 1 GADD45A 6 ZKSCAN5 2

Proteins listed bind at least 3 of the 16 SNPs, none of the 3 control oligos, and 6 or less of the 460 oligos tested by (Hu et al., 2009; Xie et al., 2010). Blue fill: protein binds only to the non-risk allele, red fill: the protein binds only the risk allele, grey fill: protein binds both alleles, no fill: no binding.

65 4.6 Chapter 4 Figures

Figure 9. EMSA for rs4765905 with HEK293 and SK-N-SH nuclear extracts.

Nuclear extracts plus buffer are run in lanes 1 and 2. Probes plus buffer are run in lanes

3-5. Probes are incubated with nuclear extract as indicated above the lane number in lanes 6-11. Lane 12 is buffer alone. Control allele is a positive control for the assay from an unrelated variant.

66 !!!! !

!!!!

!!!! !

!!!! !

67 Intentionally blank

!

!!!!

!!!! !

!!!! !!!!!!!!! Figure 10. EMSAs for the remaining 15 variants in LD with rs1006737.

68 SKNSH Nuclear Extract + + + + + + + + + + + + rs2159100 Non-Risk Probe . + + + + + ...... SKNSH Nuclear Extract + + + + + + + + + + + + rs2159100 Risk Compe@tor . . 25 50 100 150 ...... rs1006737 Non-Risk Probe . + + + + + ...... rs2159100 Risk Probe ...... + + + + + . rs1006737 Risk Compe@tor . . 25 50 100 150 ...... rs2159100 Non-Risk Compe@tor ...... 25 50 100 150 . rs1006737 Risk Probe ...... + + + + + . Control A Probe ...... + rs1006737 Non-Risk Compe@tor 1 2 3 4 5 6 7 8 9 10 11 12 ...... 25 50 100 150 . Control B Probe ...... + 1 2 3 4 5 6 7 8 9 10 11 12

SKNSH Nuclear Extract + + + + + + + + + + + + rs769087 Non-Risk Probe . + + + + + ...... SKNSH Nuclear Extract + + + + + + + + + + + . rs769087 Risk Compe@tor . . 19 38 75 112 ...... rs10774035 Non-Risk Probe . + + + + + ...... rs769087 Risk Probe ...... + + + + + . rs10774035 Risk CompeAtor . . 5 10 20 40 ...... rs10774035 Risk Probe ...... + + + + + . rs769087 Non-Risk Compe@tor ...... 19 38 75 112 . Control A Probe ...... + rs10774035 Non-Risk CompeAtor ...... 5 10 20 40 . 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

SKNSH Nuclear Extract + + + + + + + + + + + + SKNSH Nuclear Extract + + + + + + + + + + + . rs1024582 Non-Risk Probe . + + + + + ...... rs2370414 Non-Risk Probe . + + + + + ...... rs1024582 Risk CompeAtor . . 25 50 100 150 ...... rs2370414 Risk CompeAtor . . 5 10 20 40 ...... rs1024582 Risk Probe ...... + + + + + . rs2370414 Risk Probe ...... + + + + + . rs1024582 Non-Risk CompeAtor ...... 25 50 100 150 . rs2370414 Non-Risk CompeAtor ...... 5 10 20 40 . Control B Probe ...... + 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

SKNSH Nuclear Extract + + + + + + + + + + + + rs4298967 Non-Risk Probe . + + + + + ...... rs4298967 Risk CompeAtor . . 25 50 100 150 ...... SKNSH Nuclear Extract + + + + + + + + + + + . rs4298967 Risk Probe ...... + + + + + . rs11062170 Risk Probe . + + + + + ...... rs4298967 Non-Risk CompeAtor ...... 25 50 100 150 . rs11062170 Non-Risk Compe@tor . . 5 10 20 40 ...... Control B Probe ...... + rs11062170 Non-Risk Probe ...... + + + + + . 1 2 3 4 5 6 7 8 9 10 11 12 rs11062170 Risk Compe@tor ...... 5 10 20 40 . 1 2 3 4 5 6 7 8 9 10 11 12

69 SKNSH Nuclear Extract + + + + + + + + + + + . rs10744560 Non-Risk Probe . + + + + + ...... SKNSH Nuclear Extract + + + + + + + + + + + . rs10744560 Risk CompeAtor . . 5 10 20 40 ...... rs10774036 Non-Risk Probe . + + + + + ...... rs10744560 Risk Probe ...... + + + + + . rs10774036 Risk CompeAtor . . 5 10 20 40 ...... rs10744560 Non-Risk CompeAtor ...... 5 10 20 40 . rs10774036 Risk Probe ...... + + + + + . 1 2 3 4 5 6 7 8 9 10 11 12 rs10774036 Non-Risk CompeAtor ...... 5 10 20 40 . 1 2 3 4 5 6 7 8 9 10 11 12

SKNSH Nuclear Extract + + + + + + + + + + + + rs12311439 Non-Risk Probe . + + + + + ...... SKNSH Nuclear Extract + + + + + + + + + + + . rs12311439 Risk Compe@tor . . 25 50 100 150 ...... rs4765905 Risk Probe . + + + + + ...... rs12311439 Risk Probe ...... + + + + + . rs4765905 Non-Risk CompeAtor . . 5 10 20 40 ...... rs12311439 Non-Risk Compe@tor ...... 25 50 100 150 . rs4765905 Non-Risk Probe ...... + + + + + . 55 G Control Probe ...... + rs4765905 Risk CompeAtor ...... 5 10 20 40 . 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

SKNSH Nuclear Extract + + + + + + + + + + + + rs758170 Non-Risk Probe . + + + + + ...... rs758170 Risk Compe@tor . . 25 50 100 150 ...... rs758170 Risk Probe ...... + + + + + . rs758170 Non-Risk Compe@tor ...... 25 50 100 150 . 55 G Control Probe ...... + 1 2 3 4 5 6 7 8 9 10 11 12

Intentionally blank

Figure 11 Competition EMSAs.

Competition EMSAs, unlabeled competitor allele is shown in ng of DNA added above each well. Control A and B are the A and G alleles of rs60827755 (NRG3), respectively.

Control C is the G allele of rs4765905.

70 Chapter 5: Chromatin Conformation Capture

5.1 Introduction

Within a cell, chromatin is organized in a 3 dimensional (3D) architecture to facilitate functional interactions between loci that are not necessarily proximal on a linear chromosome. This functional compartmentalization allows distal enhancers to work together with promoters and drive gene expression (Ulianov et al., 2015). Data suggest that enhancers and promoters form loop structures to allow the interaction of the enhancer-bound protein complex of transcription factors, activators, and co-activators to interact with the promoter. In some cases, the pre-initiation complex may even form at the enhancer, before it is transferred to the promoter to recruit RNA polymerase II and begin transcription. Additionally, it has been observed that disrupting the enhancer sequence can result in reduced transcription of the target gene (Vernimmen and

Bickmore, 2015). Cis-interactions between enhancers and promoters are often lineage- or cell-type specific, which allows for different cell types to express the genes they require at the appropriate levels (Gorkin et al., 2014). 3D interactions between enhancers and promoters have been observed, even when transcription is not actively occurring, suggesting that the structures are relatively stable within a cell type and may be captured in cellular assays (Danino et al., 2015).

We are particularly interested to determine if the schizophrenia-associated variants tagged by rs1006737 in the third intron of CACNA1C interact with promoters to add further support to their role as an eQTL. The CACNA1C locus has been recently shown to also produce a carboxy-terminal peptide named CCAT is transcribed from the last exon (exon 47) of the CACNA1C gene. According to a follow up report, its

71 expression is driven by a separate promoter overlapping exon 46 of CACNA1C (Gomez-

Ospina et al., 2013). Interestingly, CCAT is expressed in brain during development in rats and has been shown to be a repressor of CACNA1C expression(Gomez-Ospina et al.,

2013). It is therefore possible that the differences we observe in post mortem tissue between the alleles of the schizophrenia-associated variants are driven by changes in the expression of CCAT rather than CACNA1C itself. Therefore, considering that CCAT has such an intimate relationship with CACNA1C, understanding the enhancer-promoter interactions of both genes will be informative to the overall regulation of CACNA1C expression.

Dual luciferase reporter assays provide useful information on the regulatory potential of DNA sequences and protein binding assays can enrich this information by identifying the specific mediators of the regulatory effect. However, none of those provide much information on the target of this regulation. It is important to link a regulatory sequence with its specific target if we are to understand the link between the variants and schizophrenia. Circular chromatin conformation capture with next- generation sequencing (4C-seq) captures the three dimensional structure of chromatin and allows for identification of regulatory DNA sequences.

In 4C-seq, DNA-protein complexes are cross-linked, the DNA is digested to form short fragments of DNA still bound to the associated proteins, then the short DNA fragments are ligated together, and the crosslinking is reversed. Primers are designed for specific fragments of interest, called viewpoints, and used to sequence the unknown interacting fragments that are ligated to the viewpoint. Using next generation sequencing technology, the unknown interacting DNA fragments can be mapped back to the genome

72 and quantified to determine which interactions are most frequent(Göndör et al., 2008;

Splinter et al., 2012). 4C-seq allows the identification of previously unsuspected interactions, which might prove of particular interest in this project, as it has been shown that promoters often interact with multiple enhancers (Gorkin et al., 2014).

5.2 Methods

5.2.1 Experimental Design

To safeguard against false negatives due to cell type specific interactions we used two human cell lines, SK-N-SH and HEK293. Additionally, we used 3 different primary restriction enzymes for the CACNA1C viewpoint so that the amplicons involved in each experiment would be different. We further performed each experiment in duplicate. We then applied quality controls requiring that we acquire at least one million reads from each experiment, that more than 80% of them originate from the viewpoint and that more than a quarter can be mapped uniquely back to the . This resulted in discarding some technical replicates.

5.2.2 Crosslinking

The following methods are largely based on the protocol published by Splinter et al. (2012), with minor changes (Splinter et al., 2012). HEK293 and SK-N-SH cells were cultured as described in the Chapter 3 Methods (section 3.2.1 Cell Culture).

For each experimental sample, 107 cells were required. To crosslink cells, they were trypsinized, resuspend in media, and transferred to a 250 mL bottle. Cells were counted with the Beckman Coulter cell counter and 107 cells were pipetted into 15 mL conical tubes. Tubes were centrifuged at 1,000 rpm for 4 minutes and the supernatant was 73 aspirated. Cells were resupended in 5 mL PBS + 10% FBS. Then, the PBS + 10% FBS +

Formaldehyde solution was added for crosslinking. The standard concentration of formaldehyde was 4%, but we also tried variable crosslinking conditions, including 2% formaldehyde. Experiments with non-standard crosslinking conditions are indicated as such. After adding formaldehyde, cells were tumbled on an end-over-end rotator for 10 minutes at room temperature. Non-standard crosslinking times of 5 minutes tumbling were also used, and samples were labeled accordingly. 1.425 mL 1M glycine was added to quench the formaldehyde and cells were put on ice immediately. Then, cells were spun at 400rcf 4° for 8 minutes and the supernatant as aspirated. Cells were resuspended in 1 mL cold lysis buffer (as described in Splinter et al. (2012) and incubated on ice for 10 minutes (Splinter et al., 2012). The efficiency of the lysis was checked with trypan blue staining. The lysed cells were centrifuged at 750rcf at 4° for 5 minutes and the supernatant was pipetted off. The lysate was resuspended in 200 µL lysis buffer and transferred to safe lock freezer tube. The freezer tubes were placed into 12 mL inoculation tubes so they could be spun at 540rcf at 4° for 2 minutes. The supernatant was pipetted off and the lysate was stored at -80° until ready for template preparation

5.2.3 Template Preparation

Lysate pellet was resuspended in 450 µL water, transferred to a microcentrifuge tube, and 60 µL Buffer for Restriction Enzyme #1 (RE1) was added. The tube was incubated at 37° and 15 µL 10% SDS was added. Then, the tube was shaken at 900rpm for 1 hour at 37°. 75 µL 20% Triton X was added and the tube was shaken at 900rpm for an additional 1 hour at 37°. A 5 µL aliquot was taken from the tube and stored at 4° as undigested control. 200 units of RE1 was added and the sample was shaken at 900rpm for

74 4 hours at the active temperature of RE1 (generally 37°). An additional 200 units of RE1 was added and the sample was shaken at 900rpm overnight at 37°. Another 200 units of

RE1 was added and the sample was shaken at 900rmp for 4 hours at 37°. A 5 µL aliquot was removed and stores at 4° as digested control.

Digestion efficiency was determined by adding 90 µL 10mM Tris-HCl pH 7.5 and 5 µL 10 mg/mL Proteinase K (New England Biolabs) to each control (undigested and digested) aliquot, then incubating for 4 hours at 65°. Then, 100 µL Phenol-Chloroform was added and mixed vigorously. The control aliquots were spun at 16,400 g for 10 minutes at room temperature. 20 µL of the resulting aqueous phase were run on a 0.6% agarose gel. Undigested controls should show a very large band, while digested should show a smear.

In the samples, RE1 was inactivated by heat treatment of 20 minutes at 65° or

80°, depending on the restriction enzyme. If RE1 could be heat inactivated, 80 µL 10%

SDS was added and incubated for 30 minutes at 65°. Then the sample was transferred to

50 mL conical tubes and 5.4 mL water, 700 µL 10x T4 Ligase Buffer, and 375 µL 20%

Triton X were added and incubated for 1 hour at 37°. If sample could not be heat inactivated, continue with ligation by adding only T4 ligase.

To ligate the digested ends of chromatin together, the sample was transferred to

50 mL Falcon tube and 5.7 mL water, 700 µL ligase buffer, and 50 units T4 ligase (New

England Biolabs). The sample was mixed by swirling and incubated overnight at 16°.

Then, 100 µL aliquot was removed as a ligation control.

To test ligation efficiency, 5 µL Proteinase K was added to the ligation control aliquot and incubates for 4 hours at 65°. Then, 100 µL phenol-chloroform was added and

75 mixed vigorously. The control was spun at 13,000g for 10 mins at room temperature. 20

µL of the resulting aqueous phase was run on 0.6% agarose gel with 20 µL each of the remaining undigested and digested controls from previous control check. The ligated aliquot should be a smear similar to digested aliquot, but with a slight shift towards higher molecular weight.

To reverse the crosslinking, 30 µL Proteinase K was added to the now digested and ligated sample and it was incubated overnight at 65°. In the morning, 30 µL 10 mg/mL RNase A (ThermoScientific) was added and incubated for 45 minutes at 37°.

Then, 7 mL phenol-chloroform was added and mixed vigorously. Samples were spun at\

3220g for 15 minutes at room temperature. The aqueous phase was transferred to a new

50 mL Falcon tube and 7 mL water, 1.5 mL 2M NaAC pH 5.6, 7 µL 20 mg/mL glycogen

(Affymetrix), and 35 mL 100% EtOH were added. The sample was mixed and incubated at -80° until frozen solid (~5 hours). Then the sample was spun at 3220g for 20 minutes at 4°. Note the original protocol said to spin at 8346g, but the tubes cracked when I used that speed and I lost a lot of my sample. The supernatant was removed and 10 mL cold

70% EtOH was added. The sample was spun at 3220g for 15 minutes at 4°. The supernatant was removed and the pellet was briefly air dried at room temperature. Then, the pellet was dissolved in 150 µL 10mM Tris-HCl pH 7.5 at 37° for ~30 minutes. The sample was then stored at -20° until it was ready for the second round of digestion and ligation.

For the second digestion, the sample was removed from the freezer and 50 µL restriction enzyme 2 (RE2) buffer, 50 units RE2, and water to 500 µL was added. The

76 sample was incubated at the active temperature of RE2 (generally 37°) overnight. A 5 µL aliquot was then removed as a digested control.

Digestion efficiency of the second digestion was determined by adding 95 µL

10mM Tris-HCl pH 7.5 to the control aliquot and then run 20 µL on 0.6% agarose gel with 20 µL of the previous ligation control. The digested aliquot should be a smaller smear than the ligated.

In the sample, RE2 should be heat inactivated at 65° or 80°, depending on the restriction enzyme, for 25 minutes. If RE2 could not be heat inactivated, 500 µL phenol- chloroform was added and mixed vigorously. The sample was spun at 13,000g for 10 minutes at room temperature. Then, the aqueous layer was transferred to fresh tube, where 50 µL 2M NaAc pH 5.6 and 950 µL 100% EtOH were added. The sample was incubated at -80° until frozen solid, then spun for 20 minutes at 13,000g at 4°and the supernatant was removed. 150 µL cold 70% EtOH was added, the sample was spun again for 10 minutes at 13,000g at 4° and the supernatant was removed. The pellet was resuspended in 500 µL 10mM Tris-HCl pH 7.5 (by agitating via Vortex and pipetting up and down).

For the second ligation, the sample was transferred to a 50 mL Falcon tube and

12.1 mL water, 1.4 mL 10x ligation buffer, and 100 units T4 DNA ligase were added.

The sample was incubated at 16° overnight.

To purify the sample, 700 uL 2M NaAC pH 5.6, 7 uL glycogen. and 35 mL 100%

EtOH were added and mixed well. The sample was stored at -80° until frozen solid, then spun at 3300g for 45 minutes at 4°. Note, the original protocol said to spin 8346g, but the tubes cracked when I used that speed and I lost a lot of my sample. The supernatant was

77 removed and 10 mL 70% EtOH was added. The sample was spun for 15 minutes at

3,300g at 4°. The supernatant was removed. The pellet was briefly dry pellet at room temperature and dissolved in 150 µL 10mM Tris-HCl pH 7.5 at 37° for ~30 minutes.

Then, each sample was divided into 3 50 µL aliquots and each aliquot was purified with the QIAquick PCR purification kit (Qiagen) according to the manufacturer’s protocol.

The recovered flow through from each of the 3 aliquots of a sample were pooled back together and the DNA concentration was measured on the Nanodrop. This DNA was the

4C template and it was stored at -20°.

5.2.4 Primer Design and Optimization

In order to sequence the unknown interacting fragments bound to the viewpoint fragment, primers were designed on the edges of the viewpoint fragment facing outward.

The sequencing primer was designed within 50bp of RE1 cut site and the reverse primer was designed near the RE2 cut site. Primers could not contain any variants and had to map uniquely to the genome. Generally 2-5 primer pairs were designed per viewpoint and were ordered from IDT.

Many primer combinations were tested with the following protocol:

2.5 uL 10x PCR Buffer 1 (Expand Long Template Polymerase kit)

0.5 uL 10mM dNTPs

0.35 uL 100uM sequencing primer (forward)

0.35 uL 100uM reverse primer

0.35 uL Expand Long Template Polymerase

100 ng Template DNA

to 25 uL Water

78 Thermocycler:

94° 2 min

94° 10 sec

55° 1 min x30

68° 3 min

68° 5 min

4° ∞

The resulting PCR products were run on a 1.5% agarose gel. 1-3 distinct bands and a smear in the background was the optimal product. More than 3 bands indicated lack of specificity of the primers. And no distinct band resulted in poor sequencing quality.

For primers that work well, Illumina sequencing adapter and barcode sequences were added to the primers and ordered from IDT at 200 nmol with HPLC purification

(Table 10). Forward primers had the Illumina sequencing adapter (5’ aatgatacggcgac caccgaatctacactctttccctacacgacgctcttccgatct 3’) and a 12nt barcode added to the 5’ end.

Reverse primers had only the Illumina adapter added (5’ caagcagaagacggcatacgagatgtgac tggagttcagacgtgtgctcttccgatc 3’) to the 5’ end. The Illumina adapter was required for the amplicons to bind to the Illumina MiSeq flow cell, form bridges, and for the Illumina sequencing primer to bind the amplicon to begin the sequencing by synthesis reaction.

12nt barcodes were designed based on the Nextera 8nt barcodes and allowed multiple samples to be sequenced at once and deconvoluted in analysis. Because the samples were low complexity, barcodes that would be pooled together were chosen to have even base representation at each position to allow for accurate cluster calling, and at least 8 barcoded samples had to be sequenced together. Before library preparation, these longer

79 PCR primers were tested under the same conditions described above. If 1-3 distinct bands plus a smear were seen on the gel, as described in (Splinter et al., 2012), the library could be prepared.

5.2.5 Library Preparation

A large 800 µL PCR was used to generate libraries. Each reaction was made in a microcentrifuge tube, then split into 16 separate 50 µL reactions in PCR tubes. After the

PCR cycling was complete, the 16 reactions were pooled back together.

80 uL 10x PCR Buffer 1 (Expand Long Template Polymerase kit)

16 uL 10mM dNTPs

11.2 uL 100uM Reading primer (forward)

11.2 uL 100uM Reverse primer

11.2 uL Expand Long Template Polymerase

varies Template DNA (3.2 ug if 100 ng worked in 25 µL rxn)

to 800 uL Water

Thermocycler:

94° 2 min

94° 10 sec

55° 1 min x30

68° 3 min

68° 5 min

4° ∞

To purify the libraries and remove unincorporated nucleotides and unused primers, each library was separated into 3 aliquots and the High Pure PCR Product

80 Purification Kit (Roche) was used according to the manufacturer’s directions. After purification the 3 aliquots from a given library were pooled back together. The recovered libraries may have had some residue from the spin columns, which was removed by spinning at 13,000 rpm for 10 minutes. The supernatant was saved and transferred into a fresh microcentrifuge tube. DNA concentration of the libraries was measured on the

Nanodrop and library quality was checked by running 300 ng of library on a 1.5% agarose gel. The library should produce PCR amplicons similar to the amplicons from the primer optimization step. 1-3 distinct bands with a smear, as seen in the primer optimization PCR gel, was desired.

5.2.6 Sequencing

Sequencing reactions were run on the Illumina MiSeq according to manufacturer’s recommendations. 8 – 12 libraries were combined in equal molarity to create a 4nM pool. The pool was denatured and combined with 10% PhiX control to create a 20pM pool according to the Illumina “Preparing Libraries for Sequencing on the

MiSeq” instruction manual. The pool was sequenced with MiSeq v2 300 cycle kit or v3

150 cycle kit with single end reads. On the sample sheet wizard, the options: prep kit =

TruSeqHT, index reads = 0, single end reads, FASTQ only, and 1 sample = “pool” were chosen. The resulting ".fastq" file was saved for analysis.

5.2.6 Analysis

The resulting ".fastq" files were processed through our 4C-seq analysis pipeline, developed by Dimitrios Avramopoulos, which utilizes Unix shell scripts, Perl scripts, and

R scripts. First, reads are separated and extracted to separate files by barcode (100%

81 match required). Each barcode represents one experiment. Each file is then processed separately in the following steps. First, the sequences that match the sequence between the viewpoint primer and the primary enzyme restriction site are extracted, allowing for no more than two mismatches. This viewpoint sequence is removed so the resulting fragments now start at the ligation point of linked fragments. Next, any of these fragments containing secondary restriction sites are truncated at that site, as these would be hybrid fragments resulting from multiple fragment ligations. Next, the resulting fragments are aligned to the human genome (hg19) using Bowtie2 (Langmead et al.,

2009). The reads mapping to the chromosome of interest (chromosome 12 for

CACNA1C) are extracted and their start nucleotide position is noted (Table 11). The start positions that map to primary enzyme restriction sites are counted, these counts representing the ligation events.

Many factors must be accounted for in order to quantitatively evaluate the frequency of DNA fragments interacting with the viewpoint. There are two types of fragments captured by the 4C template preparation, those that start at a primary site and end at a secondary site and those that have no secondary site but only primary sites on both ends. The latter, that we call "blind," have an amplification disadvantage, as any potential amplicon would need to include a primary-to-secondary fragment in order to circularize and be amplified. The amplification efficiency also depends on fragment size,

GC content, and distance from the viewpoint. While there is a strong negative correlation between counts and the distance from the viewpoint, the relationship is exponential at small distances. To correct for these factors, we generate a file that includes this information, as well as the squared fragment length and the squared distance from

82 viewpoint for every one of the possible fragments at the region of interest, in this case

1Mb up and downstream from the CACNA1C promoter.

This file is then used as input to R to calculate a residual for the counts at each site through a generalized linear model. The residual counts corresponding to each side of each primary fragment are then merged to one value, the maximum of the two. We choose to do this (instead of using for example the average) because there are many potential causes for an amplicon to fail to generate products capable of being mapped back to the genome or to amplified (e.g too long, too short, possible interference of binding proteins), even if it does partake in interactions. The rationale is that if one side shows a strong result, this is sufficient to suggest interactions. The resulting file, which now contains information on count residuals for each primary fragment, is passed again to R to calculate Z scores and p-values. Occasional very high counts, presumably from the strongest interactions, create a highly skewed distribution, which can lead to deflation of signal. To overcome this, the highest count fragments are iteratively removed and p- values recalculated. Finally, the resulting p-values are -log10 transformed and reported in a ".bed" format file, which can be loaded into the genome browser and visualized. To improve visibility, the -log(p-values) in this file are truncated to a max of 20. The analysis pipeline also uses R to graph the residual counts across the region to ensure that the effect of distance is corrected and create QQ plots of the residuals to confirm that their distribution after all the adjustments is near normal with only relatively few high values as expected for true interactions.

83 5.3 Results

Based on our underlying hypothesis that CACNA1C is the target gene with which the schizophrenia-associated variants and corresponding eQTL interacts, we performed

4C-seq to describe DNA-DNA interactions. In this analysis, we utilize two different viewpoints as candidate target promoters for interactions with the eQTL variants. The first was the established primary CACNA1C promoter. The second was the promoter of an alternative transcript previously shown to express a short part of the 3’ end of the gene resulting in a peptide called CCAT with transcription factor activity and affecting the expression of the CACNAC1C gene itself (Gomez-Ospina et al., 2013). We performed experiments on SK-N-SH and HEK293 cells. To confirm initial positive results for the

CACNA1C promoter experiments, we utilized three different restriction enzymes in independent experiments on SK-N-SH cells.

Our results for the CACNA1C promoter viewpoint in SK-N-SH cells for the three different primary enzymes (Figure 12) showed that in addition to the region immediately surrounding the viewpoint, two additional regions consistently showed interactions with

CACNA1C promoter. The first was the 68 kb region in the middle of the 330 kb intron 3 that contains the schizophrenia-associated SNPs. The second was a region at the 3’ end of the gene beyond exon 12. The results for the CACNA1C promoter viewpoint in HEK293 cells show a similar pattern of interactions (Figure 13).

Our results from the CCAT promoter viewpoint did not reveal interactions with the schizophrenia-associated variant region (Figure 14). Therefore, it was not investigated with more than one primary enzyme. Consistent interactions were, however, observed in two cell lines with the CACNA1C promoter, the region at the 3’ end of the gene beyond exon 12, and an additional region in the 3’ end of the gene. 84 5.4 Discussion

Our 4C-seq results confirmed that the region carrying the schizophrenia- associated variants shows interactions with the CACNA1C promoter. This is in agreement with the 3C experiment of Roussos, et al. (2014) (Roussos et al., 2014) and validates the region as an eQTL. By taking a semi-agnostic approach with 4C-seq and using both

CACNA1C and CCAT promoters as viewpoints, we captured a much larger picture of the regulatory landscape of these genes, and we showed that the region of interaction goes beyond the single SNP tested by Roussos et al. (2014), encompassing practically all of the schizophrenia-associated SNPs in the LD region.

Additionally, we identified a region downstream of exon 12 that also may be a regulator of the CACNA1C promoter. Interestingly, this latter region was also identified by our experiments from the CCAT promoter. This suggests there may be common elements in the regulation of the two transcripts but it also highlights the importance of

CCAT, which has not yet received much attention in the literature after it was first reported (Gomez-Ospina et al., 2006, 2013; Schroder et al., 2009).

85 5.5 Chapter 5 Tables

Table 10. Primers Used for 4C-seq

Primer Name Primer Sequence 3CCATDpn_0_3 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatcgctgtgtccctgtacctaga 3CCATHind_5_3 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctggactccttaggatagccaggcactaagcag 3CCATHind_6_3 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttaggcatgctctatagccaggcactaagcag 3CCATHind_7_3 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctctctctaccagaatagccaggcactaagcag 3CCATHind_8_3 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctcagagagggctaatagccaggcactaagcag 3CAC_Dpn_Hind_0_6 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatccagaaatgcagccgatgata 3CAC_Hind_8_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctcagagagggctacatttctccaaggtggaagc 3CAC_Hind_9_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctgctacgctcgagcatttctccaaggtggaagc 3CAC_Hind_11_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctaagaggcagtagcatttctccaaggtggaagc 3CAC_Hind_12_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctgtagaggataagcatttctccaaggtggaagc 5CACCvi_0_5 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatcgcctctcccgatttatttt 5CACEco_1_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttaaggcgacgtagggaaggcctctgtgagc 5CACEco_2_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctcgtactagaggcgggaaggcctctgtgagc 5CACEco_3_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctaggcagaatcctgggaaggcctctgtgagc 5CACEco_4_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttcctgagcggacgggaaggcctctgtgagc 3CACCvi_0_5 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatctgcactaagaggagaccttg 3CACEco_1_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttaaggcgacgtactctacctttccagcaacat 3CACEco_10_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctcgaggctgaagactctacctttccagcaacat 3CACEco_3_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctaggcagaatcctctctacctttccagcaacat 3CACEco_4_5 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttcctgagcggacctctacctttccagcaacat 3CAC_Cvi_Eco_0_6 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatcggccaaagaaaacaaccaga 3CAC_Eco_1_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttaaggcgacgtaaacatgccagcaaacgctat 3CAC_Eco_10_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctcgaggctgaagaaacatgccagcaaacgctat 3CAC_Eco_3_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctaggcagaatcctaacatgccagcaaacgctat 3CAC_Eco_4_6 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttcctgagcggacaacatgccagcaaacgctat 5CAC_Cvi_Bst_0_1 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatcgtgcggtgctcagttcttg 5CAC_Bst_5_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctggactccttagggctcagttcaaaatcctggag 5CAC_Bst_6_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttaggcatgctctgctcagttcaaaatcctggag 5CAC_Bst_7_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctctctctaccagagctcagttcaaaatcctggag 5CAC_Dpn_Nco_0_2 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatctaccagaggggagagggaag 5CAC_Nco_1_1 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttaaggcgacgtaatcacatcccagcactcctc 5CAC_Nco_10_1 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctcgaggctgaagaatcacatcccagcactcctc 5CAC_Nco_3_1 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctaggcagaatcctatcacatcccagcactcctc 5CAC_Nco_4_1 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttcctgagcggacatcacatcccagcactcctc 5CAC_Nla_0_1 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatcgggcccgctccctttgac 5CAC_Dpn_Nla_8_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctcagagagggctaaatttgcccgactaccagag 5CAC_Dpn_Nla_9_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctgctacgctcgagaatttgcccgactaccagag 5CAC_Dpn_Nla_11_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctaagaggcagtagaatttgcccgactaccagag

86 5CAC_Dpn_Nla_12_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctgtagaggataagaatttgcccgactaccagag 3CAC_Cvi_Bst_0_2 caagcagaagacggcatacgagatgtgactggagttcagacgtgtgctcttccgatcccatttctagatgcggttcc 3CAC_Bst_5_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctggactccttaggccatttctagatgcggttcc 3CAC_Bst_6_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatcttaggcatgctctccatttctagatgcggttcc 3CAC_Bst_7_2 aatgatacggcgaccaccgaatctacactctttccctacacgacgctcttccgatctctctctaccagaccatttctagatgcggttcc

Primers are named based on which end of the viewpoint they were designed (5’ of 3’), the viewpoint (CAC for CACNA1C promoter, or CCAT for CCAT promoter), the restriction enzyme the primer is designed near, the other restriction enzyme used to generate the template (if necessary), the barcode number, and the primer design number.

In each section, the reverse primer, designed close to RE2 is listed first, followed by the forward sequencing primers with barcodes to differentiate between replicates and cell types.

87 Table 11. Summary of sequencing metrics for 4C-seq.

Viewpoint Primary Reads Mapping Reads Mapping Promoter Cell Line Enzyme (RE1) Replicate Total Reads to Viewpoint to Chr. 12 CACNA1C SK-N-SH EcoRI 1 2,796,849 83% 61% CACNA1C SK-N-SH NcoI 1 2,928,518 85% 25% CACNA1C SK-N-SH NcoI 2 2,850,733 81% 28% CACNA1C SK-N-SH HindIII 1 8,633,143 90% 80% CACNA1C HEK293 HindIII 1 6,859,247 88% 71% CACNA1C HEK293 HindIII 2 7,302,600 90% 84% CCAT HEK293 HindIII 1 1,437,643 97% 74% CCAT SK-N-SH HindIII 1 1,546,521 96% 55%

Viewpoint indicates which promoter, CACNA1C or CCAT, was used. The “Total Reads” column indicates the total number of reads starting with the barcode unique to that experiment, followed by the percent of reads mapping to the viewpoint sequence, and the percent of reads that continue beyond the viewpoint to an interacting fragment on chromosome 12, where CACNA1C is located.

88 5.6 Chapter 5 Figures

Figure 12. 4C-seq results from the CACNA1C promoter viewpoint in SK-N-SH cells.

Bars indicate -log(p-values) for excessive read counts suggesting interaction with the viewpoint. Three regions with high densities of interactions, are indicated by shading: the

CACNA1C promoter, the schizophrenia-associated SNPs in intron 3 of CACNA1C

(labeled SZ SNPs), and REGION A at the 3’ end of CACNA1C, downstream of exon 12.

89

Figure 13. 4C-seq results from the CACNA1C promoter viewpoint in HEK293 cells.

Bars indicate -log(p-values) for excessive read counts suggesting interaction with the viewpoint. Three regions with high densities of interactions, are indicated by shading: the

CACNA1C promoter, the schizophrenia-associated SNPs in intron 3 of CACNA1C

(labeled SZ SNPs), and REGION A at the 3’ end of CACNA1C, downstream of exon 12.

90

Figure 14. 4C-seq results from the CCAT promoter viewpoint.

Bars indicate -log(p-values) for excessive read counts suggesting interaction with the viewpoint. Five regions are indicated by shading: the CACNA1C promoter, the schizophrenia-associated SNPs in intron 3 of CACNA1C (labeled SZ SNPs), REGION A at the 3’ end of CACNA1C, downstream of exon 12, the CCAT promoter, and REGION B downstream of the CACNA1C and CCAT genes.

91 Chapter 6: Conclusion

The goal of this thesis work was to go beyond the linkage and association studies that have identified schizophrenia risk loci and to conduct functional studies to elucidate how these risk loci may contribute to disease. Often the risk loci are difficult to interpret because they occur in non-coding regions. And although the risk loci are enriched overall for eQTLs, there has been little follow up on specific loci (Nicolae et al., 2010). It is essential to elucidate the functions of genetic risk variants in order to understand the molecular and cellular pathogenesis of schizophrenia, as it is a very common disease and carries a large burden to society.

In NRG3, we identified two independent potential eQTLs. The first, tagged by rs7899151 showed a weak correlation between the risk allele (G) and increased expression of class IV NRG3 transcripts, which include exon 4, in the STG, but not the

DLPFC. The second, tagged by rs10748842, also showed a correlation with alternative splicing. In the STG, the risk allele (C) had a weak correlation with decreased expression of transcripts using the alternative first exon 1B (classes II, III, and IV), and significantly reduced expression of exon 1B transcripts relative to exon 1A transcripts. In the DLPFC, both total and relative expression was significantly decreased with the risk allele of rs10748842. More work on the function of NRG3 variants was done by Mariela Zeledón, another graduate student in our research group (Zeledón et al., 2015).

In CACNA1C, we found a relationship between genotype at rs1006737, in the third intron, and expression of the gene. In the STG, the risk allele (A) was correlated with significantly decreased CACNA1C expression, while we observed a non-significant trend in the opposite direction in the DLPFC. Our findings are in agreement with Bigos et

92 al. (2010), who observed the risk allele (A) was correlated with statistically significant increase in CACNA1C expression in a larger, independent DLPFC sample set. (Bigos et al., 2010).

To investigate the molecular mechanism that drives this correlation, we characterized the regulatory potential of all the SNPs in high LD with rs1006737, as the schizophrenia-associated variant may not be functional itself, but rather it tags a locus likely to contain functional variants. In dual luciferase reporter assays, conducted in SK-

N-SH cells, rs4765905 showed allele-specific regulatory function, where the risk allele

(C) drives lower levels of expression of the firefly luciferase reporter gene. Although we did not observe any consistent effects in HEK293 cells, this result may reflect the complex regulation of CACNA1C, and is similar to the differences we observed in genotype-dependent CACNA1C expression in the STG and DLPFC.

The effects of regulatory DNA sequences are generally mediated by protein interactions, including with transcription factors, activators, and co-activators. Therefore, to further characterize the regulatory potential of the schizophrenia-associated variants, we conducted protein binding assays to look for allele-specific interactions. In EMSA, we observed all 16 SNPs were capable of binding proteins or protein complexes from nuclear extracts of SK-N-SH and HEK293 cell lines. In 13 of the 16 SNPs, although it appeared the same proteins or protein complexes were binding both alleles, there was a preference for one allele over the other, as indicated by a higher intensity signal.

In protein microarrays, we identified the specific nuclear proteins capable of binding the variant sequences. All tested sequences bound at least one protein, and many bound in an allele-dependent manner. Furthermore, we observed that five proteins bound

93 multiple variant sequences, but none of the negative controls. These five proteins were

PKNOX2, PRNP, EIF1AD, GADD45A, and ZKSCAN5.

The combined observations of all the variant sequences binding proteins in

EMSAs and several variant sequences binding the same proteins on the microarray, it is possible that multiple SNPs in this locus interact through protein mediated complexes to regulate CACNA1C expression. This is reminiscent of the proposed role of combinatorial effects of variants in LD on gene expression (Corradin et al., 2014b). Such a phenomenon could be the driver of the strong LD, which links 16 SNPs in only two haplotypes, by favoring the allele combinations in the context of balancing selection. This phenomenon may be important for more behavioral or other phenotype associations.

Better understanding of these phenomena is important towards tapping the translational potential of this and other GWAS identified associations.

Finally, we confirmed that the eQTL tagged by rs1006737 interacts with the

CACNA1C promoter in the 3D space of the cell, through 4C-seq. We also investigated the promoter of a related, but independently transcribed gene, CCAT, which produces a transcription factor that has been shown to negatively regulate CACNA1C expression

(Gomez-Ospina et al., 2013). The CCAT promoter does not interact with the third intron of CACNA1C. However, both promoters interact with a region in intron 12, suggesting this region may regulate expression of both genes.

In this work, we focused developing a substantial understanding of the complex regulation of CACNA1C expression. We showed that the schizophrenia-associated variants in the third intron of CACNA1C act as an eQTL and reside in a region that interacts with the CACNA1C promoter, and we identified some candidate the

94 transcription factors that may be involved. This work was important because it not only contributed to the understanding of gene regulation, a fundamental biological process, but it also elucidated the role CACNA1C plays in the pathogenesis of complex psychiatric disorders. The genes and pathways that will emerge by this work may ultimately serve as therapeutic targets making a real difference in the life of patients and their relatives.

95 References

Abramoff, MD, Magelhaes, PJ, and Ram, SJ (2004). Image Processing with ImageJ. Biophotonics Int. 11, 36–42.

American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders (Arlington, VA: American Psychiatric Publishing).

Van Arensbergen, J., van Steensel, B., and Bussemaker, H.J. (2014). In search of the determinants of enhancer–promoter interaction specificity. Trends Cell Biol. 24, 695– 702.

Bentham, J., and Vyse, T.J. (2013). The development of genome-wide association studies and their application to complex diseases, including lupus. Lupus 22, 1205–1213.

Bigler, E.D., Mortensen, S., Neeley, E.S., Ozonoff, S., Krasny, L., Johnson, M., Lu, J., Provencal, S.L., McMahon, W., and Lainhart, J.E. (2007). Superior Temporal Gyrus, Language Function, and Autism. Dev. Neuropsychol. 31, 217–238.

Bigos, K.L., Mattay, V.S., Callicott, J.H., Straub, R.E., Vakkalanka, R., Kolachana, B., Hyde, T.M., Lipska, B.K., Kleinman, J.E., and Weinberger, D.R. (2010). Genetic Variation in CACNA1C Affects Brain Circuitries Related to Mental Illness. Arch. Gen. Psychiatry 67, 939.

Cardno, A.G., and Gottesman, I.I. (2000). Twin studies of schizophrenia: From bow-and- arrow concordances to Star Wars Mx and functional genomics. Am. J. Med. Genet. 97, 12–17.

Chan, R.C.K., Shum, D., Toulopoulou, T., and Chen, E.Y.H. (2008). Assessment of executive functions: Review of instruments and identification of critical issues. Arch. Clin. Neuropsychol. 23, 201–216.

Chen, P.-L., Avramopoulos, D., Lasseter, V.K., McGrath, J.A., Fallin, M.D., Liang, K.- Y., Nestadt, G., Feng, N., Steel, G., Cutting, A.S., et al. (2009). Fine Mapping on Chromosome 10q22-q23 Implicates Neuregulin 3 in Schizophrenia. Am. J. Hum. Genet. 84, 21–34.

Claes, S., Tang, Y.-L., Gillespie, C.F., and Cubells, J.F. (2012). Chapter 3 - Human genetics of schizophrenia†. In Handbook of Clinical Neurology, F.B. and D.F.S. Michael J. Aminoff, ed. (Elsevier), pp. 37–52.

Consortium, the B.D.G.S. (BiGS) (2010). Meta-analysis of genome-wide association data identifies a risk locus for major mood disorders on 3p21.1. Nat. Genet. 42, 128–131.

Corradin, O., Saiakhova, A., Akhtar-Zaidi, B., Myeroff, L., Willis, J., Cowper-Sal lari, R., Lupien, M., Markowitz, S., and Scacheri, P.C. (2014a). Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24, 1–13. 96 Corradin, O., Saiakhova, A., Akhtar-Zaidi, B., Myeroff, L., Willis, J., Cowper-Sal lari, R., Lupien, M., Markowitz, S., and Scacheri, P.C. (2014b). Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24, 1–13.

Cross-Disorder Group of the Psychiatric Genomics Consortium (2013). Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet Lond. Engl. 381, 1371–1379.

Curtis, D., Vine, A.E., McQuillin, A., Bass, N.J., Pereira, A., Kandaswamy, R., Lawrence, J., Anjorin, A., Choudhury, K., Datta, S.R., et al. (2011). Case–case genome- wide association analysis shows markers differentially associated with schizophrenia and bipolar disorder and implicates calcium channel genes: Psychiatr. Genet. 21, 1–4.

Danino, Y.M., Even, D., Ideses, D., and Juven-Gershon, T. (2015). The core promoter: At the heart of gene expression. Biochim. Biophys. Acta BBA - Gene Regul. Mech. 1849, 1116–1131.

Dimas, A.S., and Dermitzakis, E.T. (2009). Genetic variation of regulatory systems. Curr. Opin. Genet. Dev. 19, 586–590.

Eaton, W.W. (1985). Epidemiology of Schizophrenia. Epidemiol. Rev. 7, 105–126.

Elliott, R. (2003). Executive functions and their disorders Imaging in clinical neuroscience. Br. Med. Bull. 65, 49–59.

Fallin, M.D., Lasseter, V.K., Wolyniec, P.S., McGrath, J.A., Nestadt, G., Valle, D., Liang, K.-Y., and Pulver, A.E. (2003). Genomewide linkage scan for schizophrenia susceptibility loci among Ashkenazi Jewish families shows evidence of linkage on chromosome 10q22. Am. J. Hum. Genet. 73, 601–611.

Faraone, S.V., Hwu, H.-G., Liu, C.-M., Chen, W.J., Tsuang, M.-M., Liu, S.-K., Shieh, M.-H., Hwang, T.-J., Ou-Yang, W.-C., Chen, C.-Y., et al. (2006). Genome scan of Han Chinese schizophrenia families from Taiwan: confirmation of linkage to 10q22.3. Am. J. Psychiatry 163, 1760–1766.

Ferreira, M.A.R., O’Donovan, M.C., Meng, Y.A., Jones, I.R., Ruderfer, D.M., Jones, L., Fan, J., Kirov, G., Perlis, R.H., Green, E.K., et al. (2008). Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder. Nat. Genet. 40, 1056–1058.

Gomez-Ospina, N., Tsuruta, F., Barreto-Chang, O., Hu, L., and Dolmetsch, R. (2006). The C terminus of the L-type voltage-gated calcium channel Ca(V)1.2 encodes a transcription factor. Cell 127, 591–606.

Gomez-Ospina, N., Panagiotakos, G., Portmann, T., Pasca, S.P., Rabah, D., Budzillo, A., Kinet, J.P., and Dolmetsch, R.E. (2013). A Promoter in the Coding Region of the

97 Calcium Channel Gene CACNA1C Generates the Transcription Factor CCAT. PLoS ONE 8, e60526.

Göndör, A., Rougier, C., and Ohlsson, R. (2008). High-resolution circular chromosome conformation capture assay. Nat. Protoc. 3, 303–313.

Gorkin, D.U., Leung, D., and Ren, B. (2014). The 3D Genome in Transcriptional Regulation and Pluripotency. Cell Stem Cell 14, 762–775.

Grant, S.F.A., and Hakonarson, H. (2008). Microarray Technology and Applications in the Arena of Genome-Wide Association. Clin. Chem. 54, 1116–1124.

Green, E.K., Grozeva, D., Jones, I., Jones, L., Kirov, G., Caesar, S., Gordon-Smith, K., Fraser, C., Forty, L., Russell, E., et al. (2010). The bipolar disorder risk allele at CACNA1C also confers risk of recurrent major depression and of schizophrenia. Mol. Psychiatry 15, 1016–1022.

Grice, E.A., Rochelle, E.S., Green, E.D., Chakravarti, A., and McCallion, A.S. (2005). Evaluation of the RET regulatory landscape reveals the biological relevance of a HSCR- implicated enhancer. Hum. Mol. Genet. 14, 3837–3845.

Grisanzio, C., Werner, L., Takeda, D., Awoyemi, B.C., Pomerantz, M.M., Yamada, H., Sooriakumaran, P., Robinson, B.D., Leung, R., Schinzel, A.C., et al. (2012). Genetic and functional analyses implicate the NUDT11, HNF1B, and SLC22A3 genes in prostate cancer pathogenesis. Proc. Natl. Acad. Sci. 109, 11252–11257.

Guo, X., Liu, Z., Wang, X., and Zhang, H. (2013). Genetic association test for multiple traits at gene level. Genet. Epidemiol. 37, 122–129.

Hamshere, M.L., Walters, J.T.R., Smith, R., Richards, A.L., Green, E., Grozeva, D., Jones, I., Forty, L., Jones, L., Gordon-Smith, K., et al. (2013). Genome-wide significant associations in schizophrenia to ITIH3/4, CACNA1C and SDCCAG8, and extensive replication of associations reported by the Schizophrenia PGC. Mol. Psychiatry 18, 708– 712.

Helton, T.D., Xu, W., and Lipscombe, D. (2005). Neuronal L-Type Calcium Channels Open Quickly and Are Inhibited Slowly. J. Neurosci. 25, 10247–10251.

Hofmann, F., Flockerzi, V., Kahl, S., and Wegener, J.W. (2014). L-Type CaV1.2 Calcium Channels: From In Vitro Findings to In Vivo Function. Physiol. Rev. 94, 303– 326.

Hu, S., Xie, Z., Onishi, A., Yu, X., Jiang, L., Lin, J., Rho, H., Woodard, C., Wang, H., Jeong, J.-S., et al. (2009). Profiling the human protein-DNA interactome reveals ERK2 as a transcriptional repressor of interferon signaling. Cell 139, 610–622.

98 Hu, S., Xie, Z., Blackshaw, S., Qian, J., and Zhu, H. (2011). Characterization of Protein– DNA Interactions Using Protein Microarrays. Cold Spring Harb. Protoc. 2011, pdb.prot5614.

Hu, S., Wan, J., Su, Y., Song, Q., Zeng, Y., Nguyen, H.N., Shin, J., Cox, E., Rho, H.S., Woodard, C., et al. (2013). DNA methylation presents distinct binding sites for human transcription factors. eLife 2, e00726.

Imoto, I., Sonoda, I., Yuki, Y., and Inazawa, J. (2001). Identification and characterization of human PKNOX2, a novel homeobox-containing gene. Biochem. Biophys. Res. Commun. 287, 270–276.

International Schizophrenia Consortium, Purcell, S.M., Wray, N.R., Stone, J.L., Visscher, P.M., O’Donovan, M.C., Sullivan, P.F., and Sklar, P. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752.

Johnson, A.D., Handsaker, R.E., Pulit, S.L., Nizzari, M.M., O’Donnell, C.J., and Bakker, P.I.W. de (2008). SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939.

Kao, W.-T., Wang, Y., Kleinman, J.E., Lipska, B.K., Hyde, T.M., Weinberger, D.R., and Law, A.J. (2010). Common genetic variation in Neuregulin 3 (NRG3) influences risk for schizophrenia and impacts NRG3 expression in human brain. Proc. Natl. Acad. Sci. U. S. A. 107, 15619–15624.

Kasai, K., Shenton, M.E., Salisbury, D.F., Hirayasu, Y., Lee, C.-U., Ciszewski, A.A., Yurgelun-Todd, D., Kikinis, R., Jolesz, F.A., and McCarley, R.W. (2003). Progressive Decrease of Left Superior Temporal Gyrus Gray Matter Volume in Patients With First- Episode Schizophrenia. Am. J. Psychiatry 160, 156–164.

Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, and D. (2002). The Human Genome Browser at UCSC. Genome Res. 12, 996– 1006.

Khosravani, H., and Zamponi, G.W. (2006). Voltage-Gated Calcium Channels and Idiopathic Generalized Epilepsies. Physiol. Rev. 86, 941–966.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25.

Law, A.J., Lipska, B.K., Weickert, C.S., Hyde, T.M., Straub, R.E., Hashimoto, R., Harrison, P.J., Kleinman, J.E., and Weinberger, D.R. (2006). Neuregulin 1 transcripts are differentially expressed in schizophrenia and regulated by 5’ SNPs associated with the disease. Proc. Natl. Acad. Sci. U. S. A. 103, 6747–6752.

Lewis, C.M., Levinson, D.F., Wise, L.H., DeLisi, L.E., Straub, R.E., Hovatta, I., Williams, N.M., Schwab, S.G., Pulver, A.E., Faraone, S.V., et al. (2003). Genome Scan

99 Meta-Analysis of Schizophrenia and Bipolar Disorder, Part II: Schizophrenia. Am. J. Hum. Genet. 73, 34–48.

Lichtenstein, P., Yip, B.H., Björk, C., Pawitan, Y., Cannon, T.D., Sullivan, P.F., and Hultman, C.M. (2009). Common genetic influences for schizophrenia and bipolar disorder: A population-based study of 2 million nuclear families. Lancet 373.

Liu, Y., Blackwood, D.H., Caesar, S., de Geus, E.J.C., Farmer, A., Ferreira, M.A.R., Ferrier, I.N., Fraser, C., Gordon-Smith, K., Green, E.K., et al. (2011). Meta-analysis of genome-wide association data of bipolar disorder and major depressive disorder. Mol. Psychiatry 16, 2–4.

Ma, H., Cohen, S., Li, B., and Tsien, R.W. (2012). Exploring the dominant role of Cav1 channels in signalling to the nucleus. Biosci. Rep. 33.

McEvoy, J. (2007). The costs of schizophrenia. J Clin Psychiatry.

McGrath JA, Avramopoulos D, Lasseter VK, and et al (2009). FAmiliality of novel factorial dimensions of schizophrenia. Arch. Gen. Psychiatry 66, 591–600.

McGue, M., and Gottesman, I.I. (1991). The genetic epidemiology of schizophrenia and the design of linkage studies. Eur. Arch. Psychiatry Clin. Neurosci. 240, 174–181.

McVicker, G., Geijn, B. van de, Degner, J.F., Cain, C.E., Banovich, N.E., Raj, A., Lewellen, N., Myrthil, M., Gilad, Y., and Pritchard, J.K. (2013). Identification of Genetic Variants That Affect Modifications in Human Cells. Science 342, 747–749.

Monsell, S. (2003). Task switching. Trends Cogn. Sci. 7, 134–140.

Mueser, K.T., and McGurk, S.R. (2004). Schizophrenia. The Lancet 363, 2063–2072.

Nicodemus, K.K., Law, A.J., Luna, A., Vakkalanka, R., Straub, R.E., Kleinman, J.E., and Weinberger, D.R. (2009). A 5’ promoter region SNP in NRG1 is associated with schizophrenia risk and type III isoform expression. Mol. Psychiatry 14, 741–743.

Nicolae, D.L., Gamazon, E., Zhang, W., Duan, S., Dolan, M.E., and Cox, N.J. (2010). Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genet. 6, e1000888.

O’Donovan, M.C., Craddock, N., Norton, N., Williams, H., Peirce, T., Moskvina, V., Nikolov, I., Hamshere, M., Carroll, L., Georgieva, L., et al. (2008). Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat. Genet. 40, 1053–1055.

Van Os, J., and Kapur, S. (2009). Schizophrenia. The Lancet 374, 635–645.

Owen, M.J. (2012). Implications of genetic findings for understanding schizophrenia. Schizophr. Bull. 38, 904–907.

100 Owen, M.J., Sawa, A., and Mortensen, P.B. (2016). Schizophrenia. Lancet Lond. Engl.

Pers, T.H., Timshel, P., Ripke, S., Lent, S., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Sullivan, P.F., O’Donovan, M.C., Franke, L., and Hirschhorn, J.N. (2016). Comprehensive analysis of schizophrenia-associated loci highlights ion channel pathways and biologically plausible candidate causal genes. Hum. Mol. Genet.

Phan, J.H., Quo, C.-F., and Wang, M.D. (2006). Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics. In Progress in Brain Research, S.E. Hemby and S. Bahn, ed. (Elsevier), pp. 83–108.

Picchioni, M.M., and Murray, R.M. (2007). Schizophrenia. BMJ 335, 91–95.

Pietrobon, D. (2010). CaV2.1 channelopathies. Pflüg. Arch. - Eur. J. Physiol. 460, 375– 393.

Prata, D.P., Breen, G., Osborne, S., Munro, J., St Clair, D., and Collier, D.A. (2009). An association study of the neuregulin 1 gene, bipolar affective disorder and psychosis. Psychiatr. Genet. 19, 113–116.

Radua, J., Phillips, M.L., Russell, T., Lawrence, N., Marshall, N., Kalidindi, S., El-Hage, W., McDonald, C., Giampietro, V., Brammer, M.J., et al. (2010). Neural response to specific components of fearful faces in healthy and schizophrenic adults. NeuroImage 49, 939–946.

Raychaudhuri, S. (2011). Mapping Rare and Common Causal Alleles for Complex Human Diseases. Cell 147, 57–69.

Rebhan, M., Chalifa-Caspi, V., Prilusky, J., and Lancet, D. (1997). GeneCards: integrating information about genes, proteins and diseases. Trends Genet. 13, 163.

Riley, B., Thiselton, D., Maher, B.S., Bigdeli, T., Wormley, B., McMichael, G.O., Fanous, A.H., Vladimirov, V., O’Neill, F.A., Walsh, D., et al. (2010). Replication of association between schizophrenia and ZNF804A in the Irish Case-Control Study of Schizophrenia sample. Mol. Psychiatry 15, 29–37.

Ripke, S., Sanders, A.R., Kendler, K.S., Levinson, D.F., Sklar, P., Holmans, P.A., Lin, D.-Y., Duan, J., Ophoff, R.A., Andreassen, O.A., et al. (2011). Genome-wide association study identifies five new schizophrenia loci. Nat. Genet. 43, 969–976.

Roussos, P., Mitchell, A.C., Voloudakis, G., Fullard, J.F., Pothula, V.M., Tsang, J., Stahl, E.A., Georgakopoulos, A., Ruderfer, D.M., Charney, A., et al. (2014). A Role for Noncoding Variation in Schizophrenia. Cell Rep. 9, 1417–1429.

Schaub, M.A., Boyle, A.P., Kundaje, A., Batzoglou, S., and Snyder, M. (2012). Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759.

101 Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427.

Schmunk, G., and Gargus, J.J. (2013). Channelopathy pathogenesis in autism spectrum disorders. Front. Genet. 4.

Schroder, E., Byse, M., and Satin, J. (2009). L-type calcium channel C terminus autoregulates transcription. Circ. Res. 104, 1373–1381.

Shaw, G., Morse, S., Ararat, M., and Graham, F.L. (2002). Preferential transformation of human neuronal cells by human adenoviruses and the origin of HEK 293 cells. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 16, 869–871.

Sklar, P., Ripke, S., Scott, L.J., Andreassen, O.A., Cichon, S., Craddock, N., Edenberg, H.J., Nurnberger, J.I., Rietschel, M., Blackwood, D., et al. (2011). Large-scale genome- wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983.

Spicuglia, S., Kumar, S., Yeh, J.-H., Vachez, E., Chasson, L., Gorbatch, S., Cautres, J., and Ferrier, P. (2002). Promoter Activation by Enhancer-Dependent and -Independent Loading of Activator and Coactivator Complexes. Mol. Cell 10, 1479–1487.

Splawski, I., Yoo, D.S., Stotz, S.C., Cherry, A., Clapham, D.E., and Keating, M.T. (2006). CACNA1H Mutations in Autism Spectrum Disorders. J. Biol. Chem. 281, 22085–22091.

Splinter, E., de Wit, E., van de Werken, H.J.G., Klous, P., and de Laat, W. (2012). Determining long-range chromatin interactions for selected genomic sites using 4C-seq technology: From fixation to computation. Methods 58, 221–230.

Stefansson, H., Sigurdsson, E., Steinthorsdottir, V., Bjornsdottir, S., Sigmundsson, T., Ghosh, S., Brynjolfsson, J., Gunnarsdottir, S., Ivarsson, O., Chou, T.T., et al. (2002). Neuregulin 1 and susceptibility to schizophrenia. Am. J. Hum. Genet. 71, 877–892.

Stefansson, H., Sarginson, J., Kong, A., Yates, P., Steinthorsdottir, V., Gudfinnsson, E., Gunnarsdottir, S., Walker, N., Petursson, H., Crombie, C., et al. (2003). Association of neuregulin 1 with schizophrenia confirmed in a Scottish population. Am. J. Hum. Genet. 72, 83–87.

Stefansson, H., Ophoff, R.A., Steinberg, S., Andreassen, O.A., Cichon, S., Rujescu, D., Werge, T., Pietiläinen, O.P.H., Mors, O., Mortensen, P.B., et al. (2009). Common variants conferring risk of schizophrenia. Nature 460, 744–747.

Steinberg, S., Mors, O., Børglum, A.D., Gustafsson, O., Werge, T., Mortensen, P.B., Andreassen, O.A., Sigurdsson, E., Thorgeirsson, T.E., Böttcher, Y., et al. (2011). Expanding the range of ZNF804A variants conferring risk of psychosis. Mol. Psychiatry 16, 59–66.

102 Steinthorsdottir, V., Stefansson, H., Ghosh, S., Birgisdottir, B., Bjornsdottir, S., Fasquel, A.C., Olafsson, O., Stefansson, K., and Gulcher, J.R. (2004). Multiple novel transcription initiation sites for NRG1. Gene 342, 97–105.

Sullivan, P.F., Kendler, K.S., and Neale, M.C. (2003). Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Arch. Gen. Psychiatry 60, 1187–1192.

Szymanski, M., Wang, R., Bassett, S.S., and Avramopoulos, D. (2011). Alzheimer’s risk variants in the clusterin gene are associated with alternative splicing. Transl. Psychiatry 1, e18.

Tang, Z.Z., Liang, M.C., Lu, S., Yu, D., Yu, C.Y., Yue, D.T., and Soong, T.W. (2004). Transcript Scanning Reveals Novel and Extensive Splice Variations in Human L-type Voltage-gated Calcium Channel, Cav1.2 α1 Subunit. J. Biol. Chem. 279, 44335–44343.

Ulianov, S.V., Gavrilov, A.A., and Razin, S.V. (2015). Chapter Five - Nuclear Compartments, Genome Folding, and Enhancer-Promoter Communication. In International Review of Cell and Molecular Biology, K.W. Jeon, ed. (Academic Press), pp. 183–244.

Vernimmen, D., and Bickmore, W.A. (2015). The Hierarchy of Transcriptional Activation: From Enhancer to Promoter. Trends Genet. 31, 696–708.

Wang, K.-S., Zhang, Q., Liu, X., Wu, L., and Zeng, M. (2012). PKNOX2 is associated with formal thought disorder in schizophrenia: a meta-analysis of two genome-wide association studies. J. Mol. Neurosci. MN 48, 265–272.

Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm, A., Flicek, P., Manolio, T., Hindorff, L., et al. (2014). The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006.

Williams, H.J., Craddock, N., Russo, G., Hamshere, M.L., Moskvina, V., Dwyer, S., Smith, R.L., Green, E., Grozeva, D., Holmans, P., et al. (2011a). Most genome-wide significant susceptibility loci for schizophrenia and bipolar disorder reported to date cross-traditional diagnostic boundaries. Hum. Mol. Genet. 20, 387–391.

Williams, H.J., Norton, N., Dwyer, S., Moskvina, V., Nikolov, I., Carroll, L., Georgieva, L., Williams, N.M., Morris, D.W., Quinn, E.M., et al. (2011b). Fine mapping of ZNF804A and genome-wide significant evidence for its involvement in schizophrenia and bipolar disorder. Mol. Psychiatry 16, 429–441.

Xie, Z., Hu, S., Blackshaw, S., Zhu, H., and Qian, J. (2010). hPDI: a database of experimental human protein–DNA interactions. Bioinformatics 26, 287–289.

Zeledón, M., Eckart, N., Taub, M., Vernon, H., Szymanksi, M., Wang, R., Chen, P.-L., Nestadt, G., McGrath, J.A., Sawa, A., et al. (2015). Identification and functional studies of regulatory variants responsible for the association of NRG3 with a delusion phenotype in schizophrenia. Mol. Neuropsychiatry 1, 36–46.

103 Zhang, Q., Shen, Q., Xu, Z., Chen, M., Cheng, L., Zhai, J., Gu, H., Bao, X., Chen, X., Wang, K., et al. (2012). The Effects of CACNA1C Gene Polymorphism on Spatial Working Memory in Both Healthy Controls and Patients with Schizophrenia or Bipolar Disorder. Neuropsychopharmacology 37, 677–684.

Zhao, H., Friedman, R.D., and Fournier, R.E.K. (2007). The Locus Control Region Activates Serpin Gene Expression through Recruitment of Liver-Specific Transcription Factors and RNA Polymerase II. Mol. Cell. Biol. 27, 5286–5295.

(2013). Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. The Lancet 381, 1371–1379.

The National Institute of Mental Health (2016). Schizophrenia.

104 Curriculum Vitae

Nicole Eckart (508) 272-0215 [email protected] www.linkedin.com/in/nicoleeckart

EDUCATION

Johns Hopkins University, Baltimore, MD Ph.D. Human Genetics and Molecular Biology Expected March 2016

University of Massachusetts, Amherst, MA B.S. Biology September 2007 - May 2010

University of Ghana, Ghana, West Africa Study Abroad January - May 2009

Fairfield University, CT Biology Major September 2006 – May 2007

RESEARCH EXPERIENCE

Ph.D. Candidate August 2010 - Present Laboratory of Dimitrios Avramopoulos, M.D., Ph.D. Johns Hopkins University

 Elucidated the function of non-coding genetic variants associated with schizophrenia to interpret their contribution to disease and identify potential drug targets  Discovered correlation between genotype of non-coding variants and mRNA levels in 284 post-mortem brain samples, demonstrating non-coding variants play a role in transcriptional regulation  Identified DNA-protein interactions between non-coding variants and transcription factor proteins, suggesting the molecular mechanism of transcriptional regulation  Optimized the Circularized Chromatin Conformation Capture with Next- Generation Sequencing (4C-seq) protocol at the university to identify 3- dimmensional DNA interactions  Appointed to train all 75 users on the BioTek Synergy2 plate reader and software, prepared quick start documentation, and recognized as the technical expert for the instrument

105 Volunteer Health Educator September 2014 – June 2015 Harriet Lane Pediatric Clinic Johns Hopkins Children Center

 Developed plans of action for over 50 adolescent patients to make healthy lifestyle changes using motivational interviewing techniques; over 10 female patients made follow-up appointments to receive long-acting reversible contraceptives  Educated patients, their parents and friends about methods for reducing risk of HIV and other sexually transmitted infections, as well as the benefits of various methods of contraception  Earned HIV/AIDS Counseling & Testing Skills Level I Certification from the Maryland Department of Health and Mental Hygiene

Undergraduate Researcher February 2008 – May 2010 Laboratory of Ana Caicedo, Ph.D. University of Massachusetts Amherst

 Investigated the genetic and phenotypic diversity of the grass species, Brachypodium distachyon, a model organism for biofuels and agricultural crops such as wheat and barley  Demonstrated a preliminary correlation between increased number of sets of in the plant and increased biomass, which produces a more profitable crop

Undergraduate Intern June – August 2008 Criminalistics Unit Massachusetts State Police Crime Laboratory

 Optimized protocols for manual detection of biological fluids with various UV light sources  Developed and wrote user protocol for new software for automated detection of sperm cells  Systematically organized case files, evidence, office, and laboratory supplies  Ensured efficient workflow by calling investigators and attorneys to coordinate evidence processing and proper documentation

Undergraduate Researcher September – December 2007 Laboratory of Theodore Stankowich, Ph.D. University of Massachusetts Amherst

106  Built chambers and objects for the novel object recognition behavioral test to determine if behavioral differences between species of spiders exist  Conducted novel object recognition behavioral test for over 25 spiders by observing the spiders’ behavior inside the chamber and recording time spend interacting with a novel object compared to a familiar object  Collected spider species in the field and managed housing and feeding of approximately 25 spiders

LEADERSHIP EXPERIENCE

Organizing Member, Journeys of Women in Genetics August 2014 - Present Institute of Genetic Medicine Johns Hopkins University

 Identified, selected, and invited six female faculty members as workshop speakers to provide interesting and diverse perspectives, appealing to a broad audience of female students  Coordinated four annual workshops and facilitated discussions between female faculty and trainees about the obstacles women in science face and possible strategies to overcome those challenges  Successfully negotiated with and gained approval from stakeholders to begin hosting a new annual co-ed professional development event  Marketed the new event via emails and posters, successfully attracted a full audience of about 70 trainees, and received rave reviews from participants  Moderated a panel of four faculty members for the inaugural professional development event

Editor, Student Guidebook August 2014 - Present Institute of Genetic Medicine Johns Hopkins University

 Edited the second edition of the guidebook for content, organization, and grammar  Authored sections on research ethics requirements, professional development plans, health insurance, taxes, and vacation time

Organizing Member, New Student Recruitment February 2011 - 2015 Institute of Genetic Medicine Johns Hopkins University

 Hosted events for recruits to meet current students and facilitated conversation

107  Lead a team of other graduate students, delegating tasks to ensure the recruitment activities stayed on schedule  Coordinated with current students to provide meals and transportation for 5-15 recruits at each of four interviewing sessions each year  Interviewed recruits to provide information about the graduate program and answer any questions

MENTORING EXPERIENCE

Mentor to Undergraduate Students June – August 2012, 2014, 2015 Department of Psychiatry Johns Hopkins University

 Designed research projects for two undergraduate students, suitable to complete within their 10-week internship; both students continued their training and were accepted into medical school  Supervised students in basic molecular biology techniques and reinforced fundamental concepts in genetics and biology as they related to their projects  Advised students in the development and delivery of poster and oral presentations of their results, as well as a journal club presentations

Tutor to Graduate Students October 2012 – June 2014 Institute of Genetic Medicine Johns Hopkins University

 Clarified difficult concepts with five students on an individual or small group basis and reviewed errors on homework assignments and exams for the Fundamentals of Genetics course  Two students requested continued tutorship throughout the completion of their first and second year courses and oral qualifying exams  Distilled two years of curriculum into a series of concise one-page study guides, requested and utilized by 10 students in preparation for their qualifying exams

Tutor to Elementary Students January – May 2008 Citizen Scholars Program Holyoke, MA

 Tutored children ages 5-13, assisted them with their homework assignments, and re-focused energy toward completing assignments  Directed group play activities, encouraging positive social behaviors

108 PEER REVIEWED PUBLICATIONS

Eckart N, Song Q, Yang R, Wang R, Zhu H, McCallion A, Avramopoulos D. “ “Functional characterization of schizophrenia-associated variation in CACNA1C.” Manuscript in preparation.

Zeledón M, Eckart N, Taub M, Vernon H, Szymanksi M, Wang R, et al. “Identification and functional studies of regulatory variants responsible for the association of NRG3 with a delusion phenotype in schizophrenia.” Molecular Neuropsychiatry. 2015;1:36–46.

Avramopoulos D, Pearce BD, McGrath J, Wolyniec P, Wang R, Eckart N, et al. “Infection and inflammation in schizophrenia and bipolar disorder: a genome wide study for interactions with genetic variation.” PLoS One. 2015 Mar 17;10(3).

POSTERS PRESENTATIONS

N. Eckart, R. Wang, R. Yang, Q. Song, H. Zhu, D. Valle, D. Avramopoulos, Regulatory Function of Schizophrenia-Associated Variants in CACNA1C. 3rd Annual Mayrlyand Genetics, Epidemiology, and Medicine Genetics Research Day: February, 2016 in Baltimore, MD.

N. Eckart, R. Wang, R. Yang, Q. Song, H. Zhu, D. Valle, D. Avramopoulos, Regulatory Function of Schizophrenia-Associated Variants in CACNA1C. 65th Annual Meeting of The American Society of Human Genetics: October, 2015 in Baltimore, MD.

N. Eckart, R. Wang, M. Zeledòn, M. Szymanski-Pierce, D. Valle, D. Avramopoulos, Regulatory function of CACNA1C schizophrenia-associated variants. 2nd Annual Mayrlyand Genetics, Epidemiology, and Medicine Genetics Research Day: February, 2015 in Baltimore, MD.

N. Eckart, R. Wang, M. Zeledòn, M. Szymanski-Pierce, D. Valle, D. Avramopoulos, Regulatory function of CACNA1C schizophrenia-associated variants. 64th Annual Meeting of The American Society of Human Genetics: October, 2014 in San Diego, CA.

N. Eckart, R. Wang, M. Szymanski-Pierce, M. Zeledon, S. Goswami, D. Valle, et al., Regulatory Function of CACNA1C Schizophrenia-Associated Variants. 1st Annual Mayrlyand Genetics, Epidemiology, and Medicine Genetics Research Day: February, 2014 in Baltimore, MD.

N. Eckart, R. Wang, M. Szymanski-Pierce, M. Zeledon, S. Goswami, D. Valle, et al., Regulatory Function of CACNA1C Schizophrenia-Associated Variants. Presented at the 63rd Annual Meeting of The American Society of Human Genetics: October, 2013 in Boston, MA.

M. Zeledon, M. Taub, N. Eckart, M. Beer, R. Wang, M. Szymanski, et al., Variants in NRG3 Associated with Delusion Have Regulatory Potential and Differentially Bind to

109 Nuclear Proteins. Presented at the 63rd Annual Meeting of The American Society of Human Genetics: October, 2013 in Boston, MA.

N. Eckart, R. Wang, J. McGrath, P. Wolyniec, M. Zeledòn, M. Szymanski, et al., Regulatory effects of genes associated with Schizophrenia. Presented at the 62nd Annual Meeting of The American Society of Human Genetics: November 2012 in San Francisco, CA.

M. Zeledón, M. Taub, N. Eckart, R. Wang, M. Szymanski, P. Chen, et al., Delusion- associated SNPs in NRG3 Show Regulatory Potential, Dysregulate NRG3 Splicing and Differentially Bind to Nuclear Proteins. Presented at the 62nd Annual Meeting of The American Society of Human Genetics: November 2012 in San Francisco, CA.

M. Zeledón, M. Taub, N. Eckart, R. Wang, M. Szymanski, P. Chen, et al., Variants in NRG3 at 10q23.1 correlated with a subtype of schizophrenia with severe delusions. Presented at the 61st Annual Meeting of The American Society of Human Genetics: October, 2011 in Montreal, Quebec.

AWARDS AND HONORS

2016: Winner of the Maryland Genetics, Epidemiology, and Medicine Genetics Research Day Predoctoral Poster Competition 2010: B.S. in Biology Magna Cum Laude 2009 - 2010: Commonwealth College Honors Research Grant at the University of Massachusetts Amherst 2009: Walpole Chamber of Commerce Scholarship 2009: Richardi Memorial Scholarship 2009: Howard Hughes Medical Institute Research Sponsorship 2006-2010: Walpole Scholarship Foundation Scholarship 2006: Presidential Scholarship at Fairfield University

110