Investigation of Splicing Variants Contributing Risk to Psychiatric Disorders

by

Emma Reble

A thesis submitted in conformity with the requirements for the degree of Master of Science

Institute of Medical Science University of Toronto

© Copyright by Emma Reble 2017

Investigation of Gene Splicing Variants Contributing Risk to Psychiatric Disorders

Emma Reble

Master of Science

Institute of Medical Science University of Toronto

2017

Abstract

The majority of the variants found through genome-wide association studies (GWAS) to be associated with psychiatric disorders are not found to cause a coding change, therefore it is predicted that these variants are causing gene expression changes, including changes to alternative splicing. Datasets of variants associated with alternative splicing changes in the general population have also been curated with the aim of aiding the search for functional GWAS variants. This study uses these datasets to identify variants associated with alternatively spliced isoforms in schizophrenia and bipolar disorder.

Alternatively spliced isoforms in four (APOPT1, AS3MT, NEK4 and RPGRIP1L) were identified that converge on the mitochondria and cilia, implicating these cellular structures in psychiatric disease and providing avenues of genes and pathways for further study. A method to study alternative splicing changes in psychiatric disorders is presented that would also be a useful tool for studying other complex genetic disorders.

ii Table of Contents

Table of Contents ...... iii Acknowledgments ...... vi Statement of Contributions ...... vii List of Abbreviations ...... viii List of Tables ...... x List of Figures ...... xi Chapter 1: Introduction and Background ...... 1 1.1 Psychiatric Disorders and Their Genetic Contributions ...... 1 1.1.1 Schizophrenia ...... 1 1.1.2 Bipolar Disorder ...... 3 1.1.3 Elucidating Genetic Risk for Psychiatric Disorders ...... 4 1.2 Gene Splicing ...... 5 1.2.1 Splicing Mechanisms ...... 5 1.2.2 Types of Alternative Splicing ...... 7 1.2.3 Alternative Splicing in Mendelian Disease ...... 9 1.2.4 Alternative Splicing in the Brain ...... 10 1.2.5 Alternative Splicing in Psychiatric Disorders ...... 11 1.2.5.1 Alternative Splicing in Autism Spectrum Disorder (ASD) ...... 12 1.2.5.2 Alternative Splicing in Bipolar Disorder ...... 14 1.2.5.3 Alternative Splicing in Schizophrenia ...... 15 1.3 Splicing Predictions and Datasets ...... 17 1.3.1 SPIDEX ...... 18 1.3.2 Altrans ...... 19 1.3.3 sQTLSeekeR ...... 19 1.3.4 Protein Truncating Variants (PTV) ...... 20 1.3.5 DLPFC sQTLs ...... 21 1.4 Protein Structure and Function Prediction ...... 21 1.5 Hypothesis, Aims and Overview of Thesis ...... 23 Chapter 2: Materials and Methods ...... 25 2.1 Identification of Schizophrenia and Bipolar Disorder Candidate SNVs ...... 25 2.2 Brain tissue samples ...... 25 2.3 SNV Genotyping ...... 27 2.4 Expression of Alternative Splicing Isoforms ...... 28 2.4.1 Identification of Splicing Variants ...... 28 2.4.2 Quantification of Identified Splicing Variants Using ddPCR ...... 31 2.5 Predicting the Mechanism of the Splice Variants ...... 33

iii 2.6 Predicting the Effect on Protein Function ...... 34 Chapter 3: Results ...... 35 3.1 Splicing SNVs associated with Schizophrenia and Bipolar Disorder ...... 35 3.2 Genes studied for alternative splicing ...... 42 3.2.1 APOPT1 ...... 42 3.2.1.1 APOPT1 SNVs Associated with Alternative Splicing ...... 43 3.2.1.2 Identification of Two APOPT1 Transcript Isoforms ...... 45 3.2.1.3 Quantification of APOPT1 Isoforms and Association to rs4906337, rs10431750 and rs7140568 ...... 46 3.2.1.4 Protein Function Prediction of APOPT1 Isoforms ...... 48 3.2.2 AS3MT ...... 50 3.2.2.1 AS3MT SNV Predicted to Alter Splicing ...... 51 3.2.2.2 Identification of Two AS3MT Transcript Isoforms ...... 53 3.2.2.3 Quantification of AS3MT Isoforms and Association to rs3740400 ...... 54 3.2.2.4 Protein Function Prediction of AS3MT Isoforms ...... 56 3.2.2.5 Hypothesis of inefficient arsenic metabolism contributing to risk of schizophrenia 57 3.2.3 BNIP3L ...... 59 3.2.3.1 SNVs Associated with Alternative BNIP3L Transcripts ...... 59 3.2.3.2 Identification of Two BNIP3L Transcript Isoforms ...... 60 3.2.3.3 Quantification of BNIP3L Isoforms and Association to rs11779570, rs11786753, rs60377146 and rs11135938 ...... 61 3.2.3.4 Protein Function Prediction of BNIP3L Isoforms ...... 63 3.2.4 CACNA1D ...... 65 3.2.5 KIF21B ...... 66 3.2.6 LRRC48 ...... 68 3.2.6.1 LRRC48 SNVs Predicted to Alter Exon 7 and 8 Splicing ...... 69 3.2.6.2 LRRC48 SNV Predicted to Alter Exon 2 Splicing ...... 71 3.2.6.3 Identification of Three LRRC48 Transcript Isoforms ...... 72 3.2.6.4 Quantification of LRRC48 Isoforms and Association to rs11870660 ...... 73 3.2.6.5 Protein Function Prediction of LRRC48 Isoforms ...... 76 3.2.7 NEK4 ...... 76 3.2.7.1 NEK4 SNVs Predicted to Alter Exon 5 Splicing ...... 77 3.2.7.2 NEK4 SNVs Predicted to Alter Exon 4 Splicing ...... 78 3.2.7.3 Identification of Two NEK4 Transcript Isoforms ...... 81 3.2.7.4 Quantification of NEK4 Isoforms and Association to SNVs associated with NEK4 exon 4 skipping ...... 83 3.2.7.5 Protein Function Prediction of NEK4 Isoforms ...... 84 3.2.8 NGEF ...... 85 3.2.9 PRKAG1 ...... 87 3.2.10 PPP1R16B ...... 90 3.2.11 RPGRIP1L ...... 93 3.2.11.1 RPGRIP1L SNVs Associated with Alternative Splicing ...... 93 3.2.11.2 Identification of Two RPGRIP1L Transcript Isoforms ...... 95 3.2.11.3 Quantification of RPGRIP1L Isoforms and Association to SNVs associated with RPGRIP1L exon 20 skipping ...... 97 3.2.11.4 Protein Function Prediction of RPGRIP1L Isoforms ...... 99 3.2.12 SNAP91 ...... 100 3.2.13 STAB1 ...... 103

iv Chapter 4: Discussion and Conclusions ...... 104 4.1 Implications ...... 105 4.1.1 Alternative transcripts converge on the mitochondria and the cilia ...... 106 4.1.2 Combined Alternative Splicing Model of Psychiatric Risk ...... 109 4.1.3 Negative Results ...... 110 4.1.4 Useful method to identify splicing changes in other complex disorders ...... 111 4.2 Future Directions ...... 112 4.2.1 Causal SNV confirmation ...... 112 4.2.2 Experimentally tested protein function ...... 113 4.2.3 Combined alternative splicing and environmental effects ...... 115 4.2.4 Splicing Therapies ...... 116 4.3 Conclusions ...... 118 Bibliography ...... 119

v Acknowledgments

First and foremost I would like to thank my supervisor, Dr. Cathy Barr, for giving me the opportunity to work in her lab as well as her support and guidance throughout this project. She is incredibly dedicated and passionate about science and genetics in particular and has stimulated and encouraged my own passion for research. I would also like to thank my committee members, Dr. Ana Andreazza, Dr. Jennifer Mitchell and Dr. Jeff Wrana for their feedback and insight on this project.

I would like to thank the Harvard Brain Tissue Resource, the Douglas Brain Bell Canada Brain Bank and the National Institute of Child Health and Human Development for providing the brain tissue samples. I would also like to acknowledge the support of this research by the Ontario Mental Health Foundation and my scholarship supported by the Hospital for Sick Children Research Training Centre.

Finally, I would especially like to thank all of the Barr lab members for their support and friendship throughout the past two years. Specifically, I would like to thank Karen Wigg for her help with the bioinformatics portion of this project, Yu Feng for all her help and troubleshooting in the lab, Dr. Aidan Dineen for his advice and mentorship and Kaitlyn Price for her encouragement.

vi Statement of Contributions

Karen Wigg was involved in the bioinformatics analyses including downloading data, making BED files and training of bioinformatics techniques. Karen Wigg and Kaitlyn Price were both involved in the preparation and analysis of the genotyping results. Yu Feng performed the brain sample DNA and RNA extractions and was involved in the training of lab techniques.

The brain tissue samples were generously received from the Harvard Brain Tissue Resource, the Douglas Brain Bell Canada Brain Bank and the National Institute of Child Health and Human Development.

This research was supported by the Ontario Mental Health Foundation and I am supported by a scholarship from the Hospital for Sick Children Research Training Centre.

vii List of Abbreviations AChE: Acetylcholinesterase ANOVA: Analysis Of Variance AON: Antisense Oligonucleotide APOPT1: Apoptogenic 1, mitochondrial AS3MT: Arsenic (+3 Oxidation State) Methyltransferase ASD: Autism Spectrum Disorder ATM: Ataxia Telangiectasia Mutated BD: Bipolar Disorder BNIP3L: BCL2 Interacting Protein 3 Like CACNA1D: Calcium Voltage-Gated Channel Subunit Alpha1 D cDNA: Complimentary Deoxyribonucleic Acid CNV: Copy Number Variation ddPCR: Digital Droplet Polymerase Chain Reaction DLPFC: Dorsolateral Prefrontal Cortex DNA: Deoxyribonucleic Acid DRD2: Dopamine Receptor D2 eQTL: expression Quantitative Trait Loci ESE: Exonic Splicing Enhancer ESS: Exonic Splicing Silencer EtBr: Ethidium Bromide FTDP-17: Frontotemporal Dementia with Parkinsonim-17 GABA: Gamma-Aminobutyric Acid GAD: Glutamic Acid Decarboxylase GO: Gene Ontology GTEx: Genotype Tissue Expression GWAS: Genome Wide Association Study hnRNP: Heterogeneous Nuclear Ribonucleoprotein ISE: Intronic Splicing Enhancer ISS: Intronic Splicing Silencer KIF21B: Kinesin Family Member 21B

viii LD: Linkage Disequilibrium LRRC48: Leucine Rich Repeat Containing 48 MAF: Minor Allele Frequency MAPT: Microtubule-Associated Protein Tau mRNA: Messenger Ribonucleic Acid N-DRC: Nexin-Dynein Regulatory Complex NEK4: Never In Mitosis Gene A Related Kinase 4 NF1: Neurofibromin 1 NGEF: Neuronal Guanine Nucleotide Exchange Factor NOVA: Neuro-Oncological Ventral Antigen NMD: Nonsense Mediated Decay NRG: Neuregulin PCR: Polymerase Chain Reaction PPP1R16B: Protein Phosphatase 1 Regulatory Subunit 16 B PRKAG1: Protein Kinase AMP-activated Non-Catalytic Subunit Gamma 1 PSI: Percent Spliced In PTBP: Polypyrimidine Tract Binding Protein PTV: Protein Truncating Variant RIN: RNA Integrity Number RNA: Ribonucleic Acid ROS: Reactive Oxygen Species RPGRIP1L: Retinitis Pigmentosa GTPase Regulator Interacting Protein 1-Like SCZ: Schizophrenia SNAP91: Synaptosome Associated Protein 91 snRNP: Small Nuclear Ribonucleic Protein SNV: Single Nucleotide Variant sQTL: splicing Quantitative Trail Loci SR: Serine/Arginine Rich SRRM4/nSR100: Serine/Arginine Repetitive Matrix Protein 4 STAB1: Stabilin 1 VNTR: Variable Number Tandem Repeat

ix List of Tables

Table 1. Demographics of individuals from whom brain sample tissues were collected. 26 Table 2. Schizophrenia and alternative splicing associated SNVs...... 36 Table 3. Bipolar disorder and alternative splicing associated SNVs...... 38 Table 4. SNVs associated with alternative splicing of APOPT1...... 44 Table 5. SNV predicted to alter splicing of AS3MT...... 52 Table 6. SNVs associated with alternative BNIP3L transcripts...... 60 Table 7. SNV predicted to alter splicing of CACNA1D...... 65 Table 8. SNV associated with alternative splicing of KIF21B...... 67 Table 9. SNVs predicted to alter splicing of LRRC48...... 69 Table 10. SNV predicted to alter splicing of LRRC48...... 71 Table 11. SNVs associated with alternative splicing of NEK4...... 77 Table 12. SNVs associated with alternative splicing of NEK4...... 79 Table 13. Brain tissue sample genotypes of SNVs associated with alternative splicing of NEK4...... 81 Table 14. SNV predicted to alter splicing of NGEF...... 85 Table 15. SNVs associated with alternative splicing of PRKAG1...... 88 Table 16. Brain tissue sample genotypes of SNVs associated with alternative splicing of PRKAG1...... 88 Table 17. SNVs associated with alternative splicing of PPP1R16B...... 90 Table 18. SNVs associated with alternative splicing of RPGRIP1L...... 94 Table 19. Brain tissue sample genotypes of SNVs associated with alternative splicing of RPGRIP1L...... 96 Table 20. SNVs associated with alternative splicing of SNAP91...... 100 Table 21. Brain tissue sample genotypes of SNVs associated with alternative splicing of SNAP91...... 101 Table 22. SNV predicted to alter splicing of STAB1...... 103 Table 23. Summary of major findings: alternatively spliced transcripts associated with schizophrenia and bipolar disorder risk alleles...... 105

x List of Figures

Figure 1. Schematic of spliceosome snRNPs binding to pre-mRNA...... 6 Figure 2. Events leading to alternative transcripts...... 8 Figure 3. Schematic of the cDNA assay used to identify alternative splicing transcripts with exon skipping...... 29 Figure 4. Schematic of the cDNA assay used to identify alternative splicing transcripts with intron retention...... 30 Figure 5. Schematic of the cDNA assay used to identify alternative splicing transcripts with alternative first exons...... 31 Figure 6. Schematic of the probe-based assay used to identify the proportion of each splicing isoform...... 32 Figure 7. Digital Droplet PCR output of probe-based exon skipping assay...... 32 Figure 8. Identification of two APOPT1 transcript isoforms through cDNA PCR amplification...... 46 Figure 9. Quantification of APOPT1 transcripts in the DLPFC...... 47 Figure 10. Quantification of APOPT1 transcripts in the hippocampus...... 48 Figure 11. PredictProtein structural and binding site results for full-length and truncated APOPT1 protein sequences...... 49 Figure 12. Arsenic metabolism pathway...... 51 Figure 13. rs3740400 SpliceAid results...... 52 Figure 14. Identification of two AS3MT transcript isoforms through cDNA PCR amplification...... 54 Figure 15. Quantification of AS3MT transcripts in the DLPFC...... 55 Figure 16. Quantification of AS3MT transcripts in the hippocampus...... 56 Figure 17. AS3MT Pfam protein predictions...... 57 Figure 18. Identification of two BNIP3L transcript isoforms through cDNA PCR amplification...... 61 Figure 19. Quantification of BNIP3L transcripts in the DLPFC...... 62 Figure 20. Quantification of BNIP3L transcripts in the hippocampus...... 63 Figure 21. PredictProtein binding site results for full-length and truncated BNIP3L protein sequences...... 64 Figure 22. No identification of exon 25 skipped CACNA1D transcript...... 66 Figure 23. No identification of exon 28 skipped KIF21B transcript...... 68 Figure 24. No identification of exon 7 or 8 skipped LRRC48 transcripts...... 70 Figure 25. Identification of three LRRC48 transcript isoforms through cDNA PCR amplification...... 72 Figure 26. Quantification of LRRC48 transcripts in the DLPFC...... 74 Figure 27. Quantification of LRRC48 transcripts in the hippocampus...... 75 Figure 28. No identification of exon 5 skipped NEK4 transcript...... 78 Figure 29. Identification of two NEK4 transcript isoforms through DLPFC cDNA PCR amplification...... 82 Figure 30. Quantification of NEK4 transcripts in the DLPFC...... 83 Figure 31. Quantification of NEK4 transcripts in the hippocampus...... 84 Figure 32. NEK4 Pfam protein predictions...... 85 Figure 33. No identification of exon 2 skipped NGEF transcript...... 87

xi Figure 34. No identification of PRKAG1 transcript with partial intron 2 retention...... 89 Figure 35. No identification of exon 7 skipped PPP1R16B transcript...... 92 Figure 36. Identification of two RPGRIP1L transcript isoforms through cDNA PCR amplification...... 97 Figure 37. Quantification of RPGRIP1L transcripts in the DLPFC...... 98 Figure 38. Quantification of RPGRIP1L transcripts in the hippocampus...... 99 Figure 39. No identification of SNAP91 transcript with intron 11 retention...... 102

xii Chapter 1

Introduction and Background

1.1 Psychiatric Disorders and Their Genetic Contributions 1.1.1 Schizophrenia Schizophrenia is a chronic and devastating disorder affecting ~1.1% of the population, with little variance in prevalence worldwide (Regier et al., 1993). The onset of schizophrenia is generally just prior to or during early adulthood. The hallmarks of this disorder are abnormal social behavior and a distortion in the perception of reality (Regier et al., 1993). Symptoms are divided into four main categories: positive, negative, cognitive and mood disturbances. Positive symptoms include delusions and hallucinations; negative symptoms include a flat affect and social and emotional withdrawal; cognitive symptoms include problems with attention, memory, decision- making and executive functioning; and mood disturbances include dysphoria and depression.

The underlying mechanisms of schizophrenia are not well understood, however a strong genetic component is observed with family and twin studies identifying heritability estimates of ~81% (Sullivan, Kendler, & Neale, 2003). Despite known genetic involvement, the specific genes contributing to schizophrenia and their potential impact are still not well known. This lack of etiologic understanding, among other factors, has prevented the discovery of effective and personalized medicine. Current treatments largely target the positive symptoms of schizophrenia and are often accompanied by severe side effects with little impact on the other categories of detrimental symptoms (Wright, 2014). Due to the lack of effective treatment strategies for most patients, two- thirds of individuals with schizophrenia live with detrimental symptoms and 5-13% eventually die by suicide (Pompili et al., 2007; Saha, Chant, Welham, & McGrath, 2005).

1 Taken together, these factors have caused schizophrenia to be a large social, financial and health burden to patients, families and society (Knapp, Mangalore, & Simon, 2004).

Unlike monogenic disorders where a mutation in a single gene will cause a specific phenotype, the genetic basis of schizophrenia is very complex. Variants in hundreds of genes have been identified in schizophrenia patients, suggesting that multiple genes and variants are likely to be contributing to the schizophrenia phenotype (International Schizophrenia et al., 2009; S. H. Lee et al., 2012; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). Debate exists in the field with regards to the relative involvement of rare or common variants in schizophrenia etiology. Two main theories have been suggested about the genetic architecture of schizophrenia. One proposes an extreme heterogeneity or multiple rare variant model in which different rare variants cause the same schizophrenic phenotype in different patients (McClellan & King, 2010). The other theory is a polygenic/multifactorial threshold model in which a predisposition and accumulation of both common and rare risk variants exceeds the threshold of disease for schizophrenia (Gottesman & Shields, 1967; International Schizophrenia et al., 2009). Due to the identification of many common variants associated with schizophrenia, the latter theory is becoming the most widely accepted.

With the goal to identify variants involved in psychiatric disorders, genome-wide association studies (GWAS) have been performed using large samples of psychiatric patients and controls. GWAS examine genetic variants in control and affected groups to search for associations between variants and a disorder that are more commonly seen in patients. These studies have supported the polygenic theories of psychiatric disorders (Sullivan, Daly, & O'Donovan, 2012). GWAS are responsible for identifying new genetic loci associated with psychiatric disorders, such as a breakthrough paper that recently identified 108 regions associated with schizophrenia (Ripke et al., 2013; Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium, 2011; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014).

2 Although some rare variants of large effect have been identified for schizophrenia, there is more evidence to support the multiple common variants theory with each variant contributing to the risk of schizophrenia (International Schizophrenia et al., 2009). An estimated 23% of variation in schizophrenia risk is thought to be caused by common single nucleotide variants (SNVs), some of which are causal risk variants (S. H. Lee et al., 2012). Therefore, variants associated with schizophrenia are not necessarily found at an extremely low frequency in the general population. Even in the case where a rare, likely causal variant has been identified in schizophrenia patients, it is now thought that common risk variants are still necessary for the disorder to occur. A recent study suggests that common risk variants are contributing to schizophrenia in patients with rare copy number variations (CNVs) (Tansey et al., 2016). Polygenic risk profile scores (RPS) were used to determine the common risk allele burden in schizophrenia cases with a rare schizophrenia-associated CNV, schizophrenia cases with no rare schizophrenia- associated CNV and healthy controls. It was found that both schizophrenia patients with and without a rare schizophrenia-associated CNV had a higher RPS than controls and also that the RPS was not significantly different between schizophrenia patients with and without a schizophrenia-associated CNV (Tansey et al., 2016). This has strong implications for the role of common variants in schizophrenia.

1.1.2 Bipolar Disorder Bipolar spectrum disorder is a chronic disorder affecting 2.4% of people worldwide (Merikangas et al., 2011). Bipolar disorder (BD) is characterized by disruptive changes in mood and behavior and is categorized into two subtypes: bipolar disorder I (BD-I) and bipolar disorder II (BD-II). Both subtypes involve periods of mania and depression, with BD-I involving manic episodes while BD-II involves periods of hypomania. Hypomanic episodes involve elevated moods, lack of sleep and increased motor drive, while mania encompasses these symptoms but is often more severe with psychotic symptoms present in 75% of patients with manic episodes (Goodwin, Jamison, & Ghaemi, 2007). The risk of suicide is 20 times higher in individuals with bipolar disorder, especially when the disorder is untreated, and 15-20% of people with bipolar disorder die by suicide (Gibbons, Hur, Brown, & Mann, 2009; Pompili et al., 2013; Schaffer et al., 2015). Both

3 BD-I and BD-II affect social, cognitive and occupational functioning, can lead to hospitalization and cause a decreased quality of life (Martinez-Aran et al., 2007).

The underlying causes of bipolar disorder are not well known, therefore treatments only target acute symptoms as well as long-term management. Antipsychotics, mood stabilizers and anti-depressants are used for acute treatment of mania and depression, however their effects are variable (Grande & Vieta, 2015; Vieta et al., 2010). Lithium is also used as a treatment for bipolar disorder and has generally been found to be the most effective and widely used treatment for both acute symptoms and long-term management (Miura et al., 2014). Lithium is thought to act through glycogen synthase kinase 3, cAMP response element-binding protein and Na(+)-K(+) ATPase signaling, however its exact mechanism of action is still not known (Alda, 2015). Neither the causes of bipolar disorder nor the specific mechanism of action of its treatments are well established therefore more research is required.

A known genetic component has been identified for bipolar disorder with family and twin studies suggesting 77% heritability (Edvardsen et al., 2008). Similar to schizophrenia, bipolar disorder is recognized as a complex genetic disorder with multiple genes and variants that interact with each other and the environment to contribute to the bipolar disorder phenotype (Craddock & Sklar, 2013). Genome-wide association studies have been performed with genotype data from bipolar disorder patients and controls to identify variants that are more prevalent among patients with bipolar disorder and therefore may contribute risk to the disorder (D. T. Chen et al., 2013; Muhleisen et al., 2014). The largest and most recent GWAS of bipolar disorder included genotype data of over 9 million variants from 9,784 people with bipolar disorder and 30,471 controls (Hou et al., 2016). Four genetic loci were replicated that had previously been associated with bipolar disorder and two new genetic loci were identified.

1.1.3 Elucidating Genetic Risk for Psychiatric Disorders The large number of GWAS and candidate gene studies for psychiatric disorders has identified a high degree of genetic overlap between psychiatric disorders, with five

4 genetic loci found to be common between schizophrenia, bipolar disorder, major depressive disorder, attention deficit-hyperactivity disorder and autism spectrum disorder (Cross-Disorder Group of the Psychiatric Genomics Consortium et al., 2013). A high degree of genetic similarity has especially been found between schizophrenia and bipolar disorder. A recent study aimed at identifying potential genetic associations between bipolar disorder and the risk loci identified in the recent schizophrenia GWAS found that 22 of the schizophrenia-associated index SNVs were also nominally associated with bipolar disorder (Forstner et al., 2017; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). Psychiatric disorders clearly have a strong and overlapping genetic contribution, however the exact mechanism by which the associated genes and variants are contributing risk to these disorder is yet to be determined.

The next step in elucidating the contribution of genetic risk to psychiatric disorders is to identify the genes within these regions and further, the mechanism by which these DNA variants are contributing risk. The initial focus of GWAS variants would be to identify non-synonymous, protein-coding changing variants because the effects of these DNA changes on protein structure are more easily predicted. However, many of the GWAS variants identified in psychiatric disorders, and variants identified by GWAS in general, cannot be explained by a coding region SNV. Approximately 93% of GWAS variants are not found in the coding region of the genome (Maurano et al., 2012). For example, within the recently identified 108 loci associated with schizophrenia, only 11 regions contained a potentially causal non-synonymous, protein-coding changing variant (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). Therefore more insight is needed into the mechanisms by which these variants identified by GWAS to be associated with psychiatric disorders are related to the risk of the disorder.

1.2 Gene Splicing 1.2.1 Splicing Mechanisms Splicing of precursor mRNA to remove introns and join exons together is necessary to create a readable transcript to be further translated into a functional RNA or protein. Splicing occurs through the spliceosome complex and requires multiple RNAs and

5 in order to function properly (Rappsilber, Ryder, Lamond, & Mann, 2002; Zhou, Licklider, Gygi, & Reed, 2002). The spliceosome and the critical consensus splice sites are the basis of the splicing reaction, and are further influenced by intronic and exonic splicing enhancers and inhibitors. The spliceosome consists of the small nuclear ribonucleic proteins (snRNPs) U1 and U2 and the U4, U5 and U6 tri-snRNP (Y. Wang et al., 2015). These snRNPs bind to the critical consensus splice sites: the 5’ splice site or splicing donor, whose consensus sequence is GU, and the 3’ splice site or splicing acceptor, whose consensus sequence is AG. The U1 snRNP binds to the 5’ splice site which initiates splicing and triggers interactions between the U2 snRNP with the branchpoint sequence and the U4, U5 and U6 tri-snRNP with the 5’ and 3’ splice sites (Havens, Duelli, & Hastings, 2013) (Figure 1).

Figure 1. Schematic of spliceosome snRNPs binding to pre-mRNA.

Proper binding of the snRNPs to the 5’ splice site, branchpoint sequence and 3’ splice site is necessary for proper splicing to occur (Havens et al., 2013). The resulting sequence of exons within the mature mRNA transcript after splicing will be determined by the interactions between the cis- and trans-acting elements, which either promote or inhibit interactions between the spliceosome and weak splice sites. Two families of RNA- binding proteins have been identified that act as trans-acting elements and bind to cis- elements of pre-mRNA transcripts to determine the splicing pattern (He & Smith, 2009; Long & Caceres, 2009). Positive cis-acting elements such as exonic splicing enhancers (ESEs) and intronic splicing enhancers (ISEs) interact with positive trans-acting elements, such as serine/arginine-rich (SR) proteins to promote interactions between the

6 spliceosome and weak splice sites. Negative cis-acting elements such as exonic splicing silencers (ESSs) and intronic splicing silencers (ISSs) interact with negative trans-acting elements such as heterogeneous nuclear ribonucleoproteins (hnRNPs) to inhibit interactions between the spliceosome and weak splice sites and generally promote exon exclusion or skipping events. SR proteins generally antagonize the function of hnRNPs and result in exon inclusion, however some studies have shown that SR proteins regulate exon skipping (Shen & Mattox, 2012). Changes in binding site sequences can affect which splice sites are used and can change the resulting mRNA sequence.

1.2.2 Types of Alternative Splicing Multiple forms of alternative splicing have been identified that create structurally and functionally different proteins from the same pre-mRNA transcript. More than 90% of all human multi-exon genes are predicted to be affected by alternative splicing of mRNA (Pan, Shai, Lee, Frey, & Blencowe, 2008; E. T. Wang et al., 2008). The most common form of alternative splicing is exon skipping, which accounts for around one third of alternative splicing events in humans (Blencowe, 2006). Other forms of alternative splicing include mutually exclusive exons, intron retention and alternative 5’ or 3’ splice sites. Alternative transcripts may also arise through the use of alternative promoters/first exons and alternative poly A sites/terminal exons (Figure 2).

7

Figure 2. Events leading to alternative transcripts.

Alternative splicing may even result in very small changes to the protein sequence, such as the case with NAGNAG acceptors that cause only one codon change depending on which splice site acceptor is used and are found in approximately 30% of human genes (Hiller et al., 2004). These small protein changes have the potential to cause very large consequences and be involved in disease. For example, an SNV in the SFTPA2 gene,

8 rs1650232, creates an NAGNAG splice acceptor and the addition of a codon into the coding sequence of the surfactant protein A2, and has found to be associated with risk for respiratory diseases (Azad et al., 2013).

This study will focus specifically on cis-acting splice variants, which refer to variants within a gene that affect the splicing of its pre-mRNA transcript. Both exonic and intronic variants may affect splicing in this way. 25% of exonic variants are predicted to alter splicing, including silent exonic variants that have been previously overlooked but are now predicted to cause splicing changes as well as missense exonic variants previously thought to be involved in disease by causing an amino acid change but may in fact contribute to disease by altering splicing (Lim, Ferraris, Filloux, Raphael, & Fairbrother, 2011; Sterne-Weiler, Howard, Mort, Cooper, & Sanford, 2011). Intronic splice site variants are also estimated to account for ~15% of disease variants (Krawczak, Reiss, & Cooper, 1992). Both intronic and exonic variants taken together have been estimated to account for over 30% of all disease causing variants (Lim et al., 2011). In cases where DNA and RNA sequences have been used to identify the mechanism of disease-causing variants, 50% of variants have been found to act by affecting splicing (Ars et al., 2000; Teraoka et al., 1999). Furthermore, a model predicting the effect of disease-causing mutations to either affect splicing or alter the coding sequence found that ~60% of mutations are predicted to be splicing mutations (Lopez-Bigas, Audit, Ouzounis, Parra, & Guigo, 2005). The exact prevalence of disease-causing variants that affect splicing is unknown, however the multiple lines of evidence discussed above suggest that it is likely much higher than previously thought. Along with disease-causing variants that cause splicing changes, common variants that alter the splicing pattern and efficiency may also affect predisposition to disease and disease severity.

1.2.3 Alternative Splicing in Mendelian Disease Disruption of splicing regulation has been identified in many genetic disorders, and has been best characterized in Mendelian diseases. Around 50% of the mutations in the ATM and NF1 genes that cause ataxia-telangiectasia and neurofibromatosis type 1, respectively, are mutations that cause aberrant splicing (Ars et al., 2000; Teraoka et al.,

9 1999). For these disorders, the sequencing of DNA and mRNA from patients with the disorders has been essential in determining disease-causing splicing mutations (Cartegni, Chew, & Krainer, 2002). In frontotemporal dementia with Parkinsonism linked to 17 (FTDP-17), cis-acting mutations that affect the splicing of MAPT and the 3:4 ratio of a repeat domain which occurs based on the exclusion or inclusion of exon 10 have been identified as causal (Buee, Bussiere, Buee-Scherrer, Delacourte, & Hof, 2000; V. M. Lee, Goedert, & Trojanowski, 2001). As well, mutations that cause inappropriate splicing through exon skipping or disrupting of splicing enhancers in the dystrophin gene have been found to cause Duchenne muscular dystrophy (Ahn & Kunkel, 1993).

1.2.4 Alternative Splicing in the Brain Alternative splicing occurs more frequently in the brain than in any other tissue (Blencowe, 2006; Yeo, Holste, Kreiman, & Burge, 2004). For example, Neurexin genes, important for synaptic function, have been found to have almost 3000 different isoforms due to alternative splicing (Missler & Sudhof, 1998). Alternative splicing can also be altered by changes to the environment or exposure to stimuli, for example an increase in the AChE (acetylcholinesterase) transcript containing intron 4 has been found due to forced swimming stress in mice and AChE protein inhibition (Kaufer, Friedman,

Seidman, & Soreq, 1998).

Alternative splicing is also greatly regulated by tissue type and development specific splicing factors, such as the neuronal splicing factors neuro-oncological ventral antigen 1 (NOVA1) and neuro-oncological ventral antigen 2 (NOVA2), which bind YCAY (Y = C or U) elements (Ule et al., 2005). Mice knockouts show important roles for these genes in motor neuron function and brain development. NOVA2 knockout mice show mis- localization to lower layers of neurons from cortical layers II, III and IV (Yano, Hayakawa-Yano, Mele, & Darnell, 2010). Different polypyrimidine tract binding proteins (PTBPs) are also found to be brain or development specific, with PTBP1 expressed in neural stem and progenitor cells but not in mature neurons, PTBP2 expressed in neurons, and PTBP3 not expressed in neurons, or neural cell types (Spellman et al., 2005). PTBP1 has also been found to play a large role in preventing

10 neuronal differentiation, with knockdown of PTBP1 in many cell types, including HeLa cells, human embryonic carcinoma cells, human retinal epithelial cells, primary mouse embryo fibroblasts and mouse neural progenitor cells, resulting in the cells exhibiting characteristics of neuronal cells such as neurite outgrowth and expression of neuronal markers (Xue et al., 2013). Knockdown of PTBP1 also caused upregulation of PTBP2, which is generally targeted for nonsense-mediated decay (NMD) due to exon 10 skipping in the presence of PTBP1 (Xue et al., 2013).

Serine/arginine repetitive matrix protein 4 (SRRM4 or nSR100) is another brain specific splicing factor that has been found to play a role in neuronal fate and differentiation. Knockdown of this protein has been reported to inhibit neuronal differentiation and disrupt cortical layering, neurite outgrowth and axon guidance (Quesnel-Vallieres, Irimia, Cordes, & Blencowe, 2015; Raj et al., 2011). nSR100 has also been found to regulate neural microexons, which are 3 to 27 nucleotides in length (Quesnel-Vallieres et al., 2015).

The RBFOX family of RNA-binding proteins, consisting of RBFOX1, RBFOX2, and RBFOX3, are important regulators of splicing in the brain and are expressed at different developmental stages in different neuronal cell types (McKee et al., 2005). The expression pattern is complex, with different family members expressed in specific neural cell types in early development, differentiation and migration.

The large number of brain-specific splicing regulators and their complex expression patterns suggests that alternative splicing in the brain is highly regulated, and that even small changes to the splicing code and its regulators can lead to dysfunction and disease.

1.2.5 Alternative Splicing in Psychiatric Disorders Due to the lack of causal and protein-altering variants in psychiatric disorders, the high frequency of intronic variants identified by GWAS in psychiatric disorders and the evidence of splicing abnormalities in other neurological disorders, it is predicted that the vast majority of risk variants underlying psychiatric disorders change gene expression,

11 specifically transcription and splicing. Unlike Mendelian disorders, the role of splicing in complex disorders, such as neurodevelopmental and psychiatric disorders, are much less well known or understood. However, changes in splicing have been identified in tissues from individuals with schizophrenia, Autism Spectrum Disorder (ASD) and bipolar disorder. These include global changes in alternatively spliced transcripts that have been reported when analyzing the transcriptomes of patients compared to controls, genetic variants associated with quantitative changes in transcript isoforms (splicing quantitative trait loci or sQTLs) that are enriched in loci associated with psychiatric disorders and alterations in splicing networks due to altered splicing trans factors that have been found to be altered in patients with psychiatric disorders. Aberrant splicing has been identified in psychiatric disorders, however there is a lack of information about the mechanism of these splicing disruptions and how they may relate to disease risk and progression (Black

& Grabowski, 2003; Clinton, Haroutunian, Davis, & Meador-Woodruff, 2003).

1.2.5.1 Alternative Splicing in Autism Spectrum Disorder (ASD) Evidence of altered gene expression of splicing regulators and differentially spliced genes has been identified in tissues from patients with ASD. In a study of gene expression in RNA isolated from blood samples from boys with ASD compared to typically developing boys, 53 genes were identified to have differential exon usage (Stamova et al., 2013). Multiple splicing regulators were identified among the 53 differentially spliced genes including SFPQ (splicing factor proline/glutamine rich), SRPK1 (serine/arginine-rich splicing factor kinase 1), SRSF11 (serine/arginine-rich splicing factor 11), SRSF2IP (splicing factor, arginine/serine-rich 2-interacting protein), FUS (fused in sarcoma), and

LSM14A (SCD6 homolog A-yeast) (Stamova et al., 2013).

Alterations of gene expression of the RBFOX family of splicing regulators affecting splicing networks have also been implicated in ASD. A reduction in the transcript levels of the RBFOX family of RNA-binding splicing regulators has been identified in brain tissues from patients with ASD, with these tissues also having a misregulation of 5-10% of alternative exons (Voineagu et al., 2011; Weyn-Vanhentenryck et al., 2014). Comparison of the RBFOX targets to the Simons Foundation Autism Research Initiative

12 database (SFARI) identified 48 RBFOX targets that were previously identified as ASD candidate genes and suggests that the RBFOX family may contribute to ASD through the dysregulation of these 48 target genes (Weyn-Vanhentenryck et al., 2014). The RBFOX genes RBFOX1 and RBFOX3 have also previously been implicated in ASD as well as other neural disorders including epilepsy, Rolandic epilepsy, intellectual disabilities, attention deficit-hyperactivity disorder and schizophrenia (Bhalla et al., 2004; Elia et al., 2010; Hamshere et al., 2009; Lal et al., 2013; Martin et al., 2007; Sebat et al., 2007; B. Xu et al., 2008). This further implicates the RBFOX family of proteins and their targets in the pathogenicity of ASD, language, cognition and psychiatric disorders.

Similar to the reduction of the RBFOX family, a reduction in the transcript levels of the neural specific nSR100/SRRM4 protein has been identified in one third of individuals with idiopathic ASD, with tissues from those patients with reduced nSR100/SRRM4 having dysregulation of splicing of 30-40% of brain-specific microexons (Irimia et al., 2014). Mice with expression of 50% of wild-type levels of nSR100/SRRM4 due to a frameshifting deletion of nSR100/SRRM4 on one chromosome were found to have altered synaptic transmission and neuronal excitability, with an increase in glutamatergic synapses and a decrease in GABAergic synapses (Quesnel-Vallieres et al., 2016). These mice also demonstrated ASD behavioural features such as sensitivity to different environmental stimuli, including a decrease in pre-pulse inhibition to the startle response which is a characteristic of ASD, as well as deficits in social behaviour, which was found to be more profound in male mice compared to female mice, also representative of the higher prevalence of ASD in males (Quesnel-Vallieres et al., 2016). nSR100 function was found to be dependent on neuronal activity, with depolarization of primary neurons by KCl treatment causing a decrease in nSR100/SRRM4 protein expression leading to an increase in microexon skipping (Quesnel-Vallieres et al., 2016). This suggests a potential link between increased neuronal activity in ASD with a decrease in nSR100 and an increase in microexon skipping.

An expanded sample of post-mortem cortex tissues from 48 ASD patients and 49 controls confirmed altered differential splicing in the cortex with 1,127 splicing events in 833

13 genes identified (Parikshak et al., 2016). Most of these events involved exclusion of neuron-specific exons and were enriched in genes previously shown to be regulated by neuronal activity. Differential splicing was correlated with differential expression of brain-specific splicing regulators including RBFOX1, nSR100, and PTBP1.

These studies in total point to a major role of altered expression of splicing regulators in ASD with the consequence of altered exon usage in the brain resulting in expression changes in neuronal genes.

1.2.5.2 Alternative Splicing in Bipolar Disorder Splice variants have been identified for a number of risk genes for bipolar disorder. For example, associated genetic variants in the ankyrin-3 gene (ANK3) have been identified by GWAS of bipolar disorder, and associated markers in this gene have been consistently replicated (Ferreira et al., 2008; Muhleisen et al., 2014). Association with schizophrenia has also been reported, however with less significant results (Nie et al., 2015). One of these variants, rs41283526, was recently identified to be located in a splice site of a small 54 nucleotide exon of ANK3, with the minor allele of this variant creating a loss-of- function site and lack of splicing of this small exon into the ANK3 transcript (Hughes et al., 2016). This minor allele and exon skipping event are thought to have protective effects against bipolar disorder and schizophrenia (Hughes et al., 2016).

Neuregulin 3 (NRG3) has also been implicated in schizophrenia phenotypes, including delusion severity in schizophrenia, and bipolar disorder (Kao et al., 2010; Meier et al., 2013). NRG3 has many different transcript isoforms that can be divided into classes I-IV based on exon homology. Paterson et al. (2017) looked at the expression of each of the NRG3 classes in the brain throughout development as well as in association with bipolar disorder. In the dorsolateral prefrontal cortex (DLPFC), NRG3 class I was most highly expressed during the neonatal and infanthood periods, classes II and IV were most highly expressed during the fetal and neonatal periods and class III expression was consistently expressed throughout development (Paterson et al., 2017). NRG3 classes I and II were more highly expressed in bipolar disorder patients than controls, and an increase in NRG3

14 classes II and III expression was found to be associated with the T allele of rs10748842 and the G allele of rs6584400, both of which were associated with bipolar disorder and schizophrenia (Kao et al., 2010; Paterson et al., 2017).

.

1.2.5.3 Alternative Splicing in Schizophrenia Global changes in alternative splicing have been identified in schizophrenia patients compared to controls. One study examined the level of exon expression in blood and brain samples from schizophrenia patients and controls and then compared the genes of exons with differential expression between in schizophrenia patients and controls in both blood and brain tissue samples to genotype data from these samples and determined associations between SNVs and differential exon expression in four genes between schizophrenia patients and controls (Oldmeadow et al., 2014). Previous studies have identified changes in gene expression based on schizophrenia risk alleles that affect splicing, such as a SNV in Neuregulin 1 (NRG1) that was associated with lower levels of the NRG1-IVNV isoform in the cortex and a SNV that caused lower levels of a DLG1 isoform in early-onset schizophrenia patients (Paterson, Wang, Kleinman, & Law, 2014; Uezato et al., 2015). Kao et al. (2010) found that NRG3 classes I and IV were more highly expressed in the DLPFC of schizophrenia patients compared to controls. Transcript isoforms of ERBB4 that include exon 16 and exon 26 have been reported to be increased in the DLPFC of schizophrenia patients compared to controls, with different schizophrenia risk SNVs associated with each of these changes (Law, Kleinman, Weinberger, & Weickert, 2007). Another study identified over 1000 genes that show differential splicing and gene expression in schizophrenia patients compared to controls

(Wu et al., 2012).

For a few associated variants, molecular studies have sought to identify the functional mechanism by which variants alter splicing. The dopamine receptor D2 (DRD2) gene has long been studied as a risk gene for psychiatric disorders and large GWAS have identified significant markers in the proximity of the DRD2 gene associated with schizophrenia (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). A DRD2 rs1076560(T) variant located in intron 6 was recently found to be

15 associated with alternative splicing of exon 6 (short isoform) and schizophrenia risk in the Han Chinese population (Cohen et al., 2016). It was also determined that this variant exerts decreased binding affinity for an alternative splicing regulator, Zinc Finger RAN- Binding Domain Containing 2 (ZRANB2), using oligonucleotides with the variants, and was associated with a decreased ratio of short:long DRD2 isoforms in cell culture in an minigene assay with co-expression of ZRANB2. Further, altered ratios of DRD2 isoforms were observed in brain tissues samples with the GG homozygous genotype group of rs1076560 associated with a significant increase in the short:total DRD2 mRNA expression ratio relative to those with the GT or TT genotypes (Cohen et al., 2016).

The molecular mechanism behind the second most highly schizophrenia-associated region identified by the recent schizophrenia GWAS, the 10q24.32 region, has also been extensively studied (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). Recently, a group investigating this region identified a novel AS3MT transcript with exon 2 and 3 skipping that is associated with schizophrenia and found most highly expressed in the brain (Li et al., 2016). This novel transcript was found to be associated with the three repeat major allele of a variable number tandem repeat (VNTR) in the 5’UTR of AS3MT. However, the mechanism by which the three repeat VNTR could be causing this double exon skipping is not known, therefore it could be due to another variant in linkage disequilibrium (LD) with this VNTR that is causing this alternative splicing. Thus, this alternatively spliced transcript is implicated in schizophrenia but the variant causing this splicing change is still unknown.

The synthesis of the inhibitory neurotransmitter γ-aminobutyric acid (GABA) is regulated by glutamic acid decarboxylase (GAD) genes. Alternative transcripts of the GAD2 gene have been identified in the human brain, including a full length GAD2 transcript coding for the 65kDa GAD enzyme and a truncated GAD2 transcript that lacks an in-frame stop codon and may be subject to NMD (Davis et al., 2016). The transcripts were shown to be differentially expressed across the lifespan with the truncated transcript being a fetal predominant transcript. The full length GAD2 transcript was found to have decreased expression in DLPFC brain tissues from schizophrenia and bipolar disorder patients

16 compared to controls, while the truncated GAD2 transcript was found to be increased in bipolar disorder patients and decreased in schizophrenia patients’ tissues compared to controls (Davis et al., 2016). No differences were observed in the DLPFC from depression samples (Davis et al., 2016). Expression of the full length transcript was also correlated with nicotine exposure and suicide completion in schizophrenia samples, potentially complicating correlations with disease (Davis et al., 2016).

1.3 Splicing Predictions and Datasets Due to the identification of cis- and trans-acting factors that contribute to splicing, multiple tools have been developed with the aim of predicting splicing patterns and differential splice sites. Two main methods exist for identifying sQTLs. The first method involves computational methods, in which a model is trained based on a framework of known splicing motifs and is designed so that the model will be able to be given an input, have splicing features extracted and determine how that sequence will splice (Z. Wang & Burge, 2008). For these methods, known splicing features, including splice sites, branchpoint sequences, ISEs, ESEs, ISSs and ESSs are used by the model to predict splicing changes. However, the predicted splicing changes may not necessarily have been observed in the human population.

The second method involves identifying quantitative changes in gene structure that are correlated with genetic variation. Large, genotyped, RNA-seq datasets are now providing the correlation of quantitative changes in alternative splicing and genetic variation. The most ambitious of these is the Genotype Tissue Expression (GTEx) project (https://gtexportal.org/home/) that is creating a database of RNAseq and genotype data, using tissues from hundreds of individuals and multiple tissues. The project has developed analysis methods, and is a resource to study the relationship of genetic variation with gene regulation to identify associations between genetic variants and quantitative changes in gene expression (expression quantitative trait loci or eQTLs) and associations between genetic variants and quantitative changes in transcript isoforms (splicing quantitative trait loci or sQTLs) (GTEx Consortium, 2015). The project arose from the need to understand the relationship between gene expression and genetic

17 variants, given that the majority of GWAS variants are not found to cause a protein codon change. The GTEx project aimed to create these datasets and methods to identify eQTLs and sQTLs that are associated with complex genetic disorders, and that therefore may be contributing to risk for these disorders through their change in gene expression or alternative transcript usage. Multiple pipelines have been created using the GTEx data to identify sQTLs, including Altrans, sQTLSeekeR and protein truncating variants (PTVs).

The CommonMind Consortium is another effort to understand gene expression changes associated with genetic variation specifically in the brain. Data from over 1,000 post- mortem brain samples from individuals with schizophrenia, bipolar disorder and no known psychiatric disorder has been collected by the CommonMind Consortium (https://www.synapse.org//#!Synapse:syn2759792/wiki/69613). This includes data from multiple brain regions. Genotype and RNAseq data from the DLPFC of the CommonMind Consortium samples has been used to identify sQTLs (Takata, Matsumoto, & Kato, 2017).

Specific models using these methods and their datasets are discussed here and their predictions will be used to identify SNVs affecting splicing associated with schizophrenia and bipolar disorder.

1.3.1 SPIDEX Xiong et al. (2015) created a model that determines the splicing code and have identified 20,813 out of 658,420 SNVs tested that influence splicing. Their model can predict the splicing regulation of any triplet exon sequence. First, they used 10,689 exons to train the model to determine the percent spliced in (PSI) of the central exon in a triplet of exons. The ability of the model to detect differences in splicing based on different genotypes was also tested and it was found that the model was correct in predicting the direction of the PSI change in 73% of pairs of individuals with differing SNVs.

To determine the affect of both common and rare SNVs on splicing regulation, the model was used to compute the PSI of the sequence with and without the SNV for 16 tissues.

18 The tissue with the largest difference in PSI was then studied for the potential effect. This method identified multiple exonic (missense, nonsense and synonymous) SNVs, intronic SNVs close to splice sites and even intronic SNVs that are over 30 nucleotides away from splice sites but that are still predicted to cause splicing changes. These intronic SNVs more than 30 nucleotides from a splice site were predicted to disrupt splicing regulation 9 times as much as common SNVs in the same region (Xiong et al., 2015).

1.3.2 Altrans Altrans identifies sQTLs by identifying SNVs that may affect splicing through their association with differences in exon junction expression levels (Ongen & Dermitzakis, 2015). Altrans analyzes the +/-1 Mb region around the transcript start site (TSS) to identify novel and annotated splicing events that result from changes in splice junction usage (GTEx Consortium, 2015).

A study performed by the GTEx Consortium using the Altrans method and RNA sequencing data from nine tissues (subcutaneous adipose, tibial artery, left ventricle of the heart, lung, skeletal muscle, tibial nerve, sun-exposed skin, thyroid and whole blood) identified ~1900 genes with sQTLs per tissue (GTEx Consortium, 2015). 80% of the sQTLs identified by Altrans are exon-skipping events (GTEx Consortium, 2015).

1.3.3 sQTLSeekeR sQTLSeekeR identifies sQTLs by identifying SNVs that may affect splicing through their association with differences in the relative abundance of transcript isoforms (Monlong, Calvo, Ferreira, & Guigo, 2014). It is implemented in an R package that computes the variability in splicing ratios of a gene using a distance-based approach and computes the genotype variability of the splicing ratios to test the association between a SNV and a gene (Monlong et al., 2014). The transcripts with the highest expression levels are identified for the SNVs predicted to effect splicing. sQTLSeekeR analyzes annotated transcript isoforms, analyzes SNVs within the body of a gene (+/- 5kb) and can detect any variation in the relative abundance of transcript isoforms (Monlong et al., 2014). It is

19 predicted that changes in the relative abundance of transcript isoforms is likely to be most detrimental for genes with many splice isoforms for which a small change in the ratio of isoforms may cause a significant disruption (Monlong et al., 2014).

A study performed by the GTEx Consortium using the sQTLSeekeR method and RNA sequencing data from nine tissues (subcutaneous adipose, tibial artery, left ventricle of the heart, lung, skeletal muscle, tibial nerve, sun-exposed skin, thyroid and whole blood) identified ~250 genes with sQTLs per tissue (GTEx Consortium, 2015). 60% of these sQTLs detected using sQTLSeekeR were complex splice events and were mostly mutually exclusive exons (GTEx Consortium, 2015).

1.3.4 Protein Truncating Variants (PTV) Many protein truncating variants (PTVs), which cause a shorter than normal protein product, have been found to be disease causing. Rivas et al. (2015) developed a predictive model to determine the effects of PTVs on transcripts. They tested their model using DNA and mRNA sequences from the GTEx study and identified 16,286 variants that may cause protein truncation (Rivas et al., 2015). In their study, it was found that common PTVs caused a transcript to be more weakly expressed and expressed in a tissue-specific manner than transcripts that did not contain common PTVs.

As well as resulting in truncated proteins, transcripts with PTVs have also been predicted to be degraded through nonsense-mediated mRNA decay (NMD) (Chang, Imam, & Wilkinson, 2007). NMD was generally thought to function to remove nonfunctional truncated proteins that could accumulate toxically (Neu-Yilik, Gehring, Hentze, & Kulozik, 2004). It has now been shown that alternative splicing naturally creates mRNA that will be targeted for NMD as a form of post-transcriptional regulation (Hillman, Green, & Brenner, 2004; Lewis, Green, & Brenner, 2003; Neu-Yilik et al., 2004). In the Rivas et al. (2015) study, it was found that NMD occurred more often for transcripts with rare PTVs than common PTVs, with NMD occurring 54.3% and 35.7% of the time for rare and common PTVs respectively. Tissue specific differences in NMD due to PTVs

20 were also identified (Rivas et al., 2015). Therefore common PTVs predicted by Rivas et al. (2015) may be detected through cDNA expression.

1.3.5 DLPFC sQTLs Takata, Matsumoto & Kato (2017) used CommonMind Consortium genotype and RNAseq data from the DLPFC of 206 control samples to identify DLPFC sQTLs. The vast-tools software package was used on the RNAseq data to identify 102,469 alternative splicing events including exon skipping, intron retention and alternative splice site usage. These alternative splicing events were then correlated to genotype and 8,966 sQTLs in 1,341 genes were identified. These sQTLs were found to be enriched in exonic and H3K4me3 regions as well as among GWAS disease-associated regions (Takata et al., 2017).

1.4 Protein Structure and Function Prediction Identifying protein structure and function of novel proteins arising from alternatively spliced transcripts has become a large and necessary challenge as the increase in high- throughput genomic sequencing has identified more predicted protein sequences. Around 40% of proteins in the NCBI database do not have an annotated function (Bernardes & Pedreira, 2013). The first protein prediction approaches involved comparative or homology modeling, in which structure and function of novel proteins is based on sequence similarity to previously annotated proteins. These methods use sequence similarity between an inputted or target protein sequence and protein sequences already found and annotated in the Protein Data Bank, for which information already exists about these proteins (Q. Xu et al., 2008). If the inputted or target sequence aligns to homologous proteins in the Protein Data Bank, structural information can be inferred about the target protein based on what is known about the homologous proteins, which can further lead to information about function and localization of the target protein. Comparative modeling only aligns sequences of homologous proteins and relies on sequence similarity of at least 30% (Khor, Tye, Lim, & Choong, 2015). This type of method only works if the protein is conserved and if there is already information about this protein or a similar protein in protein databases, however it is not ideal for novel

21 proteins and will likely provide little information about their domains, structure and potential function.

Thus, more sophisticated computational approaches have been developed that use i) local sequence similarities or motif-based methods, ii) 3D structure-based methods and iii) genomic context- and network-based methods.

Motif and domain-based methods compare local sequence similarities of a target protein to identify potential known domains or motifs, which can help provide information about function. Domains are sequences that are known to fold into a certain structure and are often functionally annotated, therefore identifying known domains in the target sequence can provide information about the function of the protein. Motifs are shorter sequences that can provide information about catalytic sites, binding sites and potentially localization sequences.

3D protein structure-based prediction methods, namely protein threading and ab initio methods, have become more popular as they can be used for protein sequences for which homology based methods do not provide much information. Protein threading methods compare an inputted protein sequence to structural information found in protein templates in the Protein Data Bank, such as similar fold or structural motifs (Q. Xu et al., 2008). Protein threading uses secondary structure information, sequence profile based on homologous proteins, pairwise interactions between nearby residues and solvent accessibility. The main goal of protein threading is to correctly align structural similarities of the target protein to the template protein. PSI-BLAST and ClustalW are often used to generate sequence profiles of homologous proteins similar to the target protein sequence.

Genomic context and network-based methods use high-throughput gene expression and protein interaction data sets. Genes that are co-expressed may have similar functions or be involved in similar pathways, therefore identifying proteins of known function that are expressed with a protein of unknown function can help to infer the unknown protein’s

22 function. As well, identifying proteins that a protein of unknown function interacts with can be important in identifying its potential function. These methods assume that proteins that are co-expressed or interact with each other are likely to function similarly.

I-TASSER, a protein prediction approach developed by Zhang (2009), uses a combination of these protein function prediction methods to provide an in-depth analysis of the potential function of an input amino acid sequence (Y. Zhang, 2009). The I- TASSER pipeline consists of i) identification of protein threading templates, ii) assembly of simulated structures, iii) selection and refinement of model and iv) functional annotation based on identified structures (Yang et al., 2015). The first step uses the LOMETS protein threading algorithm to identify target-template alignments with similar folds or structures from the Protein Data Bank library. The next step involves reassembly of the Protein Data Bank templates into structural models using Monte Carlo simulations and ab initio modeling. Finally, for functional annotation, COACH matches the predicted structural models to the enzyme classification, gene ontology (GO) and ligand-binding site libraries.

PredictProtein is a protein structure and function prediction server that aligns a target protein sequence to over 30 tools including homology-based, structural and functional databases in order to predict structure and function (Yachdav et al., 2014). PredictProtein predicts secondary structure, solvent accessibility features, subcellular localization, protein-protein interactions, binding sites and GO terms based on an input amino acid sequence.

1.5 Hypothesis, Aims and Overview of Thesis Many genetic variants have been identified that are associated with schizophrenia and bipolar disorder, however the actual functional role of these variants is not known in many cases. We hypothesize that some of these variants will alter splicing, which will in turn alter gene expression, contributing to the risk for schizophrenia and bipolar disorder. In order to test this hypothesis, two main aims are achieved in this study: i) the identification of schizophrenia and bipolar disorder-associated SNVs that are

23 bioinformatically predicted to alter splicing and ii) an experimentally tested genotype- phenotype correlation between schizophrenia and bipolar disorder-associated SNVs and altered levels of splicing isoforms due to this specific genotype. Seven datasets of SNVs were mined to prioritize variants for testing, including five datasets that contain SNVs with predictions of altered splicing (GTEx Consortium, 2015; Rivas et al., 2015; Takata et al., 2017; Xiong et al., 2015), a sixth dataset containing SNVs associated with schizophrenia identified through the Psychiatric Genetics Consortium (PGC) (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) and a seventh dataset containing SNVs associated with bipolar disorder (Hou et al., 2016). The relationship between SNVs associated with schizophrenia or bipolar disorder and splicing was tested using DNA and RNA from brain tissue samples from 92 individuals, including schizophrenia and bipolar disorder patients and controls. These individuals were then genotyped for the candidate SNVs and the relationship of differential splicing based on genotype using the cDNA from these tissues was determined. Based on the genotype and cDNA expression results, the effect of these SNVs on gene expression was determined using bioinformatics and functional implications of specific SNVs associated with schizophrenia and bipolar disorder were identified.

24 Chapter 2

Materials and Methods

2.1 Identification of Schizophrenia and Bipolar Disorder Candidate SNVs One dataset from SPIDEX, three datasets from GTEx: Altrans, sQTLSeekeR and PTVs, and one dataset of DLPFC sQTLs were mined to identify which of the SNVs from these datasets were also GWAS or near GWAS significantly associated with schizophrenia or bipolar disorder from the schizophrenia and bipolar disorder GWASs. This identified a set of candidate SNVs associated with both a splicing change and schizophrenia or bipolar disorder.

Candidate SNVs were then further filtered based on the gene in which they were predicted to cause a splicing change. Inclusion criteria for these genes were based on expression in the adult hippocampus or DLPFC brain regions, previously reported association to schizophrenia or bipolar disorder and a function that may relate to schizophrenia or bipolar disorder. Further, genes were ruled out if a clear candidate was already identified in their associated region.

2.2 Brain tissue samples Brain tissue samples from 8 schizophrenia patients, 5 bipolar patients and 89 controls were obtained from the Harvard Brain Tissue Resource, the Douglas Brain Bell Canada Brain Bank and the National Institute of Child Health and Human Development (Table 1).

25 Table 1. Demographics of individuals from whom brain sample tissues were collected.

Disease State: Schizophrenia Bipolar Disorder Control # of samples: 8 5 89 8 male, 4 male 57 male, Gender: 0 female 1 female 32 female Mean age: 39.75 47 41.22 78 Caucasian, Ethnicity: 8 Caucasian 5 Caucasian 10 African American 1 Mixed

The control postmortem brain tissue samples were from healthy individuals less than 65 years of age with no neurological or psychiatric disorder. The post mortem interval for these tissues was less than 24 hours. Exclusion criteria for the control brain tissue samples included individuals where neurological disease may have influenced the tissue quality such as stroke, prolonged hypoxic death, Alzheimer’s disease or other psychiatric or neurological disorder. The gender, age, ethnicity and cause of death were provided for each individual. The aim of this study was to identify alternative splicing changes associated with psychiatric disorder risk alleles. Because the risk SNVs analyzed in this study are common in the population, with the lowest minor allele frequency of any SNV in this study being 0.0755, brain samples from controls would still carry the risk alleles, and therefore were used to study the relationship between these variants and alternative splicing. Further, as the purpose of this study is to investigate the effect of schizophrenia and bipolar disorder risk variants on alternative splicing, all analyses are performed with samples separated by genotype regardless of disease status.

Two brain regions relevant to schizophrenia and bipolar disorder were collected: dorsolateral prefrontal cortex (DLPFC) Brodmann area 9 (BA9) and hippocampus. These regions were chosen as they have previously been found to be dysregulated and have physical differences between control individuals and individuals with schizophrenia and bipolar disorder, including deficits in dopamine release and dendritic spine density in the

26 prefrontal cortex (Konopaske, Lange, Coyle, & Benes, 2014) and hippocampal volume differences (McClure et al., 2013).

2.3 SNV Genotyping DNA was extracted from 50mg of each brain tissue. The brain tissues were homogenized in a plastic disposable pestle in a 2ml tube. 1ml of DNA lysis buffer (10mM Tris-HCl pH 8.2, 400mM NaCl and 2mM EDTA) was added followed by further homogenizing using a 1ml syringe and disposable 0.9 gauge needle. 5ul of RNase A (100ug/ml) was added and then incubated at 37°C for 1 hour. 100ul of 10% SDS and 10ul of proteinase K (20mg/ml) were added and then incubated at 50°C overnight while rotating. 1ml of 25:24:1 phenol/chloroform/isoamyl alcohol was added, centrifuged at 13,000 rpm for 10 minutes and the upper phase was transferred to new Eppendorf tubes. 50ul of 3M NaOAC and 1ml of 100% ethanol were added, incubated at 20°C for 1 hour and centrifuged at 13,000rpm for 10 minutes. The supernatant was removed and the pellet was washed once with 70% ethanol and then soaked in 70% ethanol to remove the precipitation that stuck to the wall of the tube. The column was centrifuged at 13,000rpm for 5 minutes, the ethanol was removed and the pellet was dried at room temperature for 10 minutes. 150ul of TE buffer was then added to dissolve the DNA on a shaker overnight.

Extracted DNA from all 102 brain samples were run on a 2% agarose gel to ensure the quality of the DNA and that there was no apparent degradation. 92 samples were high quality and were sent for genotyping. Array-based genotyping was performed using the OmniExpress Platform, which genotyped 733,202 SNVs. This captured 91% of HapMap markers with minor allele frequency (MAF) >5% in the Caucasian European (CEU) population with linkage disequilibrium (LD) coverage of r2 >.80. PLINK was used for quality control and analyses (http://pngu.mgh.harvard.edu/purcell/plink/) (Abecasis, Cardon, & Cookson, 2000). Genotypes were checked for Hardy-Weinberg Equilibrium and markers with MAF<0.05 and p<0.0001 were removed. Samples with higher than 5% missing genotypes and SNVs with call rates less than 97% were removed. Ethnicity of

27 the samples was checked using principal component analysis. Genotypes were imputed using the Michigan Imputation Server (https://imputationserver.sph.umich.edu/index.html) using 1000 Genomes Phases 1 and 3 (Das et al., 2016).

2.4 Expression of Alternative Splicing Isoforms 2.4.1 Identification of Splicing Variants The alternative transcript event associated with each disease-associated SNV was determined (ex. exon-skipping, intron retention, alternative first exon usage) based on information from the respective splicing dataset from which the SNV was identified. Transcript isoform expression was then tested based on these predictions using DLPFC and hippocampal cDNA from the brain tissue samples. RNA was extracted from the brain tissues using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Samples were placed in a 2ml tube on dry ice, homogenized and then 1ml of TRIzol reagent was added and mixed. The brain tissue and TRIzol were further homogenized using a 1ml syringe and 0.9 gauge needle and then transferred to a 1ml Eppendorf tube. The samples were mixed well and left to stand at room temperature for 5 minutes. 200ul of phenol-chloroform was added to each sample, shaken vigorously for 20 seconds and then incubated at room temperature for 3 minutes. The samples were centrifuged at 13,000rpm for 15 minutes at 4°C in order to create 3 distinct layers: a lower red phenol-chloroform phase, an interphase and a colorless upper phase, with the RNA found in the upper phase. The aqueous upper phase was transferred to a fresh 1.5ml Eppendorf tube and the RNA was precipitated out by adding 500ul of 100% isopropanol. The sample was mixed and incubated at room temperature for 10 minutes. The sample was centrifuged at 13,000rpm for 15 minutes at 4°C and the supernatant was removed. The pellet was gently vortexed in 1ml of 75% ethanol, spun down at 7,500rpm for 5 minutes and the supernatant was removed. The pellet was left to air dry for 10 minutes in the cabinet and then resuspended in 87ul of

DEPC dH2O (pH 7) and redissolved on ice for 10 minutes. The RNA samples were then purified using the RNeasy spin column kit (Qiagen Crawley, UK) following the manufacturers protocol with the additional DNAse step added to ensure removal of genomic DNA. The quality of the RNA extracted from the 102 individuals used to make

28 the cDNA was checked using the Agilent 2100 Bioanalyzer and only RNA with RNA integrity number (RIN) > 6 were used. Due to differing RIN between the different regions from the 102 brain samples, this led to 89 DLPFC brain samples with RIN > 6 and 65 hippocampal brain samples with RIN > 6 which were used to make cDNA. cDNA was synthesized using the ImProm-II Reverse Transcriptase protocol (Promega, Madison, WI, USA).

For exon-skipping events, primers were designed flanking the predicted spliced-out exon such that a fragment could be amplified both with the exon spliced in and out (Figure 3). PCR amplification was performed on all samples using these primers. The PCR amplification was then run on a 2% agarose gel and the presence of splicing isoforms was identified based on size separation of amplified fragments.

Figure 3. Schematic of the cDNA assay used to identify alternative splicing transcripts with exon skipping. Primers were designed in the two exons flanking the predicted skipped out exon in order to amplify isoforms with the central exon spliced in or out.

For intron retention events, primers were designed in the exons flanking the retained intron such that a fragment could be amplified both with the intron spliced in and out

29 (Figure 4). PCR amplification was performed on all samples using these primers. The PCR amplification was then run on a 2% agarose gel and the presence of splicing isoforms was identified based on size separation of amplified fragments.

Figure 4. Schematic of the cDNA assay used to identify alternative splicing transcripts with intron retention. Primers were designed in the two exons flanking the predicted spliced in intron in order to amplify isoforms with the intron spliced in or out.

For alternative first exon events, separate primer assays were designed in either of the first exons such that two different PCR reactions could be performed to amplify the transcript with either first exon (Figure 5). PCR amplification was performed on all samples using these primers. The PCR amplification was then run on a 2% agarose gel to determine if there was amplification in the different PCRs that corresponded to the alternative first exons.

30

Figure 5. Schematic of the cDNA assay used to identify alternative splicing transcripts with alternative first exons. Different forward primers were designed in each of the two predicted first exons with the same reverse primer in the second exon in order to amplify isoforms with alternative first exons.

2.4.2 Quantification of Identified Splicing Variants Using ddPCR Upon identification of multiple splicing isoforms, a quantitative digital droplet PCR (ddPCR) assay was designed in order to quantify the proportion of each splicing isoform in each sample. Digital droplet PCR (ddPCR) partitions a normal PCR reaction into 10,000-20,000 droplets and then counts the number of droplets that amplified at the end of the PCR cycling in order to determine the number of copies of target template that was in the original reaction. ddPCR works on the basis that only one copy of the target template will be partitioned into each of the droplets, with some droplets also containing no target template. Using ddPCR, the number of copies of each splicing isoform in each brain sample was determined. Probe-based assays were designed with one assay amplifying the isoform with the exon spliced in and using a HEX fluorescent dye for the probe, and a second assay

31 amplifying the exon spliced out and using a FAM fluorescent dye for the probe (Figure 6).

Figure 6. Schematic of the probe-based assay used to identify the proportion of each splicing isoform.

These assays were run as duplex ddPCRs on each brain sample. Each droplet with either the full length or exon-skipped transcript, determined by either FAM or HEX fluorescence, was then counted (Figure 7).

Figure 7. Digital Droplet PCR output of probe-based exon skipping assay. Black dots represent negative droplets with no amplification, blue dots represent droplets with FAM fluorescence containing the exon skipped transcript, green dots represent droplets with HEX fluorescence containing the full length transcript and orange dots represent double droplets with both FAM and HEX fluorescence containing the exon skipped and full length transcripts.

32

The fractional abundance (FA) of the exon skipped transcript compared to the total number of transcripts amplified was determined by the following equation:

FA = A/(A+B)

A refers to the copies/uL of the alternatively spliced transcript, in this case the exon skipped transcript, and B refers to the copies/uL of the full length transcript, in this case the transcript with the exon spliced in.

The fractional abundance was determined for each individual brain sample and was then compared to their risk genotype for the candidate SNVs as an ANOVA. The fractional abundance of each sample compared to their genotype was also plotted as a boxplot using plotly in R. For each plot, the upper and lower hinges are the third and first quartiles, respectively, and the upper and lower whiskers are the largest and smallest values contained within 1.5 times the inter-quartile range. Outliers are values that extend above or below 1.5 times the inter-quartile range.

2.5 Predicting the Mechanism of the Splice Variants Candidate SNVs whose splicing prediction was experimentally validated in the brain tissue samples through the identification of multiple splicing isoforms were input into splicing prediction servers to determine the mechanism by which the SNV change would likely cause a splicing change. If the candidate SNV was found within an exonic or intronic region in close proximity to the location of the predicted splicing change, two splicing prediction servers were used to determine the mechanism by which these SNVs could be causing the splicing change: i) RBPMap, which identifies RNA binding proteins that are predicted to bind to an input sequence (Paz, Kosti, Ares, Cline, & Mandel- Gutfreund, 2014) and ii) SpliceAid, which identifies differentially binding RNA motifs based on the two alleles of a SNV (Piva, Giulietti, Burini, & Principato, 2012). If the candidate SNV was not found within an exonic or intronic region in close proximity to

33 the location of the predicted splicing change, a literature search into the mechanism behind the splicing change was performed.

2.6 Predicting the Effect on Protein Function Protein sequences of the splicing isoforms experimentally validated in the brain tissue samples through the identification of multiple splicing isoforms were input into protein prediction servers in order to determine the potential effect on protein function of the alternatively spliced isoforms.

Three sequence-motif based protein prediction methods were employed to analyze functional differences between the full length and alternatively spliced protein sequences: InterPro, PANTHER and Pfam (Finn et al., 2016).

Two automated function prediction methods were also used to determine the potential structural and functional differences between the full-length and alternatively spliced protein sequences: PredictProtein and I-TASSER (Yachdav et al., 2014; Yang et al., 2015). These servers use protein threading and ab initio methods to predict potential structural and binding regions as well as identifying protein domains similar to the input sequence which can aid in inferring function of the target protein.

34 Chapter 3

Results

3.1 Splicing SNVs associated with Schizophrenia and Bipolar Disorder 63 SNVs from the splicing datasets were found to be a GWAS or near GWAS significant SNV in the schizophrenia GWAS dataset (p>10-5) (Table 2). 142 SNVs from the splicing datasets were found to be a GWAS or near GWAS significant SNV in the bipolar disorder GWAS dataset (p>10-3) (Table 3). Near GWAS significant SNVs were analyzed in this study due to the fact that soon to be published larger datasets of cases and controls may lead these SNVs to become GWAS significant. As well, the bipolar disorder GWAS included a much smaller sample size than the schizophrenia GWAS, which would lead to fewer SNVs associated at more significant p values, therefore the p value cutoff was much lower for bipolar disorder than schizophrenia in order to capture potential disease- associated SNVs predicted to cause a splicing change. Further, SNVs that are not in LD with the index SNVs may still be contributing to disease and may be identified within the SNVs with less significant GWAS p values. Of the 63 schizophrenia-associated SNVs and 142 bipolar disorder-associated SNVs, 37 of these SNVs overlapped. This led to a total of 26 schizophrenia-associated SNVs, 105 bipolar disorder-associated SNVs and 37 schizophrenia and bipolar disorder-associated SNVs found to also be associated with an alternative splicing event that were then further filtered based on which gene they were predicted to alter the splicing of. After filtering, 72 SNVs in 13 genes were chosen to be genotyped and have the associated alternative splicing change identified in the brain tissue sample cDNA. For 8 of these genes (PPP1R16B, STAB1, NGEF, LRRC48, CACNA1D, KIF21B, SNAP91 and PRKAG1), the predicted alternative splicing event was not identified in any of the brain tissue samples. However, in 6 genes (APOPT1, AS3MT, BNIP3L, LRRC48, NEK4, and RPGRIP1L), the predicted alternative splicing change was identified in the brain tissue samples and these genes were further studied.

35 Table 2. Schizophrenia and alternative splicing associated SNVs.

Alternative SCZ Splicing Splicing p Splicing Splicing cDNA Splicing SNV Gene GWAS p Database value Z-score rho value Tested? Isoforms value Identified? rs3740400 AS3MT 6.00E-17 SPIDEX 2.043 Yes Yes rs10431750 APOPT1 5.69E-13 PTV 3.1488E-06 Yes Yes rs4906337 APOPT1 5.72E-13 PTV 2.0799E-05 Yes Yes rs7140568 APOPT1 5.74E-13 PTV 2.0799E-05 Yes Yes rs2535627 NEK4 3.96E-11 DLPFC sQTLs 3.26E-06 Yes Yes rs1108842 NEK4 2.27E-09 DLPFC sQTLs 4.66E-07 Yes Yes rs6769789 NEK4 8.92E-09 DLPFC sQTLs 4.69E-04 Yes Yes rs2071042 NEK4 1.10E-08 DLPFC sQTLs 6.43E-04 Yes Yes rs2535629 NEK4 1.33E-08 DLPFC sQTLs 1.19E-06 Yes Yes rs6445538 NEK4 4.10E-08 DLPFC sQTLs 1.64E-03 Yes Yes rs2276817 NEK4 4.49E-08 DLPFC sQTLs 1.73E-03 Yes Yes rs6445539 NEK4 4.58E-08 DLPFC sQTLs 1.64E-03 Yes Yes rs736408 NEK4 4.58E-08 DLPFC sQTLs 4.37E-06 Yes Yes rs2230535 NEK4 7.78E-08 DLPFC sQTLs 9.15E-10 Yes Yes rs1042779 NEK4 9.26E-08 DLPFC sQTLs 4.59E-10 Yes Yes rs3774354 NEK4 1.91E-07 DLPFC sQTLs 1.15E-09 Yes Yes rs11130317 NEK4 1.94E-07 DLPFC sQTLs 9.15E-10 Yes Yes rs1076425 NEK4 1.99E-07 DLPFC sQTLs 2.48E-08 Yes Yes rs11870660 LRRC48 2.16E-07 SPIDEX 1.994 Yes Yes rs2300149 NEK4 2.80E-07 DLPFC sQTLs 8.46E-10 Yes Yes rs1075653 NEK4 2.99E-07 DLPFC sQTLs 2.27E-08 Yes Yes rs9324 NEK4 3.64E-07 DLPFC sQTLs 2.27E-08 Yes Yes rs2071508 NEK4 3.98E-07 DLPFC sQTLs 2.27E-08 Yes Yes rs11177 NEK4 4.25E-07 DLPFC sQTLs 2.52E-09 Yes Yes rs6976 NEK4 4.40E-07 DLPFC sQTLs 2.52E-09 Yes Yes rs10780035 NEK4 4.74E-07 DLPFC sQTLs 2.52E-09 Yes Yes rs13071584 NEK4 6.32E-07 DLPFC sQTLs 4.77E-09 Yes Yes rs3755799 NEK4 6.85E-07 DLPFC sQTLs 9.83E-09 Yes Yes rs11716747 NEK4 1.56E-06 DLPFC sQTLs 4.77E-09 Yes Yes

36 rs4687657 NEK4 1.57E-06 DLPFC sQTLs 1.21E-03 Yes Yes rs2289247 NEK4 1.64E-06 DLPFC sQTLs 4.77E-09 Yes Yes rs1866268 NEK4 1.82E-06 DLPFC sQTLs 4.77E-09 Yes Yes rs3755798 NEK4 2.16E-06 DLPFC sQTLs 4.77E-09 Yes Yes rs12635140 NEK4 2.20E-06 DLPFC sQTLs 4.77E-09 Yes Yes rs1362572 RPGRIP1L 7.23E-06 DLPFC sQTLs 1.54E-26 Yes Yes rs2675959 NGEF 1.13E-09 SPIDEX 3.415 Yes No rs217323 SNAP91 2.71E-09 DLPFC sQTLs 5.45E-23 Yes No rs4368395 PPP1R16B 3.32E-09 sQTLseekeR 7.2000E-05 Yes No rs217291 SNAP91 3.73E-09 DLPFC sQTLs 5.45E-23 Yes No rs217328 SNAP91 4.16E-09 DLPFC sQTLs 2.49E-22 Yes No rs6093098 PPP1R16B 4.85E-09 sQTLseekeR 7.2000E-05 Yes No rs1546977 SNAP91 5.51E-09 DLPFC sQTLs 2.49E-22 Yes No rs2022265 SNAP91 6.22E-09 DLPFC sQTLs 3.83E-19 Yes No rs217308 SNAP91 7.88E-09 DLPFC sQTLs 5.45E-23 Yes No rs10854216 PPP1R16B 3.66E-08 sQTLseekeR 1.2400E-04 Yes No rs4924832 LRRC48 4.89E-08 SPIDEX 2.432 Yes No rs4584886 LRRC48 1.60E-07 SPIDEX -2.204 Yes No rs7741310 SNAP91 4.38E-07 DLPFC sQTLs 4.50E-10 Yes No rs1158059 SNAP91 4.46E-07 DLPFC sQTLs 5.90E-14 Yes No rs1171114 SNAP91 1.26E-06 DLPFC sQTLs 9.77E-10 Yes No rs66824127 STAB1 3.58E-06 SPIDEX -2.492 Yes No rs55742290 ARL6IP4 1.74E-10 SPIDEX -2.412 No rs61884274 ATG13 9.52E-09 SPIDEX 2.352 No rs1545762 PLEKHO1 1.01E-08 PTV 1.0992E-03 No rs3208509 PLEKHO1 1.07E-08 PTV 1.0992E-03 No rs883563 OGFOD2 3.15E-08 SPIDEX 2.014 No rs133335 WBP2NL 1.89E-07 sQTLseekeR 2.6400E-04 No rs7537216 C1orf132 5.00E-07 Altrans 0.521 No rs2851435 MPHOSPH9 6.65E-07 SPIDEX -1.998 No rs1142469 CD46 1.17E-06 sQTLseekeR 3.3300E-07 No rs7541230 CD46 1.31E-06 sQTLseekeR 3.3300E-07 No rs1891423 CD46 2.08E-06 sQTLseekeR 3.3300E-07 No rs2796275 CD46 2.85E-06 sQTLseekeR 3.3300E-07 No

37 Table 3. Bipolar disorder and alternative splicing associated SNVs.

Alternative Splicing BD GWAS Splicing Splicing p Splicing cDNA Splicing SNV Gene rho p value Database value Z-score Tested? Isoforms value Identified? rs11779570 BNIP3L 9.23E-06 sQTLSeekeR 1.01E-04 Yes Yes rs11786753 BNIP3L 1.01E-05 sQTLSeekeR 1.01E-04 Yes Yes rs60377146 BNIP3L 1.05E-05 sQTLSeekeR 1.94E-04 Yes Yes rs4687657 NEK4 1.48E-05 DLPFC sQTLs 1.21E-03 Yes Yes rs11135938 BNIP3L 1.83E-05 sQTLSeekeR 2.62E-04 Yes Yes rs8050354 RPGRIP1L 4.73E-05 DLPFC sQTLs 9.12E-42 Yes Yes rs1014969 NEK4 4.97E-05 DLPFC sQTLs 8.00E-09 Yes Yes rs3755799 NEK4 5.03E-05 DLPFC sQTLs 9.83E-09 Yes Yes rs1421091 RPGRIP1L 6.63E-05 DLPFC sQTLs 7.31E-41 Yes Yes rs736408 NEK4 9.18E-05 DLPFC sQTLs 4.37E-06 Yes Yes rs12635140 NEK4 9.64E-05 DLPFC sQTLs 4.77E-09 Yes Yes rs13071584 NEK4 9.67E-05 DLPFC sQTLs 4.77E-09 Yes Yes rs1866268 NEK4 9.68E-05 DLPFC sQTLs 4.77E-09 Yes Yes rs11716747 NEK4 9.70E-05 DLPFC sQTLs 4.77E-09 Yes Yes rs2289247 NEK4 9.87E-05 DLPFC sQTLs 4.77E-09 Yes Yes rs3755798 NEK4 1.04E-04 DLPFC sQTLs 4.77E-09 Yes Yes rs1042779 NEK4 1.23E-04 DLPFC sQTLs 4.59E-10 Yes Yes rs2535629 NEK4 1.36E-04 DLPFC sQTLs 1.19E-06 Yes Yes rs2071508 NEK4 2.14E-04 DLPFC sQTLs 2.27E-08 Yes Yes rs11130317 NEK4 2.15E-04 DLPFC sQTLs 9.15E-10 Yes Yes rs1108842 NEK4 2.34E-04 DLPFC sQTLs 4.66E-07 Yes Yes rs2230535 NEK4 2.40E-04 DLPFC sQTLs 9.15E-10 Yes Yes rs1075653 NEK4 2.62E-04 DLPFC sQTLs 2.27E-08 Yes Yes rs9324 NEK4 2.72E-04 DLPFC sQTLs 2.27E-08 Yes Yes rs2300149 NEK4 2.94E-04 DLPFC sQTLs 8.46E-10 Yes Yes rs6499632 RPGRIP1L 3.15E-04 DLPFC sQTLs 2.45E-15 Yes Yes rs1076425 NEK4 3.35E-04 DLPFC sQTLs 2.48E-08 Yes Yes rs1362572 RPGRIP1L 3.58E-04 DLPFC sQTLs 1.54E-26 Yes Yes rs7193898 RPGRIP1L 4.02E-04 DLPFC sQTLs 4.46E-12 Yes Yes

38 rs9788828 RPGRIP1L 4.88E-04 DLPFC sQTLs 6.51E-05 Yes Yes rs8047089 RPGRIP1L 5.57E-04 DLPFC sQTLs 1.14E-39 Yes Yes rs10780035 NEK4 6.03E-04 DLPFC sQTLs 2.52E-09 Yes Yes rs11177 NEK4 6.32E-04 DLPFC sQTLs 2.52E-09 Yes Yes rs3760008 RPGRIP1L 6.80E-04 DLPFC sQTLs 3.04E-39 Yes Yes rs6976 NEK4 7.09E-04 DLPFC sQTLs 2.52E-09 Yes Yes rs1946155 RPGRIP1L 7.57E-04 DLPFC sQTLs 1.14E-39 Yes Yes rs3774354 NEK4 7.81E-04 DLPFC sQTLs 1.15E-09 Yes Yes rs2071042 NEK4 8.96E-04 DLPFC sQTLs 6.43E-04 Yes Yes rs10783299 PRKAG1 1.56E-06 DLPFC sQTLs 2.71E-06 Yes No rs2276817 NEK4 3.21E-06 DLPFC sQTLs 1.73E-03 Yes No rs2293445 PRKAG1 3.96E-06 DLPFC sQTLs 1.79E-06 Yes No rs6445539 NEK4 4.22E-06 DLPFC sQTLs 1.64E-03 Yes No rs11168839 PRKAG1 4.22E-06 DLPFC sQTLs 1.45E-06 Yes No rs6445538 NEK4 4.72E-06 DLPFC sQTLs 1.64E-03 Yes No rs2241726 PRKAG1 4.83E-06 DLPFC sQTLs 1.79E-06 Yes No rs2022265 SNAP91 4.35E-05 DLPFC sQTLs 3.83E-19 Yes No rs7741310 SNAP91 1.07E-04 DLPFC sQTLs 4.50E-10 Yes No rs217291 SNAP91 1.08E-04 DLPFC sQTLs 5.45E-23 Yes No rs1171114 SNAP91 1.24E-04 DLPFC sQTLs 9.77E-10 Yes No rs217308 SNAP91 1.33E-04 DLPFC sQTLs 5.45E-23 Yes No rs10875912 PRKAG1 1.37E-04 DLPFC sQTLs 5.98E-07 Yes No rs217323 SNAP91 1.53E-04 DLPFC sQTLs 5.45E-23 Yes No rs12118735 KIF21B 1.82E-04 PTV 8.00E-03 Yes No rs1546977 SNAP91 2.05E-04 DLPFC sQTLs 2.49E-22 Yes No rs217328 SNAP91 2.15E-04 DLPFC sQTLs 2.49E-22 Yes No rs1158059 SNAP91 4.07E-04 DLPFC sQTLs 5.90E-14 Yes No rs17053453 CACNA1D 8.33E-04 SPIDEX -2.41 Yes No rs7195043 DEF8 2.41E-06 SPIDEX -2.086 No rs115749012 FLOT1 1.05E-05 DLPFC sQTLs 2.37E-22 No rs115184190 FLOT1 1.06E-05 DLPFC sQTLs 2.37E-22 No rs116387714 FLOT1 1.15E-05 DLPFC sQTLs 2.37E-22 No rs115246198 FLOT1 1.35E-05 DLPFC sQTLs 5.90E-20 No rs115007114 FLOT1 1.43E-05 DLPFC sQTLs 2.37E-22 No

39 rs115799332 FLOT1 1.50E-05 DLPFC sQTLs 5.90E-20 No rs114480306 FLOT1 1.50E-05 DLPFC sQTLs 1.04E-14 No rs116733291 FLOT1 1.58E-05 DLPFC sQTLs 5.90E-20 No rs116265619 FLOT1 2.34E-05 DLPFC sQTLs 2.37E-22 No rs8059821 CDK10 3.17E-05 DLPFC sQTLs 1.55E-04 No rs2282888 SP4 7.67E-05 DLPFC sQTLs 6.00E-03 No rs463938 DPEP1 8.00E-05 SPIDEX -2.053 No rs115973930 FLOT1 8.02E-05 DLPFC sQTLs 1.35E-06 No rs164749 CDK10 8.09E-05 DLPFC sQTLs 5.89E-06 No rs154657 CDK10 8.26E-05 DLPFC sQTLs 2.71E-06 No rs115559348 FLOT1 8.32E-05 DLPFC sQTLs 1.31E-30 No rs114374585 FLOT1 8.62E-05 DLPFC sQTLs 1.35E-06 No rs114259349 FLOT1 8.71E-05 DLPFC sQTLs 6.14E-07 No rs8047581 ZNF276 8.96E-05 DLPFC sQTLs 3.40E-04 No rs115614113 FLOT1 8.98E-05 DLPFC sQTLs 1.35E-06 No rs164746 CDK10 1.09E-04 DLPFC sQTLs 2.71E-06 No rs11078935 MED24 1.22E-04 PTV 6.00E-03 No rs460879 CDK10 1.24E-04 DLPFC sQTLs 7.40E-06 No rs6860 CDK10 1.31E-04 DLPFC sQTLs 1.99E-06 No rs2301249 MPI 1.33E-04 DLPFC sQTLs 7.32E-04 No rs7085 MPI 1.44E-04 DLPFC sQTLs 5.37E-04 No rs114202113 FLOT1 1.47E-04 DLPFC sQTLs 1.07E-14 No rs1378938 MPI 1.52E-04 DLPFC sQTLs 8.50E-04 No rs7162232 MPI 1.65E-04 DLPFC sQTLs 7.66E-04 No rs116575337 FLOT1 1.68E-04 DLPFC sQTLs 1.91E-12 No rs4393623 PRPSAP2 1.70E-04 SPIDEX 2.779 No rs116523291 FLOT1 1.93E-04 DLPFC sQTLs 1.63E-06 No rs3828783 LEMD2 1.99E-04 DLPFC sQTLs 1.63E-04 No rs1205340 ITCH 2.02E-04 DLPFC sQTLs 2.39E-06 No rs115211595 FLOT1 2.11E-04 DLPFC sQTLs 1.27E-36 No rs115420460 FLOT1 2.40E-04 DLPFC sQTLs 9.40E-38 No rs154665 CDK10 2.47E-04 DLPFC sQTLs 4.73E-05 No rs116295105 FLOT1 2.61E-04 DLPFC sQTLs 9.40E-38 No rs115339255 FLOT1 2.64E-04 DLPFC sQTLs 9.40E-38 No

40 rs8177375 FAM118B 2.81E-04 DLPFC sQTLs 6.95E-07 No rs17127118 ADD3 2.89E-04 DLPFC sQTLs 1.63E-05 No rs116582597 FLOT1 2.92E-04 DLPFC sQTLs 3.30E-22 No rs6088512 ITCH 3.01E-04 DLPFC sQTLs 1.88E-06 No rs2282640 MRPL11 3.09E-04 DLPFC sQTLs 1.61E-07 No rs12942088 SAT2 3.24E-04 DLPFC sQTLs 5.83E-30 No rs6059893 ITCH 3.33E-04 DLPFC sQTLs 1.80E-06 No rs1264308 GTF2H4 3.35E-04 SPIDEX -2.535 No rs886420 GTF2H4 3.42E-04 SPIDEX 2.931 No rs2242663 CTSF 3.42E-04 DLPFC sQTLs 9.36E-04 No rs10509907 ADD3 3.52E-04 DLPFC sQTLs 5.75E-06 No rs8079726 MYH10 4.11E-04 Altrans 0.515 No rs146220333 HLA-A 4.19E-04 PTV 8.00E-03 No rs551708 SPTBN2 4.19E-04 Altrans 0.506 No rs523786 AMPD2 4.28E-04 PTV 8.00E-03 No rs2162943 CDK10 4.35E-04 DLPFC sQTLs 4.99E-10 No rs7554936 CDC42SE1 4.66E-04 DLPFC sQTLs 6.32E-07 No rs6534338 KIAA1109 4.67E-04 DLPFC sQTLs 2.24E-13 No rs6087577 ITCH 4.79E-04 DLPFC sQTLs 1.80E-06 No rs116823721 FLOT1 4.84E-04 DLPFC sQTLs 4.16E-39 No rs3746455 ITCH 4.91E-04 DLPFC sQTLs 1.80E-06 No rs12924572 CDK10 5.13E-04 DLPFC sQTLs 2.50E-05 No rs11220432 FAM118B 5.42E-04 DLPFC sQTLs 6.83E-07 No rs1049666 FAM118B 5.61E-04 DLPFC sQTLs 6.83E-07 No rs1049665 FAM118B 5.81E-04 DLPFC sQTLs 6.83E-07 No rs9784675 KIF3A 6.16E-04 DLPFC sQTLs 3.38E-04 No rs2237303 SP4 6.44E-04 DLPFC sQTLs 6.41E-03 No rs3763253 LEMD2 6.70E-04 DLPFC sQTLs 7.35E-04 No rs2894342 LEMD2 6.73E-04 DLPFC sQTLs 7.35E-04 No rs10878372 LRRK2 6.95E-04 DLPFC sQTLs 1.15E-06 No rs9210 ULK3 7.24E-04 sQTLSeekeR 3.33E-07 No rs7188458 CDK10 7.25E-04 DLPFC sQTLs 3.28E-05 No rs2501574 ADD3 7.29E-04 DLPFC sQTLs 3.69E-04 No rs1045814 CDK10 7.65E-04 DLPFC sQTLs 1.22E-08 No

41 rs114261345 FLOT1 7.72E-04 DLPFC sQTLs 3.95E-09 No rs2067606 GABPB2 8.14E-04 DLPFC sQTLs 8.80E-04 No rs2647266 PPA2 8.22E-04 DLPFC sQTLs 6.93E-08 No rs2158414 SP4 8.22E-04 DLPFC sQTLs 9.05E-04 No rs7211847 SAT2 8.80E-04 DLPFC sQTLs 5.29E-39 No rs114126524 FLOT1 9.18E-04 DLPFC sQTLs 4.81E-09 No rs11650101 MMD 9.32E-04 sQTLSeekeR 1.00E-06 No rs6103645 JPH2 9.63E-04 sQTLSeekeR 1.23E-04 No rs72815607 ANKRD36 9.68E-04 SPIDEX 2.215 No rs4564652 NOL4 9.85E-04 SPIDEX 2.166 No rs12924138 CDK10 9.91E-04 DLPFC sQTLs 7.92E-08 No

3.2 Genes studied for alternative splicing 3.2.1 APOPT1 Apoptogenic 1, mitochondrial (APOPT1) was identified by Yasuda et al. (2006) as an apoptogenic protein that localizes to the mitochondria when it was cloned from Apo-E deficient mice smooth muscle cells in culture. By analyzing the amino acid sequence of APOPT1, it was determined to be a soluble protein with no transmembrane domains and a mitochondrial targeting sequence at the N terminus (Yasuda et al., 2006). When APOPT1 expression vectors were transfected into wild-type mouse smooth muscle cells, the cells died within 24 hours and displayed features of apoptosis. Caspase-dependent apoptosis occurs through mitochondrial outer membrane permeabilization and release of cytochrome c to the cytoplasm (Green & Kroemer, 2004). Mitochondrial outer membrane permeabilization can occur through two main pathways: the Bax/Bak-related pathway or the Cyp-D pathway. Yasuda et al. (2006) performed experiments blocking either the Bax/Bak-related apoptosis pathway with Bcl-2 or the Cyp-D apoptosis pathway with CyA and identified that the apoptosis induced by APOPT1 was occurring through only the Cyp-D pathway. Further studies on the inhibition of the APOPT1 pathway determined that the apoptosis-preventing PI3K signaling pathway, which involves the activation of Akt and further inactivation of pro-apoptotic proteins and activation of anti- apoptotic proteins, can prevent apoptosis induced by APOPT1 (Sun et al., 2008). APOPT1 was also identified to interact with Bcl2-associated athanogene 3 (BAG3),

42 which is also involved in regulating apoptosis and the cellular response to stress (Y. Chen et al., 2013). Thus, many studies have implicated APOPT1 as an apoptogenic protein.

However, Melchionda et al. (2014) identified a different functional role for APOPT1. APOPT1 mutations were identified in patients with cavitating leukoencephalopathy and cytochrome c oxidase deficiency, and functional studies were performed in control fibroblasts and fibroblasts from one of these patients with homozygous APOPT1 mutations leading to a truncated protein. Under normal conditions, APOPT1 was found to be degraded in cell culture as a protein by the proteasome system, therefore these cells were placed under two other conditions: an apoptosis inducing condition and an oxidative stress condition. When the apoptosis inducer was used on these fibroblasts, still no APOPT1 protein was detected in both the control and patient cells. However, under oxidative stress conditions, levels of the APOPT1 protein were detected. Examining the effect of oxidative stress on fibroblasts with mutant APOPT1 compared to controls demonstrated increased levels of reactive oxygen species (ROS) after oxidative stress only in the mutant APOPT1 fibroblasts, suggesting that full-length functional APOPT1 is necessary to maintain low levels of ROS (Melchionda et al., 2014).

3.2.1.1 APOPT1 SNVs Associated with Alternative Splicing Three SNVs in APOPT1 have been identified by the GTEx PTV dataset to be associated with alternative splicing and lead to an exon-skipping event and a truncated APOPT1 protein (Table 4).

43 Table 4. SNVs associated with alternative splicing of APOPT1.

SCZ Splicing Alternative SNV Location EUR MAF p value p value Splicing Event NM_001302652.1: 5.27E- Exon 3 skipping rs4906337 A = 0.2744 3.1488E- c.361-30C>A 13 06 of APOPT1 NM_001302652.1: 5.69E- resulting in a rs10431750 C = 0.2744 2.0799E- c.361-379T>C 13 05 truncated protein NM_001302652.1: (p.Glu121Valfs* rs7140568 T = 0.2744 5.74E- 2.0799E- c.424+867C>T 13 05 6) SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia

Rs7140568, rs10431750 and rs4906337 were inputted into RBPMap and SpliceAid to determine the mechanism by which these SNVs are likely causing exon 3 skipping.

RBPMap and SpliceAid both did not yield significant results for rs7140568.

For rs10431750, SpliceAid identified hnRNP C 1/2, ELAV1, Sam68 and SLM-2 to only bind to the T allele and TIA-1 to only bind to the C allele. RBPMap also predicted hnRNP C and hnRNP CL1 to be more likely to bind to the T allele compared to the C allele. Heterogeneous nuclear ribonucleoproteins (hnRNPs) bind to intronic splicing silencers and promote exon skipping, therefore the T allele of rs10431750 likely contributes to splicing out of exon 3 through binding to hnRNPs.

For rs4906337, SpliceAid identified hnRNP H 1/2 and SFRS7 to only bind to the A allele. RBPMap did not yield significant results for rs4906337. Splicing factor arginine/serine-rich 7 is a member of the serine/arginine rich family of proteins that promote exon inclusion. Interestingly, Crawford and Patton (2006) showed that the SFRS7 and hnRNP H proteins competed for the same splicing regulatory elements to promote or inhibit splicing of exon 2 of a-Tropomyosin (a-TM). Overexpression of the hnRNP H proteins blocked SFRS7 binding and led to decreased exon 2 splicing, while knockdown of the hnRNP H proteins allowed SFRS7 binding and increased exon 2 splicing (Crawford & Patton, 2006). hnRNP H2 and SFRS7 are both expressed in the

44 brain however their expression in the brain may differ throughout development and contribute to APOPT1 exon 3 splicing in or out depending on when each of these proteins are more highly expressed.

3.2.1.2 Identification of Two APOPT1 Transcript Isoforms 92 brain tissue samples were genotyped for the three APOPT1 SNVs predicted to alter splicing. For rs4906337 and rs7140568 respectively, 50 individuals were homozygous for the C and C alleles, 34 individuals were heterozygous and 8 individuals were homozygous for the A and T alleles. For rs10431750, 48 individuals were homozygous for the T allele, 36 individuals were heterozygous and 8 individuals were homozygous for the C allele. cDNA from these brain tissue samples was amplified using primers in exons 2 and 4 of APOPT1 (forward primer: 5’ GCCATGATTGGATAGGACCC 3’, reverse primer: 5’ GAAGTCCGCCATTTCTTCTG 3’). Two APOPT1 transcripts were identified at varying levels between individuals: a 272bp fragment corresponding to the full APOPT1 transcript including exon 3 and a 208bp fragment corresponding to a smaller APOPT1 transcript with exon 3 spliced out (Figure 8). Both fragments were gel extracted from a 2.5% agarose Ethidium Bromide (EtBr) post-stained gel, gel purified and sequenced, and both the full length and exon 3 skipped APOPT1 sequences were identified.

45

Figure 8. Identification of two APOPT1 transcript isoforms through cDNA PCR amplification. cDNA PCR amplification using forward and reverse primers in exons 2 and 4 of APOPT1, respectively, amplified a 272bp fragment corresponding to the full length APOPT1 transcript and a 208bp fragment corresponding to a smaller APOPT1 transcript isoform excluding exon 3.

3.2.1.3 Quantification of APOPT1 Isoforms and Association to rs4906337, rs10431750 and rs7140568 The APOPT1 isoforms with exon 3 spliced in or out were quantified using probe-based assays run as ddPCRs (forward primer: 5’ ACACAAGAATGGAATCAACAG 3’, reverse primer: 5’ TGTAGAAGTCCGCCATTTC 3’, probe for full length: 5’ TGATTCAGTTCTCAGGCCCAGGC 3’, probe for exon 3 skipping: 5’ TAGTAAGGTCAGAAAGCAACATTGAATGC 3’). The fractional abundance of the number of copies of the exon 3 skipped isoform compared to the total number of copies of both isoforms was identified in the DLPFC and hippocampus of each sample and was run as an ANOVA with each sample’s genotype. The APOPT1 transcript with exon 3 spliced out is significantly decreased in the DLPFC of individuals with the schizophrenia risk alleles A, C and T of rs4906337, rs10431750 and rs7140568, respectively, with a dose allele effect (rs4906337 and rs7140568: p = 1.26e-15, F (2, 84) = 53.35, rs10431750: p = 1.71e-15, F = (2, 84) 52.66) and is most significant for rs4906337 and rs7140568 (Figure 9).

46 Genotype

Hom. Maj.

Het. 30 Hom. Min.

20 SplicingRatio

10

Hom. Maj. Het. Hom. Min. Genotype

Figure 9. Quantification of APOPT1 transcripts in the DLPFC. Splicing ratio of the APOPT1 exon 3 spliced out transcript compared to all APOPT1 transcripts amplified shows a significant decrease with the schizophrenia risk alleles in an allele dose dependent manner in the DLPFC. (Hom. = Homozygous, Maj. = Major Allele, Min. = Minor Allele, Het. = Heterozygous)

The APOPT1 transcript with exon 3 spliced out is also significantly decreased in the hippocampus of individuals with the schizophrenia risk alleles A, C and T of rs4906337, rs10431750 and rs7140568, respectively, with a dose allele effect (rs4906337 and rs7140568: p = 1.1e-13, F (2, 60) = 51.11, rs10431750: p = 5.79e-13, F (2, 60) = 46.74) and is most significant for rs4906337 and rs7140568 (Figure 10).

47

Genotype

Hom. Maj.

Het. 20 Hom. Min.

15

10 SplicingRatio

5

0 Hom. Maj. Het. Hom. Min. Genotype

Figure 10. Quantification of APOPT1 transcripts in the hippocampus. Splicing ratio of the APOPT1 exon 3 spliced out transcript compared to all APOPT1 transcripts amplified shows a significant decrease with the schizophrenia risk alleles in an allele dose dependent manner in the hippocampus. (Hom. = Homozygous, Maj. = Major Allele, Min. = Minor Allele, Het. = Heterozygous)

3.2.1.4 Protein Function Prediction of APOPT1 Isoforms The shorter isoform of APOPT1 encodes a transcript that skips exon 3 which leads to a frameshift and a premature stop codon. The premature stop codon is predicted to occur 6 amino acids into exon 4 (p.Glu121Valfs*6) and predicts a truncated protein that is 125 amino acids in length, as opposed to 206 amino acids in the full length transcript. The full-length and truncated APOPT1 protein sequences were input into InterPro, PANTHER, Pfam, PredictProtein and I-TASSER.

The protein families and domains corresponding to the APOPT1 protein sequences were identified as uncharacterized protein family UPF0671 (InterPro), unnamed family PTHR31107 (PANTHER) and domain of unknown function DUF2315 (Pfam). Limited information exists to describe the function of these uncharacterized families and domains.

48 PredictProtein results compared between the full-length and truncated APOPT1 proteins identified the loss of 5 potential protein-binding, 3 potential DNA-binding and 1 potential RNA-binding regions, 1 strand and 5 helix structural regions and 15 exposed and 16 buried solvent accessibility regions (Figure 11).

Figure 11. PredictProtein structural and binding site results for full-length and truncated APOPT1 protein sequences. The first row of stick figures represent predicted protein-binding (red diamonds), DNA-binding (yellow circles) and RNA-binding (purple circles) sites. The second row of block figures represents predicted strand (blue boxes) and helix (red boxes) structural regions. The third row of block figures represents predicted exposed (blue boxes) and buried (yellow boxes) solvent accessibility regions. A) The full-length APOPT1 protein predicted structural and binding site results identified 17 protein-binding, 6 DNA-binding and 1 RNA-binding sites, 4 strand and 6 helix structural regions and 31 exposed and 33 buried solvent accessibility regions. B) The truncated APOPT1 protein predicted structural and binding site results identified 12 protein-binding, 3 DNA-binding and 0 RNA-binding sites, 3 strand and 1 helix structural regions and 16 exposed and 17 buried solvent accessibility regions (Yachdav et al., 2014).

I-TASSER protein prediction results identified top protein domains and GO terms that were different for the full-length and truncated APOPT1 protein sequences. The top protein domains were categorized as hydrolases and transferases for the full-length protein but were categorized as oxidoreductases and some hydrolases for the truncated protein. GO molecular terms predicted for the full-length protein were ion binding and DNA binding, but for the truncated protein were monooxygenase activity and nucleotide binding. The GO biological term identified for the full-length protein was DNA integration whereas for the truncated protein it was oxidation-reduction process.

Due to the limited number of experimental studies on APOPT1 protein function, the

49 exact function of the truncated protein resulting from the exon 3 skipped APOPT1 transcript and the resulting effect at the cellular level would have to be tested experimentally. The APOPT1 protein is induced upon oxidative stress, therefore the effect of both APOPT1 transcripts and their translation into proteins upon oxidative stress should be tested. This exon 3 skipped transcript may also be targeted for non-sense mediated decay (NMD) or may escape NMD but create a non-functional protein as a mechanism of decreasing the amount of functional protein that is present in the cell. Lewis, Green & Brenner (2002) found that one third of alternatively spliced transcripts create premature stop codons in order to target these transcripts for non-sense mediated decay (NMD), and propose that this is a mechanism of controlling protein expression (Lewis et al., 2003). This process of linking alternative splicing to NMD in order to regulate gene and protein expression is called regulated unproductive splicing and translation (RUST) (Lareau, Brooks, Soergel, Meng, & Brenner, 2007). It is possible that RUST may occur through exon skipping of exon 3 in APOPT1 in order to regulate the amount of functional APOPT1 protein. As the schizophrenia risk allele is associated with a decrease in the amount of exon 3 skipped transcript and an increase in the full length transcript, this could result in too much functional protein and an imbalance in the amount of apoptosis required for the stressor to which the cell is responding to, or perhaps the formation of functional protein at incorrect times.

3.2.2 AS3MT Arsenic (+3 Oxidation State) Methyltransferase (AS3MT) is a key component in the metabolism of arsenic and catalyzes inorganic arsenic into methylated products, which are then further metabolized through reduction and methylation reactions to monomethylarsonic acid (MMA) and dimethylarsinic acid (DMA) that are then excreted in the urine (Figure 12).

50

Figure 12. Arsenic metabolism pathway. Metabolism of inorganic arsenic (iAsV) to dimethylarsinic acid (DMAIII) through a series of reduction and methylation reactions carried out by glutathione reductase (GSH) and arsenic (+3 oxidation state) methyltransferase (AS3MT), respectively.

DMA, the most methylated form of arsenic in the human body, is associated with decreased reactivity and increased urinary arsenic excretion, which suggests efficient methylation of arsenic is a key component of arsenic detoxification. The efficiency of arsenic methylation and metabolism can be measured based on the amount and ratio of excreted arsenic products, with a higher amount of iAs and MMA being associated with less efficient arsenic methylation and metabolism while a higher amount of DMA is associated with more efficient arsenic methylation and metabolism. Differences in arsenic metabolism and susceptibility to increased iAs and MMA buildup have been associated with SNVs in AS3MT (K. Engstrom et al., 2011; K. S. Engstrom et al., 2013; K. S. Engstrom et al., 2015).

As mentioned previously, a novel exon 2 and 3 skipped AS3MT transcript was recently reported that is associated with schizophrenia and bipolar disorder, most highly expressed in the brain and specific to humans (Li et al., 2016). However the mechanism of exon skipping was not identified.

3.2.2.1 AS3MT SNV Predicted to Alter Splicing The major allele, T, of one SNV in AS3MT was found to be significantly associated with schizophrenia and was predicted by the SPIDEX dataset to cause exon 2 skipping (Table 5).

51 Table 5. SNV predicted to alter splicing of AS3MT.

SCZ Splicing Alternative SNV Location EUR MAF p value Z score Splicing Event NM_020682.3: Exon 2 skipping rs3740400 G = 0.3698 6.00E-17 2.043 c.2-97T>G of AS3MT SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia.

Rs3740400 was inputted into RBPMap and SpliceAid to determine the mechanism by which this SNV is likely causing exon 2 skipping.

RBPMap did not yield any predictions for rs3740400. SpliceAid identified one RNA binding protein that is predicted to only bind to the minor allele, G, of rs3740400: SF2/ASF, also known as serine/arginine-rich splicing factor 1 (SRSF1) (Figure 13).

Figure 13. rs3740400 SpliceAid results. A) SF2/ASF does not bind to major allele, T. B) SF2/ASF binds to minor allele, G (Piva et al., 2012).

SRSF1 is a member of the serine/arginine-rich family of splicing factors. SRSF1 promotes recruitment of the U1 and U2 snRNPs, which are both key components of the spliceosome and are necessary for proper exon inclusion. Based on these predictions, the likely mechanism by which the major allele, T, of rs3740400 may be contributing to the splicing out of exon 2 in AS3MT is through decreased binding of SRSF1 which in turn

52 would lead to a decrease in U1 and U2 snRNP binding and therefore lack of correct splicing of this exon.

Ming Li et al. (2016) identified a novel alternatively spliced AS3MT transcript associated with schizophrenia that has both exon 2 and 3 skipped. Although rs3740400, which is in intron 1 of AS3MT, is only predicted to cause exon 2 skipping by the SPIDEX dataset, it is possible that the decreased binding of SRSF1 may cause exon 2 and 3 skipping. A similar situation has been identified by Guo et al. (2013) in the IRF-3 gene, whereby a SNV in a SRSF1 binding site in intron 1 of this gene caused decreased binding of SRSF1 which led to skipping of exons 2 and 3 in IRF-3. A minigene assay determined that the single change to the SRSF1 binding sequence was enough to prevent SRSF1 binding and therefore decreased exon 2 and 3 skipping (Guo et al., 2013). Therefore it is possible that disrupting an SRSF1 binding site in intron 1 of AS3MT could not only cause skipping of exon 2 but also skipping of exons 2 and 3, thus making rs3740400 the likely causal variant leading to the newly identified, schizophrenia and bipolar disorder- associated AS3MT splice variant.

3.2.2.2 Identification of Two AS3MT Transcript Isoforms 92 brain tissue samples were genotyped for the AS3MT SNV predicted to alter splicing. For rs3740400, 31 individuals were homozygous for the T allele, 48 individuals were heterozygous and 13 individuals were homozygous for the G allele. cDNA from these brain tissue samples was amplified using primers in exons 1 and 4 of AS3MT (forward primer: 5’ AGCCCGCCGTCCTGAGTC 3’, reverse primer: 5’ CTACCCAGATCCAAAATCCAGCAGT 3’). The full length AS3MT transcript corresponding to a 341 or 305bp fragment and the exon 2 and 3 skipped AS3MT transcript corresponding to a 172 or 136bp fragment were identified at varying levels between individuals (Figure 14). The altered size of the full length and exon 2 and 3 skipped transcripts is due to a 36bp VNTR that exists in exon 1 of AS3MT. All fragments were gel extracted from a 2.5% agarose EtBr post-stained gel, gel purified and

53 sequenced, and the full length and exon 2 and 3 skipped AS3MT sequences were identified.

Figure 14. Identification of two AS3MT transcript isoforms through cDNA PCR amplification. cDNA PCR amplification using forward and reverse primers in exons 1 and 4 of AS3MT, respectively, amplified a 341bp fragment corresponding to the full length AS3MT transcript and a 172bp fragment corresponding to a smaller AS3MT transcript isoform excluding exons 2 and 3.

3.2.2.3 Quantification of AS3MT Isoforms and Association to rs3740400 The AS3MT isoforms with exons 2 and 3 spliced in or out were quantified using probe- based assays run as ddPCRs (forward primer: 5’ GCCGAGGAGACAGTGAGT 3’, reverse primer: 5’ TAGATGCTCAGGGATCACCA 3’, probe for full length: 5’ TGACGCTGAGATACAGAAGGACGTGC 3’, probe for exon 2 and 3 skipping: 5’ TCGCAGGCCGAGGAGACAATATTATGGC 3’). The fractional abundance of the number of copies of the exon 2 and 3 skipped isoform compared to the total number of copies of both isoforms was identified in the DLPFC and hippocampus of each sample and was run as an ANOVA with each sample’s genotype. The AS3MT transcript with exons 2 and 3 spliced out is significantly increased in the DLPFC of individuals with the

54 schizophrenia risk allele, T, of rs3740400 with a dose allele effect (p = 2.85e-06, F (2, 84) = 14.92) (Figure 15). Genotype

GG

GT 40 TT

30 SplicingRatio

20

10 GG GT TT Genotype

Figure 15. Quantification of AS3MT transcripts in the DLPFC. Splicing ratio of the AS3MT exons 2 and 3 spliced out transcript compared to all AS3MT transcripts amplified shows a significantly increase with the schizophrenia risk allele in an allele dose dependent manner in the DLPFC.

The AS3MT transcript with exons 2 and 3 spliced out is also significantly increased in the hippocampus of individuals with the schizophrenia risk allele, T, of rs3740400 with a dose allele effect (p = 8.81e-06, F (2, 60) = 14.22) (Figure 16).

55 Genotype

GG 50 GT TT

40

30 SplicingRatio

20

10

GG GT TT Genotype

Figure 16. Quantification of AS3MT transcripts in the hippocampus. Splicing ratio of the AS3MT exons 2 and 3 spliced out transcript compared to all AS3MT transcripts amplified shows a significantly increase with the schizophrenia risk allele in an allele dose dependent manner in the hippocampus.

3.2.2.4 Protein Function Prediction of AS3MT Isoforms The full length AS3MT start codon exists at the exon 1-2 junction. The smaller isoform of AS3MT encodes a transcript that skips exons 2 and 3, therefore disrupting the start codon at the exon 1-2 junction, leading to the predicted use of an alternative start codon 103 amino acids downstream of the constitutive start site. The full length and downstream start codon protein sequences of AS3MT were input into InterPro, PANTHER and Pfam.

The methyltransferase domain was identified for both AS3MT protein sequences: IPR025714 (InterPro), PTHR43675:SF2 (PANTHER) and PF13847 (Pfam). However, the methyltransferase domain of the full length AS3MT protein spans amino acids 71-215 and is truncated in the smaller AS3MT isoform, spanning amino acids 5-113 (Figure 17).

56

A)

B) Figure 17. AS3MT Pfam protein predictions. A) Pfam predicted methyltransferase domain (in green) on full length AS3MT protein sequence. B) Pfam predicted truncated methyltransferase domain (in green) on shortened AS3MT protein sequence.

The shortened AS3MT protein sequence coding for a predicted truncated methyltransferase domain likely would result in inefficient methylation and metabolism of arsenic. Interestingly, rs3740400 has been associated with decreased arsenic metabolism efficiency (K. S. Engstrom et al., 2015). This shortened AS3MT isoform may also be associated with decreased arsenic metabolism efficiency as they are both associated with rs3740400, and the mechanism by which the decreased arsenic metabolism efficiency occurs may be due to the truncation of the AS3MT methyltransferase domain.

3.2.2.5 Hypothesis of inefficient arsenic metabolism contributing to risk of schizophrenia Arsenic exists naturally in the world in the Earth’s crust, and is found in water, soil and the atmosphere. The major source of arsenic ingestion to humans is through drinking water, but humans can also come into contact with arsenic through vegetation whose irrigation systems contain arsenic, especially rice, pollution and many manufactured products that contain arsenic, such as computer chips, fertilizer, glass manufacturing and mining waste (Ayres, 1992). The central nervous system is a major target of arsenic, with an unknown entrance mechanism through the blood-brain-barrier but an identified accumulation in the brain (Rodriguez, Limon-Pacheco, Carrizales, Mendoza-Trejo, & Giordano, 2010). Chronic exposure to arsenic has been found in rat brains to cause increased reactive oxygen species, mitochondrial dysfunction and changes to the dopamine system, including elevated striatal dopamine, increased striatal dopamine

57 receptor 1 (DRD1) mRNA, decreased dopamine receptor 2 (DRD2) mRNA in the nucleus accumbens, decreased dopamine and tyrosine hydroxylase in the striatum and cerebral cortex as well as locomotor changes (Chandravanshi, Shukla, Sultana, Pant, & Khanna, 2014; Kim, Seo, Sung, & Kim, 2014; Moreno Avila, Limon-Pacheco, Giordano, & Rodriguez, 2016; Rodriguez et al., 2010).

An association between arsenic and developmental problems in humans has also been identified in many studies. Decreased IQ, perceptual reasoning, working memory, verbal comprehension and increased incidence of intellectual disabilities have been found in children exposed to powdered milk contaminated with arsenic and arsenic-contaminated water all over the world (Dakeishi, Murata, & Grandjean, 2006; Rosado et al., 2007; Wasserman et al., 2014). Hsieh et al. (2017) examined the relationship between arsenic methylation capacity, AS3MT polymorphisms and developmental delay in children from Taiwan and found that certain AS3MT SNVs were associated with decreased arsenic methylation, and in turn both of these factors were associated with increased developmental delay. SNVs in enzymes involved in the reduction of arsenic were not found to be associated with arsenic methylation or developmental delay. These children from Taiwan were not in an area of increased exposure to arsenic and drank water regulated at the World Health Organization standards of <10ug/L, therefore the toxic effects of arsenic do not require extreme concentrations (Hsieh et al., 2017). Other studies have looked at the association of AS3MT SNVs and arsenic methylation capacity, without looking at developmental delay, and have found that multiple AS3MT SNVs are associated with arsenic methylation and metabolism, including rs3740400, where the schizophrenia and alternative splicing-associated major allele, T, was associated with less efficient or decreased arsenic methylation and metabolism (K. S. Engstrom et al., 2015).

Rs3740400, among other AS3MT SNVs, has been associated with schizophrenia, decreased arsenic methylation and metabolism and an alternatively spliced AS3MT isoform that is most highly expressed in the brain. Therefore, it is hypothesized that the AS3MT variants associated with exon 2 and 3 skipping and the truncated AS3MT protein have poorer arsenic methylation which may lead to increased inorganic or toxic arsenic

58 exposure in these individuals, even at normal arsenic exposure levels. This arsenic exposure would lead to increased ROS, mitochondrial dysfunction and likely even an altered dopamine system. In combination with additional environmental and genetic risk factors, these detrimental mitochondrial and dopaminergic changes, especially if occurring during development, may contribute to the manifestation of schizophrenia or bipolar disorder in at-risk individuals who have the risk allele for rs3740400, or the other AS3MT SNVs associated with schizophrenia and bipolar disorder, as well as other risk variants.

3.2.3 BNIP3L BCL2/Adenovirus E18 19kDa Interacting Protein 3-Like (BNIP3L) is a member of the BCL2 family of proteins that contain a single BCL2 homology 3 domain and are involved in the mitochondrial apoptosis and autophagy pathways (J. Zhang & Ney, 2009). The BNIP proteins have been found to be pro-apoptotic by suppressing the anti- apoptotic function of E1B 19kDa or BCL2, and the overexpression of BNIP3 and BNIP3L has been found to induce autophagy (Boyd et al., 1994; Daido et al., 2004; Hamacher-Brady et al., 2007; Sandoval et al., 2008). Of the BNIP proteins, BNIP3L was found to be most highly expressed in the hippocampus and cerebellum in the developing and adult rat brain (Cho, Choi, Park, Sun, & Geum, 2012). Oxygen-glucose deprivation to cultured cerebral cortex rat neurons dramatically increased BNIP3L mRNA expression (Cho et al., 2012). BNIP3L was also found to positively upregulate NF-kB under hypoxic conditions in the U251 human glioblastoma cell line, and this upregulation was lost in BNIP3L-knockout cell line (Lu et al., 2012).

3.2.3.1 SNVs Associated with Alternative BNIP3L Transcripts Four SNVs were found to have a trend for association with bipolar disorder and were identified by the sQTLSeekeR dataset to be associated with the use of an alternative first exon of BNIP3L (Table 6).

59

Table 6. SNVs associated with alternative BNIP3L transcripts.

BD Splicing Alternative SNV Location EUR MAF p value p value transcript event NC_000008.10: rs11779570 A = 0.1968 9.23E-06 1.01E-04 g.26317889G>A NC_000008.10: Alternative rs11786753 A = 0.1968 1.01E-05 1.01E-04 g.26318240C>A exon 1 of NC_000008.10: BNIP3L rs60377146 A = 0.1899 1.05E-05 1.94E-04 g.26316766G>A NC_000008.10: rs11135938 C = 0.1909 1.83E-05 2.62E-04 g.26313736A>C SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, BD = bipolar disorder.

These variants are all in an intragenic region downstream of BNIP3L and are not in close proximity to the alternative first exons, therefore SpliceAid and RBPMap were not used to identify the mechanism by which these variants could be contributing to this transcript change. However, these SNVs were found to be in intragenic DNA methylation regions, which have been found to regulate alternative promoter usage (Maunakea et al., 2010). This suggests that these variants may affect alternative promoter usage and therefore alternative first exon usage.

3.2.3.2 Identification of Two BNIP3L Transcript Isoforms 92 brain tissue samples were genotyped for the four BNIP3L SNVs predicted to alter first exon usage. For rs11779570, rs11786753, rs60377146 and rs11135938 respectively, 50 individuals were homozygous for the G, C, G and A alleles, 35 individuals were heterozygous and 7 individuals were homozygous for the A, A, A and C alleles. cDNA from these brain tissue samples was amplified using two different forward primers in each of the potential first exons of BNIP3L and the same reverse primer in exon 2 of BNIP3L (forward primer 1A: 5’ GGACTCGGCTTGTTGTGTTG 3’, forward primer 1B:

60 5’ CTGTTGCCTCCTGGGGTCTT 3’ reverse primer: 5’ GGAGGATGAGGATGGTACGTG 3’). Two BNIP3L transcripts were identified at varying levels between individuals: a 272bp fragment corresponding to the BNIP3L transcript with one first exon named here as exon 1A and a 157bp fragment corresponding to the BNIP3L transcript with another first exon named here as exon 1B (Figure 18). Both fragments were gel extracted from a 2.5% agarose EtBr post-stained gel, gel purified and sequenced, and both BNIP3L transcripts with the alternative first exon sequences were identified.

Figure 18. Identification of two BNIP3L transcript isoforms through cDNA PCR amplification. cDNA PCR amplification using two forward primers in each of the alternative first exons and one reverse primer in exons 2 of BNIP3L amplified a 272bp fragment corresponding to the BNIP3L transcript using exon 1A and a 157bp fragment corresponding to the BNIP3L transcript using exon 1B.

3.2.3.3 Quantification of BNIP3L Isoforms and Association to rs11779570, rs11786753, rs60377146 and rs11135938 The BNIP3L isoforms with alternative first exons were quantified using probe-based assays run as ddPCRs (forward 1A primer: 5’ ACAATGTCGTCCCACCTAGTC 3’, forward 1B primer: 5’ TTGCCTCCTGGGGTCTT 3’, reverse primer: 5’

61 GGTAGCTCCACCCAGGAA 3’, probe for exon 1A isoform: 5’ CCCTGCACAACAACAACAACAACTGCG 3’, probe for exon 1B isoform: 5’ AGGGGGATCCGCGCTACC 3’). The fractional abundance of the number of copies of the isoform with exon 1B compared to the total number of copies of both isoforms was identified in the DLPFC and hippocampus of each sample and was run as an ANOVA with each sample’s genotype. The BNIP3L transcript using exon 1B as the first exon was not significantly increased or decreased in the DLPFC of individuals with the bipolar risk alleles (p = 0.64, F (2, 84) = 0.449) (Figure 19).

Genotype

Hom. Maj.

1.5 Het. Hom. Min.

1.0 SplicingRatio

0.5

0.0 Hom. Maj. Het. Hom. Min. Genotype

Figure 19. Quantification of BNIP3L transcripts in the DLPFC. Splicing ratio of the BNIP3L transcript using exon 1B compared to all BNIP3L transcripts amplified shows no difference with the bipolar disorder risk alleles in the DLPFC.

The BNIP3L transcript using exon 1B as the first exon was not significantly increased or decreased in the hippocampus of individuals with the bipolar risk alleles (p = 0.889, F (2, 60) = 0.118) (Figure 20).

62 Genotype 1.6 Hom. Maj.

Het.

Hom. Min.

1.2

0.8 SplicingRatio

0.4

Hom. Maj. Het. Hom. Min. Genotype

Figure 20. Quantification of BNIP3L transcripts in the hippocampus. Splicing ratio of the BNIP3L transcript using exon 1B compared to all BNIP3L transcripts amplified shows no difference with the bipolar disorder risk alleles in the hippocampus.

3.2.3.4 Protein Function Prediction of BNIP3L Isoforms The BNIP3L transcript utilizing exon 1A encodes for the full length BNIP3L protein with 219 amino acids. The BNIP3L transcript utilizing exon 1B uses a start codon 40 amino acids downstream of the start codon from the full length BNIP3L isoform, and therefore encodes a protein that is 40 amino acids shorter than the full length isoform. The full length and shortened BNIP3L protein sequences were input into InterPro, PANTHER, Pfam and PredictProtein.

The BNIP3 domain was identified for both BNIP3L protein sequences: IPR010548 (InterPro), PTHR15186:SF9 (PANTHER) and PF06553 (Pfam). However, the BNIP3 domain of the full length BNIP3L protein spans amino acids 29-218 and is truncated in the smaller BNIP3L isoform, spanning amino acids 7-178.

PredictProtein results compared between the full-length and truncated BNIP3L proteins identified the loss of 3 potential protein-binding regions and the creation of a potential

63 polynucleotide binding region (Figure 21).

Figure 21. PredictProtein binding site results for full-length and truncated BNIP3L protein sequences. The row of stick figures represents predicted protein binding (red diamonds) and polynucleotide binding (yellow circles) sites. A) The full-length BNIP3L protein predicted binding site results identified 11 protein-binding sites. B) The truncated BNIP3L protein predicted binding site results identified 8 protein binding and 1 DNA- binding sites (Yachdav et al., 2014).

GO terms also predicted binding as a molecular function of both isoforms but catalytic activity as a molecular function of only the truncated protein.

Many of the BCL2 family of proteins express two alternatively spliced proteins, and the ratio of these different protein isoforms has been found to be very important in determining the effect on apoptosis. For example, Bcl-x has two isoforms: a short, pro- apoptotic Bcl-xS isoform and a long, anti-apoptotic Bcl-xL isoform. Changes in the ratio of these isoforms can lead to increased or decreased apoptosis depending on whether Bcl- xS or Bcl-xL is increased, respectively (Kunisada et al., 2002; Revil et al., 2007). A similar situation has been identified with BNIP3L in a mouse model of cardiac hypertrophy, where a smaller BNIP3L protein was found to form a heterodimer with the full-length BNIP3L protein and prevent its pro-apoptotic function, however this isoform is not listed in the UCSC browser (Yussman et al., 2002).

Although the specific function of these BNIP3L proteins and how their functions may differ is unclear and still in need of further study, based on extensive evidence from other BCL2 proteins that the ratio of isoforms is highly regulated it is likely that a change in

64 the ratio of the expression of these BNIP3L proteins would impact their effect on apoptosis and cell function.

3.2.4 CACNA1D Calcium Voltage-Gated Channel Subunit Alpha1 D (CACNA1D) encodes a subunit of the voltage-gated long-lasting L-type calcium channels, which are important for many brain functions, including synaptic transmission and dopamine signaling (Berger & Bartsch, 2014). Variants in this gene have been associated with bipolar disorder, schizophrenia and autism spectrum disorder (Guan et al., 2015; Pinggera et al., 2015; Ross et al., 2016). One SNV in CACNA1D was found to have a trend for association with bipolar disorder and was predicted by the SPIDEX dataset to cause exon 25 skipping (Table 7).

Table 7. SNV predicted to alter splicing of CACNA1D.

BD Splicing Alternative SNV Location EUR MAF p value Z score Splicing Event Exon 25 NM_001128839.2: G = 0.0755 rs17053453 8.33E-04 -4.4027 skipping of c.3114+18A>G CACNA1D SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, BD = bipolar disorder.

92 brain tissue samples were genotyped for the CACNA1D SNV predicted to alter splicing. For rs17053453, 79 individuals were homozygous for the A allele, 13 individuals were heterozygous and no individuals were homozygous for the G allele. cDNA from these brain tissue samples was amplified using primers in exons 24 and 26 of CACNA1D (forward primer: 5’ TCCAGTGCCATCTCCGTTG 3’, reverse primer: 5’ TCAGGGTTACTTTTGGCTTCA 3’). Only a 239bp fragment corresponding to the full CACNA1D transcript including exon 25 was identified for each genotype but no 131bp fragment corresponding to the smaller CACNA1D transcript with exon 25 spliced out was identified in the DLPFC or hippocampus (Figure 22).

65 A)

B) Figure 22. No identification of exon 25 skipped CACNA1D transcript. A) Amplification of CACNA1D DLPFC cDNA from 12 individuals heterozygous for rs17053453 and 11 individuals homozygous for the A allele only identified a 239bp fragment corresponding to the full CACNA1D transcript but not a 131bp fragment corresponding to the CACNA1D transcript with exon 25 spliced out. B) Amplification of CACNA1D hippocampal cDNA from 11 individuals heterozygous for rs17053453 and 11 individuals homozygous for the A allele only identified a 239bp fragment corresponding to the full CACNA1D transcript but not a 131bp fragment corresponding to the CACNA1D transcript with exon 25 spliced out.

Therefore, rs17053453 was not identified in this study to be associated with splicing out of exon 25 of CACNA1D in the DLPFC or hippocampus.

3.2.5 KIF21B Kinesin Family Member 21B (KIF21B) encodes a member of the kinesin superfamily. This gene has been found to be necessary for proper neuronal morphology and synapse transmission, as Kif21b knockout mice show decreased spine density and synaptic

66 transmission deficits (Muhia et al., 2016). One SNV in KIF21B was found to have a trend for association with bipolar disorder and was found by the PTV dataset to be associated with exon 28 skipping of KIF21B (Table 8).

Table 8. SNV associated with alternative splicing of KIF21B.

BD Splicing Alternative SNV Location EUR MAF p value p value Splicing Event Exon 28 NM_001252100.1: rs12118735 A = 0.2833 1.82E-04 8.00E-3 skipping of c.3843-736G>A KIF21B SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, BD = bipolar disorder.

92 brain tissue samples were genotyped for the KIF21B SNV associated with alternative splicing. For rs12118735, 55 individuals were homozygous for the G allele, 30 individuals were heterozygous and 7 individuals were homozygous for the A allele. cDNA from these brain tissue samples was amplified using primers in exons 27 and 29 of KIF21B (forward primer: 5’ ACCGCAATGTCTTCTCTCGT 3’, reverse primer: 5’ AACTCATCTGTGGCATCCAGG 3’). Only a 214bp fragment corresponding to the full KIF21B transcript including exon 28 was identified for each genotype but no 175bp fragment corresponding to the smaller KIF21B transcript with exon 28 spliced out was identified in the DLPFC or hippocampus (Figure 23).

67 A)

B) Figure 23. No identification of exon 28 skipped KIF21B transcript. A) Amplification of KIF21B DLPFC cDNA from 7 individuals homozygous for the G allele, 7 heterozygous individuals and 7 individuals homozygous for the A allele of rs12118735 only identified a 214bp fragment corresponding to the full KIF21B transcript but not a 175bp fragment corresponding to the KIF21B transcript with exon 28 spliced out. B) Amplification of KIF21B hippocampal cDNA from 4 individuals homozygous for the G allele, 4 heterozygous individuals and 4 individuals homozygous for the A allele of rs12118735 only identified a 214bp fragment corresponding to the full KIF21B transcript but not a 175bp fragment corresponding to the KIF21B transcript with exon 28 spliced out.

Therefore, rs12118735 was not identified in this study to be associated with splicing out of exon 28 of KIF21B in the DLPFC or hippocampus.

3.2.6 LRRC48 LRRC48 (Leucine Rich Repeat Containing 48), also known as DRC3 (Dynein Regulatory Complex subunit 3), encodes subunit 3 of the nexin-dynein regulatory complex (N- DRC). The N-DRC plays an important role in controlling flagellar and ciliary motility (Awata et al., 2015). A mutant mouse model of LRRC48 which introduces a missense

68 mutation and decrease in LRRC48 transcription was reported to have hydrocephalus and laterality defects (Ha, Lindsay, Timms, & Beier, 2016). Laterality defects have been identified in schizophrenia, including handedness and abnormal hemispheric asymmetry (Brandler & Paracchini, 2014; Ribolsi, Daskalakis, Siracusano, & Koch, 2014). Hydrocephalus involves enlargement of the ventricles, which is also one of the hallmarks of schizophrenia (Olabi et al., 2011).

3.2.6.1 LRRC48 SNVs Predicted to Alter Exon 7 and 8 Splicing Two SNVs in LRRC48 were found to be associated with schizophrenia and were predicted by the SPIDEX dataset to each cause different exon skipping events (Table 9).

Table 9. SNVs predicted to alter splicing of LRRC48.

SCZ Splicing Alternative SNV Location EUR MAF p value Z Score Splicing Event Exon 8 NM_001130090.1: rs4924832 C = 0.3767 4.89E-08 2.432 skipping of c.711+7T>C LRRC48 Exon 7 NM_001130090.1: rs4584886 T = 0.3748 1.60E-07 -2.204 skipping of c.571C>T LRRC48 SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia.

92 brain tissue samples were genotyped for the two LRRC48 SNVs predicted to alter splicing. For rs4924832, 40 individuals were homozygous for the T allele, 38 individuals were heterozygous and 14 individuals were homozygous for the C allele. For rs4584886, 43 individuals were homozygous for the C allele, 36 individuals were heterozygous and 13 individuals were homozygous for the T allele. cDNA from these brain tissue samples was amplified using primers in exons 6 and 9 of LRRC48 (forward primer: 5’ GGACCTGAGCTTGTTCAACA 3’, reverse primer: 5’ GAGTCCTCAGCGTACATGCT 3’). Only a 495bp fragment corresponding to the full LRRC48 transcript including both exons 7 and 8 was identified for each genotype but no

69 348bp fragment or 432bp fragment corresponding to the smaller LRRC48 transcripts with exon 7 or 8 spliced out, respectively, were identified in the DLPFC or hippocampus (Figure 24).

A)

B) Figure 24. No identification of exon 7 or 8 skipped LRRC48 transcripts. A) Amplification of LRRC48 DLPFC cDNA from 7 individuals homozygous for the T and C alleles, 7 heterozygous individuals and 7 individuals homozygous for the C and T alleles of rs4924832 and rs4584886, respectively, only identified a 495bp fragment corresponding to the full LRRC48 transcript but not a 348bp or 432bp fragment corresponding to the LRRC48 transcripts with exon 7 or 8 spliced out, respectively. B) Amplification of LRRC48 hippocampal cDNA from 4 individuals homozygous for the T and C alleles, 4 heterozygous individuals and 4 individuals homozygous for the C and T alleles of rs4924832 and rs4584886, respectively, only identified a 495bp fragment corresponding to the full LRRC48 transcript but not a 348bp or 432bp fragment corresponding to the LRRC48 transcripts with exon 7 or 8 spliced out, respectively.

Therefore, rs4924832 and rs4584886 were not identified in this study to be associated with splicing out of exons 7 or 8 of LRRC48 in the DLPFC or hippocampus.

70

3.2.6.2 LRRC48 SNV Predicted to Alter Exon 2 Splicing One SNV in LRRC48 was found to be associated with schizophrenia and was predicted by the SPIDEX dataset to cause exon 2 skipping of LRRC48 (Table 10).

Table 10. SNV predicted to alter splicing of LRRC48.

SCZ Splicing Alternative SNV Location EUR MAF p value Z score Splicing Event NM_001033551.2 Exon 2 skipping rs11870660 A = 0.3956 2.16E-07 1.994 :c.-1469G>A of LRRC48 SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia.

Rs11870660 was inputted into RBPMap and SpliceAid to determine the mechanism by which this SNV is likely causing exon 2 skipping.

SpliceAid identified one RNA binding protein that is predicted to only bind to the minor allele, A, of rs11870660: hnRNP I, also known as heterogeneous nuclear ribonucleoprotein I or polypyrimidine tract binding protein 1. Heterogeneous nuclear ribonucleoproteins bind to intronic splicing silencers and promote exon skipping. RBPMap identified SRSF3, SRSF5 and U2AF2 as RNA binding proteins that are predicted to only bind to the major allele, G of rs11870660. Serine and arginine rich splicing factor 3 (SRSF3) and serine and arginine rich splicing factor 5 (SRSF5) are both serine/arginine rich proteins that promote exon inclusion. The U2 small nuclear RNA auxiliary factor 2 (U2AF2) is a non-snRNP that is necessary for the binding of the U2 snRNP to the branch site, thus a variant in the binding site of this protein may cause decreased binding of U2AF2 and thus decreased binding of the U2 snRNP to the branch site. Therefore the A allele of rs11870660 likely to promote exon 2 skipping of LRRC48 and the G allele of rs11870660 likely to promote exon 2 inclusion of LRRC48.

71 3.2.6.3 Identification of Three LRRC48 Transcript Isoforms 92 brain tissue samples were genotyped for the LRRC48 SNV associated with alternative splicing. For rs11870660, 34 individuals were homozygous for the G allele, 42 individuals were heterozygous and 16 individuals were homozygous for the A allele. cDNA from these brain tissue samples was amplified using primers in exons 1 and 4 of LRRC48 (forward primer: 5’ TTCACCCAGGATGTCTCCCT 3’, reverse primer: 5’ ACATCCTTGAAGAGGATGCCC 3’). Three LRRC48 transcripts were identified at varying levels between individuals: a 471bp fragment corresponding to the full LRRC48 transcript including exon 2, a 382bp fragment corresponding to a smaller LRRC48 transcript with exon 2 spliced out, and a 320bp fragment corresponding to a third novel LRRC48 transcript that has not been reported before with exon 2 and part of exon 3 spliced out (Figure 25). All fragments were gel extracted from 2.5% agarose EtBr post- stained gel, gel purified and sequenced, and the full length, exon 2 and exon 2 and partial exon 3 skipped LRRC48 sequences were identified.

Figure 25. Identification of three LRRC48 transcript isoforms through cDNA PCR amplification. cDNA PCR amplification using forward and reverse primers in exons 1 and 4 of LRRC48 amplified a 471bp fragment corresponding to the full length LRRC48 transcript, a 382bp fragment corresponding to a smaller LRRC48 transcript isoform excluding exon 2 and a 320bp fragment corresponding to a novel LRRC48 transcript isoform excluding exon 2 and part of exon 3.

72 3.2.6.4 Quantification of LRRC48 Isoforms and Association to rs11870660 The LRRC48 isoforms with exon 2 skipping, exon 2 and partial exon 3 skipping and the full length were quantified using probe-based assays run as ddPCRs (forward primer: 5’ GTCTGCTGCGTGGAACC 3’, reverse primer: 5’ TCCCTGATGATAGTGTTGTTGAAG 3’, probe for full length: 5’ AGTTCAGGAGTTCAAGACCAGCCC 3’, probe for exon 2 skipping: 5’ TCCCAGCGCTTGAGAAGGAAAATTCTG 3’, probe for exon 2 and partial exon 3 skipping: 5’ CGCTTGAGAAGCTCCAACCTCTCT 3’). The fractional abundance of the number of copies of the exon 2 or exon 2 and partial exon 3 skipped isoforms compared to the total number of copies of all isoforms was identified in the DLPFC and hippocampus of each sample and was run as an ANOVA with each sample’s genotype. The LRRC48 transcript with exon 2 or exon 2 and partial exon 3 spliced out was not significantly increased or decreased in the DLPFC of individuals with the schizophrenia risk allele (exon 2 skipping: p = 0.0532, F (2, 84) = 3.04, exon 2 and partial exon 3 skipping: p = 0.823, F (2, 84) = 0.196) (Figure 26).

73 Genotype 90

CC

TC

80 TT

70

60 SplicingRatio

50

40

CC TC TT Genotype A) Genotype

CC

TC

TT 9

6 SplicingRatio

3

CC TC TT Genotype B) Figure 26. Quantification of LRRC48 transcripts in the DLPFC. A) Splicing ratio of the LRRC38 exon 2 spliced out transcript compared to all LRRC48 transcripts amplified shows no difference with the schizophrenia risk allele in the DLPFC. B) Splicing ratio of the LRRC38 exon 2 and partial exon 3 spliced out transcript compared to all LRRC48 transcripts amplified shows no difference with the schizophrenia risk allele in the DLPFC.

The LRRC48 transcript with exon 2 or exon 2 and partial exon 3 spliced out was not significantly increased or decreased in the hippocampus of individuals with the

74 schizophrenia risk allele (exon 2 skipping: p = 0.461, F (2, 84) = 0.785, exon 2 and partial exon 3 skipping: p = 0.385, F (2, 84) = 0.971) (Figure 27).

Genotype 90

CC

TC

80 TT

70

60 SplicingRatio

50

40

CC TC TT Genotype A) Genotype

10.0 CC TC

TT

7.5

SplicingRatio 5.0

2.5

CC TC TT B) Genotype Figure 27. Quantification of LRRC48 transcripts in the hippocampus. A) Splicing ratio of the LRRC38 exon 2 spliced out transcript compared to all LRRC48 transcripts amplified shows no difference with the schizophrenia risk allele in the hippocampus. B) Splicing ratio of the LRRC38 exon 2 and partial exon 3 spliced out transcript compared to all LRRC48 transcripts amplified shows no difference with the schizophrenia risk allele in the hippocampus.

75 3.2.6.5 Protein Function Prediction of LRRC48 Isoforms The LRRC48 start codon exists in exon 4, meaning the protein sequences of the full length, exon 2 skipped and exon 2 and partial exon 3 skipped LRRC48 isoforms all have identical protein sequences. Therefore InterPro, PANTHER, Pfam, PredictProtein and I- TASSER were not used to predict the protein function of the alternatively spliced isoforms. Instead, the 5’UTR sequences were input into IRESite to predict the presence of internal ribosome entry sites (IRES), which are translation initiation sites. IRESite predicted the presence of an IRES in the 5’ sequence of the full length LRRC48 isoform but not the exon 2 or exon 2 and partial 3 skipped isoforms (Mokrejs et al., 2006). Therefore altering the ratio of these splicing isoforms may affect the rate at which the LRRC48 protein is expressed.

3.2.7 NEK4 NEK4 (never in mitosis gene A related kinase 4) is a member of the NEK family of proteins that are involved in multiple aspects of cell cycle control as well as ciliary function (Fry, O'Regan, Sabir, & Bayliss, 2012). NEK4 has been found to play a role in regulating the autophagy process and autophagy induction in MCF-7 human breast carcinoma cells, but acts independently or downstream of mTORC1 (Szyniarowski et al., 2011). Knockdown of NEK4 in cells from a mouse model of lymphoma promoted resistance to taxol, a microtubule stabilizing drug, but promoted sensitization to vincristine, a microtubule destabilizing drug, and also prevented polarization of microtubules upon depletion, implicating a role in microtubule regulation (Doles & Hemann, 2010). In the brain, NEK4 has been found to associate with ciliary rootlets and is involved in cilium integrity. Coene et al. (2011) found a decrease in ciliated cells after knocking down NEK4, but no difference in mitotic profiles compared to non-NEK4 knock down. Multiple possible functions of NEK4 have been identified in many cell types, however in the brain NEK4 has been found to be involved in ciliary function.

76 3.2.7.1 NEK4 SNVs Predicted to Alter Exon 5 Splicing Three SNVs were found to be associated with schizophrenia and bipolar disorder and were identified by the DLPFC sQTLs dataset to be associated with exon 5 skipping of NEK4 (Table 11).

Table 11. SNVs associated with alternative splicing of NEK4.

EUR BD p SCZ Splicing Alternative SNV Location MAF value p value p value Splicing Event NM_001166449.1 A = 3.21E 4.49E- rs2276817 1.73E-03 :c.390G>A 0.2584 -06 08 Exon 5 NM_001198974.2 T = 4.22E 4.58E- rs6445539 1.64E-03 skipping of :c.879+220C>T 0.2654 -06 08 NEK4 NM_001198974.2 G = 4.72E 4.10E- rs6445538 1.64E-03 :c.879+228A>G 0.2664 -06 08 SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia.

92 brain tissue samples were genotyped for the NEK4 SNVs predicted to alter splicing. For rs2276817, rs6445539 and rs6445538 respectively, 50 individuals were homozygous for the G, C and A alleles, 34 individuals were heterozygous, and 8 individuals were homozygous for the A, T and G alleles. cDNA from these brain tissue samples was amplified using primers in exons 4 and 6 of NEK4 (forward primer: 5’ GAAATGGCCACCTTGAAGCA 3’, reverse primer: 5’ AGCCCTCAGAAGAGAGTGGT 3’). Only a 355bp fragment corresponding to the full NEK4 transcript including exon 5 was identified for each genotype but no 200bp fragment corresponding to the smaller NEK4 transcript with exon 5 spliced out was identified in the DLPFC or hippocampus (Figure 28).

77 A)

B) Figure 28. No identification of exon 5 skipped NEK4 transcript. A) Amplification of NEK4 DLPFC cDNA from 7 individuals homozygous for the G, C and A alleles, 7 heterozygous individuals and 7 individuals homozygous for the A, T and G alleles of rs2276817, rs6445539 and rs6445538, respectively, only identified a 355bp fragment corresponding to the full NEK4 transcript but not a 200bp fragment corresponding to the NEK4 transcript with exon 5 spliced out. B) Amplification of NEK4 hippocampal cDNA from 4 individuals homozygous for the G, C and A alleles, 4 heterozygous individuals and 4 individuals homozygous for the A, T and G alleles of rs2276817, rs6445539 and rs6445538, respectively, only identified a 355bp fragment corresponding to the full NEK4 transcript but not a 200bp fragment corresponding to the NEK4 transcript with exon 5 spliced out.

Therefore, rs2276817, rs6445539 and rs6445538 were not identified in this study to be associated with splicing out of exon 5 of NEK4 in the DLPFC or hippocampus.

3.2.7.2 NEK4 SNVs Predicted to Alter Exon 4 Splicing Twenty-seven SNVs were found to be associated with bipolar disorder and schizophrenia and were identified by the DLPFC sQTLs dataset to be associated with exon 4 skipping of NEK4 (Table 12).

78

Table 12. SNVs associated with alternative splicing of NEK4.

EUR BD p SCZ p Splicing Alternative SNV Location MAF value value p value Splicing Event NM_001166449.1: A = 1.48E 1.57E rs4687657 1.21E-03 c.2002C>A 0.3022 -05 -06 NC_000003.11:g.5 A = 4.97E 6.83E rs1014969 8.00E-09 2808341G>A 0.4235 -05 -01 NC_000003.11:g.5 A = 5.03E 6.85E rs3755799 9.83E-09 2809193G>A 0.3439 -05 -07 NM_002217.3:c.13 T = 9.18E 4.58E rs736408 4.37E-06 83+192C>T 0.3847 -05 -08 NM_001010983.2: G = 9.64E 2.20E rs12635140 4.77E-09 c.-37+575A>G 0.4314 -05 -06 NM_001193533.1: G = 9.67E 6.32E rs13071584 4.77E-09 c.93+183A>G 0.4294 -05 -07 NM_014366.4:c.- A = 9.68E 1.82E rs1866268 4.77E-09 711C>A 0.4304 -05 -06 NM_001193533.1: A = 9.70E 1.56E rs11716747 4.77E-09 c.2167-2972G>A 0.4304 -05 -06 NM_014366.4:c.10 A = 9.87E 1.64E rs2289247 4.77E-09 99G>A 0.4294 -05 -06 NM_001010983.2: T = 1.04E 2.16E rs3755798 4.77E-09 c.-1963C>T 0.4294 -04 -06 NM_001166434.2: G = 1.23E 9.26E rs1042779 4.59E-10 c.1358A>G 0.3986 -04 -08 NM_002217.3:c.78 A = 1.36E 1.33E rs2535629 1.19E-06 9+112G>A 0.3728 -04 -08 NM_002217.3:c.- A = 2.14E 3.98E rs2071508 2.27E-08 1974G>A 0.3907 -04 -07 NM_001010983.2: A = 2.15E 1.94E rs11130317 9.15E-10 c.329+254G>A 0.4205 -04 -07 Exon 4 NM_014366.4:c.- A = 2.34E 2.27E skipping of rs1108842 4.66E-07 29A>C 0.4791 -04 -09 NEK4 NM_001193533.1: G = 2.40E 7.78E rs2230535 9.15E-10 c.201A>G 0.4195 -04 -08 NM_001166434.2: C = 2.62E 2.99E rs1075653 2.27E-08 c.2069-5T>C 0.3926 -04 -07 NM_001166434.2: C = 2.72E 3.64E rs9324 2.27E-08 c.2121T>C 0.3926 -04 -07 NM_001166434.2: T = 2.94E 2.80E rs2300149 8.46E-10 c.1693+560C>T 0.4076 -04 -07 NM_001166434.2: G = 3.35E 1.99E rs1076425 2.48E-08 c.2069-71A>G 0.3917 -04 -07

79 NM_001193533.1: A = 6.03E 4.74E rs10780035 2.52E-09 c.2166+6978G>A 0.4175 -04 -07 NM_014366.4:c.11 A = 6.32E 4.25E rs11177 2.52E-09 6G>A 0.4175 -04 -07 NM_001010983.2: A = 7.09E 4.40E rs6976 2.52E-09 c.*57G>A 0.4165 -04 -07 NM_001166434.2: A = 7.81E 1.91E rs3774354 1.15E-09 c.799+320G>A 0.3966 -04 -07 NM_001166449.1: C = 8.96E 1.10E rs2071042 6.43E-04 c.75T>C 0.4046 -04 -08 NM_001198974.2: G = 1.05E 8.92E rs6769789 4.69E-04 c.879+70G>A 0.4056 -03 -09 NG_016006.1:g.24 G = 1.40E 3.96E rs2535627 3.26E-06 613A>G 0.4071 -03 -11 SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, BD = bipolar disorder, SCZ = schizophrenia.

Of the 27 SNVs associated with exon 4 skipping of NEK4, only two were in close proximity to exon 4 of NEK4 and are the most plausible candidates to cause this splicing change. The two SNVs were rs13071584, which is in intron 1 of NEK4, and rs2230535, which is in exon 3 of NEK4. These two SNVs were inputted into RBPMap and SpliceAid to determine the mechanism by which these SNVs are likely to cause exon 4 skipping.

RBPMap did not yield any significant results for rs13071584 or rs2230535.

For rs13071584, SpliceAid identified SRSF2 to bind to both the A and G alleles but SRSF9 only to bind to the G allele. Serine and arginine rich splicing factor 2 (SRSF2) and serine and arginine rich splicing factor 9 (SRSF9) are both likely to contribute to exon inclusion.

For rs2230535, SpliceAid identified ETR-3 to only bind to the G allele but not the A allele. ELAV-Type RNA-binding protein 3 (ETR-3) has previously been identified to cause exon skipping in the Tau protein (Leroy et al., 2006). Therefore the most likely variant to contribute to exon 4 skipping of NEK4 would be the G allele of rs2230535, which binds to ETR-3.

80 3.2.7.3 Identification of Two NEK4 Transcript Isoforms 92 brain tissue samples were genotyped for the twenty-seven NEK4 SNVs associated with alternative splicing (Table 13). Table 13. Brain tissue sample genotypes of SNVs associated with alternative splicing of NEK4.

# Individuals # # Individuals Major Minor SNV homozygous for Heterozygous homozygous for Allele Allele Major Allele Individuals Minor Allele rs4687657 C A 46 39 7 rs1014969 G A 26 46 20 rs3755799 G A 33 46 13 rs736408 C T 31 43 18 rs12635140 A G 25 46 21 rs13071584 A G 26 45 21 rs1866268 C A 25 46 21 rs11716747 G A 25 46 21 rs2289247 G A 25 46 21 rs3755798 C T 25 46 21 rs1042779 A G 29 47 16 rs2535629 G A 31 44 17 rs2071508 G A 38 40 14 rs11130317 G A 31 47 14 rs1108842 C A 29 46 17 rs2230535 A G 31 47 14 rs1075653 T C 34 42 16 rs9324 T C 33 42 17 rs2300149 C T 36 41 15 rs1076425 A G 39 39 14 rs10780035 G A 35 43 14 rs11177 G A 35 43 14 rs6976 G A 35 44 13

81 rs3774354 G A 36 44 12 rs2071042 T C 25 45 22 rs6769789 A G 25 44 23 rs2535627 A G 21 42 29 cDNA from the DLPFC brain tissue samples was amplified using primers in exons 3 and 6 of NEK4 (forward primer: 5’ GGACCTAGGAATTGCCCGAG 3’, reverse primer: 5’ AGCCCTCAGAAGAGAGTGGT 3’). Two NEK4 transcripts were identified at varying levels between individuals: a 503bp fragment corresponding to the full NEK4 transcript including exon 4 and a 395bp fragment corresponding to a smaller NEK4 transcript with exon 4 spliced out (Figure 29). Both fragments were gel extracted from 2.5% agarose EtBr post-stained gel, gel purified and sequenced, and both the full length and exon 4 skipped NEK4 sequences were identified.

Figure 29. Identification of two NEK4 transcript isoforms through DLPFC cDNA PCR amplification. cDNA PCR amplification using forward and reverse primers in exons 3 and 6 of NEK4 amplified a 503bp fragment corresponding to the full length NEK4 transcript and a 395bp fragment corresponding to a smaller NEK4 transcript isoform excluding exon 4.

82 3.2.7.4 Quantification of NEK4 Isoforms and Association to SNVs associated with NEK4 exon 4 skipping The NEK4 isoforms with exon 4 spliced in or out were quantified using probe-based assays run as ddPCRs (forward primer: 5’ GCTCTGGGCTGTAATCTCTTG 3’, reverse primer: 5’ CATTGGCACACCCTACTACAT 3’, probe for full length: 5’ ACACAGCATCCTAGAGCCCAAACA 3’, probe for exon 4 skipping: 5’ TGGTGGCAGCTTATAGTTGTAGGGT 3’). The fractional abundance of the number of copies of the exon 4 skipped isoform compared to the total number of copies of both isoforms was identified in the DLPFC and hippocampus of each sample and was run as an ANOVA with each sample’s genotype. The NEK4 transcript with exon 4 spliced out is significantly increased in the DLPFC of individuals with the schizophrenia and bipolar disorder risk allele, G, of both rs13071584 and rs2230535 with a dose allele effect (rs13071584: p = 2.39e-11, F (2, 84) = 33.19; rs2230535: p = 2.18e-13, F (2, 84) = 42.08) and is most significant for rs2230535 (Figure 30). Genotype

AA 15 AG GG

10 SplicingRatio

5

AA AG GG Genotype

Figure 30. Quantification of NEK4 transcripts in the DLPFC. Splicing ratio of the NEK4 exon 4 spliced out transcript compared to all NEK4 transcripts amplified shows a significant increase with the schizophrenia and bipolar disorder risk alleles in an allele dose dependent manner in the DLPFC.

The NEK4 transcript with exon 4 spliced out is also significantly increased in the hippocampus of individuals with the schizophrenia and bipolar disorder risk allele, G, of

83 rs2230535 with a dose allele effect (rs2230535: p = 0.0274, F (2, 60) = 3.823) (Figure 31). Genotype

AA

AG

GG

20

SplicingRatio 10

0 AA AG GG Genotype

Figure 31. Quantification of NEK4 transcripts in the hippocampus. Splicing ratio of the NEK4 exon 4 spliced out transcript compared to all NEK4 transcripts amplified shows a significant increase with the schizophrenia and bipolar disorder risk allele in an allele dose dependent manner in the hippocampus.

3.2.7.5 Protein Function Prediction of NEK4 Isoforms The full length NEK4 transcript encodes for a protein with 841 amino acids, while the NEK4 transcript skipping exon 4 causes an in-frame deletion of 36 amino acids and results in a protein with 805 amino acids. The full length and exon 4 skipped NEK4 protein sequences were input into InterPro, PANTHER, Pfam and PredictProtein.

The protein kinase domain was identified for both NEK4 protein sequences: IPR000719 (InterPro), PTHR24361:SF452 (PANTHER) and PF00069 (Pfam). However, in the full length NEK4 protein, the protein kinase domain spans amino acids 9-261 and includes all of exon 4. Therefore in the NEK4 protein with exon 4 skipped, 36 amino acids of the protein kinase domain are deleted (Figure 32).

84

Figure 32. NEK4 Pfam protein predictions. A) Pfam predicted protein kinase domain (in green) on full length NEK4 protein sequence. B) Pfam predicted truncated protein kinase domain (in green) on shortened NEK4 protein sequence.

PredictProtein also identified different potential protein-binding regions for the full length and exon 4 skipped NEK4 protein sequences.

Therefore, the NEK4 protein with exon 4 skipped may have decreased protein kinase activity, however this would have to be experimentally tested in order to determine this protein’s exact function.

3.2.8 NGEF NGEF (neuronal guanine nucleotide exchange factor) is a guanine exchange factor protein that has been identified to be involved in axon guidance by activating EphA receptors (Shamah et al., 2001). One SNV in NGEF was found to be associated with schizophrenia and was predicted by the SPIDEX dataset to cause exon 2 skipping (Table 14).

Table 14. SNV predicted to alter splicing of NGEF.

SCZ Splicing Alternative SNV Location EUR MAF p value Z Score Splicing Event Exon 2 NM_001114090.1: rs2675959 A = 0.2913 1.13E-09 3.415 skipping of c.108-92T>A NGEF SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia.

85 92 brain tissue samples were genotyped for the NGEF SNV predicted to alter splicing. For rs2675959, 53 individuals were homozygous for the T allele, 29 individuals were heterozygous, 10 individuals were homozygous for the A allele. cDNA from these brain tissue samples was amplified using primers in exons 1 and 3 of NGEF (forward primer: 5’ GTGGACCACGACAGTTCCA 3’, reverse primer: 5’ GCTGTAGGATCTCAAGCACC 3’). Only a 490bp fragment corresponding to the full NGEF transcript including exon 2 was identified for each genotype but no 347bp fragment corresponding to the smaller NGEF transcript with exon 2 spliced out was identified in the DLPFC or hippocampus (Figure 33).

86 A)

B) Figure 33. No identification of exon 2 skipped NGEF transcript. A) Amplification of NGEF DLPFC cDNA from 7 individuals homozygous for the T allele, 7 heterozygous individuals and 7 individuals homozygous for the A allele of rs2675959 only identified a 490bp fragment corresponding to the full NGEF transcript but not a 347bp fragment corresponding to the NGEF transcript with exon 2 spliced out. B) Amplification of NGEF hippocampal cDNA from 4 individuals homozygous for the T allele, 4 heterozygous individuals and 4 individuals homozygous for the A allele of rs2675959 only identified a 490bp fragment corresponding to the full NGEF transcript but not a 347bp fragment corresponding to the NGEF transcript with exon 2 spliced out.

Therefore, rs2675959 was not identified in this study to be associated with splicing out of exon 2 of NGEF in the DLPFC or hippocampus.

3.2.9 PRKAG1 PRKAG1 (protein kinase AMP-activated non-catalytic subunit gamma 1) is a protein that encodes a regulatory subunit of the AMP-activated protein kinase. PRKAG1 has to been

87 found to interact with the netrin-DSCAM signaling pathway to induce neurite outgrowth (Zhu et al., 2013). Five SNVs were found to be associated with bipolar disorder and were identified by the DLPFC sQTLs dataset to be associated with partial intron 2 retention of PRKAG1 (Table 15).

Table 15. SNVs associated with alternative splicing of PRKAG1.

BD Splicing Alternative SNV Location EUR MAF p value p value Splicing Event NM_003482.3: rs10875912 G = 0.3042 1.37E-04 5.98E-07 c.16053-286A>G NM_003482.3: rs2241726 T = 0.3479 4.83E-06 1.79E-06 c.2826C>T Partial intron 2 NM_000012.11: rs11168839 A = 0.3489 4.22E-06 1.45E-06 retention of g.49457635G>A PRKAG1 NM_001206709.1: rs2293445 T = 0.3489 3.96E-06 1.79E-06 c.382+41C>T NM_015086.1: rs10783299 G = 0.3539 1.56E-06 2.71E-06 c:1982A>G SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, BD = bipolar disorder.

92 brain tissue samples were genotyped for the five PRKAG1 SNVs associated with alternative splicing (Table 16).

Table 16. Brain tissue sample genotypes of SNVs associated with alternative splicing of PRKAG1.

# Individuals # # Individuals Major Minor SNV homozygous for Heterozygous homozygous for Allele Allele Major Allele Individuals Minor Allele rs10875912 A G 37 41 14 rs2241726 C T 36 39 17 rs11168839 G A 35 40 17 rs2293445 C T 38 38 16 rs10783299 A G 34 41 17

88 cDNA from the brain tissue samples was amplified using primers in exons 2 and 3 of PRKAG1 (forward primer: 5’ TCTTCAGATAGCTCCCCAGC 3’, reverse primer: 5’ AGCTTGTGGGAATCAGGTCA 3’). Only a 121bp fragment corresponding to the PRKAG1 transcript without partial intron 2 retention was identified for each genotype but no 295bp fragment corresponding to the PRKAG1 transcript with partial intron 2 retention was identified in the DLPFC or hippocampus (Figure 34).

A)

B) Figure 34. No identification of PRKAG1 transcript with partial intron 2 retention. A) Amplification of PRKAG1 DLPFC cDNA from 7 individuals homozygous for the A, C, G, C and A alleles, 7 heterozygous individuals and 7 individuals homozygous for the G, T, A, T and G alleles of rs10875912, rs2241726, rs11168839, rs2293445 and rs10783299, respectively, only identified a 121bp fragment corresponding to the PRKAG1 transcript without partial intron 2 retention but not a 295bp fragment corresponding to the PRKAG1 transcript with partial intron 2 retention. B) Amplification of PRKAG1 hippocampal cDNA from 4 individuals homozygous for the A, C, G, C and A alleles, 4 heterozygous individuals and 4 individuals homozygous for the G, T, A, T and G alleles of rs10875912, rs2241726, rs11168839, rs2293445 and rs10783299, respectively, only identified a 121bp fragment corresponding to the PRKAG1 transcript without partial intron 2 retention but not a 295bp fragment corresponding to the PRKAG1 transcript with partial intron 2 retention.

89

Therefore, these five SNVs were not identified in this study to be associated with partial intron 2 retention of PRKAG1 in the DLPFC or hippocampus.

3.2.10 PPP1R16B PPP1R16B (protein phosphatase 1, regulatory subunit 16B) encodes a regulatory protein for protein phosphatase 1, which has many regulatory and signaling roles involved in neuronal activities, brain signaling cascades, apoptosis and other processes (Shopik et al., 2013). PPP1R16B was implicated in schizophrenia by the most recent schizophrenia GWAS (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014), and although it has not been implicated in schizophrenia in any other study, it regulates a gene that plays a role in many neuronal processes and is therefore a potential schizophrenia candidate gene. Three SNVs in PPP1R16B were found to be associated with schizophrenia and were identified by the sQTLSeekeR dataset to be associated with the PPP1R16B isoform with exon 7 spliced out (Table 17).

Table 17. SNVs associated with alternative splicing of PPP1R16B.

SCZ Splicing Alternative SNV Location EUR MAF p value p value Splicing Event NM_001172735.2: rs4368395 G = 0.329 3.32E-09 7.2E-05 c.250+5635G>A Exon 7 NM_001172735.2: rs6093098 T = 0.327 4.85E-09 7.2E-05 skipping of c.250+11459T>C PPP1R16B NM_001172735.2: rs10854216 A = 0.3231 3.66E-08 1.24E-04 c.-102+13484A>G SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia.

92 brain tissue samples were genotyped for the three PPP1R16B SNVs associated with alternative splicing. For rs4368395, 54 individuals were homozygous for the A allele, 30 individuals were heterozygous and 8 individuals were homozygous for the G allele. For rs6093098, 54 individuals were homozygous for the C allele, 31 individuals were heterozygous and 7 individuals were homozygous for the A allele. For rs10854216, 55

90 individuals were homozygous for the G allele, 28 individuals were heterozygous and 9 individuals were homozygous for the A allele. cDNA from the brain tissue samples was amplified using primers in exons 6 and 8 of PPP1R16B (forward primer: 5’ AAAATCAACGAGATGCGGGT 3’, reverse primer: 5’ CATGGATGAGATGCCAATAG 3’). Only a 313bp fragment corresponding to the full PPP1R16B transcript including exon 7 was identified for each genotype but no 187bp fragment corresponding to the smaller PPP1R16B transcript with exon 7 spliced out was identified in the DLPFC or hippocampus (Figure 35).

91 A)

B) Figure 35. No identification of exon 7 skipped PPP1R16B transcript. A) Amplification of PPP1R16B DLPFC cDNA from 7 individuals homozygous for the A, C and G alleles, 7 heterozygous individuals and 7 individuals homozygous for the G, A, and A alleles of rs4368395, rs6093098 and rs10854216, respectively, only identified a 313bp fragment corresponding to the PPP1R16B transcript corresponding to the full length PPP1R16B isoform with exon 7 spliced in but not a 187bp fragment corresponding to the PPP1R16B isoform with exon 7 spliced out. B) Amplification of PPP1R16B hippocampal cDNA from 4 individuals homozygous for the A, C and G alleles, 4 heterozygous individuals and 4 individuals homozygous for the G, A, and A alleles of rs4368395, rs6093098 and rs10854216, respectively, only identified a 313bp fragment corresponding to the PPP1R16B transcript corresponding to the full length PPP1R16B isoform with exon 7 spliced in but not a 187bp fragment corresponding to the PPP1R16B isoform with exon 7 spliced out.

Therefore, rs4368395, rs6093098 and rs10854216 were not identified in this study to be associated with splicing out of exon 7 of PPP1R16B in the DLPFC or hippocampus.

92 3.2.11 RPGRIP1L Retinitis pigmentosa GTPase regulator interacting protein 1-like (RPGRIP1L) encodes for a protein that localizes to the basal bodies and transition zone of primary cilia. Mutations in this gene have been found to cause Joubert Syndrome type 7 and Meckel Syndrome type 5, both ciliopathies, and RPGRIP1L knockout mice have a ciliopathy phenotype and die at birth (Delous et al., 2007; Vierkotten, Dildrop, Peters, Wang, & Ruther, 2007). Vierkotten et al. (2007) showed that RPGRIP1L is necessary for multiple processes during embryonic development, including left-right asymmetry development, neural tube formation and cilium-related sonic hedgehog signaling. RPGRIP1L has been found to regulate proteasomal activity in the primary cilia of mouse embryonic fibroblasts, and interacts with Psmd2, a 19S proteasomal subunit component (Gerhardt et al., 2015). RPGRIP1L has also been found to interact with NEK4 at the basal bodies of the cilia and is thought to act as a scaffolding protein to recruit proteins to the primary cilium (Coene et al., 2011). RPGRIP1L has also been found to be in a region associated with obesity (Thorleifsson et al., 2009). Stratigopoulos et al. (2008) found that the expression of RPGRIP1L in the mouse hypothalamus decreased after fasting as well as exposure to cold (Stratigopoulos et al., 2008). These researchers later identified a GWAS obesity risk variant that effects a proximal enhancer leading to decreased RPGRIP1L levels and in turn decreased leptin receptor localization to the cilium in the hypothalamus, and therefore affects leptin signaling and food intake (Stratigopoulos, LeDuc, Cremona, Chung, & Leibel, 2011). RPGRIP1L is a protein with many functions that has been found to recruit multiple proteins to the primary cilia and whose expression has been shown to be regulated by environmental factors and as well genetic variants.

3.2.11.1 RPGRIP1L SNVs Associated with Alternative Splicing Ten SNVs were found to have a trend for association with bipolar disorder and were identified by the DLPFC sQTLs dataset to be associated with exon 20 skipping of RPGRIP1L (Table 18). One of these variants was also found to have a trend for association with schizophrenia.

93 Table 18. SNVs associated with alternative splicing of RPGRIP1L.

EUR BD p SCZ p Splicing Alternative SNV Location MAF value value p value Splicing Event NM_001127897.3: A = 4.73E rs8050354 9.12E-42 c.3376+3769G>A 0.3231 -05 NM_001127897.3: G = 6.63E rs1421091 7.31E-41 c.-2066T>G 0.3230 -05 NM_001127897.3: G = 3.15E rs6499632 2.45E-15 c.*2907A>G 0.4851 -04 NC_000016.10:g.5 A = 3.58E 7.23E rs1362572 1.54E-26 3554008G>A 0.3330 -04 -06 NC_000016.10:g.5 A = 4.02E rs7193898 4.46E-12 3551009G>A 0.2744 -04 NC_000016.10:g.5 T = 4.88E Exon 20 rs9788828 6.51E-05 3543607G>T 0.2495 -04 skipping of NM_001127897.3: A = 5.57E RPGRIP1L rs8047089 1.14E-39 c.3461+1220G>A 0.2992 -04 NM_001127897.3: G = 6.80E rs3760008 3.04E-39 c.*737T>G 0.2962 -04 NM_001127897.3: A = 7.57E rs1946155 1.14E-39 c.*1693G>A 0.2962 -04 NM_001127897.3: A = 1.85E rs4784321 5.61E-19 c.3596-309G>A 0.2326 -03 SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, BD = bipolar disorder, SCZ = schizophrenia.

Of the 10 SNVs associated with exon 20 skipping of RPGRIP1L, only five were in close proximity to exon 20 of RPGRIP1L and are the most plausible candidates to cause this splicing change. The five SNVs were rs8050354, which is in intron 24 of RPGRIP1L, rs8047089, which is in intron 25 of RPGRIP1L, rs4784321, which is in intron 26 of RPGRIP1L, and rs3760008 and rs1946155, which are in exon 27 of RPGRIP1L. These five SNVs were inputted into RBPMap and SpliceAid to determine the mechanism by which these SNVs are likely to cause exon 20 skipping.

RBPMap did not yield any significant results for any of the five SNVs and SpliceAid did not yield any significant results for rs8050354 or rs4784321.

94 For rs8047089, SpliceAid identified MBNL1 to only to bind to the G allele but not the A allele. Muscleblind like splicing regulator 1 (MBNL1) has been found to bind to and compete for the same sequence as the U2AF65 splicing factor, which recruits the U2 snRNP necessary for proper splicing, and therefore MBNL1 promotes exon skipping (Warf, Diegel, von Hippel, & Berglund, 2009).

For rs3760008, SpliceAid identified Sam68 to only bind to the T allele but not the G allele. Sam68 has been found to cause exon 7 skipping in the SMN2 gene (Pedrotti et al., 2010).

For rs1946155, SpliceAid identified hnRNP I to only bind to the A allele but not the G allele. Heterogeneous nuclear ribonucleoprotein I (hnRNP I) or polypyrimidine tract binding protein 1 (PTBP1) is a member of the heterogeneous nuclear ribonucleoproteins that bind to intronic splicing silencers and promote exon skipping. However, PTBP1 is only expressed in neural stem and progenitor cells but not mature neurons, and therefore would not contribute to splicing in the adult brain tissues used in this study but may contribute to RPGRIP1L splicing during development in neural stem cells.

3.2.11.2 Identification of Two RPGRIP1L Transcript Isoforms 92 brain tissue samples were genotyped for the ten RPGRIP1L SNVs associated with alternative splicing (Table 19).

95 Table 19. Brain tissue sample genotypes of SNVs associated with alternative splicing of RPGRIP1L.

# Individuals # # Individuals Major Minor SNV homozygous for Heterozygous homozygous for Allele Allele Major Allele Individuals Minor Allele rs8050354 G A 42 37 13 rs1421091 T G 40 37 15 rs6499632 A G 21 44 27 rs1362572 G A 42 42 8 rs7193898 G A 44 44 4 rs9788828 G T 46 43 3 rs8047089 G A 46 37 9 rs3760008 T G 47 36 9 rs1946155 G A 47 36 9 rs4784321 G A 46 43 3 cDNA from these brain tissue samples was amplified using primers in exons 19 and 21 of RPGRIP1L (forward primer: 5’ TCGTGGATATCATGCCACATCA 3’, reverse primer: 5’ GCTTTGTTCTGCAAGCTGAC 3’). Two RPGRIP1L transcripts were identified at varying levels between individuals: a 239bp fragment corresponding to the full RPGRIP1L transcript including exon 20 and a 137bp fragment corresponding to a smaller RPGRIP1L transcript with exon 20 spliced out (Figure 36). Both fragments were gel extracted from 2.5% agarose EtBr post-stained gel, gel purified and sequenced, and both the full length and exon 20 skipped RPGRIP1L sequences were identified.

96

Figure 36. Identification of two RPGRIP1L transcript isoforms through cDNA PCR amplification. cDNA PCR amplification using forward and reverse primers in exons 19 and 21 of RPGRIP1L amplified a 239bp fragment corresponding to the full length RPGRIP1L transcript and a 137bp fragment corresponding to a smaller RPGRIP1L transcript isoform excluding exon 20.

3.2.11.3 Quantification of RPGRIP1L Isoforms and Association to SNVs associated with RPGRIP1L exon 20 skipping The RPGRIP1L isoforms with exon 20 spliced in or out were quantified using probe- based assays run as ddPCRs (forward primer: 5’ AGCTGACCTTCAGATAGTAAAGAC 3’, reverse primer: 5’ AGAAGGTATCTTTCGTGGATATCAT 3’, probe for full length: 5’ TGGGAACATGTGGAACAGTCAGCA 3’, probe for exon 20 skipping: 5’ ACTGCCTTCTTGTGAAACACTCTGATGT 3’). The fractional abundance of the number of copies of the exon 20 skipped isoform compared to the total number of copies of both isoforms was identified in the DLPFC and hippocampus of each sample and was run as an ANOVA with each sample’s genotype. The RPGRIP1L transcript with exon 20 spliced out is significantly decreased in the DLPFC of individuals with the schizophrenia and bipolar disorder risk alleles A, G and A of rs8047089, rs3760008 and rs1946155, respectively, with a dose allele effect (rs8047089: p < 2e-16, F (2, 84) = 338.9; rs1946155: p < 2e-16, F (2, 84) = 271.9; rs3760008: p < 2e-16, F (2, 84) = 135.9) (Figure 37).

97 Genotype 100 Hom. Maj.

Het.

Hom. Min.

75

50 SplicingRatio

25

0 Hom. Maj. Het. Hom. Min. Genotype

Figure 37. Quantification of RPGRIP1L transcripts in the DLPFC. Splicing ratio of the RPGRIP1L exon 20 spliced out transcript compared to all RPGRIP1L transcripts amplified shows a significant decrease with the schizophrenia and bipolar disorder risk allele in an allele dose dependent manner in the DLPFC. (Hom. = Homozygous, Maj. = Major Allele, Min. = Minor Allele, Het. = Heterozygous)

The RPGRIP1L transcript with exon 20 spliced out is significantly decreased in the hippocampus of individuals with the schizophrenia and bipolar disorder risk alleles A, G and A of rs8047089, rs3760008 and rs1946155, respectively, with a dose allele effect (rs8047089: p < 2e-16, F (2, 60) = 229.9; rs1946155: p < 2e-16, F (2, 60) = 182.6; rs3760008: p < 2e-16, F (2, 60) = 182.6) (Figure 38).

98 Genotype

Hom. Maj.

Het.

Hom. Min. 75

50 SplicingRatio

25

Hom. Maj. Het. Hom. Min. Genotype

Figure 38. Quantification of RPGRIP1L transcripts in the hippocampus. Splicing ratio of the RPGRIP1L exon 20 spliced out transcript compared to all RPGRIP1L transcripts amplified shows a significant decrease with the schizophrenia and bipolar disorder risk alleles in an allele dose dependent manner in the hippocampus. (Hom. = Homozygous, Maj. = Major Allele, Min. = Minor Allele, Het. = Heterozygous)

3.2.11.4 Protein Function Prediction of RPGRIP1L Isoforms The full length RPGRIP1L transcript encodes for a protein with 1,315 amino acids, while the RPGRIP1L transcript skipping exon 20 causes an in-frame deletion of 34 amino acids and results in a protein with 1,281 amino acids. The full length and exon 20 skipped RPGRIP1L protein sequences were input into InterPro, PANTHER, Pfam and PredictProtein.

The deletion of 34 amino acids resulting from exon 20 skipping of RPGRIP1L does not affect any known domains identified by InterPro, PANTHER and Pfam. PredictProtein identified two protein binding regions in the full length RPGRIP1L protein sequence and four protein binding regions in the exon 20 skipped RPGRIP1L protein sequence.

99 The effect on RPGRIP1L protein function from exon 20 skipping of this transcript are unclear from the protein prediction results, and require experimental testing to determine their effects.

3.2.12 SNAP91 SNAP91 (synaptosome associated protein 91) encodes a protein that localizes to the synapse and has been found to play a role in regulating synaptic vesicles in the rat hippocampus (Petralia et al., 2013). Nine SNVs were found to be associated with both schizophrenia and bipolar disorder and were identified by the DLPFC sQTLs dataset to be associated with intron 11 retention of SNAP91 (Table 20).

Table 20. SNVs associated with alternative splicing of SNAP91.

EUR BD p SCZ p Splicing Alternative SNV Location MAF value value p value Splicing Event rs2022265 NM_001242792.1 C = 4.35E 6.22E 1.38E-17 :c.2015-1196T>C 0.4592 -05 -09 rs7741310 NC_000006.11:g. C = 1.07E 4.38E 1.35E-08 84250330C>T 0.4254 -04 -07 rs217291 NM_001242792.1 G = 1.08E 3.73E 1.69E-21 :c.131-18642A>G 0.4443 -04 -09 rs1171114 NM_001170423.1 T = 1.24E 1.26E 2.74E-08 :c.-125-3608T>C 0.4254 -04 -06 Intron 11 rs217308 NM_001242792.1 G = 1.33E 7.88E 1.96E-21 retention of :c.453-1058A>G 0.4443 -04 -09 SNAP91 rs217323 NM_001242792.1 T = 1.53E 2.71E 2.12E-21 :c.659-3180C>T 0.4433 -04 -09 rs1546977 NM_001242792.1 C = 2.05E 5.51E 9.45E-21 :c.1141-981T>C 0.4433 -04 -09 rs217328 NM_001242792.1 T = 2.15E 4.16E 9.70E-21 :c.765-3101C>T 0.4433 -04 -09 rs1158059 NM_001242792.1 G = 4.07E 4.46E 1.83E-12 :c.*11-329G>A 0.4831 -04 -07 SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, BD = bipolar disorder, SCZ = schizophrenia.

92 brain tissue samples were genotyped for the ten SNAP91 SNVs associated with alternative splicing (Table 21).

100 Table 21. Brain tissue sample genotypes of SNVs associated with alternative splicing of SNAP91.

# Individuals # # Individuals Major Minor SNV homozygous for Heterozygous homozygous for Allele Allele Major Allele Individuals Minor Allele rs2022265 T C 21 49 22 rs7741310 T C 28 52 12

rs217291 A G 30 47 15 rs1171114 C T 28 52 12

rs217308 A G 30 47 15

rs217323 C T 30 47 15 rs1546977 T C 27 48 17

rs217328 C T 28 47 17 rs1158059 A G 21 55 16

cDNA from the brain tissue samples was amplified using primers in exon 11 and exon 12 of SNAP91 (forward primer: 5’ ACTATTGACACATCCCCACCG 3’, reverse primer: 5’ CCCTCCAGAGGAAAAGTCTGG 3’). Only a 120bp fragment corresponding to the SNAP91 transcript without intron 11 retention was identified for each genotype but no 1,895bp fragment corresponding to the SNAP91 transcript with intron 11 retention was identified in the DLPFC or hippocampus (Figure 39).

101 A)

B) Figure 39. No identification of SNAP91 transcript with intron 11 retention. A) Amplification of SNAP91 DLPFC cDNA from 7 individuals homozygous for the T, T, C, A, A, C, T, C and A alleles, 7 heterozygous individuals and 7 individuals homozygous for the C, C, T, G, G, T, C, T and G alleles of rs2022265, rs7741310, rs1171114, rs217291, rs217308, rs217323, rs1546977, rs217328 and rs1158059, respectively, only identified a 120bp fragment corresponding to the SNAP91 transcript without intron 11 retention but not a 1,895bp fragment corresponding to the SNAP91 transcript with intron 11 retention. B) Amplification of SNAP91 hippocampal cDNA from 4 individuals homozygous for the T, T, C, A, A, C, T, C and A alleles, 4 heterozygous individuals and 4 individuals homozygous for the C, C, T, G, G, T, C, T and G alleles of rs2022265, rs7741310, rs1171114, rs217291, rs217308, rs217323, rs1546977, rs217328 and rs1158059, respectively, only identified a 120bp fragment corresponding to the SNAP91 transcript without intron 11 retention but not a 1,895bp fragment corresponding to the SNAP91 transcript with intron 11 retention.

Therefore, these nine SNVs were not identified in this study to be associated with intron 11 retention of SNAP91 in the DLPFC or hippocampus.

102 3.2.13 STAB1 STAB1 (stabilin 1) encodes a transmembrane protein that binds to gram negative and gram positive bacteria and is thought to be involved in the defense against bacterial infections (Adachi & Tsujimoto, 2002). It has been previously implicated in schizophrenia through pathway analysis of schizophrenia GWAS results (Y. H. Lee, Kim, & Song, 2013) and was identified again to be associated with schizophrenia in the most recent and comprehensive schizophrenia GWAS (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). One SNV in STAB1 was found to be associated with schizophrenia and was found in the SPIDEX dataset to be predicted to cause exon 42 skipping (Table 22).

Table 22. SNV predicted to alter splicing of STAB1.

SCZ Splicing Alternative SNV Location EUR MAF p value Z Score Splicing Event NM_015136.2: Exon 42 skipping rs66824127 T = 0.4334 3.58E-06 -2.492 c.4490-71T>C of STAB1 SNV = single nucleotide polymorphism, EUR MAF = minor allele frequency for European Caucasian population, SCZ = schizophrenia.

92 brain tissue samples were genotyped for the STAB1 SNV predicted to alter splicing. For rs66824127, 23 individuals were found to be homozygous for the C allele, 47 individuals were found to be heterozygous and 22 individuals were found to be homozygous for the T allele. cDNA from these brain tissue samples was amplified using primers in exons 41 and 44 of STAB1 (forward primer: 5’ CTCCGGCAATGGCATCTTCT 3’, reverse primer: 5’ GGATGCCATCCCCGCTGTA 3’), however no or very little amplification was visible after gel electrophoresis due to the low expression of STAB1 in the adult DLPFC and hippocampus. STAB1 is expressed in the adolescent brain and would be an interesting candidate to test in adolescent brain tissue samples.

103 Chapter 4

Discussion and Conclusions

This study identified alternative transcripts that are associated with risk variants for schizophrenia, bipolar disorder or both disorders. Therefore these alternatively spliced transcripts may confer risk for these disorders. This study used multiple datasets, including two GWAS and five datasets identifying SNVs that are associated with a splicing change or predicted to cause a splicing change. The predicted alternative splicing event was then tested using cDNA from brain tissue samples to confirm the identification of these events. In six genes, multiple alternative transcripts were identified. In four of these genes, the alternative splicing events were quantified in each of the brain tissue samples and were found to be significantly associated with the schizophrenia and bipolar disorder risk alleles (Table 23). Thus, these alternative transcripts may be involved in risk for schizophrenia and bipolar disorder.

104 Table 23. Summary of major findings: alternatively spliced transcripts associated with schizophrenia and bipolar disorder risk alleles.

Splicing p Splicing BD SCZ Alternative SNV Gene value from Dataset p value p value Splicing Event brain samples Decreased DLPFC: exon 3 5.27E- 1.26E-15, rs4906337 APOPT1 PTV skipping 13 hippocampus: associated with 1.1E-13 risk allele Increased DLPFC: exons 2 and 3 6.00E- 2.85E-06, rs3740400 AS3MT SPIDEX skipping 17 hippocampus: associated with 8.81E-06 risk allele DLPFC: Increased exon DLPFC 2.40E- 7.78E- 2.18E-13, 4 skipping rs2230535 NEK4 sQTLs 04 08 hippocampus: associated with 0.0274 risk allele Decreased DLPFC: exon 20 DLPFC 5.57E- < 2E-16, rs8047089 RPGRIP1L skipping sQTLs 04 hippocampus: associated with < 2E-16 risk allele

4.1 Implications Previous focus of GWAS variants was to identify non-synonymous, protein-coding changing variants that cause more straightforward changes to protein structure. However, due to the fact that many of the GWAS variants identified in psychiatric disorders could not be explained by a coding region SNV the focus was shifted to identify variants affecting gene expression, including splicing. This study identified the presence of alternative splicing changes that are associated with psychiatric disorders and suggests that these transcripts may contribute risk for these disorders.

In addition to identifying individual genes with disease associated splicing changes, these splicing changes collectively indicate pathogenic gene networks and pathways implicated in these disorders. Identifying these pathogenic gene networks and pathways is critical to

105 understand the underlying biology of psychiatric disorders in order to develop therapeutic targets. For example, psychiatric disorder-associated splicing changes in NRG1, NRG3 and ERBB4 indicate pathogenic changes to the NRG-ErbB signalling pathway, thus emphasizing the importance of identifying gene networks contributing to disease and opening the possibility of therapeutic interventions of signalling pathways impacting the network of risk genes (Law et al., 2012). The findings from this study also provide avenues for further study of the pathways where these alternatively spliced genes are found, which can help to better understand the mechanisms behind the disorder and potentially allow for new animal models and treatment targets.

4.1.1 Alternative transcripts converge on the mitochondria and the cilia The four genes in which alternative transcripts were identified in this study converged on two cellular components: the mitochondria and the cilia.

APOPT1, a pro-apoptotic protein that localizes to the mitochondria, was identified in this study to have an alternatively spliced transcript with exon 3 skipped that was found to be decreased in the DLPFC and hippocampus of individuals with the schizophrenia risk alleles compared to individuals without the schizophrenia risk alleles. The role of mitochondrial dysfunction has already been highly implicated in both schizophrenia and bipolar disorder. Robicsek et al (2013) showed that abnormal neuronal differentiation occurred due to mitochondrial dysfunction in induced pluripotent stem cells (iPSCs) derived from schizophrenia patient hair follicles. Evidence of mitochondrial dysfunction including altered proteins and transcripts involved in mitochondrial function, energy metabolism and the oxidative stress response has been found in postmortem DLPFC tissues from schizophrenia patients compared to controls (Prabakaran et al., 2004). A decreased number of mitochondria has also been found in the prefrontal cortex and caudate of schizophrenia patients compared to controls (Uranova et al., 2001). Further, decreased complex I subunit levels and increased oxidative stress have been identified in the prefrontal cortex of patients with schizophrenia and bipolar disorder (Andreazza, Wang, Salmasi, Shao, & Young, 2013).

106

AS3MT, a component of the arsenic metabolism pathway, was identified in this study to have an alternatively spliced transcript with exons 2 and 3 skipped that was found to be increased in the DLPFC and hippocampus of individuals with the schizophrenia risk allele compared to individuals without the schizophrenia risk allele. As discussed previously, arsenic exposure in rats has been found to cause increased ROS and affect the mitochondria, therefore anything that affects arsenic metabolism may have an indirect consequence on the mitochondria. The predicted effect on the AS3MT protein with exons 2 and 3 skipped is a decrease in the methyltransferase efficiency, and therefore decreased arsenic metabolism efficiency. Individuals with the schizophrenia risk allele associated with the AS3MT protein with exons 2 and 3 skipped will likely have increased ROS and mitochondrial dysfunction due to decreased arsenic metabolism.

Therefore two genes involved in mitochondrial function: APOPT1 and AS3MT, were identified in this study. This supports the already established role of mitochondrial dysfunction in psychiatric disorders and also provides specific mitochondrial targets for further study.

NEK4 was identified in this study to have an alternatively spliced transcript with exon 4 skipped that was found to be increased in the DLPFC and hippocampus of individuals with the schizophrenia and bipolar disorder risk alleles compared to individuals without the schizophrenia and bipolar disorder risk alleles. RPGRIP1L was found to have an alternatively spliced transcript with exon 20 skipped that was found to be decreased in the DLPFC and hippocampus of individuals with the schizophrenia and bipolar disorder risk alleles compared to individuals without the schizophrenia and bipolar disorder risk alleles. NEK4, a serine-threonine kinase, was found to co-localize to the basal body of the primary cilium with RPGRIP1L, a known ciliopathy-associated protein (Coene et al., 2011). Therefore these proteins both converge on the primary cilia. Limited research exists on the involvement or dysregulation of the primary cilia and its relationship to schizophrenia and bipolar disorder, however one of the most highly implicated risk genes for these disorders, disrupted in schizophrenia 1 (DISC1), has been found to localize to

107 and play a role in the function of primary cilia (Marley & von Zastrow, 2010). DISC1 was found to localize to the basal bodies of the primary cilia in rat striatal neurons and a fibroblast cell line, and knockdown of DISC1 in these cells resulted in decreased primary cilia which was restored after re-expression of DISC1 (Marley & von Zastrow, 2010). Interestingly, this study also showed that the dopamine receptors D1R and D2R, both highly implicated in psychiatric disorders, localized to the primary cilia on these cells (Marley & von Zastrow, 2010). A more recent study by Marley & von Zastrow (2012) showed that knockdown of 20 genes associated with psychiatric disorders either through interactions with DISC1, rare variants associated with psychiatric disorders or association by GWAS also resulted in decreased ciliated cells. Interestingly, transfecting these cells with one of the top schizophrenia GWAS hits, MIR137, increased ciliation and the length of the cilia (Marley & von Zastrow, 2012). Furthermore, lithium, the most common and effective treatment of bipolar disorder, has also been found to increase the length of primary cilia in striatal neuronal cell culture and in the mouse brain (Miyoshi, Kasahara, Miyazaki, & Asanuma, 2009). Thus, multiple lines of evidence implicate the primary cilium in psychiatric disorders.

RPGRIP1L has also been found to be important for left-right asymmetry development (Vierkotten et al., 2007). Interestingly, abnormal asymmetry has been identified in schizophrenia patients. Oertel et al. (2010) identified reduced left compared to right asymmetry in the temporal lobe of schizophrenia patients when compared to controls based on gray matter volume and functional activation. First degree unaffected relatives of schizophrenia patients were also studied and showed an intermediate reduced asymmetry phenotype, suggesting a genetic component (Oertel et al., 2010). Additionally, motile cilia in the brain have been found to play an important role in neuronal migration, left-right asymmetry and body plan development (Sawamoto et al., 2006). Therefore, psychiatric disorder risk variants, such as those identified in RPGRIP1L causing alternative splicing changes in this ciliary-associated gene, may contribute to the abnormalities in left-right asymmetry and laterality observed in schizophrenia.

108 Therefore two genes involved in ciliary function: NEK4 and RPGRIP1L, were identified in this study. This further supports role of ciliary dysfunction in psychiatric disorders and also provides specific ciliary targets for further study.

The identification of psychiatric disorder-associated alternatively spliced isoforms that converge on the mitochondria and cilia further implicates these cellular components as dysfunctional in psychiatric disorders. It not only provides novel genes to target and continue to research in terms of their relationship and contribution to risk for psychiatric disorders, but also identifies specific alternatively spliced transcripts of these genes as risk transcripts for psychiatric disorders.

4.1.2 Combined Alternative Splicing Model of Psychiatric Risk Psychiatric disorders are complex genetic disorders with multiple genes and variants involved, likely each contributing a small amount of risk in an individual that manifests in disease when the genetic load passes a certain threshold or environmental factors interact with genetic risk factors to pass the threshold of disease. The identification of multiple variants that each causes a splicing change fits with this additive model. Multiple small transcript changes converging on similar pathways could cause significantly increased risk to schizophrenia and bipolar disorder. For example, one could argue that an individual with the AS3MT risk allele and increased amount of AS3MT exon 2 and 3 skipped transcript and protein would have a decreased tolerance to the detrimental effects of arsenic, which would increase this individual’s risk for psychiatric disorders. Along with other risk variants that affect alternative splicing, such as variants affecting APOPT1, this would lead to an increase in the risk for psychiatric disorders. An elevated level of variants that cause alternative splicing changes would increase an individual’s risk for psychiatric disorders, especially if these variants affect genes that act in similar pathways.

APOPT1 is a pro-apoptotic protein that targets the mitochondria, therefore different amounts of this protein’s isoforms may affect when and under what conditions apoptosis

109 takes place. In conjunction with potentially increased ROS due to decreased arsenic metabolism by inefficient AS3MT, this may cause dysfunctional mitochondria that may harm the cell’s energy levels and affect neuronal function, and may not be able to be properly degraded if the apoptosis mechanisms, including the function of APOPT1, are not properly regulated.

RPGRIP1L is a protein that localizes to the primary cilia and has been found to play an important role in sonic hedgehog signaling, a signaling pathway that occurs at the primary cilia and is crucial for embryonic development and neuronal proliferation and differentiation (Vierkotten, Dildrop, Peters, Wang, & Ruther, 2007). Alterations in the levels of RPGRIP1L transcripts, in combination with alterations in other primary cilia genes including NEK4, could affect primary ciliary function as well as sonic hedgehog signaling and embryonic development, contributing to the altered neural development and connections found in psychiatric disorders.

A combination of small changes affecting mitochondrial or primary cilia function may collectively lead to dysfunction of these organelles, and ultimately lead to the manifestation of schizophrenia or bipolar disorder. Mitochondrial and primary ciliary dysfunction would lead to altered neurodevelopment, neuronal function, immuno- inflammatory pathways and response, and altered neurotransmitter system function, including dopaminergic function, all of which are hallmarks of schizophrenia and bipolar disorder and would contribute to the improper neuronal connections underlying the symptoms of these disorders.

4.1.3 Negative Results Alternatively spliced isoforms were identified in six genes that were predicted by five datasets of sQTLs. However, in eight genes the alternative splicing change that was predicted by the five sQTL datasets used in this study was not identified. The alternative transcripts of the eight genes may not be expressed in the DLPFC or hippocampus of the brain, may be expressed in other developmental time periods or may be a result of activity dependent splicing.

110

Not all of the splicing prediction datasets used data from brain tissue samples to identify associations between genetic variants and splicing changes. The PTV and DLPFC sQTLs datasets used RNAseq and genotype data from brain tissue samples, however the Altrans, sQTLseekeR, and SPIDEX datasets used RNAseq and genotype data from a combination of multiple tissues to identify sQTLs. Therefore these alternatively spliced isoforms may exist in other tissues but would not be detected in this study as only brain tissues from two brain regions were used: the DLPFC and the hippocampus. Furthermore, these alternative splicing changes may not exist specifically in the adult brain, but may be present in the brain during other developmental stages. For example, the neuregulin 3 gene (NRG3) has many transcript isoforms that have differential expression in the DLPFC during different developmental periods (Paterson et al., 2017). The genes tested in this study may also have transcript isoforms that are expressed only in certain developmental periods, and therefore may not be identified due to the use of only adult brain tissue samples in this study.

Alternatively spliced transcripts that are associated with environmental stressors have also been identified, such as an increase in intron 4 retention of the acetylcholinesterase gene (AChE) following inhibition of the AChE protein or forced swimming stress in mice (Kaufer et al., 1998). Therefore these alternatively spliced transcripts may result from activity-dependent splicing and would not be identified in this study.

4.1.4 Useful method to identify splicing changes in other complex disorders This study provides a method to identify GWAS variants that are functionally contributing risk to psychiatric disorders by altering splicing, and would be a useful model for identifying functional GWAS SNVs for other complex genetic disorders.

This study used datasets of sQTLs from the Genotype Tissue Expression Project (GTEx) and the CommonMind Consortium as well as a splicing model to determine variants that are associated with and predicted to cause an alternative splicing change. These datasets

111 were then mined for variants that had been found to be associated with schizophrenia and bipolar disorder in order to identify candidate sQTLs. The candidate sQTLs were then tested in brain tissue samples to determine the presence of alternatively spliced isoforms and their association to the schizophrenia and bipolar disorder risk alleles.

Alternative splicing changes have been identified in many complex genetic disorders. Therefore this method could be used for any genetic disorder by comparing the GWAS variants of the disorder with the variants found in datasets of sQTLs to identify candidate SNVs and splicing isoforms. These candidate sQTLs and their associated splicing changes could then be tested in tissue samples related to the specific disorder of interest.

4.2 Future Directions

4.2.1 Causal SNV confirmation Genetic risk variants for psychiatric disorders were found to be associated with an alternative splicing change in this study. The mechanism of how these risk variants are causing the splicing change was predicted in this study, however whether it is these specific risk variants, a combination of variants or another variant in LD with the risk variants that are causing the associated splicing change cannot be concretely known unless it is experimentally tested. Predictions were inferred based on the sequence and SNV change, however many proteins that bind to these sequences can induce both exon inclusion and exclusion depending on the larger surrounding sequence as well the expressed trans factors. For example, Nova proteins bound to exonic sequences block exon inclusion but promote exon inclusion when bound to intronic sequences (Ule et al., 2006). Therefore a future direction of this study would be to determine whether the SNVs identified in this study are the functional SNVs causing these associated splicing changes. A mini-gene assay with the risk and non-risk alleles transfected into cell culture could be used to test the effect of each allele on the splicing of the predicted gene. A CRISPR/Cas9 assay could also be used to edit the predicted splice-altering SNVs to determine the effect of the risk and non-risk alleles on splicing.

112 Different splicing trans factors that bind to these sequences are also present in different cell types in different amounts, therefore a variant may cause a splicing change only in a certain cell type or tissue. It would also be important to determine whether the risk allele causes its associated splicing change only in neuronal cell types or if the splicing change is also caused by this variant in other cell types. This could be done by quantifying the expression of these alternatively spliced isoforms in other human tissues and determining the association to genotype or by performing mini-gene assays in multiple cell lines that express different splicing factors. Furthermore, determining which trans splicing factor is binding preferentially to the risk or non-risk allele and causing the splicing change is also necessary. An electrophoretic mobility shift assay (EMSA) could be performed to determine whether the splicing trans factors identified by RBPMap and SpliceAid bind preferentially to the risk or non-risk allele.

4.2.2 Experimentally tested protein function Function at the protein level can be predicted based on the protein sequences of these alternatively spliced transcripts, however the specific protein function of each alternatively spliced transcript, especially in different cell types and environmental conditions, cannot be known until it is tested experimentally. Therefore, the function of the proteins encoded by these alternatively spliced transcripts and how they contribute to cellular and system function would need to be experimentally tested.

For each of the four genes identified in this study to have an alternatively spliced transcript associated with psychiatric disease, a mini-gene assay in which either the full length or alternatively spliced sequence was transfected into neuronal cell culture could be performed in order to determine the function of the proteins encoded by the alternatively spliced transcripts and how this function may differ from the proteins encoded by the full length transcripts. The cellular effects from both ends of the spectrum, with only the protein encoded by the full length transcript or only the protein encoded by the alternatively spliced transcript present, should be tested as well as the effects of biologically relevant levels with both proteins present at different levels based on the risk and non-risk alleles.

113

For APOPT1, the function of the exon 3 skipped transcript, whether it is non-sense mediated decay or a functional protein, would be identified under normal and oxidative stress conditions in neural cell types. The effect of changing the ratio of the exon 3 skipped and un-skipped APOPT1 transcripts would also be identified under normal and oxidative stress conditions due to serum withdrawal. As well, no studies of APOPT1 in brain cells exist to date to our knowledge, therefore the role of APOPT1 in neural cell types would also be determined. For AS3MT, the function of the exon 2 and 3 skipped transcript would be determined under normal and arsenic-treated conditions in neural cell types. The effect of changing the ratio of the exon 2 and 3 skipped and un-skipped AS3MT transcripts would also be identified under normal and arsenic-treated conditions. In these studies, the effect on the mitochondria, reactive oxygen species and apoptosis would be determined by examining mitochondrial morphology, oxygen consumption, the enzymatic function of mitochondrial subunits, the amount of hydroxyl radicals, carbonyl groups and nitration and the amount of cell death. As well, for AS3MT, the effect on downstream neurotransmitter systems would be determined by measuring the levels of dopamine and dopamine receptors.

For NEK4, the function of the exon 4 skipped transcript would be determined in neural cell types, as well as the effect of changing the ratio of the exon 4 skipped and un-skipped NEK4 transcripts. For RPGRIP1L, the function of the exon 20 skipped transcript would be determined in neural cell types, as well as the effect of changing the ratio of the exon 20 skipped and un-skipped RPGRIP1L transcripts. In these studies, the effect on the primary cilia would be determined. As well, because DISC1 has been found to localize to the primary cilia, the potential interaction between RPGRIP1L, DISC1 and NEK4 would be explored.

Identifying not only the alternative transcripts themselves but also the specific variants that are causing these alternative splicing changes can be beneficial when testing patient samples where the genotype for these variants is known. Cell models of schizophrenia have been created by reprogramming cells from schizophrenia patients into iPSCs and

114 further differentiating them into neural cells types (Tran, Ladran, & Brennand, 2013). Creating cell models from patients with a known genotype for the sQTLs identified in this study would be a useful method to test the effects of these alternatively spliced transcripts and their protein functions in neuronal cell types known to express these genes.

4.2.3 Combined alternative splicing and environmental effects Due to the fact that these four genes exist at similar cellular structures and likely are involved in similar pathways, the combined effects of these alternative risk transcripts on cellular and neuronal function would be important to test and would be more relevant to psychiatric risk than testing their individual effects.

A future direction of the work from this study would be to determine the effect on the mitochondria of expressing the alternatively spliced transcripts of APOPT1 and AS3MT. The effect on the primary cilia of expressing the alternatively spliced transcripts of NEK4 and RPGRIP1L would also be determined. As well, the effects of expressing the alternatively spliced transcripts of all four genes (APOPT1, AS3MT, NEK4 and RPGRIP1L) would be determined as an additive model of alternative splicing. In the mitochondria, the levels of complex I subunit, oxidation, nitration and lipid peroxidation would be identified as these have been identified to be altered in the prefrontal cortex of patients with schizophrenia and bipolar disorder (Andreazza et al., 2013). The number of ciliated cells as well as the length of the cilia would be identified as knockdown of genes implicated in psychiatric disorders have been found to decrease the primary cilia (Marley & von Zastrow, 2012).

This potential dysfunction to the mitochondria and cilia would likely negatively affect neuronal differentiation as mitochondrial dysfunction has been found to effect neuronal differentiation in iPSC from schizophrenia patient hair follicles (Robicsek et al., 2013). As well, the effects of these combined alternative splicing changes on dopamine receptors 1 and 2, dopamine and tyrosine hydroxylase expression would be determined as

115 arsenic exposure has been found to effect the dopaminergic system, dopamine receptors have been found to localize to the primary cilia and the dopaminergic system has been found to be dysregulated in schizophrenia (Bauer, Praschak-Rieder, Kasper, & Willeit, 2012; Chandravanshi et al., 2014; Cross-Disorder Group of the Psychiatric Genomics et al., 2013; Kim et al., 2014; Marley & von Zastrow, 2010; Rodriguez et al., 2010).

Many studies of long term arsenic exposure in rats have identified harmful changes to the mitochondria and downstream neurotransmitter systems (Rodriguez et al., 2010). The AS3MT isoform discussed in this study with exon 2 and 3 skipping that would likely lead to a protein with decreased arsenic metabolism is human specific, therefore it would not be present in the rats used in these studies (Li et al., 2016). An interesting future study would be to express the human specific AS3MT isoform at biologically relevant levels in rats with long term arsenic exposure and non-exposure, and compare the effects to rats with only the full-length AS3MT isoform expressed. It is predicted that the already detrimental effects found in rats with chronic arsenic exposure would be exacerbated in rats expressing the human-specific AS3MT alternatively spliced isoform that are also chronically exposed to arsenic, and that rats not exposed to arsenic would not have these detrimental effects.

4.2.4 Splicing Therapies Identifying alternatively spliced transcripts that are associated with disease unlocks a potential therapeutic area. Multiple mechanisms of splicing therapies have been proposed, some of which are being tested in Mendelian diseases.

One approach is to directly target the alternatively spliced transcript associated with disease. For example, the survival motor neuron protein (SMN) has two alternatively spliced isoforms, SMN1 and SMN2, which only differ due to increased exon 7 skipping in SMN2 and therefore less functional SMN protein. Spinal muscular atrophy (SMA) is caused by mutations in SMN1, which leads to a reduction in the levels of the SMN protein. Small molecule drugs have been developed as a potential treatment of SMA that

116 decrease the amount of exon 7 skipping of SMN2 and therefore increase the amount of functional SMN protein (Andreassi et al., 2001).

Antisense oligonucleotides (AONs) are also being developed to correct the reading frame of alternatively spliced transcripts associated with disease. Duchenne Muscular Dystrophy (DMD) is caused by mutations that change the reading frame or result in premature stop codons of the DMD gene and do not produce functional dystrophin protein. AONs have been developed to bind to the exons that contain these mutations so that they are not spliced into the dystrophin transcript and a functional protein is produced (Goyenvalle et al., 2015; Jarmin, Kymalainen, Popplewell, & Dickson, 2014).

Another approach is to target the protein isoform encoded by the disease-associated alternatively spliced transcript. For example, cyclooxygenase (COX) is expressed as two protein isoforms: cyclooxygenase-1, which is constitutively expressed, and cyclooxygenase-2, which is expressed under inflammatory conditions. Small molecule drugs that only inhibit cyclooxygenase-2 have been developed for the treatment of arthritis to block its expression under inflammatory conditions (Patrignani, Capone, & Tacconelli, 2003).

The most novel method is through spliceosome-mediated RNA trans-splicing (SMaRT). This method forms a hybrid of the target mRNA with a corrected mRNA sequence at the location of the aberrant splicing region. Frontotemporal dementia with Parkinsonism linked to (FTDP-17) is caused by cis-acting mutations that cause increased splicing of exon 10 in MAPT. The SMaRT method has been used as a potential therapy for FTDP-17 that creates a hybrid of a corrected mRNA sequences without exon 10 onto the target mRNA with exon 10 to create a final mRNA sequence that does not include exon 10 (Rodriguez-Martin et al., 2009).

Identifying pathways that contain multiple genes with splicing changes associated with psychiatric disorders also provides a potential therapeutic area. Schizophrenia-associated alternative splicing changes in NRG1 and ERBB4 implicated the NRG1-ErbB4 signaling

117 pathway in schizophrenia. Upon further study of this pathway, the ERBB4 schizophrenia- risk haplotype associated with alternative splicing changes in ERBB4 was also found to be associated with increased PIK3CD mRNA and protein, however this increased PIK3CD was not found in post-mortem brain tissue samples from schizophrenia patients (Law et al., 2012). This was hypothesized to be a result of anti-psychotic medication use potentially decreasing the elevated PIK3CD levels, and this hypothesis was supported by amelioration of amphetamine-induced schizophrenia symptoms in rats treated with a PIK3CD inhibitor (Law et al., 2012). Thus, identification of NRG1 and ERBB4 schizophrenia-associated alternatively spliced transcripts led Law et al. (2012) to study the NRG1-ErbBr signaling pathway which was critical in identifying a PIK3CD inhibitor as a new potential treatment for schizophrenia.

Splicing based therapies and therapies identified through alternative splicing targets are proving to be very successful, therefore identifying targets for these therapies is beneficial. The alternatively spliced transcripts identified in this study could be used as targets for splicing based therapies in the future.

4.3 Conclusions In conclusion, this study identified schizophrenia and bipolar disorder risk variants that are associated with an alternative splicing event. This study also demonstrated a unique and useful method for identifying alternatively spliced transcripts associated with GWAS variants that could be used to study other complex genetic disorders. For psychiatric disorders, the alternatively spliced transcripts identified in this study likely confer risk for these disorders and are candidates that can be further studied for their specific contribution to schizophrenia and bipolar disorder at a cellular and systems level. As well, these genes exist in similar cellular components, therefore further implicating these cellular structures in schizophrenia and bipolar disorder. Finally, the transcripts identified in this study provide useful targets for splicing therapies. Identifying variants that cause alternatively spliced transcripts is important for identifying functional variants that contribute risk to psychiatric disorders in order to better understand the mechanisms of risk for these disorders as well as to discover new therapeutic options.

118 Bibliography

Abecasis, G. R., Cardon, L. R., & Cookson, W. O. (2000). A general test of association for quantitative traits in nuclear families. Am J Hum Genet, 66(1), 279-292. Adachi, H., & Tsujimoto, M. (2002). FEEL-1, a novel scavenger receptor with in vitro bacteria-binding and angiogenesis-modulating activities. J Biol Chem, 277(37), 34264-34270. doi:10.1074/jbc.M204277200 Ahn, A. H., & Kunkel, L. M. (1993). The structural and functional diversity of dystrophin. Nat Genet, 3(4), 283-291. doi:10.1038/ng0493-283 Alda, M. (2015). Lithium in the treatment of bipolar disorder: pharmacology and pharmacogenetics. Mol Psychiatry, 20(6), 661-670. doi:10.1038/mp.2015.4 Andreassi, C., Jarecki, J., Zhou, J., Coovert, D. D., Monani, U. R., Chen, X., . . . Burghes, A. H. (2001). Aclarubicin treatment restores SMN levels to cells derived from type I spinal muscular atrophy patients. Hum Mol Genet, 10(24), 2841-2849. Andreazza, A. C., Wang, J. F., Salmasi, F., Shao, L., & Young, L. T. (2013). Specific subcellular changes in oxidative stress in prefrontal cortex from patients with bipolar disorder. J Neurochem, 127(4), 552-561. doi:10.1111/jnc.12316 Ars, E., Serra, E., Garcia, J., Kruyer, H., Gaona, A., Lazaro, C., & Estivill, X. (2000). Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1. Hum Mol Genet, 9(2), 237-247. Awata, J., Song, K., Lin, J., King, S. M., Sanderson, M. J., Nicastro, D., & Witman, G. B. (2015). DRC3 connects the N-DRC to dynein g to regulate flagellar waveform. Mol Biol Cell, 26(15), 2788-2800. doi:10.1091/mbc.E15-01-0018 Ayres, R. U. (1992). Toxic heavy metals: materials cycle optimization. Proc Natl Acad Sci U S A, 89(3), 815-820. Azad, A. K., Curtis, A., Papp, A., Webb, A., Knoell, D., Sadee, W., & Schlesinger, L. S. (2013). Allelic mRNA expression imbalance in C-type lectins reveals a frequent regulatory SNP in the human surfactant protein A (SP-A) gene. Genes Immun, 14(2), 99-106. doi:10.1038/gene.2012.61 Bauer, M., Praschak-Rieder, N., Kasper, S., & Willeit, M. (2012). Is dopamine neurotransmission altered in prodromal schizophrenia? A review of the evidence. Curr Pharm Des, 18(12), 1568-1579. Berger, S. M., & Bartsch, D. (2014). The role of L-type voltage-gated calcium channels Cav1.2 and Cav1.3 in normal and pathological brain function. Cell Tissue Res, 357(2), 463-476. doi:10.1007/s00441-014-1936-3 Bernardes, J. S., & Pedreira, C. E. (2013). A review of protein function prediction under machine learning perspective. Recent Pat Biotechnol, 7(2), 122-141. Bhalla, K., Phillips, H. A., Crawford, J., McKenzie, O. L., Mulley, J. C., Eyre, H., . . . Callen, D. F. (2004). The de novo chromosome 16 translocations of two patients with abnormal phenotypes (mental retardation and epilepsy) disrupt the A2BP1 gene. J Hum Genet, 49(6), 308-311. doi:10.1007/s10038-004-0145-4 Black, D. L., & Grabowski, P. J. (2003). Alternative pre-mRNA splicing and neuronal function. Prog Mol Subcell Biol, 31, 187-216. Blencowe, B. J. (2006). Alternative splicing: new insights from global analyses. Cell, 126(1), 37-47. doi:10.1016/j.cell.2006.06.023

119 Boyd, J. M., Malstrom, S., Subramanian, T., Venkatesh, L. K., Schaeper, U., Elangovan, B., . . . Chinnadurai, G. (1994). Adenovirus E1B 19 kDa and Bcl-2 proteins interact with a common set of cellular proteins. Cell, 79(2), 341-351. Brandler, W. M., & Paracchini, S. (2014). The genetic relationship between handedness and neurodevelopmental disorders. Trends Mol Med, 20(2), 83-90. doi:10.1016/j.molmed.2013.10.008 Buee, L., Bussiere, T., Buee-Scherrer, V., Delacourte, A., & Hof, P. R. (2000). Tau protein isoforms, phosphorylation and role in neurodegenerative disorders. Brain Res Brain Res Rev, 33(1), 95-130. Cartegni, L., Chew, S. L., & Krainer, A. R. (2002). Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet, 3(4), 285-298. doi:10.1038/nrg775 Chandravanshi, L. P., Shukla, R. K., Sultana, S., Pant, A. B., & Khanna, V. K. (2014). Early life arsenic exposure and brain dopaminergic alterations in rats. Int J Dev Neurosci, 38, 91-104. doi:10.1016/j.ijdevneu.2014.08.009 Chang, Y. F., Imam, J. S., & Wilkinson, M. F. (2007). The nonsense-mediated decay RNA surveillance pathway. Annu Rev Biochem, 76, 51-74. doi:10.1146/annurev.biochem.76.050106.093909 Chen, D. T., Jiang, X., Akula, N., Shugart, Y. Y., Wendland, J. R., Steele, C. J., . . . Strauss, J. (2013). Genome-wide association study meta-analysis of European and Asian-ancestry samples identifies three novel loci associated with bipolar disorder. Mol Psychiatry, 18(2), 195-205. doi:10.1038/mp.2011.157 Chen, Y., Yang, L. N., Cheng, L., Tu, S., Guo, S. J., Le, H. Y., . . . Ge, F. (2013). Bcl2- associated athanogene 3 interactome analysis reveals a new role in modulating proteasome activity. Mol Cell Proteomics, 12(10), 2804-2819. doi:10.1074/mcp.M112.025882 Cho, B., Choi, S. Y., Park, O. H., Sun, W., & Geum, D. (2012). Differential expression of BNIP family members of BH3-only proteins during the development and after axotomy in the rat. Mol Cells, 33(6), 605-610. doi:10.1007/s10059-012-0051-0 Clinton, S. M., Haroutunian, V., Davis, K. L., & Meador-Woodruff, J. H. (2003). Altered transcript expression of NMDA receptor-associated postsynaptic proteins in the thalamus of subjects with schizophrenia. Am J Psychiatry, 160(6), 1100-1109. doi:10.1176/appi.ajp.160.6.1100 Coene, K. L., Mans, D. A., Boldt, K., Gloeckner, C. J., van Reeuwijk, J., Bolat, E., . . . Roepman, R. (2011). The ciliopathy-associated protein homologs RPGRIP1 and RPGRIP1L are linked to cilium integrity through interaction with Nek4 serine/threonine kinase. Hum Mol Genet, 20(18), 3592-3605. doi:10.1093/hmg/ddr280 Cohen, O. S., Weickert, T. W., Hess, J. L., Paish, L. M., McCoy, S. Y., Rothmond, D. A., . . . Glatt, S. J. (2016). A splicing-regulatory polymorphism in DRD2 disrupts ZRANB2 binding, impairs cognitive functioning and increases risk for schizophrenia in six Han Chinese samples. Mol Psychiatry, 21(7), 975-982. doi:10.1038/mp.2015.137 Craddock, N., & Sklar, P. (2013). Genetics of bipolar disorder. Lancet, 381(9878), 1654- 1662. doi:10.1016/S0140-6736(13)60855-7

120 Crawford, J. B., & Patton, J. G. (2006). Activation of alpha-tropomyosin exon 2 is regulated by the SR protein 9G8 and heterogeneous nuclear ribonucleoproteins H and F. Mol Cell Biol, 26(23), 8791-8802. doi:10.1128/MCB.01677-06 Cross-Disorder Group of the Psychiatric Genomics, C., Lee, S. H., Ripke, S., Neale, B. M., Faraone, S. V., Purcell, S. M., . . . Wray, N. R. (2013). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet, 45(9), 984-994. doi:10.1038/ng.2711 Cross-Disorder Group of the Psychiatric Genomics Consortium, Smoller, J. W., Craddock, N., Kendler, K., Lee, P. H., Neale, B. M., . . . Sullivan, P. F. (2013). Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet, 381(9875), 1371-1379. doi:10.1016/S0140- 6736(12)62129-1 Daido, S., Kanzawa, T., Yamamoto, A., Takeuchi, H., Kondo, Y., & Kondo, S. (2004). Pivotal role of the cell death factor BNIP3 in ceramide-induced autophagic cell death in malignant glioma cells. Cancer Res, 64(12), 4286-4293. doi:10.1158/0008-5472.CAN-03-3084 Dakeishi, M., Murata, K., & Grandjean, P. (2006). Long-term consequences of arsenic poisoning during infancy due to contaminated milk powder. Environ Health, 5, 31. doi:10.1186/1476-069X-5-31 Das, S., Forer, L., Schonherr, S., Sidore, C., Locke, A. E., Kwong, A., . . . Fuchsberger, C. (2016). Next-generation genotype imputation service and methods. Nat Genet, 48(10), 1284-1287. doi:10.1038/ng.3656 Davis, K. N., Tao, R., Li, C., Gao, Y., Gondre-Lewis, M. C., Lipska, B. K., . . . Hyde, T. M. (2016). GAD2 Alternative Transcripts in the Human Prefrontal Cortex, and in Schizophrenia and Affective Disorders. PLoS One, 11(2), e0148558. doi:10.1371/journal.pone.0148558 Delous, M., Baala, L., Salomon, R., Laclef, C., Vierkotten, J., Tory, K., . . . Saunier, S. (2007). The ciliary gene RPGRIP1L is mutated in cerebello-oculo-renal syndrome (Joubert syndrome type B) and Meckel syndrome. Nat Genet, 39(7), 875-881. doi:10.1038/ng2039 Doles, J., & Hemann, M. T. (2010). Nek4 status differentially alters sensitivity to distinct microtubule poisons. Cancer Res, 70(3), 1033-1041. doi:10.1158/0008- 5472.CAN-09-2113 Edvardsen, J., Torgersen, S., Roysamb, E., Lygren, S., Skre, I., Onstad, S., & Oien, P. A. (2008). Heritability of bipolar spectrum disorders. Unity or heterogeneity? J Affect Disord, 106(3), 229-240. doi:10.1016/j.jad.2007.07.001 Elia, J., Gai, X., Xie, H. M., Perin, J. C., Geiger, E., Glessner, J. T., . . . White, P. S. (2010). Rare structural variants found in attention-deficit hyperactivity disorder are preferentially associated with neurodevelopmental genes. Mol Psychiatry, 15(6), 637-646. doi:10.1038/mp.2009.57 Engstrom, K., Vahter, M., Mlakar, S. J., Concha, G., Nermell, B., Raqib, R., . . . Broberg, K. (2011). Polymorphisms in arsenic(+III oxidation state) methyltransferase (AS3MT) predict gene expression of AS3MT as well as arsenic metabolism. Environ Health Perspect, 119(2), 182-188. doi:10.1289/ehp.1002471 Engstrom, K. S., Hossain, M. B., Lauss, M., Ahmed, S., Raqib, R., Vahter, M., & Broberg, K. (2013). Efficient arsenic metabolism--the AS3MT haplotype is

121 associated with DNA methylation and expression of multiple genes around AS3MT. PLoS One, 8(1), e53732. doi:10.1371/journal.pone.0053732 Engstrom, K. S., Vahter, M., Fletcher, T., Leonardi, G., Goessler, W., Gurzau, E., . . . Broberg, K. (2015). Genetic variation in arsenic (+3 oxidation state) methyltransferase (AS3MT), arsenic metabolism and risk of basal cell carcinoma in a European population. Environ Mol Mutagen, 56(1), 60-69. doi:10.1002/em.21896 Ferreira, M. A., O'Donovan, M. C., Meng, Y. A., Jones, I. R., Ruderfer, D. M., Jones, L., . . . Wellcome Trust Case Control, C. (2008). Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder. Nat Genet, 40(9), 1056-1058. doi:10.1038/ng.209 Finn, R. D., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Mistry, J., Mitchell, A. L., . . . Bateman, A. (2016). The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res, 44(D1), D279-285. doi:10.1093/nar/gkv1344 Forstner, A. J., Hecker, J., Hofmann, A., Maaser, A., Reinbold, C. S., Muhleisen, T. W., . . . Nothen, M. M. (2017). Identification of shared risk loci and pathways for bipolar disorder and schizophrenia. PLoS One, 12(2), e0171595. doi:10.1371/journal.pone.0171595 Fry, A. M., O'Regan, L., Sabir, S. R., & Bayliss, R. (2012). Cell cycle regulation by the NEK family of protein kinases. J Cell Sci, 125(Pt 19), 4423-4433. doi:10.1242/jcs.111195 Gerhardt, C., Lier, J. M., Burmuhl, S., Struchtrup, A., Deutschmann, K., Vetter, M., . . . Ruther, U. (2015). The transition zone protein Rpgrip1l regulates proteasomal activity at the primary cilium. J Cell Biol, 210(1), 115-133. doi:10.1083/jcb.201408060 Gibbons, R. D., Hur, K., Brown, C. H., & Mann, J. J. (2009). Relationship between antiepileptic drugs and suicide attempts in patients with bipolar disorder. Arch Gen Psychiatry, 66(12), 1354-1360. doi:10.1001/archgenpsychiatry.2009.159 Goodwin, F. K., Jamison, K. R., & Ghaemi, S. N. (2007). Manic-depressive illness : bipolar disorders and recurrent depression (2nd ed.). New York ; Oxford: Oxford University Press. Gottesman, II, & Shields, J. (1967). A polygenic theory of schizophrenia. Proc Natl Acad Sci U S A, 58(1), 199-205. Goyenvalle, A., Griffith, G., Babbs, A., El Andaloussi, S., Ezzat, K., Avril, A., . . . Garcia, L. (2015). Functional correction in mouse models of muscular dystrophy using exon-skipping tricyclo-DNA oligomers. Nat Med, 21(3), 270-275. doi:10.1038/nm.3765 Grande, I., & Vieta, E. (2015). Pharmacotherapy of acute mania: monotherapy or combination therapy with mood stabilizers and antipsychotics? CNS Drugs, 29(3), 221-227. doi:10.1007/s40263-015-0235-1 Green, D. R., & Kroemer, G. (2004). The pathophysiology of mitochondrial cell death. Science, 305(5684), 626-629. doi:10.1126/science.1099320 GTEx Consortium. (2015). The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science, 348(6235), 648-660. doi:10.1126/science.1262110

122 Guan, F., Li, L., Qiao, C., Chen, G., Yan, T., Li, T., . . . Liu, X. (2015). Evaluation of genetic susceptibility of common variants in CACNA1D with schizophrenia in Han Chinese. Sci Rep, 5, 12935. doi:10.1038/srep12935 Guo, R., Li, Y., Ning, J., Sun, D., Lin, L., & Liu, X. (2013). HnRNP A1/A2 and SF2/ASF regulate alternative splicing of interferon regulatory factor-3 and affect immunomodulatory functions in human non-small cell lung cancer cells. PLoS One, 8(4), e62729. doi:10.1371/journal.pone.0062729 Ha, S., Lindsay, A. M., Timms, A. E., & Beier, D. R. (2016). Mutations in Dnaaf1 and Lrrc48 Cause Hydrocephalus, Laterality Defects, and Sinusitis in Mice. G3 (Bethesda), 6(8), 2479-2487. doi:10.1534/g3.116.030791 Hamacher-Brady, A., Brady, N. R., Logue, S. E., Sayen, M. R., Jinno, M., Kirshenbaum, L. A., . . . Gustafsson, A. B. (2007). Response to myocardial ischemia/reperfusion injury involves Bnip3 and autophagy. Cell Death Differ, 14(1), 146-157. doi:10.1038/sj.cdd.4401936 Hamshere, M. L., Green, E. K., Jones, I. R., Jones, L., Moskvina, V., Kirov, G., . . . Craddock, N. (2009). Genetic utility of broadly defined bipolar schizoaffective disorder as a diagnostic concept. Br J Psychiatry, 195(1), 23-29. doi:10.1192/bjp.bp.108.061424 Havens, M. A., Duelli, D. M., & Hastings, M. L. (2013). Targeting RNA splicing for disease therapy. Wiley Interdiscip Rev RNA, 4(3), 247-266. doi:10.1002/wrna.1158 He, Y., & Smith, R. (2009). Nuclear functions of heterogeneous nuclear ribonucleoproteins A/B. Cell Mol Life Sci, 66(7), 1239-1256. doi:10.1007/s00018-008-8532-1 Hiller, M., Huse, K., Szafranski, K., Jahn, N., Hampe, J., Schreiber, S., . . . Platzer, M. (2004). Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nat Genet, 36(12), 1255-1257. doi:10.1038/ng1469 Hillman, R. T., Green, R. E., & Brenner, S. E. (2004). An unappreciated role for RNA surveillance. Genome Biol, 5(2), R8. doi:10.1186/gb-2004-5-2-r8 Hou, L., Bergen, S. E., Akula, N., Song, J., Hultman, C. M., Landen, M., . . . McMahon, F. J. (2016). Genome-wide association study of 40,000 individuals identifies two novel loci associated with bipolar disorder. Hum Mol Genet, 25(15), 3383-3394. doi:10.1093/hmg/ddw181 Hsieh, R. L., Su, C. T., Shiue, H. S., Chen, W. J., Huang, S. R., Lin, Y. C., . . . Hsueh, Y. M. (2017). Relation of polymorphism of arsenic metabolism genes to arsenic methylation capacity and developmental delay in preschool children in Taiwan. Toxicol Appl Pharmacol, 321, 37-47. doi:10.1016/j.taap.2017.02.016 Hughes, T., Hansson, L., Sonderby, I. E., Athanasiu, L., Zuber, V., Tesli, M., . . . Djurovic, S. (2016). A Loss-of-Function Variant in a Minor Isoform of ANK3 Protects Against Bipolar Disorder and Schizophrenia. Biol Psychiatry, 80(4), 323- 330. doi:10.1016/j.biopsych.2015.09.021 International Schizophrenia, C., Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O'Donovan, M. C., . . . Sklar, P. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460(7256), 748-752. doi:10.1038/nature08185

123 Irimia, M., Weatheritt, R. J., Ellis, J. D., Parikshak, N. N., Gonatopoulos-Pournatzis, T., Babor, M., . . . Blencowe, B. J. (2014). A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell, 159(7), 1511-1523. doi:10.1016/j.cell.2014.11.035 Jarmin, S., Kymalainen, H., Popplewell, L., & Dickson, G. (2014). New developments in the use of gene therapy to treat Duchenne muscular dystrophy. Expert Opin Biol Ther, 14(2), 209-230. doi:10.1517/14712598.2014.866087 Kao, W. T., Wang, Y., Kleinman, J. E., Lipska, B. K., Hyde, T. M., Weinberger, D. R., & Law, A. J. (2010). Common genetic variation in Neuregulin 3 (NRG3) influences risk for schizophrenia and impacts NRG3 expression in human brain. Proc Natl Acad Sci U S A, 107(35), 15619-15624. doi:10.1073/pnas.1005410107 Kaufer, D., Friedman, A., Seidman, S., & Soreq, H. (1998). Acute stress facilitates long- lasting changes in cholinergic gene expression. Nature, 393(6683), 373-377. doi:10.1038/30741 Khor, B. Y., Tye, G. J., Lim, T. S., & Choong, Y. S. (2015). General overview on structure prediction of twilight-zone proteins. Theor Biol Med Model, 12, 15. doi:10.1186/s12976-015-0014-1 Kim, M., Seo, S., Sung, K., & Kim, K. (2014). Arsenic exposure in drinking water alters the dopamine system in the brains of C57BL/6 mice. Biol Trace Elem Res, 162(1- 3), 175-180. doi:10.1007/s12011-014-0145-y Knapp, M., Mangalore, R., & Simon, J. (2004). The global costs of schizophrenia. Schizophr Bull, 30(2), 279-293. Konopaske, G. T., Lange, N., Coyle, J. T., & Benes, F. M. (2014). Prefrontal cortical dendritic spine pathology in schizophrenia and bipolar disorder. JAMA Psychiatry, 71(12), 1323-1331. doi:10.1001/jamapsychiatry.2014.1582 Krawczak, M., Reiss, J., & Cooper, D. N. (1992). The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum Genet, 90(1-2), 41-54. Kunisada, K., Tone, E., Negoro, S., Nakaoka, Y., Oshima, Y., Osugi, T., . . . Yamauchi- Takihara, K. (2002). Bcl-xl reduces doxorubicin-induced myocardial damage but fails to control cardiac gene downregulation. Cardiovasc Res, 53(4), 936-943. Lal, D., Reinthaler, E. M., Altmuller, J., Toliat, M. R., Thiele, H., Nurnberg, P., . . . Neubauer, B. A. (2013). RBFOX1 and RBFOX3 mutations in rolandic epilepsy. PLoS One, 8(9), e73323. doi:10.1371/journal.pone.0073323 Lareau, L. F., Brooks, A. N., Soergel, D. A., Meng, Q., & Brenner, S. E. (2007). The coupling of alternative splicing and nonsense-mediated mRNA decay. Adv Exp Med Biol, 623, 190-211. Law, A. J., Kleinman, J. E., Weinberger, D. R., & Weickert, C. S. (2007). Disease- associated intronic variants in the ErbB4 gene are related to altered ErbB4 splice- variant expression in the brain in schizophrenia. Hum Mol Genet, 16(2), 129-141. doi:10.1093/hmg/ddl449 Law, A. J., Wang, Y., Sei, Y., O'Donnell, P., Piantadosi, P., Papaleo, F., . . . Weinberger, D. R. (2012). Neuregulin 1-ErbB4-PI3K signaling in schizophrenia and phosphoinositide 3-kinase-p110delta inhibition as a potential therapeutic strategy. Proc Natl Acad Sci U S A, 109(30), 12165-12170. doi:10.1073/pnas.1206118109

124 Lee, S. H., DeCandia, T. R., Ripke, S., Yang, J., Schizophrenia Psychiatric Genome- Wide Association Study, C., International Schizophrenia, C., . . . Wray, N. R. (2012). Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet, 44(3), 247-250. doi:10.1038/ng.1108 Lee, V. M., Goedert, M., & Trojanowski, J. Q. (2001). Neurodegenerative tauopathies. Annu Rev Neurosci, 24, 1121-1159. doi:10.1146/annurev.neuro.24.1.1121 Lee, Y. H., Kim, J. H., & Song, G. G. (2013). Pathway analysis of a genome-wide association study in schizophrenia. Gene, 525(1), 107-115. doi:10.1016/j.gene.2013.04.014 Leroy, O., Dhaenens, C. M., Schraen-Maschke, S., Belarbi, K., Delacourte, A., Andreadis, A., . . . Caillet-Boudin, M. L. (2006). ETR-3 represses Tau exons 2/3 inclusion, a splicing event abnormally enhanced in myotonic dystrophy type I. J Neurosci Res, 84(4), 852-859. doi:10.1002/jnr.20980 Lewis, B. P., Green, R. E., & Brenner, S. E. (2003). Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc Natl Acad Sci U S A, 100(1), 189-192. doi:10.1073/pnas.0136770100 Li, M., Jaffe, A. E., Straub, R. E., Tao, R., Shin, J. H., Wang, Y., . . . Weinberger, D. R. (2016). A human-specific AS3MT isoform and BORCS7 are molecular risk factors in the 10q24.32 schizophrenia-associated locus. Nat Med, 22(6), 649-656. doi:10.1038/nm.4096 Lim, K. H., Ferraris, L., Filloux, M. E., Raphael, B. J., & Fairbrother, W. G. (2011). Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes. Proc Natl Acad Sci U S A, 108(27), 11093- 11098. doi:10.1073/pnas.1101135108 Long, J. C., & Caceres, J. F. (2009). The SR protein family of splicing factors: master regulators of gene expression. Biochem J, 417(1), 15-27. doi:10.1042/BJ20081501 Lopez-Bigas, N., Audit, B., Ouzounis, C., Parra, G., & Guigo, R. (2005). Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett, 579(9), 1900-1903. doi:10.1016/j.febslet.2005.02.047 Lu, Y., Wang, L., He, M., Huang, W., Li, H., Wang, Y., . . . Qiu, X. (2012). Nix protein positively regulates NF-kappaB activation in gliomas. PLoS One, 7(9), e44559. doi:10.1371/journal.pone.0044559 Marley, A., & von Zastrow, M. (2010). DISC1 regulates primary cilia that display specific dopamine receptors. PLoS One, 5(5), e10902. doi:10.1371/journal.pone.0010902 Marley, A., & von Zastrow, M. (2012). A simple cell-based assay reveals that diverse neuropsychiatric risk genes converge on primary cilia. PLoS One, 7(10), e46647. doi:10.1371/journal.pone.0046647 Martin, C. L., Duvall, J. A., Ilkin, Y., Simon, J. S., Arreaza, M. G., Wilkes, K., . . . Geschwind, D. H. (2007). Cytogenetic and molecular characterization of A2BP1/FOX1 as a candidate gene for autism. Am J Med Genet B Neuropsychiatr Genet, 144B(7), 869-876. doi:10.1002/ajmg.b.30530 Martinez-Aran, A., Vieta, E., Torrent, C., Sanchez-Moreno, J., Goikolea, J. M., Salamero, M., . . . Ayuso-Mateos, J. L. (2007). Functional outcome in bipolar

125 disorder: the role of clinical and cognitive factors. Bipolar Disord, 9(1-2), 103- 113. doi:10.1111/j.1399-5618.2007.00327.x Maunakea, A. K., Nagarajan, R. P., Bilenky, M., Ballinger, T. J., D'Souza, C., Fouse, S. D., . . . Costello, J. F. (2010). Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature, 466(7303), 253-257. doi:10.1038/nature09165 Maurano, M. T., Humbert, R., Rynes, E., Thurman, R. E., Haugen, E., Wang, H., . . . Stamatoyannopoulos, J. A. (2012). Systematic localization of common disease- associated variation in regulatory DNA. Science, 337(6099), 1190-1195. McClellan, J., & King, M. C. (2010). Genomic analysis of mental illness: a changing landscape. JAMA, 303(24), 2523-2524. doi:10.1001/jama.2010.869 McClure, R. K., Styner, M., Maltbie, E., Lieberman, J. A., Gouttard, S., Gerig, G., . . . Zhu, H. (2013). Localized differences in caudate and hippocampal shape are associated with schizophrenia but not antipsychotic type. Psychiatry Res, 211(1), 1-10. doi:10.1016/j.pscychresns.2012.07.001 McKee, A. E., Minet, E., Stern, C., Riahi, S., Stiles, C. D., & Silver, P. A. (2005). A genome-wide in situ hybridization map of RNA-binding proteins reveals anatomically restricted expression in the developing mouse brain. BMC Dev Biol, 5, 14. doi:10.1186/1471-213X-5-14 Meier, S., Strohmaier, J., Breuer, R., Mattheisen, M., Degenhardt, F., Muhleisen, T. W., . . . Wust, S. (2013). Neuregulin 3 is associated with attention deficits in schizophrenia and bipolar disorder. Int J Neuropsychopharmacol, 16(3), 549-556. doi:10.1017/S1461145712000697 Melchionda, L., Haack, T. B., Hardy, S., Abbink, T. E., Fernandez-Vizarra, E., Lamantea, E., . . . Zeviani, M. (2014). Mutations in APOPT1, encoding a mitochondrial protein, cause cavitating leukoencephalopathy with cytochrome c oxidase deficiency. Am J Hum Genet, 95(3), 315-325. doi:10.1016/j.ajhg.2014.08.003 Merikangas, K. R., Jin, R., He, J. P., Kessler, R. C., Lee, S., Sampson, N. A., . . . Zarkov, Z. (2011). Prevalence and correlates of bipolar spectrum disorder in the world mental health survey initiative. Arch Gen Psychiatry, 68(3), 241-251. doi:10.1001/archgenpsychiatry.2011.12 Missler, M., & Sudhof, T. C. (1998). Neurexins: three genes and 1001 products. Trends Genet, 14(1), 20-26. doi:10.1016/S0168-9525(97)01324-3 Miura, T., Noma, H., Furukawa, T. A., Mitsuyasu, H., Tanaka, S., Stockton, S., . . . Kanba, S. (2014). Comparative efficacy and tolerability of pharmacological treatments in the maintenance treatment of bipolar disorder: a systematic review and network meta-analysis. Lancet Psychiatry, 1(5), 351-359. doi:10.1016/S2215- 0366(14)70314-1 Miyoshi, K., Kasahara, K., Miyazaki, I., & Asanuma, M. (2009). Lithium treatment elongates primary cilia in the mouse brain and in cultured cells. Biochem Biophys Res Commun, 388(4), 757-762. doi:10.1016/j.bbrc.2009.08.099 Mokrejs, M., Vopalensky, V., Kolenaty, O., Masek, T., Feketova, Z., Sekyrova, P., . . . Pospisek, M. (2006). IRESite: the database of experimentally verified IRES structures (http://www.iresite.org). Nucleic Acids Res, 34(Database issue), D125- 130. doi:10.1093/nar/gkj081

126 Monlong, J., Calvo, M., Ferreira, P. G., & Guigo, R. (2014). Identification of genetic variants associated with alternative splicing using sQTLseekeR. Nat Commun, 5, 4698. doi:10.1038/ncomms5698 Moreno Avila, C. L., Limon-Pacheco, J. H., Giordano, M., & Rodriguez, V. M. (2016). Chronic Exposure to Arsenic in Drinking Water Causes Alterations in Locomotor Activity and Decreases Striatal mRNA for the D2 Dopamine Receptor in CD1 Male Mice. J Toxicol, 2016, 4763434. doi:10.1155/2016/4763434 Muhia, M., Thies, E., Labonte, D., Ghiretti, A. E., Gromova, K. V., Xompero, F., . . . Kneussel, M. (2016). The Kinesin KIF21B Regulates Microtubule Dynamics and Is Essential for Neuronal Morphology, Synapse Function, and Learning and Memory. Cell Rep, 15(5), 968-977. doi:10.1016/j.celrep.2016.03.086 Muhleisen, T. W., Leber, M., Schulze, T. G., Strohmaier, J., Degenhardt, F., Treutlein, J., . . . Cichon, S. (2014). Genome-wide association study reveals two new risk loci for bipolar disorder. Nat Commun, 5, 3339. doi:10.1038/ncomms4339 Neu-Yilik, G., Gehring, N. H., Hentze, M. W., & Kulozik, A. E. (2004). Nonsense- mediated mRNA decay: from vacuum cleaner to Swiss army knife. Genome Biol, 5(4), 218. doi:10.1186/gb-2004-5-4-218 Nie, F., Wang, X., Zhao, P., Yang, H., Zhu, W., Zhao, Y., . . . Ma, J. (2015). Genetic analysis of SNPs in CACNA1C and ANK3 gene with schizophrenia: A comprehensive meta-analysis. Am J Med Genet B Neuropsychiatr Genet, 168(8), 637-648. doi:10.1002/ajmg.b.32348 Oertel, V., Knochel, C., Rotarska-Jagiela, A., Schonmeyer, R., Lindner, M., van de Ven, V., . . . Linden, D. E. (2010). Reduced laterality as a trait marker of schizophrenia--evidence from structural and functional neuroimaging. J Neurosci, 30(6), 2289-2299. doi:10.1523/JNEUROSCI.4575-09.2010 Olabi, B., Ellison-Wright, I., McIntosh, A. M., Wood, S. J., Bullmore, E., & Lawrie, S. M. (2011). Are there progressive brain changes in schizophrenia? A meta-analysis of structural magnetic resonance imaging studies. Biol Psychiatry, 70(1), 88-96. doi:10.1016/j.biopsych.2011.01.032 Oldmeadow, C., Mossman, D., Evans, T. J., Holliday, E. G., Tooney, P. A., Cairns, M. J., . . . Scott, R. J. (2014). Combined analysis of exon splicing and genome wide polymorphism data predict schizophrenia risk loci. J Psychiatr Res, 52, 44-49. doi:10.1016/j.jpsychires.2014.01.011 Ongen, H., & Dermitzakis, E. T. (2015). Alternative Splicing QTLs in European and African Populations. Am J Hum Genet, 97(4), 567-575. doi:10.1016/j.ajhg.2015.09.004 Pan, Q., Shai, O., Lee, L. J., Frey, B. J., & Blencowe, B. J. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet, 40(12), 1413-1415. doi:10.1038/ng.259 Parikshak, N. N., Swarup, V., Belgard, T. G., Irimia, M., Ramaswami, G., Gandal, M. J., . . . Geschwind, D. H. (2016). Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature, 540(7633), 423-427. doi:10.1038/nature20612 Paterson, C., Wang, Y., Hyde, T. M., Weinberger, D. R., Kleinman, J. E., & Law, A. J. (2017). Temporal, Diagnostic, and Tissue-Specific Regulation of NRG3 Isoform

127 Expression in Human Brain Development and Affective Disorders. Am J Psychiatry, 174(3), 256-265. doi:10.1176/appi.ajp.2016.16060721 Paterson, C., Wang, Y., Kleinman, J. E., & Law, A. J. (2014). Effects of schizophrenia risk variation in the NRG1 gene on NRG1-IV splicing during fetal and early postnatal human neocortical development. Am J Psychiatry, 171(9), 979-989. doi:10.1176/appi.ajp.2014.13111518 Patrignani, P., Capone, M. L., & Tacconelli, S. (2003). Clinical pharmacology of etoricoxib: a novel selective COX2 inhibitor. Expert Opin Pharmacother, 4(2), 265-284. doi:10.1517/14656566.4.2.265 Paz, I., Kosti, I., Ares, M., Jr., Cline, M., & Mandel-Gutfreund, Y. (2014). RBPmap: a web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res, 42(Web Server issue), W361-367. doi:10.1093/nar/gku406 Pedrotti, S., Bielli, P., Paronetto, M. P., Ciccosanti, F., Fimia, G. M., Stamm, S., . . . Sette, C. (2010). The splicing regulator Sam68 binds to a novel exonic splicing silencer and functions in SMN2 alternative splicing in spinal muscular atrophy. EMBO J, 29(7), 1235-1247. doi:10.1038/emboj.2010.19 Petralia, R. S., Wang, Y. X., Indig, F. E., Bushlin, I., Wu, F., Mattson, M. P., & Yao, P. J. (2013). Reduction of AP180 and CALM produces defects in synaptic vesicle size and density. Neuromolecular Med, 15(1), 49-60. doi:10.1007/s12017-012-8194-x Pinggera, A., Lieb, A., Benedetti, B., Lampert, M., Monteleone, S., Liedl, K. R., . . . Striessnig, J. (2015). CACNA1D de novo mutations in autism spectrum disorders activate Cav1.3 L-type calcium channels. Biol Psychiatry, 77(9), 816-822. doi:10.1016/j.biopsych.2014.11.020 Piva, F., Giulietti, M., Burini, A. B., & Principato, G. (2012). SpliceAid 2: a database of human splicing factors expression data and RNA target motifs. Hum Mutat, 33(1), 81-85. doi:10.1002/humu.21609 Pompili, M., Amador, X. F., Girardi, P., Harkavy-Friedman, J., Harrow, M., Kaplan, K., . . . Tatarelli, R. (2007). Suicide risk in schizophrenia: learning from the past to change the future. Ann Gen Psychiatry, 6, 10. doi:10.1186/1744-859X-6-10 Pompili, M., Gonda, X., Serafini, G., Innamorati, M., Sher, L., Amore, M., . . . Girardi, P. (2013). Epidemiology of suicide in bipolar disorders: a systematic review of the literature. Bipolar Disord, 15(5), 457-490. doi:10.1111/bdi.12087 Prabakaran, S., Swatton, J. E., Ryan, M. M., Huffaker, S. J., Huang, J. T., Griffin, J. L., . . . Bahn, S. (2004). Mitochondrial dysfunction in schizophrenia: evidence for compromised brain metabolism and oxidative stress. Mol Psychiatry, 9(7), 684- 697, 643. doi:10.1038/sj.mp.4001511 Quesnel-Vallieres, M., Dargaei, Z., Irimia, M., Gonatopoulos-Pournatzis, T., Ip, J. Y., Wu, M., . . . Cordes, S. P. (2016). Misregulation of an Activity-Dependent Splicing Network as a Common Mechanism Underlying Autism Spectrum Disorders. Mol Cell, 64(6), 1023-1034. doi:10.1016/j.molcel.2016.11.033 Quesnel-Vallieres, M., Irimia, M., Cordes, S. P., & Blencowe, B. J. (2015). Essential roles for the splicing regulator nSR100/SRRM4 during nervous system development. Genes Dev, 29(7), 746-759. doi:10.1101/gad.256115.114 Raj, B., O'Hanlon, D., Vessey, J. P., Pan, Q., Ray, D., Buckley, N. J., . . . Blencowe, B. J. (2011). Cross-regulation between an alternative splicing activator and a

128 transcription repressor controls neurogenesis. Mol Cell, 43(5), 843-850. doi:10.1016/j.molcel.2011.08.014 Rappsilber, J., Ryder, U., Lamond, A. I., & Mann, M. (2002). Large-scale proteomic analysis of the human spliceosome. Genome Res, 12(8), 1231-1245. doi:10.1101/gr.473902 Regier, D. A., Narrow, W. E., Rae, D. S., Manderscheid, R. W., Locke, B. Z., & Goodwin, F. K. (1993). The de facto US mental and addictive disorders service system. Epidemiologic catchment area prospective 1-year prevalence rates of disorders and services. Arch Gen Psychiatry, 50(2), 85-94. Revil, T., Toutant, J., Shkreta, L., Garneau, D., Cloutier, P., & Chabot, B. (2007). Protein kinase C-dependent control of Bcl-x alternative splicing. Mol Cell Biol, 27(24), 8431-8441. doi:10.1128/MCB.00565-07 Ribolsi, M., Daskalakis, Z. J., Siracusano, A., & Koch, G. (2014). Abnormal asymmetry of brain connectivity in schizophrenia. Front Hum Neurosci, 8, 1010. doi:10.3389/fnhum.2014.01010 Ripke, S., O'Dushlaine, C., Chambert, K., Moran, J. L., Kahler, A. K., Akterin, S., . . . Sullivan, P. F. (2013). Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet, 45(10), 1150-1159. doi:10.1038/ng.2742 Rivas, M. A., Pirinen, M., Conrad, D. F., Lek, M., Tsang, E. K., Karczewski, K. J., . . . MacArthur, D. G. (2015). Effect of predicted protein-truncating genetic variants on the human transcriptome. Science, 348(6235), 666-669. doi:10.1126/science.1261877 Robicsek, O., Karry, R., Petit, I., Salman-Kesner, N., Muller, F. J., Klein, E., . . . Ben- Shachar, D. (2013). Abnormal neuronal differentiation and mitochondrial dysfunction in hair follicle-derived induced pluripotent stem cells of schizophrenia patients. Mol Psychiatry, 18(10), 1067-1076. doi:10.1038/mp.2013.67 Rodriguez, V. M., Limon-Pacheco, J. H., Carrizales, L., Mendoza-Trejo, M. S., & Giordano, M. (2010). Chronic exposure to low levels of inorganic arsenic causes alterations in locomotor activity and in the expression of dopaminergic and antioxidant systems in the albino rat. Neurotoxicol Teratol, 32(6), 640-647. doi:10.1016/j.ntt.2010.07.005 Rodriguez-Martin, T., Anthony, K., Garcia-Blanco, M. A., Mansfield, S. G., Anderton, B. H., & Gallo, J. M. (2009). Correction of tau mis-splicing caused by FTDP-17 MAPT mutations by spliceosome-mediated RNA trans-splicing. Hum Mol Genet, 18(17), 3266-3273. doi:10.1093/hmg/ddp264 Rosado, J. L., Ronquillo, D., Kordas, K., Rojas, O., Alatorre, J., Lopez, P., . . . Stoltzfus, R. J. (2007). Arsenic exposure and cognitive performance in Mexican schoolchildren. Environ Health Perspect, 115(9), 1371-1375. doi:10.1289/ehp.9961 Ross, J., Gedvilaite, E., Badner, J. A., Erdman, C., Baird, L., Matsunami, N., . . . Byerley, W. (2016). A Rare Variant in CACNA1D Segregates with 7 Bipolar I Disorder Cases in a Large Pedigree. Mol Neuropsychiatry, 2(3), 145-150. doi:10.1159/000448041

129 Saha, S., Chant, D., Welham, J., & McGrath, J. (2005). A systematic review of the prevalence of schizophrenia. PLoS Med, 2(5), e141. doi:10.1371/journal.pmed.0020141 Sandoval, H., Thiagarajan, P., Dasgupta, S. K., Schumacher, A., Prchal, J. T., Chen, M., & Wang, J. (2008). Essential role for Nix in autophagic maturation of erythroid cells. Nature, 454(7201), 232-235. doi:10.1038/nature07006 Sawamoto, K., Wichterle, H., Gonzalez-Perez, O., Cholfin, J. A., Yamada, M., Spassky, N., . . . Alvarez-Buylla, A. (2006). New neurons follow the flow of cerebrospinal fluid in the adult brain. Science, 311(5761), 629-632. doi:10.1126/science.1119133 Schaffer, A., Isometsa, E. T., Tondo, L., D, H. M., Turecki, G., Reis, C., . . . Yatham, L. N. (2015). International Society for Bipolar Disorders Task Force on Suicide: meta-analyses and meta-regression of correlates of suicide attempts and suicide deaths in bipolar disorder. Bipolar Disord, 17(1), 1-16. doi:10.1111/bdi.12271 Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. (2011). Genome-wide association study identifies five new schizophrenia loci. Nat Genet, 43(10), 969-976. Schizophrenia Working Group of the Psychiatric Genomics Consortium. (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510), 421-427. doi:10.1038/nature13595 Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., . . . Wigler, M. (2007). Strong association of de novo copy number mutations with autism. Science, 316(5823), 445-449. doi:10.1126/science.1138659 Shamah, S. M., Lin, M. Z., Goldberg, J. L., Estrach, S., Sahin, M., Hu, L., . . . Greenberg, M. E. (2001). EphA receptors regulate growth cone dynamics through the novel guanine nucleotide exchange factor ephexin. Cell, 105(2), 233-244. Shen, M., & Mattox, W. (2012). Activation and repression functions of an SR splicing regulator depend on exonic versus intronic-binding position. Nucleic Acids Res, 40(1), 428-437. doi:10.1093/nar/gkr713 Shopik, M. J., Li, L., Luu, H. A., Obeidat, M., Holmes, C. F., & Ballermann, B. J. (2013). Multi-directional function of the protein phosphatase 1 regulatory subunit TIMAP. Biochem Biophys Res Commun, 435(4), 567-573. doi:10.1016/j.bbrc.2013.05.012 Spellman, R., Rideau, A., Matlin, A., Gooding, C., Robinson, F., McGlincy, N., . . . Smith, C. W. (2005). Regulation of alternative splicing by PTB and associated factors. Biochem Soc Trans, 33(Pt 3), 457-460. doi:10.1042/BST0330457 Stamova, B. S., Tian, Y., Nordahl, C. W., Shen, M. D., Rogers, S., Amaral, D. G., & Sharp, F. R. (2013). Evidence for differential alternative splicing in blood of young boys with autism spectrum disorders. Mol Autism, 4(1), 30. doi:10.1186/2040-2392-4-30 Sterne-Weiler, T., Howard, J., Mort, M., Cooper, D. N., & Sanford, J. R. (2011). Loss of exon identity is a common mechanism of human inherited disease. Genome Res, 21(10), 1563-1571. doi:10.1101/gr.118638.110 Stratigopoulos, G., LeDuc, C. A., Cremona, M. L., Chung, W. K., & Leibel, R. L. (2011). Cut-like homeobox 1 (CUX1) regulates expression of the fat mass and obesity- associated and retinitis pigmentosa GTPase regulator-interacting protein-1-like

130 (RPGRIP1L) genes and coordinates leptin receptor signaling. J Biol Chem, 286(3), 2155-2170. doi:10.1074/jbc.M110.188482 Stratigopoulos, G., Padilla, S. L., LeDuc, C. A., Watson, E., Hattersley, A. T., McCarthy, M. I., . . . Leibel, R. L. (2008). Regulation of Fto/Ftm gene expression in mice and humans. Am J Physiol Regul Integr Comp Physiol, 294(4), R1185-1196. doi:10.1152/ajpregu.00839.2007 Sullivan, P. F., Daly, M. J., & O'Donovan, M. (2012). Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat Rev Genet, 13(8), 537-551. doi:10.1038/nrg3240 Sullivan, P. F., Kendler, K. S., & Neale, M. C. (2003). Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Arch Gen Psychiatry, 60(12), 1187-1192. doi:10.1001/archpsyc.60.12.1187 Sun, X., Yasuda, O., Takemura, Y., Kawamoto, H., Higuchi, M., Baba, Y., . . . Rakugi, H. (2008). Akt activation prevents Apop-1-induced death of cells. Biochem Biophys Res Commun, 377(4), 1097-1101. doi:10.1016/j.bbrc.2008.10.109 Szyniarowski, P., Corcelle-Termeau, E., Farkas, T., Hoyer-Hansen, M., Nylandsted, J., Kallunki, T., & Jaattela, M. (2011). A comprehensive siRNA screen for kinases that suppress macroautophagy in optimal growth conditions. Autophagy, 7(8), 892-903. Takata, A., Matsumoto, N., & Kato, T. (2017). Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci. Nat Commun, 8, 14519. doi:10.1038/ncomms14519 Tansey, K. E., Rees, E., Linden, D. E., Ripke, S., Chambert, K. D., Moran, J. L., . . . O'Donovan, M. C. (2016). Common alleles contribute to schizophrenia in CNV carriers. Mol Psychiatry, 21(8), 1085-1089. doi:10.1038/mp.2015.143 Teraoka, S. N., Telatar, M., Becker-Catania, S., Liang, T., Onengut, S., Tolun, A., . . . Concannon, P. (1999). Splicing defects in the ataxia-telangiectasia gene, ATM: underlying mutations and consequences. Am J Hum Genet, 64(6), 1617-1631. doi:10.1086/302418 Thorleifsson, G., Walters, G. B., Gudbjartsson, D. F., Steinthorsdottir, V., Sulem, P., Helgadottir, A., . . . Stefansson, K. (2009). Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet, 41(1), 18-24. doi:10.1038/ng.274 Tran, N. N., Ladran, I. G., & Brennand, K. J. (2013). Modeling schizophrenia using induced pluripotent stem cell-derived and fibroblast-induced neurons. Schizophr Bull, 39(1), 4-10. doi:10.1093/schbul/sbs127 Uezato, A., Yamamoto, N., Iwayama, Y., Hiraoka, S., Hiraaki, E., Umino, A., . . . Nishikawa, T. (2015). Reduced cortical expression of a newly identified splicing variant of the DLG1 gene in patients with early-onset schizophrenia. Transl Psychiatry, 5, e654. doi:10.1038/tp.2015.154 Ule, J., Stefani, G., Mele, A., Ruggiu, M., Wang, X., Taneri, B., . . . Darnell, R. B. (2006). An RNA map predicting Nova-dependent splicing regulation. Nature, 444(7119), 580-586. doi:10.1038/nature05304 Ule, J., Ule, A., Spencer, J., Williams, A., Hu, J. S., Cline, M., . . . Darnell, R. B. (2005). Nova regulates brain-specific splicing to shape the synapse. Nat Genet, 37(8), 844-852. doi:10.1038/ng1610

131 Uranova, N., Orlovskaya, D., Vikhreva, O., Zimina, I., Kolomeets, N., Vostrikov, V., & Rachmanova, V. (2001). Electron microscopy of oligodendroglia in severe mental illness. Brain Res Bull, 55(5), 597-610. Vierkotten, J., Dildrop, R., Peters, T., Wang, B., & Ruther, U. (2007). Ftm is a novel basal body protein of cilia involved in Shh signalling. Development, 134(14), 2569-2577. doi:10.1242/dev.003715 Vieta, E., Locklear, J., Gunther, O., Ekman, M., Miltenburger, C., Chatterton, M. L., . . . Paulsson, B. (2010). Treatment options for bipolar depression: a systematic review of randomized, controlled trials. J Clin Psychopharmacol, 30(5), 579-590. doi:10.1097/JCP.0b013e3181f15849 Voineagu, I., Wang, X., Johnston, P., Lowe, J. K., Tian, Y., Horvath, S., . . . Geschwind, D. H. (2011). Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature, 474(7351), 380-384. doi:10.1038/nature10110 Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., . . . Burge, C. B. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature, 456(7221), 470-476. doi:10.1038/nature07509 Wang, Y., Liu, J., Huang, B. O., Xu, Y. M., Li, J., Huang, L. F., . . . Wang, X. Z. (2015). Mechanism of alternative splicing and its regulation. Biomed Rep, 3(2), 152-158. doi:10.3892/br.2014.407 Wang, Z., & Burge, C. B. (2008). Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA, 14(5), 802-813. doi:10.1261/rna.876308 Warf, M. B., Diegel, J. V., von Hippel, P. H., & Berglund, J. A. (2009). The protein factors MBNL1 and U2AF65 bind alternative RNA structures to regulate splicing. Proc Natl Acad Sci U S A, 106(23), 9203-9208. doi:10.1073/pnas.0900342106 Wasserman, G. A., Liu, X., Loiacono, N. J., Kline, J., Factor-Litvak, P., van Geen, A., . . . Graziano, J. H. (2014). A cross-sectional study of well water arsenic and child IQ in Maine schoolchildren. Environ Health, 13(1), 23. doi:10.1186/1476-069X- 13-23 Weyn-Vanhentenryck, S. M., Mele, A., Yan, Q., Sun, S., Farny, N., Zhang, Z., . . . Zhang, C. (2014). HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism. Cell Rep, 6(6), 1139-1152. doi:10.1016/j.celrep.2014.02.005 Wright, J. (2014). Genetics: Unravelling complexity. Nature, 508(7494), S6-7. doi:10.1038/508S6a Wu, J. Q., Wang, X., Beveridge, N. J., Tooney, P. A., Scott, R. J., Carr, V. J., & Cairns, M. J. (2012). Transcriptome sequencing revealed significant alteration of cortical promoter usage and splicing in schizophrenia. PLoS One, 7(4), e36351. doi:10.1371/journal.pone.0036351 Xiong, H. Y., Alipanahi, B., Lee, L. J., Bretschneider, H., Merico, D., Yuen, R. K., . . . Frey, B. J. (2015). RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science, 347(6218), 1254806. doi:10.1126/science.1254806 Xu, B., Roos, J. L., Levy, S., van Rensburg, E. J., Gogos, J. A., & Karayiorgou, M. (2008). Strong association of de novo copy number mutations with sporadic schizophrenia. Nat Genet, 40(7), 880-885. doi:10.1038/ng.162

132 Xu, Q., Canutescu, A. A., Wang, G., Shapovalov, M., Obradovic, Z., & Dunbrack, R. L., Jr. (2008). Statistical analysis of interface similarity in crystals of homologous proteins. J Mol Biol, 381(2), 487-507. doi:10.1016/j.jmb.2008.06.002 Xue, Y., Ouyang, K., Huang, J., Zhou, Y., Ouyang, H., Li, H., . . . Fu, X. D. (2013). Direct conversion of fibroblasts to neurons by reprogramming PTB-regulated microRNA circuits. Cell, 152(1-2), 82-96. doi:10.1016/j.cell.2012.11.045 Yachdav, G., Kloppmann, E., Kajan, L., Hecht, M., Goldberg, T., Hamp, T., . . . Rost, B. (2014). PredictProtein--an open resource for online prediction of protein structural and functional features. Nucleic Acids Res, 42(Web Server issue), W337-343. doi:10.1093/nar/gku366 Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., & Zhang, Y. (2015). The I-TASSER Suite: protein structure and function prediction. Nat Methods, 12(1), 7-8. doi:10.1038/nmeth.3213 Yano, M., Hayakawa-Yano, Y., Mele, A., & Darnell, R. B. (2010). Nova2 regulates neuronal migration through an RNA switch in disabled-1 signaling. Neuron, 66(6), 848-858. doi:10.1016/j.neuron.2010.05.007 Yasuda, O., Fukuo, K., Sun, X., Nishitani, M., Yotsui, T., Higuchi, M., . . . Ogihara, T. (2006). Apop-1, a novel protein inducing cyclophilin D-dependent but Bax/Bak- related channel-independent apoptosis. J Biol Chem, 281(33), 23899-23907. doi:10.1074/jbc.M512610200 Yeo, G., Holste, D., Kreiman, G., & Burge, C. B. (2004). Variation in alternative splicing across human tissues. Genome Biol, 5(10), R74. doi:10.1186/gb-2004-5-10-r74 Yussman, M. G., Toyokawa, T., Odley, A., Lynch, R. A., Wu, G., Colbert, M. C., . . . Dorn, G. W., 2nd. (2002). Mitochondrial death protein Nix is induced in cardiac hypertrophy and triggers apoptotic cardiomyopathy. Nat Med, 8(7), 725-730. doi:10.1038/nm719 Zhang, J., & Ney, P. A. (2009). Role of BNIP3 and NIX in cell death, autophagy, and mitophagy. Cell Death Differ, 16(7), 939-946. doi:10.1038/cdd.2009.16 Zhang, Y. (2009). I-TASSER: fully automated protein structure prediction in CASP8. Proteins, 77 Suppl 9, 100-113. doi:10.1002/prot.22588 Zhou, Z., Licklider, L. J., Gygi, S. P., & Reed, R. (2002). Comprehensive proteomic analysis of the human spliceosome. Nature, 419(6903), 182-185. doi:10.1038/nature01031 Zhu, K., Chen, X., Liu, J., Ye, H., Zhu, L., & Wu, J. Y. (2013). AMPK interacts with DSCAM and plays an important role in netrin-1 induced neurite outgrowth. Protein Cell, 4(2), 155-161. doi:10.1007/s13238-012-2126-2

133