DISC1 in Psychiatric Illness

Rebecca Debono

Department of Pathology

University of Otago

A thesis submitted for the degree of

Doctor of Philosophy

2014 ii Abstract

Major psychiatric illnesses including and can be highly debilitating and collectively have a lifetime incidence of approximately 1% in the general population. Twin and adoption studies have shown that there is a definite hereditary element to these disorders, estimated to be as high as 80% in some cases. Although there has been extensive research into these disorders from both pathological and genetic viewpoints, they are still poorly understood, and treatment of these disorders remains challenging.

The Disrupted-in-Schizophrenia 1 (DISC1) on human 1 was identified as a consequence of its involvement in a balanced translocation (1;11) (q42.1;q14.3) segregating with major psychiatric illnesses in a single Scottish fam- ily. Since its discovery, the DISC1 has been the subject of numerous genetic and biological studies. However, due to original evidence showing that the chro- mosome 11 region disrupted by this translocation was largely absent of , the 11q14.3 region has not gained as much attention. Only recently has it been suggested that this region may also be of interest with a possible ability to create fusion with DISC1.

The DISC1 and 11q14.3 regions were examined by analysis of genome-wide scan data made available by the Genetic Association Information Network (GAIN) and other investigators (non-GAIN) through dbGap for association with bipo- lar disorder, schizophrenia and a combination of the two disorders. No single- nucleotide polymorphisms (SNPs) within the DISC1 region were found to be sig- nificant (p-value<0.05) after correction for multiple testing in any of the datasets tested. There was evidence, however, supporting an association with schizo- phrenia for a group of SNPs in the 11q14.3 region in males with corrected p-values reaching 0.024.

The molecular function of DISC1 was also assessed via a yeast two-hybrid screen using DISC1 as a bait. This analysis revealed 386 potential interacting partners of DISC1 of which only 13 had been reported previously. Gene set enrichment

iii analysis of these data, combined with the published DISC1 interaction data and a further set of unpublished interactants, lead to an hypothesis of DISC1 having an additional function in the construction and maintenance of the primary cilium. Genetic interaction between a selection of SNPs in the genes that lead to this hypothesis and DISC1 by logistic regression analysis showed some evidence for interaction between DISC1 and TRAF3IP1, SYNE1 and FEZ2 at a genetic level (uncorrected significance only).

The role of DISC1 in major psychiatric illness is not conclusive, with the most convincing evidence for genetic involvement to date coming from the translo- cation in the original Scottish family. DISC1 has suggested roles in brain devel- opment and there have been suggestive associations through linkage and asso- ciation analyses for a role in psychiatric illness; however, the role of DISC1 as a strong candidate contributing to major psychiatric illness needs to be reconsid- ered as after extensive analyses the evidence remains anecdotal.

iv Acknowledgements

My greatest thanks go to my supervisors, Dr David Markie and Associate Pro- fessor Tony Merriman, who have been unwavering in their support and have always had an open door to my questions and frustrations. Without your ex- pertise and encouragement I may not have made it to the light at the end of the tunnel! Thank you for the opportunity to work with you during this project and for all the wisdom you have passed on.

To Dr Alison Fitches for her supervision and input to my work in the earlier days and to Professor Stephen Robertson who took over this role toward the end. You both added insight to my project from a slightly more removed perspective that was always fresh and very welcome.

Much appreciation goes to the funding bodies that have contributed to this work. To the University of Otago for providing me with a postgraduate scholarship and to the HS and JC Anderson Charitable Trust and Maurice and Phyllis Paykel Trust for grants that aided in the funding of this research project.

I would like to thank both the Pathology and Biochemistry Departments as well as the Genetics Teaching Program for their support during my years as a PhD student. Thanks to those who I have been lucky enough to share office space with, and to the Merriman lab group who have accepted me as one of their own even though I was so far away. A huge thank you to Ruth and Murray for their help with SNPmax and to Associate Professor Mik Black and Les McNoe for their great knowledge of statistics and all things genotyping. To Tanya, as promised: you are awesome! Who knows what statistical analysis I would still be trying to conduct if it weren’t for you! To Dr Zandra Jenkins for her wise words, helpful ideas and expert tips in the lab and to Lynne and Andrea who always had the answers I needed.

To my friends who have scattered far and wide in the time it has taken me to complete this thesis, thank you for making my life outside of work so much fun

v and knowing when not to ask if I was done yet. A special thank you to Megan— we were in this together right from the beginning and although you reached the end before me, you were still supportive and understanding for my final push to the finish.

To my parents, all four of them, thank you for your love and encouragement. Knowing that you all believed in me kept me going through the tough bits! Spe- cial mention must go to my Mum, who read every word of this thesis with her red pen in hand - I couldn’t have done this without you.

Finally I want to thank Thomas, who I’m sure feels like he has actually completed this thesis too - we made it! You were a great sounding board and provided me with many useful suggestions along the way, but mainly I thank you for your unwavering support, love and calming influence the whole way through.

vi Regulatory and Data Accession Approvals

Genetic modifications described in this thesis were conducted under approvals granted by the Environmental Protection Authority (EPA) (formerly Environ- mental Risk Management Authority (ERMA)).

The yeast two-hybrid experimental work was carried out in containment un- der approval code GMD000085. The construction and expression of fusion pro- teins was carried out in containment under application code GMD04095. Work carried out using Escherichia coli as a host was conducted under approval code GMD003361 and when using Homo Sapiens cell lines was conducted under ap- proval code GMD003362. No animals, animal tissues or animal products were used in this research.

This study makes use of data generated by the Wellcome Trust Case-Control Con- sortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113 and 085475 (Wellcome Trust Case- Control Consortium, 2007).

Funding support for the Genome-Wide Association Study of Schizophrenia and Molecular Genetics of Schizophrenia (MGS) was provided by Genomics Research Branch at NIMH (see below) and the genotyping and analysis of samples was provided through the Genetic Association Information Network and under the MGS U01s: MH79469 and MH79470. Assistance with data cleaning was provided by the National Center for Biotechnology Information.

The MGS dataset(s) used for the analyses described in this thesis were obtained from the database of Genotype and Phenotype (dbGaP) found at http://www. ncbi.nlm.nih.gov/dbgap through dbGaP accession numbers phs000021.v3.p2 (GAIN Schizophrenia) and phs000167. v1.p1 (non GAIN Schizophrenia). Sam- ples and associated phenotype data for the MGS GWAS study were collected un- der the following grants: NIMH Schizophrenia Genetics Initiative U01s: MH4627-

vii 6 (CR Cloninger), MH46289 (C Kaufmann), and MH46318 (MT Tsuang); and MGS Part 1 (MGS1) and Part 2 (MGS2) R01s: MH67257 (NG Buccola), MH59588 (BJ Mowry), MH59571 (PV Gejman), MH59587 (F Amin), MH59565 (Robert Freed- man), MH60870 (WF Byerley), MH59566 (DW Black), MH59586 (JM Silverman), MH61675 (DF Levinson), and MH60879 (CR Cloninger).

Funding support for the Whole Genome Association Study of Bipolar Disor- der was provided by the National Institute of Mental Health (NIMH) and the genotyping of samples was provided through the Genetic Association Informa- tion Network (GAIN). The datasets used for the analyses described in this thesis were obtained from the database of Genotypes and Phenotypes (dbGaP) found at http://www.ncbi.nlm.nih.gov/gap through accession number phs000017.v3.p1. Samples and associated phenotype data for the Collaborative Genomic Study of Bipolar Disorder were provided by the The NIMH Genetics Initiative for Bipo- lar Disorder. Data and biomaterials were collected in four projects that partici- pated in NIMH Bipolar Disorder Genetics Initiative (1991-98) under the follow- ing grants: U01’s MH46282 (J Nurnberger, M Miller and E Bowman), MH46280 (T Reich, A Goate and J Rice), MH46274 (JR DePaulo, Jr., S Simpson and C Stine) and NIMH Intramural Research Program, Clinical Neurogenetics Branch (E Ger- shon, D Kazuba and E Maxwell). Data and biomaterials were collected as part of ten projects that participated in the NIMH Bipolar Disorder Genetics Initia- tive (1999-2003) under the following grants: R01’s MH59545 (J Nurnberger, MJ Miller, ES Bowman, NL Rau, PR Moe, N Samavedy, R El-Mallakh, H Manji, DA Glitz, ET Meyer, C Smiley, T Foroud, L Flury, DM Dick, and H Edenberg), MH059534 (J Rice, T Reich, A Goate and L Bierut), MH59533 (M McInnis, JR De- Paulo, .Jr., DF MacKinnon, FM Mondimore, JB Potash, PP Zandi, D Avramopou- los, and J Payne), MH59553 (W Berrettini), MH60068, (W Byerley and M Vawter), MH059548 (W Coryell and R Crowe), MH59535 (E Gershon, J Badner, F McMa- hon, C Liu, A Sanders, M Caserta, S Dinwiddie, T Nguyen and D Harakal), MH59567 (J Kelsoe and R McKinney), MH059556 (W Scheftner, HM Kravitz, D Marta, A Vaughn-Brown and L Bederow) and NIMH Intramural Research Pro- gram 1Z01MH002810-01 (FJ McMahon, L Kassem, S Detera-Wadleigh, L Austin and DL Murphy).

viii Publications

Papers:

Debono, R., Topless, R., Markie, D., Black, M.A., and Merriman, T.R. (2012). Anal- ysis of the DISC1 translocation partner (11q14.3) in genetic risk of schizophrenia. Genes Brain Behav, 11, 859-63.

Conference Contributions:

Poster Presentation (not in published proceedings) Debono, R., Topless, R., Markie, D., Black, M. A., and Merriman, T. R. (2012, August). Assessment of genetic association of the DISC1 translocation partner 11q14.3 with schizophrenia. Poster session presented at the Queenstown Molec- ular Biology (QMB) Meetings, Queenstown, New Zealand.

Oral Presentation (not in published proceedings) Debono, Rebecca. (2010, November). DISC1 as a Candidate in Major Psychiatric Illness. Postgraduate Speaker presentation at the Genetics Otago Symposium, Dunedin, New Zealand.

ix x Contents

Abstract iii

Acknowledgements v

Approvals vii

Publications ix

List of Figures xix

List of Tables xxiii

List of Abbreviations xxvii

1 Background and Introduction 1 1.1 Major Psychiatric Illness ...... 1 1.1.1 Schizophrenia ...... 1 1.1.2 Bipolar Disorder ...... 3 1.1.3 Recurrent Major Depression ...... 4 1.1.4 Summary ...... 5 1.2 Identification of Causal Genetic Variants ...... 7 1.2.1 Linkage Analysis ...... 7 1.2.2 Association Analysis ...... 9 1.2.3 Sequencing Analysis ...... 11 1.2.4 Analysis of Genetic Interaction ...... 13 1.2.5 Considerations of Analyses ...... 15 1.2.6 Comparison of Linkage and Association Analyses ...... 21

xi CONTENTS

1.2.7 Using Linkage and Association Studies in Psychiatric Illness ...... 21 1.2.8 Functional Analysis ...... 24 1.2.9 Summary ...... 28 1.3 Disrupted in Schizophrenia 1 (DISC1)...... 29 1.3.1 The Discovery of DISC1 ...... 29 1.3.2 The Biology of DISC1 ...... 30 1.3.3 DISC1 in Psychiatric Illness ...... 35 1.3.4 Summary ...... 38 1.4 Current Research Area ...... 39 1.4.1 Background ...... 39 1.4.2 Micro-RNA ...... 40 1.4.3 Objectives of this Research ...... 42 1.4.4 Conclusion ...... 43

2 Materials and Methods 45 2.1 In Silico Analysis Tools ...... 45 2.1.1 miRNA Binding Site Prediction ...... 45 2.1.2 Enrichment Analysis ...... 45 2.1.3 Network Analysis ...... 46 2.2 Data Sets ...... 46 2.2.1 GAIN ...... 46 2.2.2 non-GAIN ...... 47 2.2.3 Wellcome Trust Case Control Consortium (WTCCC) ...... 47 2.2.4 Access to Datasets ...... 48 2.3 Association Analysis of Data ...... 48 2.3.1 Quality Control of Data ...... 48 2.3.2 Population Substructure ...... 49 2.3.3 BC SNPmax ...... 49 | 2.3.4 Corrections for Multiple Testing ...... 51 2.3.5 Power Calculations ...... 51 2.3.6 Genotyping Integrity ...... 52 2.4 Meta-Analysis of SNPs Reported in Literature ...... 52 2.4.1 Haploview and Linkage Disequilibrium Analysis ...... 52 2.4.2 STATA ...... 52 2.5 Epistasis Analysis ...... 53

xii CONTENTS

2.5.1 Imputation ...... 53 2.5.2 Statistical Analyses ...... 53 2.6 Primer Design and Synthesis ...... 54 2.7 Polymerase Chain Reaction (PCR) ...... 54 2.7.1 Optimisation ...... 54 2.7.2 Standard Reaction Conditions ...... 55 2.7.3 Hot Start PCR ...... 55 2.7.4 Long PCR ...... 55 2.7.5 Touch Down PCR ...... 56 2.8 Gel Electrophoresis ...... 56 2.8.1 Agarose Gel Electrophoresis ...... 56 2.8.2 Polyacrylamide Gel Electrophoresis ...... 56 2.9 DNA Sequencing ...... 57 2.9.1 Pre-Sequencing Template Purification ...... 57 2.9.2 Fluorescent Cycle Sequencing ...... 57 2.9.3 Re-Suspension and Running ...... 57 2.10 Yeast Methods ...... 58 2.10.1 PCR of Yeast Colonies ...... 58 2.10.2 Lithium Acetate Transformation ...... 58 2.10.3 Replica Plating ...... 59 2.10.4 Storage of Yeast ...... 60 2.11 Yeast Two-Hybrid Methods ...... 60 2.11.1 Construction of Baits and Preys ...... 60 2.11.2 Library Screen ...... 61 2.11.3 Prey Identification ...... 62 2.12 Bacteriological Methods ...... 63 2.12.1 Competent Escherichia coli Preparation ...... 63 2.12.2 Plasmid DNA Preparation (Miniprep Method) ...... 63 2.12.3 Plasmid DNA Preparation (Midiprep Method) ...... 64 2.12.4 Transformation of Competent Cells ...... 64

R 2.13 Gateway Vector System ...... 64 2.13.1 PCR from cDNA ...... 65 2.13.2 Entry Vector Cloning Reactions (BP Reactions) ...... 66 2.13.3 Destination Vector Transfer (LR Reactions) ...... 66 2.14 Mammalian Cell Culture ...... 67

xiii CONTENTS

2.14.1 Cell Maintenance and Storage ...... 67 2.14.2 Mammalian Cell Transfection ...... 67 2.15 Methods ...... 68 2.15.1 Protein Isolation from Mammalian Cell Cultures ...... 68 2.15.2 Coomassie Blue Staining ...... 69 2.15.3 GST Pulldown ...... 69 2.15.4 Western Blotting ...... 69 2.16 Functional Analyses ...... 71 2.16.1 Localisation of Proteins ...... 71 2.16.2 Fluorescent Microscopy ...... 71 2.17 Materials ...... 72 2.17.1 Yeast Media ...... 72 2.17.2 Bacteriological Media ...... 73 2.17.3 Mammalian Cell Culture Media ...... 74 2.17.4 Yeast Strains ...... 75 2.17.5 Bacterial Strains ...... 76 2.17.6 Mammalian Cell Lines ...... 76 2.17.7 Plasmids ...... 77 2.17.8 Antibodies ...... 80

3 Association of DISC1 and 11q14.3 with Psychiatric Illness 83 3.1 Introduction ...... 83 3.1.1 DISC1 Associations in the Literature ...... 83 3.1.2 Discovery of a Putative miRNA Interaction ...... 86 3.1.3 The 11q14.3 Region ...... 87 3.1.4 This Study ...... 88 3.2 Results ...... 90 3.2.1 In Silico Analysis of miR-575 and rs11122324 ...... 90 3.2.2 Assessment of Datasets ...... 91 3.2.3 Association Analysis of rs11122324 ...... 93 3.2.4 Association Analysis of DISC1 ...... 96 3.2.5 Association Analysis of 11q14.3 ...... 104 3.2.6 Population Stratification ...... 111 3.3 Discussion ...... 112 3.3.1 GWAS Analysis of Datasets in the Literature ...... 112

xiv CONTENTS

3.3.2 Exclusion of the WTCCC Dataset ...... 113 3.3.3 Separate Analysis of the GAIN and nonGAIN Datasets ...... 115 3.3.4 Meta-Analysis Assumptions and Problems ...... 115 3.3.5 Validity of Splitting Analysis by Gender ...... 115 3.3.6 Multiple Testing Correction ...... 116 3.3.7 Conclusions on rs11122324 Association with Psychiatric Illness . . . . 117 3.3.8 Discussion of DISC1 Association with Psychiatric Illness ...... 119 3.3.9 Discussion of 11q14.3 Association with Psychiatric Illness ...... 121 3.3.10 Power ...... 122 3.3.11 Psychiatric Genomics Consortium Results ...... 124 3.3.12 Conclusions ...... 125

4 Discovery of Proteins Interacting with DISC1 129 4.1 Introduction ...... 129 4.1.1 DISC1 ...... 129 4.1.2 Yeast Two-Hybrid Analysis ...... 130 4.1.3 Validation of Physical Interaction ...... 135 4.1.4 This Study ...... 137 4.2 Results ...... 139 4.2.1 Library Screen ...... 139 4.2.2 Analysis of Interacting Proteins ...... 142 4.2.3 Preys chosen for Further Investigation ...... 146 4.2.4 Cloning of Genes ...... 148 4.2.5 Confirmation of Interaction in Human Cells ...... 156 4.3 Discussion ...... 164 4.3.1 Addressing the Limitations of the Yeast Two-Hybrid System ...... 164 4.3.2 Inclusion of DISC1 Interactions in the Literature ...... 171 4.3.3 Development of an Hypothesis ...... 172 4.3.4 Assessment of Interaction in Human Cells ...... 175 4.3.5 Conclusions ...... 182

5 Interaction at a Genetic Level 183 5.1 Introduction ...... 183 5.1.1 DISC1 Interactions in the Literature ...... 183 5.1.2 Gene-Gene Epistatic Interaction ...... 185

xv CONTENTS

5.1.3 Aims of This Study ...... 187 5.2 Results ...... 188 5.2.1 Validation of Published Epistatic Relationships ...... 188 5.2.2 Assessment of the Confirmed Interacting Gene Set ...... 192 5.2.3 Gene x Gene Interactions ...... 201 5.2.4 Assessment of Epistasis - Confirmed Gene Set ...... 210 5.2.5 Assessment of Prey Subset ...... 225 5.2.6 Main Effect Analysis for the Prey Subset ...... 236 5.2.7 Assessment of Epistasis - Prey Subset Genes ...... 241 5.3 Discussion ...... 248 5.3.1 Validation of Published Epistatic Relationships ...... 248 5.3.2 Interacting Gene Set: Inflation and Association ...... 248 5.3.3 Epistatic Relationships with Confirmed Interactants ...... 253 5.3.4 Epistatic Relationships with Prey Subset ...... 256 5.3.5 Limitations ...... 258 5.3.6 Conclusions ...... 261

6 Summary and Future Directions 263 6.1 Summary and Discussion of Main Findings ...... 263 6.1.1 The Discovery of DISC1 ...... 263 6.1.2 Association Analysis ...... 267 6.1.3 Biological Analysis ...... 271 6.1.4 Genetic Analysis of DISC1 Interaction Partners ...... 273 6.2 Future Directions ...... 276 6.2.1 Association Analysis ...... 276 6.2.2 Molecular Function of DISC1 ...... 279 6.3 Concluding Comments ...... 281

Appendices 283

Appendix A 285

Appendix B 287

Appendix C 289

xvi CONTENTS

Appendix D 291

Appendix E 293

References 295

xvii CONTENTS

xviii List of Figures

1.1 Rare Effects Model ...... 22 1.2 DISC1 Main Isoforms ...... 31 1.3 Pedigree of the Scottish Family Carrying the DISC1 Translocation ...... 36 1.4 miRNA Biogenesis ...... 41

R 2.1 The Gateway Cloning system...... 65 TM 2.2 pDONR 201 Structure...... 78 TM 2.3 pDEST 27 Structure...... 78 2.4 pDEST/TO/myc-His Structure...... 79 2.5 pDEST/TO/EYFP(mCherry)/myc-His Structure...... 80

3.1 Summary of Major Association Analysis Findings ...... 85 3.2 Predicted DISC1-DISC1FP1 Fusion Constructs ...... 88 3.3 In silico analyses of the predicted hybridisation of miR-575 to a site in the

30UTR of the DISC1 ES isoform ...... 90 3.4 Q-Q Plots of Genome Wide Data ...... 91 3.5 Principle Component Analysis of DISC1 and 11q14.3 Regions ...... 92 3.6 Principle Component Analysis in Combined GAIN and nonGAIN Schizophrenia Dataset ...... 93 3.7 Manhattan Plots of Uncorrected Individual DISC1 Analyses ...... 97 3.8 Manhattan Plots of Corrected Individual DISC1 Analyses ...... 98 3.9 Manhattan Plots of the Combined GAIN and non-GAIN SCZ DISC1 Analyses 99 3.10 Manhattan Plots of the Combined GAIN and non-GAIN SCZ and GAIN BPD DISC1 Analyses ...... 100 3.11 Haploview Image of the s11122331 SNP Cluster ...... 101 3.12 Haploview Image with Published Fourth SNP ...... 102

xix LIST OF FIGURES

3.13 STATA Analysis of rs1538979 ...... 103 3.14 Manhattan Plots of the Uncorrected Individual Chromosome 11q14.3 Analyses 105 3.15 Manhattan Plots of the Corrected Individual Chromosome 11q14.3 Analyses . 106 3.16 Manhattan Plots of the Combined GAIN and non-GAIN SCZ and BPD 11q14.3 Analyses ...... 109 3.17 Manhattan Plots of the Combined GAIN and non-GAIN SCZ 11q14.3 Analyses 110 3.18 Haploview Image of Associated Chromosome 11 SNPs ...... 111 3.19 DISC1FP Structure ...... 122 3.20 Statistical Power Graph ...... 123

4.1 Overlap in Published Yeast Two-Hybrid Analyses...... 130 4.2 Yeast 2-Hybrid Mechanism of Action...... 131 4.3 Gap Repair Cloning ...... 134 4.4 Overlap of Published DISC1 Interacting Proteins and the WTAC Data . . . . . 138 4.5 Baits for DISC1 ...... 139 4.6 PCR Amplification of Preys ...... 140 4.7 Example of Sequence from Yeast Two-Hybrid Analysis ...... 140 4.8 ToppGene Enrichment Analysis ...... 143 4.9 ToppGene Enrichment Analysis Including Published Interactions ...... 144 TM 4.10 pDONR 201 Sequence Following BP Reaction...... 149 4.11 Entry Clone Sequence Trace...... 149 4.12 DISC1 Expression Construct...... 151 4.13 Prey Genes Expression Construct...... 152 4.14 Confirmation of pDEST/TO/myc-His-DISC1 Construct...... 153 4.15 Digests of pDEST/TO/myc-His-DISC1...... 154 TM 4.16 Confirmation of pDEST 27-Prey Constructs...... 155 4.17 Localisation of DISC1 in Human Cells...... 156 4.18 Co-Localisation of DISC1 and Prey Proteins in Human Cells...... 157 4.19 Localisation of RAB11A in Human Cells...... 158 4.20 Co-Localisation of DISC1 and RAD51 in Human Cells...... 158 TM 4.21 Transfection Efficiency of RAB11A and DISC1 in Human T-REx -293 Cells. . 159 4.22 Expression of Prey-GST Fusion Proteins in Human Cells...... 160 4.23 Expression of DISC1 Fusion Proteins in Human Cells...... 162 4.24 Expression of DISC1 at Varying Levels of Transfection...... 162 4.25 GST Pulldown of DISC1...... 163

xx LIST OF FIGURES

4.26 Overlap of All known DISC1 Yeast Two-Hybrid Interaction Datasets . . . . . 166 4.27 Assessment of Library Bias ...... 169

5.1 Linkage Disequilibrium between rs1754605 and rs821616 ...... 188 5.2 Possible Proxy SNPs for rs1391768 ...... 190 5.3 Q-Q Plots Of Published Gene Set in Individual Datasets ...... 193 5.4 Q-Q Plots Of Published Gene Set in Combined SCZ Dataset ...... 194 5.5 Q-Q Plots Of Published Gene Set in Combined SCZ and BPD Dataset . . . . . 195 5.6 Q-Q Plots Of Complete Gene Set in Individual Datasets ...... 197 5.7 Q-Q Plots Of Complete Gene Set in Combined SCZ Dataset ...... 198 5.8 Q-Q Plots Of Complete Gene Set in Combined SCZ and BPD Dataset . . . . . 199 5.9 Haploview Linkage Disequilibrium Patterns...... 203 5.10 Q-Q Plots Of the Selected Prey Gene Set in Individual Datasets ...... 228 5.11 Haploview Plot of Significant SNPs in ATP6V1B2 ...... 230 5.12 Q-Q Plots of Selected Prey Gene Set in Combined SCZ Dataset ...... 230 5.13 Haploview Plot of Significant SNPs in USO1 ...... 231 5.14 Q-Q Plots of Selected Prey Gene Set in Combined BPD and SCZ Dataset . . . 232 5.15 Haploview Plots of Prey Subset SNPs...... 235 5.16 Q-Q Plots Of Psychiatric Genomics Consortium Data ...... 250 5.17 TRAF3IP1-DISC1 Interacting Domain...... 256 5.18 Statistical Power Graphs for Epistasis Analysis...... 260

6.1 Summary of Thesis ...... 264

xxi LIST OF FIGURES

xxii List of Tables

1.1 Summary of DISC1 Linkage Findings ...... 37

2.1 Gender Split Cohorts ...... 51 2.2 STATA Input File ...... 53

3.1 Association Analysis of rs11122324 ...... 94 3.2 Association Analysis of rs11122324 Split by Gender ...... 94 3.3 Meta-Analyses of rs11122324 ...... 95 3.4 Meta-Analyses of rs11122324 Split by Gender ...... 95 3.5 Meta-Analysis of Psychiatric Illness in Combined BPD and SCZ Dataset. . . . 101 3.6 SNPs Reaching Corrected Significance in the non-GAIN SCZ Dataset . . . . . 107 3.7 Schizophrenia Meta-Analysis in Males ...... 108 3.8 MGS DISC1 ES Isoform Results ...... 113 3.9 Psychiatric Genomics Consortium Results ...... 124

4.1 Results of DISC1 Yeast Two-Hybrid Screen ...... 141 4.2 Summary of DISC1 Yeast Two-Hybrid Screens ...... 141 4.3 Associated Network Functions from IPA Analysis ...... 145 4.4 Preys Chosen for Confirmation ...... 147 4.5 Summary of BP Clone Sequence Confirmation ...... 150

5.1 Confirmed DISC1 Protein Interactions ...... 184 5.2 c2 Analysis for DISC1xFEZ1 Interaction in non-GAIN SCZ with Grouped Geno- types...... 189 5.3 c2 Analysis for DISC1xFEZ1 Interaction in non-GAIN SCZ...... 189 5.4 c2 Analysis for DISC1xNDEL1 Interaction in GAIN and non-GAIN SCZ with Grouped Genotypes...... 191

xxiii LIST OF TABLES

5.5 c2 Analysis for DISC1xNDEL1 Interaction in GAIN and non-GAIN SCZ. . . . 191 5.6 Individual Dataset Q-Q Analysis Results of Published Gene Set ...... 192 5.7 Combined SCZ Dataset Q-Q Analysis Results for Published Gene Set . . . . . 194 5.8 Combined SCZ and BPD Dataset Q-Q Analysis Results for Published Gene Set 195 5.9 ToppGene Enrichment Analysis of Confirmed Published Gene Set ...... 196 5.10 Individual Dataset Q-Q Analysis Results for Complete Gene Set ...... 198 5.11 Combined SCZ Dataset Q-Q Analysis Results for Complete Gene Set . . . . . 198 5.12 Combined SCZ and BPD Dataset Q-Q Analysis Results for Complete Gene Set 199 5.13 ToppGene Enrichment Analysis of Complete Gene Set ...... 200 5.14 Summary Table of non-Synonymous Variants ...... 202 5.15 Minor Allele Frequency Assessment ...... 204 5.16 Association Analysis of Epistasis Candidate SNPs in GAIN SCZ ...... 205 5.17 Association Analysis of Epistasis Candidate SNPs in non-GAIN SCZ . . . . . 206 5.18 Association Analysis of Epistasis Candidate SNPs in Combined SCZ . . . . . 207 5.19 Association Analysis of Epistasis Candidate SNPs in BPD ...... 208 5.20 Association Analysis of Epistasis Candidate SNPs in Combined BPD and SCZ 209 5.21 Summary Of Regression Analyses in GAIN SCZ...... 211 5.22 c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in GAIN SCZ. 212 5.23 c2 Analysis for DISC1 rs3738401 x PCNT rs2073376 Interaction in GAIN SCZ. 212 5.24 c2 Analysis for DISC1 rs821616 x TRAF3IP1 rs12464423 Interaction in GAIN SCZ...... 213 5.25 c2 Analysis for DISC1 rs821616 x MAP1A rs2245715 Interaction in GAIN SCZ. 213 5.26 c2 Analysis for DISC1 rs821616 x SYNE1 rs214976 Interaction in GAIN SCZ. . 214 5.27 c2 Analysis for DISC1 rs821616 x SYNE1 rs2252755 Interaction in GAIN SCZ. 214 5.28 Summary Of Regression Analyses in non-GAIN SCZ...... 215 5.29 c2 Analysis for DISC1 rs6675281 x PCNT rs6518291 Interaction in non-GAIN SCZ with Grouped Alleles...... 216 5.30 c2 Analysis for DISC1 rs6675281 x PCNT rs35940413 Interaction in non-GAIN SCZ with Grouped Alleles...... 216 5.31 Summary Of Regression Analyses in Combined SCZ...... 217 5.32 c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in Combined SCZ...... 218 5.33 c2 Analysis for DISC1 rs821616 x UTRN rs1534443 Interaction in Combined SCZ...... 218 5.34 Summary Of Regression Analyses in GAIN BPD...... 219

xxiv LIST OF TABLES

5.35 c2 Analysis for DISC1 rs821616 x TRAF3IP1 rs12464423 Interaction in GAIN BPD...... 220 5.36 c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in GAIN BPD. 220 5.37 c2 Analysis for DISC1 rs3738401 x AKAP9 rs6960867 Interaction in GAIN BPD. 221 5.38 Summary Of Regression Analyses in Combined BPD and SCZ...... 222 5.39 c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in Combined BPD and SCZ...... 223 5.40 c2 Analysis for DISC1 rs3738401 x SYNE1 rs214950 Interaction in Combined BPD and SCZ...... 223 5.41 c2 Analysis for DISC1 rs6675281 x SYNE1 rs6911096 Interaction in Combined BPD and SCZ...... 224 5.42 Yeast Two-Hybrid Prey Subset...... 225 5.43 Individual Dataset Q-Q Analysis Results for Selected Prey Genes ...... 229 5.44 Combined SCZ Dataset Q-Q Analysis Results for Selected Prey Genes . . . . 230 5.45 Combined BPD and SCZ Dataset Q-Q Analysis Results for Selected Prey Genes 232 5.46 ToppGene Enrichment Analysis of Prey Gene Set ...... 233 5.47 Summary of Non-Synonymous SNPs in Prey Gene Subset ...... 234 5.48 Minor Allele Frequencies of Chosen SNPs from the Prey Subset ...... 235 5.49 Association Analysis of Epistasis Candidate SNPs from the Prey Subset in GAIN SCZ ...... 236 5.50 Association Analysis of Epistasis Candidate SNPs from the Prey Subset in non-GAIN SCZ ...... 237 5.51 Association Analysis of Epistasis Candidate SNPs from the Prey Subset in Combined SCZ ...... 238 5.52 Association Analysis of Epistasis Candidate from the Prey Subset SNPs in BPD 239 5.53 Association Analysis of Epistasis Candidate SNPs from the Prey Subset in Combined BPD and SCZ...... 240 5.54 Summary Of Regression Analyses for Prey Subset SNPs in GAIN SCZ. . . . . 241 5.55 c2 Analysis for DISC1 rs3738401 x IQUB rs10255061 Interaction in GAIN SCZ. 242 5.56 c2 Analysis for DISC1 rs6675281 x GOLGA4 rs11718848 Interaction in GAIN SCZ...... 242 5.57 Summary Of Regression Analyses for Prey Subset SNPs in non-GAIN SCZ. . 243 5.58 c2 Analysis for DISC1 rs3738401 x CEP70 rs1673607 Interaction in non-GAIN SCZ...... 243 5.59 Summary Of Regression Analyses for Prey Subset SNPs in Combined SCZ. . 244

xxv LIST OF TABLES

5.60 c2 Analysis for DISC1 rs3738401 x FEZ2 rs848642 Interaction in non-GAIN SCZ.244 5.61 Summary Of Regression Analyses for Prey Subset SNPs in GAIN BPD. . . . . 245 5.62 c2 Analysis for DISC1 rs3738401 x FEZ2 rs848642 Interaction in GAIN BPD. . 246 5.63 c2 Analysis for DISC1 rs821616 x FEZ2 rs2287104 Interaction in GAIN BPD . 246 5.64 Summary Of Regression Analyses for Prey Subset SNPs in Combined BPD and SCZ...... 247 5.65 c2 Analysis for DISC1 rs3738401 x FEZ2 rs848642 Interaction in Combined BPD and SCZ...... 247

xxvi LD Linkage Disequilibrium

LOD Logarithm of Odds

MAF Minor Allele Frequency

mg milligrams List of miRNA Micro RNA mL Millilitre Abbreviations mRNA Messenger RNA ng nanograms

OR Odds Ratio

µL Microlitre PBS Phosphate Buffered Saline

BLAST Basic Local Alignment Search Tool PCR Polymerase Chain Reaction

BPD Bipolar Disorder RNA Ribonucleic Acid cDNA Complementary DNA rpm Revolutions Per Minute

CI Confidence Interval SCZ Schizophrenia

DAPI 4’,6-diamidino-2-phenylindole SD Synthetic Dextrose

Degrees of Freedom df SDS Sodium Dodecyl Sulfate

DISC Disrupted in Schizophrenia SIBS South Island Bipolar Study DMEM Dulbecco’s Modied Eagle’s Medium SNP Single Nucleotide Polymorphism DMSO Dimethyl Sulfoxide SOB Super Optimal Broth DNA Deoxyribonucleic Acid SOC SOB with Catabolite Repression FDR False Discovery Rate Taq Thermus aquaticus g grams TB Terrific Broth GAIN Genetics Association Information Net- UTR Untranslated Region work v/v Volume per Volume GO w/v Weight per Volume kb Kilobase WTAC Wellcome Trust Advanced Course kDa Kilodaltons

Wellcome Trust Case Control Consortium LB Luria-Bertani WTCCC

xxvii LIST OF ABBREVIATIONS

xxviii Chapter 1

Background and Introduction

1.1 Major Psychiatric Illness

Major psychiatric illness covers a wide variety of mental disorders, including the three main disorders: schizophrenia, bipolar disorder and recurrent major depression. Approximately 10–20% of the general population are affected by one of these disorders at some point dur- ing their lives. These disorders can be debilitating if left untreated, and in some individuals treatment is not particularly effective, this is the reason research into the field of major psy- chiatric illness is currently of great interest. Continuing research will help to determine the underlying biology of the disorders and to identify more effective treatment options. The three aforementioned disorders are discussed in this section.

1.1.1 Schizophrenia

Schizophrenia can be an extremely debilitating mental illness and will affect approximately 1% of the population at some point during their lifetime, with onset occurring, most com- monly, in the second or early third decades of life (Lewis & Levitt, 2002). Schizophrenia can present with both positive and negative symptoms, where positive symptoms are defined as the gain of a behaviour, such as delusions and (or) hallucinations and negative symptoms are

1 1. BACKGROUND AND INTRODUCTION defined as the loss of behaviours such as social withdrawal and loss of fluency or production of speech (American Psychiatric Association, 2013).

1.1.1.1 Subtypes

According to the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-V) schizophrenia can fall into five main subtypes or a group of schizoaffective or schizophrenia- like disorders. The five main subtypes are:

Paranoid schizophrenia - is defined by the presence of prominent delusions and halluci- • nations, the latter of which are normally auditory. Onset is often later in life, with a better prognosis. Disorganised schizophrenia - defined by symptoms of disorganised speech and behaviour • as well as flat or inappropriate affect. Often insidious and continuous without remis- sion. Catatonic schizophrenia - presence of severe psychomotor impairment—presenting as • either a loss or unnecessary excess of movement—is the defining feature. Undifferentiated schizophrenia - where symptoms meet the general schizophrenia symp- • tom criteria but do not fall specifically into one of the three previous subtypes. Residual schizophrenia - at least one episode of schizophrenia has occurred but currently • there are no symptoms.

1.1.1.2 Heritability of Schizophrenia

There is a genetic contribution to schizophrenia with both adoption studies (Ingraham & Kety, 2000) and twin studies (Onstad et al., 1991) demonstrating that the risk of developing the disorder is increased when a related individual has been diagnosed with the disorder. Additionally, the closer the relationship to the afflicted patient the higher the increase in risk. Like other psychiatric disorders the aetiology of schizophrenia is complex, although it can be inherited, in a non-Mendelian, polygenic fashion (Risch, 2000).

One meta-analysis of the disorder in twins estimates that the genetic contribution accounts for nearly 80% of the heritability in schizophrenia (Sullivan et al., 2003); however the loci are difficult to pinpoint due to external environmental factors, as well as the small effect size of each gene and phenotyping issues. Other smaller studies give varying lower estimates

2 1.1 Major Psychiatric Illness of the heritability between individuals, depending on their relatedness, ranging from 2% in first cousins to 48% in monozygotic twins (Tsuang, 2000). It is further predicted that approx- imately 60% of cases will have no first or second degree relative also diagnosed with the disorder (Gottesman & Erlenmeyer-Kimling, 2001). These results, taken together, indicate that while there is evidence of genetic heritability, there is also a reasonably strong environ- mental influence in the development of the disorder.

1.1.2 Bipolar Disorder

According to the DSM-V, bipolar disorder is a major psychiatric illness that is characterised by alternating periods of depressive and manic moods, which are often interspersed with periods of normal mood. Symptoms of mania include: inflated self-esteem leading to un- realistic goals and ambitions, decreased need for sleep, distractibility, agitation, inability to concentrate on a single idea and over-involvement in pleasurable activities that may have harmful consequences. Symptoms of depression include: changes to appetite or weight, excessive need for sleep or insomnia, decreased energy, feelings of worthlessness or guilt, difficulty concentrating or making decisions and/or recurrent thoughts of death including plans or attempts. These characteristic symptoms are shared with schizophrenia and ma- jor depressive disorder but bipolar disorder is distinct from either of these due to its cyclic nature (Bauer et al., 2002).

Bipolar disorder can be debilitating if left untreated and also affects approximately 1% of the general population worldwide (Soutullo et al., 2005); however, the complete bipolar dis- order spectrum of conditions is seen to affect 6.4–8.3% of individuals (Dorey et al., 2008). Patients can have a wide range of symptoms, aside from the characteristic manic depression, which include decreased visual memory and attention spans, cognitive impairments, and an apparent increased risk of chronic disease, such as heart disease (McIntyre et al., 2006).

1.1.2.1 Diagnostic Categories

Bipolar I Disorder: One or more manic or mixed episode, where a mixed episode is the rapid cycling between depression and mania, i.e. both occur in a single twenty-four hour period. Patients may also experience major depressive episodes.

Bipolar II Disorder: This is characterised by recurrent episodes of major depression plus at least one hypomanic episode.

3 1. BACKGROUND AND INTRODUCTION

Cyclothymia Disorder: Cyclothymia disorder is a less severe form of bipolar disorder char- acterised by chronic fluctuating mood disturbances between hypomania and depression (with no major episodes). The symptoms of both the mania and depression are of insuffi- cient number, severity, pervasiveness or duration to be deemed major episodes.

Bipolar Disorder Not Otherwise Specified: As the name indicates this category includes all bipolar disorder patients that do not exhibit the symptoms that define the other three categories. Symptoms may differ from the other three categories by frequency, duration or severity of symptoms, for example a patient may cycle too quickly between manic and de- pressive episodes to be deemed bipolar disorder I or II, or they might display manic episodes without the depressive periods.

1.1.2.2 Heritability of Bipolar Disorder

Bipolar disorder is thought to have both genetic influence and environmental risk factors, such as prior mental illness and significant adverse life events (Geller & Luby, 1997). The study of the genetics of bipolar disorder was initiated with the study of twins, and it was noted in a Finnish cohort that monozygotic twins had a concordance rate of 43% (dizygotic twins 6%) (Kieseppa et al., 2004). Additional studies have shown the genetic component associated with bipolar disorder to be even more significant, estimated to have 60–80% heri- tability (Bauer et al., 2002).

1.1.3 Recurrent Major Depression

This is more prevalent than both bipolar disorder and schizophrenia with a lifetime preva- lence of 5–12% in males and 10–25% in females. The percentage of individuals affected at any one time is estimated to be about 5–9% for females and 2–3% for males (American Psychiatric Association, 2013). The onset of this disorder can occur at any age (the highest risk period being the adolescent years) and may present as a number of major depressive episodes clus- tered together at one time or as isolated incidents over a number of years, ranging in duration from just a few weeks to many years (Lewinsohn et al., 1994). As the patient experiences an increase in the number of episodes there is a greater chance of episodes recurring, so the disorder is often described as worsening with age.

According to the DSM-V the essential feature of this disorder is more than one major depres- sive episode in an individual that lacks any symptoms of manic or mixed episodes. A major

4 1.1 Major Psychiatric Illness depressive episode is diagnosed when there is a depressed mood or loss of interest in nearly all activities that lasts for at least two weeks. The DSM-V also excludes any cases that are induced by drugs or alcohol, as these are considered mood altering substances, as well as any cases that overlap with the schizophrenia spectrum of disorders.

1.1.3.1 Diagnostic Categories

Recurrent major depression can be broadly categorised into the following, depending on the severity:

Mild—five or six depressive symptoms • Moderate—an intermediate between mild and severe • Severe—most of the symptoms of depression are present, either with or without psy- • chotic features

Psychotic features are described as being either delusions or hallucinations, which in the case of this disorder generally have a depressive theme. The general symptoms of major depres- sion used to classify the severity of the disorder include feelings of depression, alterations to normal sleep and eating patterns, loss of interest in activities previously enjoyed, social withdrawal, psychomotor changes, fatigue, decreased concentration, sense of worthlessness and often thoughts or attempts of suicide.

1.1.3.2 Heratibility of Recurrent Major Depression

As with bipolar disorder and schizophrenia there is a certain level of heritability seen in this illness. One review of heritability presents findings that suggest the level is between 10% and 55% depending on the degree of relatedness and the severity of the disorder (Shih et al., 2004). Again the heritability pattern shows (especially in the findings from twin studies) that there must be genetic contributors to the development of this disorder but these do not provide the complete picture—environment is also important.

1.1.4 Summary

Major psychiatric illnesses such as those described above are a significant cause of morbidity and mortality. The aforementioned disorders are believed to have an inherited predisposi-

5 1. BACKGROUND AND INTRODUCTION tion, in addition to environmental influences. There are a large number of environmental influences that may have an effect, obviously of course these differ between individuals; however, some of the influences may include significant life events, such as the death of a family member or close friend, prior childhood mental illness, drug or substance abuse or a traumatic incident (Miklowitz & Chang, 2008). The underlying genetic contribution to the common forms of these diseases is complex. There is no single gene responsible; instead genetic loading is increasing the risk via the accumulation of small additive genetic changes. As such, these disorders are polygenic, non-Mendelian illnesses with a broad range of phe- notypes and variable levels of penetrance.

Though these are three distinct disorders, there is definite overlap in the symptomology, which is possibly indicative of an overlap in the underlying genetic and environmental causes of the disorders. For this reason these major mental illnesses are often considered together when being assessed from a genetic view point. Some progress has been made in identifying contributing genes (see Section 1.3.3.1) and although the genetic picture is far from complete, an understanding of the normal function of these genes and the mechanism of their contribution to major psychiatric disorders is likely to provide new opportunities for treatment strategies.

6 1.2 Identification of Causal Genetic Variants

1.2 Identification of Causal Genetic Variants

As advancements in technology become available to the research community, the ability to detect variants contributing to diseases becomes easier. Two of the most common ways to begin identifying these risk variants are linkage and association studies. Once a gene is identified as a candidate for a particular disorder its function must be assessed to determine its viability as a credible contributor. Care must be taken when determining if a variant is causal or whether it is reporting on a second variant that is in linkage disequilibrium (LD) with it. This section will discuss linkage, association and sequencing analyses and their application in the identification of genetic contributors to major psychiatric illness.

1.2.1 Linkage Analysis

Linkage analysis provides information about regions of the genome that are more likely to be segregating with a particular disease. The basis of linkage analysis is , a term that describes the phenomenon of one genetic locus being inherited in combination with another, due to their relatively close proximity on a chromosome. Linkage analysis has been used in the advancement of understanding a number of now well characterised genetic disorders including cystic fibrosis (Tsui et al., 1985, 1986) and Huntington’s disease (Gusella et al., 1983).

Underlying linkage analysis is the linkage (or genetic) map, which was first described by Alfred Sturtevant in 1913; this map was an accumulation of recombination frequencies of sex-linked traits in Drosophila (Sturtevant, 1913). In 1911 Thomas Morgan first suggested that genes carried on the same chromosome were linked due to their close proximity and so the unit of distance between genes was deemed the Morgan or centiMorgan (Haldane, 1919).

Linkage analysis calculates the odds of an unknown disease causing variant being in close proximity to a set of genetic markers for which the exact chromosomal location is known. This is achieved by measuring the amount of recombination that has occurred between genetic markers and the unknown disease locus. Probability dictates that the less recom- bination occurring between two loci the closer in proximity they are on the chromosome

7 1. BACKGROUND AND INTRODUCTION and conversely, the further apart they are the more likely it is for recombination to have occurred.

In order to assess recombination multiple generations of an affected family are required and there must be information on the phenotype and genotypes at each marker for each individ- ual. If, throughout a pedigree, a particular combination of disease phenotype and genetic marker are consistently inherited together, it can be concluded that the genetic marker in question is linked to and therefore in close proximity to the disease causing variant. With a large enough family the number of variants that segregate with the disease by chance alone can be reduced, and thus the location of the disease locus can be narrowed down. As stated previously, the closer two chromosomal loci are to one another the less likely it is that re- combination will occur between them; however, unless the genetic marker and the disease causing locus are one and the same there is always some possibility that recombination be- tween the two may occur.

A statistical method is required to assess the likelihood of a given recombination frequency occurring due to genetic linkage, instead of purely by chance. This is achieved through use of LOD scores. The calculation of Logarithm of Odds (LOD) scores is the statistical method used for measuring genetic linkage from human pedigree data and has been successful in the identification of disease genes causing Mendelian disorders for almost fifty-five years (Rice et al., 2001). They are calculated as the ratio of the probability of obtaining a set of observations with a specified degree of linkage (known as the recombination fraction), to the probability of obtaining the same set of observations with normal independent assortment (Morton, 1955). This can be expressed as the following equation:

(1 q)S qR LOD = Z = log10 ⇥ 0.5(S+R)

Where S is the number of events showing no recombination, R the number showing recom- bination and q the recombination fraction.

The LOD score method is extremely robust in that the scores are additive and so LOD scores from different pedigrees with the same disease can be combined to increase power, (Rice et al., 2001) although heterogeneity must be considered in these cases as this can confound results. The estimate with the highest LOD score is considered the best, and generally a LOD score of three or more is deemed to be evidence of significant linkage in a pedigree (Morton, 1955), although this interpretation is affected when multiple testing is undertaken. A LOD

8 1.2 Identification of Causal Genetic Variants score of 3.3 is now generally considered as evidence of linkage when genome-wide analysis is carried out (Lander & Schork, 1994).

There are two different types of linkage analysis, parametric and non-parametric. The former is used when the mode of inheritance is known and generates a LOD score by mathematical likelihood of the disease being linked to the marker; non-parametric linkage analysis is used when the mode of inheritance is unknown, which is often the case in complex diseases, and determines the extent of sharing of marker alleles among affected relatives (Hodge, 2001). Parametric linkage analysis should be undertaken wherever possible as this method pro- vides the most information about the disease and can be used to evaluate collective unrelated family groups.

1.2.2 Association Analysis

Association analysis provides information about markers within the genome that may be associated with a particular disease phenotype. This type of analysis broadly consists of a comparison of a number of cases to matched controls at a large number of genotyped markers (often SNPs) to determine if there are any significant differences between the two groups, which could indicate an association to the disease of interest. There are a number of different ways in which this can be done including family based studies and robust studies of unrelated individuals across the entire genome (or specific regions within it).

Family based association analysis relies on the transmission disequilibrium test, which was first proposed as a method for identifying associations of genetic markers with a particu- lar trait through linkage disequilibrium by Spielman et al. (1993). The test measures the over-transmission of an allele to affected children from heterozygous parents and unlike conventional linkage methods that use extended pedigrees, these tests require collections of informative family trios only (Spielman & Ewens, 1996). Given that this test is based on transmissions within a family it is not affected by population substructure like classic association analysis making it a fairly robust tool in disease analysis (Ewens & Spielman, 1995).

Linkage disequilibrium refers to the alleles at different loci associating in a non-random man- ner (Slatkin, 2008). Haploblocks can be used to visualise the LD that exists across a region of the genome and are useful in the mapping of disease loci for genetic diseases. The knowl- edge of LD patterns across large regions reduces the number of association tests that need to

9 1. BACKGROUND AND INTRODUCTION be conducted because a test of one SNP also tests the association of SNPs that are in LD with it (Carlson et al., 2004). LD can be estimated by a number of different metrics due to differ- ences present in the features of the association. D is the linking parameter to these different metrics and can be defined as:

D = P P P AB AB A B where A and B are the alleles and P is their frequency.

The two most commonly used metrics for measuring the extent of LD are D’ (Lewontin, 1964) and r2 (based on the Pearson correlation coefficient (Pearson, 1895)), the equations of these two metrics are shown below:

D D0 = Dmax where, Dmax is the theoretical maximum for the allele frequencies, and

D2 r2 = P (1 P )P (1 P ) A A B B D’ is inflated when one allele is rare and also in small sample sets; however, it is a good measure to use if the two alleles have similar frequencies. For this reason the r2 method is generally preferred because disease variants can have low allele frequencies.

Association analysis in unrelated individuals usually takes the form of a classic case-control comparison, so has the capacity to be very large (Pearson & Manolio, 2008). The basis of this kind of test is to match the cases and controls in all ways (ethnicity, age, gender, etc.) ex- cept for the disease phenotype. This way when comparing genetic markers any significant difference seen between the cases and controls can be attributed to the disease phenotype being tested (Xu et al., 1998). Association studies aim to identify SNPs that are in linkage disequilibrium with disease causing variants. Variants may be inherited together more fre- quently than would be expected from a random formation of haplotypes, therefore if a SNP is identified in an association study as being associated with a disease it does not necessarily mean that it is directly associated, it may be indirectly associated via another disease-causing variant that it is in linkage disequilibrium with.

Due to the nature of the test the larger the group of individuals used the higher the power of the study and the more likely it is that significantly associated markers will be found; however, there are some factors that must be considered when using large groups.

10 1.2 Identification of Causal Genetic Variants

Firstly, it is important in principle that the population does not contain any substructure, as this will reveal differences between a single minority population and the remaining indi- viduals rather than differences that are actually associated with the trait under investigation (Pearson & Manolio, 2008). Given the huge sample sizes required for these studies it is usu- ally not feasible to avoid this issue, so it needs to be corrected for instead (for a discussion of substructure and how it can be combated see Section 1.2.5.2).

Heterogeneity is a problem for association analysis. It may exist at a genetic level where there are multiple different genetic influences on a phenotype, within the phenotype itself, or in the diagnosis of this phenotype. This heterogeneity may result in the masking of an effect that is real. Association analysis is ideal for identifying common variants (regardless of the effect size) or rarer variants as long as they are of major effect. Rare variants with a minor effect are undetectable beneath the noise that is inherent to such a study (Pearson & Manolio, 2008).

Genome wide association scans (GWAS) have become possible in more recent years with the completion of the Project and the advent of SNP Chips allowing for a more comprehensive coverage and understanding of the human genome (Huang et al., 2009). This robust method of identifying putative causal variants, allows for a quick and simple method of candidate gene identification and is well suited to complex disease states such as mental illness as it is able to pick out numerous regions simultaneously, provided that the study has adequate power. This means that a GWAS is more effective in a case control cohort than in a family based analysis as the cohorts recruited for case control studies are usually far larger and thus have more power.

1.2.3 Sequencing Analysis

Sequencing of the genomes or exomes of individuals can provide more information about variants that are contributing to a disease phenotype than the micro-satellite and tag SNP data that has historically been used for linkage and association analyses. While sequencing of individual genes has been a readily available technology since the 1970’s, particularly with the development of Sanger sequencing (Sanger et al., 1977), these methods were not particularly amenable to the sequencing of whole genomes. Though shotgun sequencing technology was used to complete the human genome sequence in 2001 (Venter et al., 2001) these early sequencing methods were too time consuming and expensive to generate genome or even exome data for multiple individuals let alone for disease cohorts.

11 1. BACKGROUND AND INTRODUCTION

It is with the development of next generation sequencing that the idea of sequencing hun- dreds or even thousands of individuals became a real possibility. Next generation (or high throughput) sequencing provides the ability to increase output by producing thousands of sequences in parallel, thus reducing the time and cost of sequencing an entire genome. The ability to read the genome of an individual exactly as it is means that any variation in the genome sequence of a patient can be identified and evaluated as contributing to a dis- ease.

The genetic contribution to complex disorders can now be interrogated at a much higher level, with the most powerful advantage of using deep sequencing methods being the abil- ity to look for rare or even de novo variation. Rare variants detectable by deep sequencing include single nucleotide variations (SNVs), structural variations (SVs) such as transloca- tions and inversions and copy number variation (CNVs) such as duplications and deletions of any size. A number of these rare variants may act together to produce a specific disease phenotype.

Currently analysis is usually limited to relatively small cohorts (when compared to GWAS meta and mega studies) and alignment of each individuals sequence data to a reference se- quence is used to reconstruct their genome. Alternatively, with long read data, more recent approaches have used de novo assembly of individual genomes to produce more accurate representations of structural variants (Eid et al., 2009). The comparison of an individuals genome to a reference genome will provide a list of differences that can be followed up. There are various programmes available that will annotate and prioritize these differences including VariantMaster (Santoni et al., 2014) and ANNOVAR (Wang et al., 2010a) among hundreds of others.

In the future it may be possible to apply association analysis methods to large cohorts of exome or genome sequenced cases and controls. The cost of doing this is still quite high and therefore out of reach for most researchers; however, the cost of genome sequencing is still reducing making this more and more viable every year. One way to reduce the cost of such an analysis is the development of control ’banks’ that could be used by numerous researchers across multiple disorders. There would need to be some consideration given to the differences between sequencing platforms that may confound results and an ability to match these controls to a set of cases, but there are certainly methods emerging that will combat such issues (Derkach et al., 2014). Next generation re-sequencing of large numbers of individuals is also useful when looking at regions that have been previously implicated in the disease by linkage or association analysis in order to pinpoint specific variants.

12 1.2 Identification of Causal Genetic Variants

1.2.4 Analysis of Genetic Interaction

It is well understood that a complex disease or trait is contributed to by a large number of variants across the genome in a number of genes, each of which has a very small effect in and of itself, but that the combined additive effect of all these polymorphisms is sufficient to cause the trait to manifest phenotypically. It is this theory which underpins the success of GWAS studies in recent years. That is, that each causal variant exhibits an additive, indepen- dent effect on the trait, and each therefore can be investigated independently in such studies. Intuitively this is not how a biological system works, however, each variant and gene does not act in isolation of all others, and in fact it is likely that the effects of one variant on a given trait are dependent on the presence or absence of one or several others in other regions of the genome (Wei et al., 2014). This genetic interaction, which may be required to enable a given variant to contribute to a disease, is known as epistasis.

Broadly, epistasis can be defined in two different ways. Functional epistasis describes the physical, molecular interactions that proteins and genes have with each other, generally when acting in a functional pathway. Clearly if given variants disrupted such an interac- tion this may contribute to disease. The second definition is statistical epistasis, which can be defined as the additive effect of two (or more) given variables beyond what would be ex- pected if each variable caused an effect on the trait in question independently (Phillips, 2008). With the increasing availability of large GWAS datasets it is now possible to investigate these statistical epistatic interactions in silico. A number of techniques have been developed in an attempt to investigate this phenomenon further.

The traditional, and most common method involves the use of statistical regression tech- niques to assess SNP interactions. In particular the logistic regression of a log likelihood ratio can be used to investigate whether there is a statistically significant increase in the number of affected individuals, relative to controls, with two given alleles at two different loci beyond what would be expected by chance (Cordell, 2009). A similar principle can be applied when using the so-called haplotype-based method. If two alleles are interacting to increase the risk of disease the presence of these alleles will be significantly higher in cases than in controls; allowing for putative interactions to be tested in essentially the same manner as single vari- ants in a case-control study. Additionally, this methodology is less computationally intensive than logistic regression as there are fewer statistical degrees of freedom because the variants are considered together (Ueki & Cordell, 2012). Generally speaking, however, the haplotype-

13 1. BACKGROUND AND INTRODUCTION method is at greater risk of generating false-positive results as it is more susceptible to the effects of LD.

A significant issue in recent years in epistatic analysis involves the sheer enormity of the data, and the computational power required to handle it efficiently. Several studies have attempted to investigate epistasis at the genome-wide level with variable success (Wan et al., 2010; Lippert et al., 2013), and the program SIXPAC has been developed in an attempt to make this process more computationally manageable (Prabhu & Pe’er, 2012), although this remains a major practical problem. It should also be noted that such programs have only come online recently. In addition to the computational issues, genome-wide epistasis stud- ies suffer from the stringent corrections required to adequately account for multiple test- ing. Many more tests are being performed in a genome-wide epistasis study than would be performed in a regular GWAS with the same number of variants, as each SNP has many potential interactors. The potential for random false-positives is high, and so only highly significant findings are likely to be detected with any degree of statistical reliability. To cir- cumvent both these issues, it has become common practice to perform so-called hypothesis- driven epistasis studies. Various candidate loci are investigated in a targeted fashion, either because they are in a gene (or genes) of functional interest, or they have been previously identified in a GWAS study, or both. This study design has similar issues to any candidate gene study when compared with genome-wide, in that vast quantities of potentially infor- mative data are not considered, which is particularly relevant when investigating epistatic interactions.

Although the study of epistasis has the potential to identify many more variants and genes contributing to complex disease, it is not without its pitfalls. As alluded to above, the sta- tistical power to detect these changes is theoretically lower than independent variables in a GWAS of the same size, given the issues of increased multiple testing. There are other prob- lems as well. A traditional GWAS will often not identify the true causal SNP, but rather an- other variant in high LD. A reduction in the LD between a given causal SNP and the observed SNP causes a proportional reduction in the level of statistical significance and therefore the ability to detect the association. This problem is magnified in epistasis, as an interaction re- quires two observed SNPs to be in high LD with the relevant causal variants. The solution is of course to have denser genotyping, but this in turn will increase the multiple testing thresholds (Phillips, 2008). Another issue arises if the two variants being investigated are in high LD with each other. This situation will result in two given alleles being inherited

14 1.2 Identification of Causal Genetic Variants together more often than by chance, and is essentially indistinguishable from an epistatic effect, at least in a statistical study (Wei et al., 2014).

Despite these issues a large volume of studies investigating epistasis have been conducted, with many finding statistical evidence for epistatic effects, providing reasonable empirical evidence for the existence of this phenomenon and the validity of its continued study. In an attempt to assess the validity of statistical epistasis analysis, Combarros et al. (2009), at- tempted to meta-analyse data from over 100 studies investigating the process in Alzheimers disease. They found that 27 different pairs of SNPs exhibited epistatic associations with the disease. Findings in other diseases have also been successfully replicated (Evans et al., 2011), although many promising findings have failed to replicate (Wei et al., 2014).

There does seem to be sufficient evidence to warrant the on-going study of epistasis in large cohorts with complex disease, but it requires careful control to do so. Targeted strategies are likely to be more successful than genome-wide approaches due to the lesser requirements of multiple testing, however technical issues in the analysis need to be considered. Impor- tantly, findings require replication in independent samples to verify their validity, as many interactions so far studied have failed to do so.

1.2.5 Considerations of Analyses

When conducting association analysis of genetic data, several considerations need to be made and potentially corrected for. With such large datasets it is inevitable that the quality of certain data points will be unacceptable, and could impact the reliability of the results. These need to be controlled for. Additionally, when analysing large cohorts two major statistical phenomena become significant, with the potential to create spurious associations between SNPs and the disease state that may not actually exist. These are population stratification and the issue of multiple testing, and must be corrected for if the results derived from such studies can be interpreted in a meaningful way. These considerations are discussed further, below.

1.2.5.1 Quality Control

The ability of a large genetic association study to detect true genetic associations depends greatly upon the quality of the data that is input to the analysis. Data that have not been properly controlled and had inconsistencies and errors removed are at risk of producing

15 1. BACKGROUND AND INTRODUCTION false-negative and false-positive results. Quality control is therefore a fundamental aspect of any large association study. There are several aspects to the process.

Sample identity problems are common, and often are the result of sample handling er- rors. One straightforward way to find individuals who have been misidentified is to cross- reference each sample’s reported gender against their sex . Individuals re- ported as female who are consistently homozygous at SNPs located on the X chromosome are most likely male, and vice versa, and can be removed from further analysis to reduce this problem (Turner et al., 2011).

In an ideal scenario, a large genetic association study would contain individuals who were all equally unrelated to one another. Unfortunately the reality of collecting a large sample from within a given population is that some individuals are likely to be related. This so- called cryptic relatedness, if present at a significant level in the dataset, can increase errors if these individuals are considered to be independent of one another. Using programs such as PLINK (Purcell et al., 2007) it is possible to compute pairwise kinship estimates for every individual in the study, looking for similarities across around 100000 SNPs in the genome. Individuals showing cryptic relatedness have been shown to frequently be second or third cousins (Aulchenko et al., 2007), and the simplest way to reduce the error rate as a result of this is to simply remove one of the pair from further analysis. This technique is also useful for picking up duplicated samples, often the result of handling errors, and additionally can help to reduce the level of population stratification in the sample set.

Another issue that commonly arises in genomic analyses is poor genotyping efficiency, usu- ally as a result of poor quality DNA in a given individual, or poor quality reading of a given SNP in a particular batch of samples. A sample where more alleles have failed to be called than the average, is likely suggestive of poor quality DNA, which also means that those al- leles that have been called are at increased risk of being incorrect (Turner et al., 2011). PLINK can be used to calculate a missingness rate for each individual, defined as 1 minus the effi- ciency of calling. Individuals with a missingness rate of >0.2 are frequently removed from studies to avoid errors. Certain markers can also routinely fail across the whole sample set, and SNPs with a dataset-wide missingness rate of >0.2 should also be excluded.

SNPs can be filtered for minor allele frequency (MAF), as most studies will lack the power to detect significant associations involving very rare SNPs. A commonly used threshold is to remove SNPs with a MAF <0.01, as the vast majority of studies will be underpowered to detect significant result below this threshold (Gorlov et al., 2008). The advantage of removing

16 1.2 Identification of Causal Genetic Variants

SNPs like this is that it reduces the number of tests, and therefore the correction required for multiple testing. The risk of course is that removing such SNPs may result in the removal of a true disease-associated variant. SNPs can also be filtered if they violate Hardy-Weinberg Equilibrium (HWE). If a given SNP significantly deviates from HWE across the sample pop- ulation it can indicate a potential genotyping error. In the case group this may of course indicate an association, so it is important to base this control process on analysis of the con- trols, and aim to remove any SNP with allele frequencies that deviate from what would be expected under Hardy-Weinberg Equilibrium in the controls.

In the genotyping process samples are generally processed in batches, as a result of the lim- itations in the number of samples that can be fitted on a plate. There will be differences in the composition of individuals in each batch, and each plate may have different genotyping accuracies, resulting in batch effects, that may lead to spurious associations. The technique used to determine if such effects are present is to code each plate in turn as cases, and run it against all other plates as controls to see if there is a significant deviation from the expected genotypes. As each plate should have a random mix of cases and controls then the allele fre- quencies should be the same between the test plate and all the others. If a plate significantly deviates from this with a p-value < 1 x 104 then it should be excluded from further analysis (Turner et al., 2011).

1.2.5.2 Population Substructure

Population substructure can cause significant issues for association analysis, with the po- tential to cause false positive findings if not adequately corrected for. It is a term used to describe systemic ancestry differences between cases and controls. In any given inherited (genetic) disorder it makes intuitive sense that any large group of affected individuals (cases) are more likely to share common ancestors, when compared with a group from the same pop- ulation that are unaffected by the disorder (controls). Additionally, the larger the sample size the more likely this phenomenon is to occur - to the point where in large GWAS studies it is almost inevitable that there will be some underlying substructure (Price et al., 2006). In such a situation, it is likely that the case group will have a multitude of alleles at different loci that are more common amongst them than are in the control group - leading to spurious positive associations between these alleles and the disease-state. As it is not usually possible to avoid population substructure statistical techniques must be used to correct for it.

17 1. BACKGROUND AND INTRODUCTION

The initial step is to analyse the dataset for underlying substructure. The prevailing tech- nique used to do this is genomic control, which calculates the extent of genomic inflation in the sample beyond what would be expected if every individual was equally unrelated. The genomic inflation factor (usually denoted as l) is defined as the ratio of the median of the empirically observed distribution of the test statistic to the expected median (Price et al., 2010), thus quantifying the level of substructure in the population (and other confounding factors), which can be visualised with the use of Q-Q plots. If no substructure is present (l = 1.0) no corrections need to made, but a minimal amount of inflation is also reasonable to ignore, although there is some debate about how much exactly, with more conservative ar- guments placing the maximum acceptable inflation at 1.05, while others accept levels up to 1.1 (Tian et al., 2008; Yang et al., 2011a; Price et al., 2010).

If unacceptable levels of stratification exist within a sample set, then corrections need to be applied. If not already completed, quality control of the data should be applied as this may reduce the number of outlying individuals, and therefore reduce the level of genomic in- flation (see Section 1.2.5.1). Generally, however, substructure will need to be accounted for and corrected in the statistical analysis of the data. One common method is structured asso- ciation. This technique assigns each sample to a discrete subpopulation cluster and then attempts to accumulate evidence of association within each cluster (Liu et al., 2013), but can be computationally intensive in large datasets and so is generally not preferred. The most commonly used technique involves principle component analysis (PCA) and the use of eigenvectors. PCA is a statistical modelling method that in genetic analyses is used to infer continuous axes of genetic variation, thereby describing as much variability in a given dataset as possible, within two dimensions. This can be graphically represented to allow a quick interpretation of the substructure (or lack of) in a population. The data can be manip- ulated using the programme EIGENSTRAT (Price et al., 2006) which uses the top principle components as covariates in the association analysis to derive corrected p-values for each SNP analysed. These adjusted p-values can then be further processed for multiple testing as required, to determine if positive associations exist between SNPs and the disorder under investigation.

18 1.2 Identification of Causal Genetic Variants

1.2.5.3 Correction for Multiple Testing

The advent of these larger scale analyses means that significant correction for multiple test- ing is required as a consequence of testing literally millions of sites to identify regions of significance for the given disease state that is under investigation.

Probability dictates that if many loci are tested some will show significance purely by chance. In statistical analyses such as transmission disequilibrium tests or GWAS the significance of each independent test is observed by a calculated p-value, which is an empirical measure of this probability. More specifically it measures the likelihood that a deviation from the null hypothesis (the assumption that there is no difference between those with the disease and those without) has occurred by chance.

Determination of whether or not to reject the null hypothesis, and accept that there is a true difference between the cases and controls, requires comparison of the obtained p-value for each test to a pre-determined threshold. For a single test this threshold is often set at 0.05 due to the work of Fisher (1925). With a sufficiently powered study this threshold would indicate that 1 in 20 significant results would be purely by chance. Studies often do not have the power required to make this statement true however, and therefore the other 19 significant results will not necessarily be truly significant. For this reason p-values and power must be interpreted together when assessing the statistical significance of each individual result. The power of a study is determined by a combination of the magnitude of the effect and the sample size. Theoretically power can also be increased by reducing the significance threshold, but this increases the risk of type one errors. Additionally, as you do more tests more and more results will reach significance purely by chance–the problem of multiple testing. To combat this issue lower thresholds must be used to offset the number of tests being conducted.

Type one (false positive) and type two (false negative) errors can both occur if multiple test- ing is poorly accounted for. If no or insufficient correction is applied, false positive results will be encountered, conversely if there is over correction, results that truly are significant may be lost (false negatives). For this reason it is important to fully consider the available correction methods and to be aware if type one or type two errors are more likely with the method chosen.

To complicate matters further, multiple testing is not truly as simple as accounting for the number of loci being investigated. The analysis of SNPs is not this straight forward, as a set of SNPs may not all be independent of one another, therefore, simply using the number of

19 1. BACKGROUND AND INTRODUCTION

SNPs analysed as the number of tests undertaken is an over estimation of the actual num- ber of tests. This lack of independence between pairs or groups of SNPs is termed linkage disequilibrium and must be considered in correction for multiple testing as it can lead to an overly conservative correction and thus false negative results. One such overly conservative method is the Bonferroni correction which simply calculates an adjusted p-value threshold by taking the original threshold (a) and dividing it by the number of markers tested (k):

a a0 = k

There are other analytic techniques available that combat this increase in type two error the gold standard of which is permutation. An example of permutation is the SDMinP method (Obreiter et al., 2005). By randomising affection status over a series of permutations (for ex- ample 100 000) this method provides an empirical, experiment-wide level of significance, with reported p-values that are adjusted for inter-marker dependency (linkage disequilib- rium). Permutation methods, however, are computationally very intensive, more so with increasing numbers of markers, and therefore require both powerful computing capabilities and time.

To try and find a middle ground between the overly conservative Bonferroni and the time consuming and difficult permutation methods is a problem for many association analyses. There are variations on the Bonferroni model, known as family-wise error rate corrections, and like the Bonferroni model, they attempt to minimise the type one error albeit in a slightly less stringent manner. One specific example of this method is that of Holm (1979). This method uses a step-down procedure which accounts for the fact that after a test has been evaluated there is one less test to account for and so, the significance threshold is reduced in a step-wise fashion. This model can be shown in simple terms as follows:

a a a0 a0 ... (1)  k ! (2)  k 1

A further method, which reduces the type two error at the expense of type one, is the false discovery rate method (Benjamini & Hochberg, 1995). This uses a step-up procedure to give q-values (adjusted p-values) for each test, shown simply as:

a 2a q q ... (1)  k ! (2)  k

20 1.2 Identification of Causal Genetic Variants

Regardless of the efficacy of the multiple testing correction approach chosen it should be kept in mind, that the p-value only reports if there is an effect and should therefore be interpreted together with measures that show the magnitude and direction of the effect.

1.2.6 Comparison of Linkage and Association Analyses

Linkage and association analyses provide a good means to identify disease-causing variants at two ends of a spectrum. Family based linkage studies can adequately predict rare variants of major effect while association scans can identify common variants, even if they have a minor effect on the disease. The gap in this area of research falls between the limits of these methods, (identification of rare variants of moderate to minor effect) but the identification of such variants is becoming possible with advances in deep sequencing methods.

Rare variants are difficult to identify in association scans due to the variability that is natu- rally seen between individuals within a population. Where a contributory variant is present at a low frequency within a disease population it is often lost to the noise created by the benign differences between individuals. This is not necessarily an issue in the analysis of genome and exome sequence data, where extremely rare and even de novo mutations can be identified by means other than association analysis. The value of such sequencing is that multiple rare variants that are affecting the same gene might be identified, these variants may seem inconsequential individually, but when assessed together they may be present in a significant number of individuals. Though the use of these next generation sequencing methods is still prohibitively expensive in some cases it is rapidly evolving to become more accessible. The relationship between these methods is illustrated in Figure 1.1.

1.2.7 Using Linkage and Association Studies in Psychiatric Illness

Linkage studies have been used in the pursuit of causal variant identification in major psy- chiatric illness. In bipolar disorder linkage has been suggested across regions of at least 13 of the 23 chromosomes (Willour et al., 2003; Hayden & Nurnberger, 2006), with little evidence of replication from further studies. Of note is a meta-analysis of bipolar disorder across the en- tire genome that provides no support for linkage at any region (Segurado et al., 2003).

Linkage has also been used to identify regions of the genome that are thought to be involved in schizophrenia. In a review of schizophrenia genetics, Girard et al. (2012) identified that there have been at least 25 linkage studies with findings for schizophrenia, some of which

21 1. BACKGROUND AND INTRODUCTION

Figure 1.1: Rare Effects Model The relationship between effect size and commonality of a given variation, with the best method of detection illustrated where present. have been supported by further studies. Linkage studies have also found regions of the genome linked with a number of the clinical features of schizophrenia (Ryu et al., 2013).

Linkage results for bipolar disorder and schizophrenia must be interpreted with caution as psychiatric illness is, in almost all cases, due to an accumulation of many variants across the genome. Therefore it must be kept in mind that these are complex diseases, and al- though they are inherited to some degree, linkage is better suited to single gene disorders with simple Mendelian inheritance such as cystic fibrosis (Markianos et al., 2001; Abecasis et al., 2002).

Association analyses, both in a genome wide and candidate gene capacity, have been con- ducted extensively for both bipolar disorder and schizophrenia among other psychiatric ill- nesses. A simple literature search using HuGENavigator (available through http://www. hugenavigator.net/HuGENavigator/), sheds some light on the scope of the work accom- plished in this field.

Using the Gene Evidence function, a total of 915 genes are reported in bipolar disorder, 1543 in schizophrenia and 671 in major depression. Using the genome wide association studies of these illnesses identifies 580 loci in schizophrenia, 373 in bipolar disorder and 122 in major

22 1.2 Identification of Causal Genetic Variants depression (as at October 2014). These studies span many ethnic groups from broad cohorts of Caucasians or Asians to far more specific single country cohorts, they vary in study design from family based to case-control studies and they vary in cohort size, from only a score of individuals to many thousands. As would be expected, given the large amount of research, there are many reviews on the topic including Girard et al. (2012), Craddock & Sklar (2013), Mowry & Gratten (2013), and Cohen-Woods et al. (2013) to name a few of the more recent ones.

It becomes clear, when assessing the results from these studies, that the majority of them are unconfirmed or vary in effect between cohorts. There are, however, recurring themes in the literature, where certain genes (albeit different variations within them) are found across a wide number of cohorts, through various methods of analysis, to be associated with vari- ous psychiatric disorders. Some of the most compelling gene associations with both schizo- phrenia and bipolar disorder (based on the HuGENavigator database) are BDNF, NRG1, COMT, SLC6A4 , DRD2 and DISC1 each with more than 100 entries, for genetic association alone, across both disorders.

There is evidence showing that individuals who are affected by major psychiatric illness have lower fertility than the general population (Battaglia & Bellodi, 1996; Laursen & Munk-Olsen, 2010), which is not necessarily due to a physical inability to reproduce. It is also known that the incidence of major psychiatric illnesses in the population is remaining constant. If these two facts are considered together with the knowledge that there is a heritable component to these disorders, then an explanation to counter the negative selection is necessary. The best explanation for this is that a relatively high rate of de novo mutation contributes to these illnesses. If this is considered a reasonable assumption, then it follows that rare variation will be widespread in these disorders due to the nature of de novo mutation and they will be clustered in genes that contribute to these disorders.

The common variants identified by linkage and association analyses seem to contribute low individual relative risk in the heritability of these disorders, though collectively they do con- tribute at least a quarter of the overall heritability (Lee et al., 2012). However, evidence is emerging that there is an over-representation of rare variants including CNVs in major psychiatric illness that explain a reasonable proportion of the heritability of these disorders (International Schizophrenia Consortium, 2008; Stefansson et al., 2008; Green et al., 2015). Al- though these variants are rare, individually they seem to explain more of the heritability of the disorders than do individual common variants. Bodmer & Bonilla (2008) show CNVs with frequencies around 0.001% can account for relative risks upwards of 2.7.

23 1. BACKGROUND AND INTRODUCTION

Rare and de novo variation (especially single nucleotide variations) frequency has been found to be higher in individuals with psychiatric illness (Awadalla et al., 2010), this information can be used to identify candidate genes by looking for genes (or loci) that have a high burden of these variants. The variants themselves may be very rare, even discrete given their de novo nature, but certain genes may stand out as containing these variants more commonly in cases than in controls (Purcell et al., 2014).

In recent years, analysis of the literature has turned to convergent functional genomics in an attempt to identify the most recurrent genetic themes in the area, as predicted by association studies as well as functional analyses. This can be a useful method in determining the top candidates for a given disease, though it must be noted that the convergent functional ge- nomics is only as good as the original published literature allows for, that is to say that the lack of publications that find no association with a particular gene bias these results.

Ayalew et al. (2012) conducted a comprehensive convergent functional analysis for genetic risk in schizophrenia, further to analyses in bipolar disorder conducted by the same group (Le-Niculescu et al., 2009), finding that many of the genes discussed above appear to stand out as top candidates including BDNF, NRG1, COMT, and DISC1.

1.2.8 Functional Analysis

If a locus of interest has been identified for a certain disorder the next step is to assess the function of that locus. The locus may be within a gene and may affect the functioning of the protein product directly, or it may be within an enhancer or other regulatory sequence that affects the function of the protein in a less direct manner. If it can be determined which (if any) protein is affected by the identified variant some determination of its relevance can be made based on the fit with the biology of the disorder. This task can be rather daunting if no function for the identified protein has been established or if there are further unidentified functions of that protein.

There are numerous methods that can be employed to help determine the function of a locus, gene or protein. These range from in silico analysis of the sequence at an identified locus to in vitro or in vivo functional analysis of a protein. A number of methods employed to assess functional relevance of a locus are outlined in the following sections.

24 1.2 Identification of Causal Genetic Variants

1.2.8.1 Identifying a Gene

Once a locus of interest has been identified through linkage or association analysis the lo- cus needs to be further investigated to identify the likely causal gene. A search of online databases will now afford information regarding whether the locus is intronic, intergenic, coding, non-coding or regulatory, and whether it is within a gene, an untranslated region or which genes are up and down-stream from it. This information is a useful initial guide, although the variant identified in such a study (particularly SNPs in association analyses) will almost certainly not be the causal variant, and instead is likely reporting on the actual associated locus via linkage disequilibrium. In a typical association analysis linkage dise- quilibrium will frequently encompass more than one gene, and therefore the gene cannot necessarily be inferred from the position of the associated marker.

There are two techniques to refine the list of candidate genes that arise from such an asso- ciation, and potentially isolate the likely causal locus. Different ethnic groups inherit dif- ferent haploblocks, and therefore have different linkage disequilibrium patterns. Repeating the association analysis in a different ethnic population can produce an overlapping but not completely replicated candidate region, effectively excluding genes not included in the over- lapping section. The second approach involves fine mapping of all potential candidate loci within an associated region using next-generation sequencing. The causal gene within a given region is likely to contain a higher burden of rare variants within the cases relative to the controls, whereas the other candidates that are in LD will not.

Once a specific gene has been identified for further investigation the known functions can be easily obtained online and hypotheses surrounding biological plausibility can be generated. The function of all genes and their protein products are not yet known however; and even for genes and proteins where functions are annotated, there may remain further unidentified functions. In these cases it is necessary to implement further analysis of the identified gene in order to gain an understanding of whether or not it is functionally relevant.

1.2.8.2 Testing the Function of a Gene

If the function of a gene is unknown a good starting point is to assess if the gene is conserved in anyway throughout evolution. A similar protein product may have been annotated in a such as Drosophila melanogaster or and may provide a

25 1. BACKGROUND AND INTRODUCTION starting point for future analysis. However, this is not always the case and as such experi- mental techniques must be used to assess putative functions.

One such technique is yeast two-hybrid analysis which predicts the function of a given pro- tein based on the known functions of proteins with which it interacts. The yeast two-hybrid method will not be discussed here as it is described in detail in Chapter 4. Once a list of inter- acting proteins has been identified the known functions of any or all of these can be analysed to find common themes or functional enrichment. This information can then provide a basis for the formation of a particular hypothesis or hypotheses to be tested.

It is important to consider that even if a gene appears to fit the biological mechanism of a certain disorder it may not actually be involved as expected, so for this reason any identified function should be tested experimentally. Testing can be done in an in vitro or in vivo manner, with the latter often occurring in model organisms such as mice or zebrafish.

Some insight into the function of a gene may be established by an understanding of the temporal and spatial expression of that gene. This knowledge is often key to deciding the functional relevance of a particular gene with a disease state. This can be achieved in vitro through introduction of the gene of interest artificially fused to a reporter gene (often a fluo- rescent tag). Expression of the gene under investigation can then be monitored for temporal and spatial expression in different cell types. However this method does have limitations in that it is by design artificial, and as such there may be issues with over-expression of the con- struct which forces the gene product into cellular compartment in which it is not naturally expressed.

The use of antibodies or probes designed to target the gene of interest are a more natural method of analysis. Once the antibody or probe is bound to the target gene it can then be detected in a manner that allows visualisation of the location and timing of expression. This can be conducted in vitro using immunohistochemistry (Coons et al., 1941) or in situ using in situ hybridisation (Gall & Pardue, 1969).

Reporter gene methods can also be used to assess regulatory elements and what effect (if any) identified variations have on the expression of their target genes. If a reporter gene is placed under the control of the identified regulatory element, expression changes can be monitored in response to different variations in an enhancer or promoter.

The function of a gene and its protein product may also be identified or confirmed by assess- ing the effects of its down (or up) regulation. This analysis is commonly done by assessing

26 1.2 Identification of Causal Genetic Variants the effects of knocking down or knocking out a gene either in a cell line or in a model organ- ism, which disrupts the gene so that it is left with reduced or no function respectively. The disruption of a gene can be permanent or transient. Permanent disruption is achieved by altering the gene itself via some form of site-specific mutagenesis, while transient disruption can be achieved by the addition of an RNA or DNA fragment that will interfere with the processing of the gene.

RNA interference, or RNAi, is a method by which genes can be interrupted in a specific manner, it was first described in Caenorhabditis elegans (Fire et al., 1998), but a specific kind of RNAi can now be applied to mammals. Short interfering , or siRNAs, have been shown to knockdown gene expression in human and mouse cell lines (Caplen et al., 2001; Elbashir et al., 2001) by specifically degrading a target mRNA of the gene under investiga- tion. More than one gene can be knocked out in a single organism simultaneously using this method (Mortensen, 1993). Another method of gene knock out is Morpholino treatment. Morpholino’s act by steric hindrance rather than degradation of the RNA, resulting in the blocking of translation of the protein (Summerton & Weller, 1997).

A gene knock out can be expressed in whole organisms or within specific tissues and also the expression of the gene can be knocked out at any time during development to assess the function of the gene at different developmental stages (Kuhn et al., 1995). Once the gene has been knocked out the phenotype of the altered organism (or cell line) can be compared to that of a wild type counterpart to infer the function of the gene and determine if it is a reasonable candidate in the disease being studied.

Over-expression of genes, particularly at abnormal times during development or in the wrong tissue type can also provide information about the function of the gene. This can be achieved through alterations to the regulatory sequences (promoters) of the gene or by insertion of multiple copies of the gene into the genome to increase expression levels.

Transgenic models such as mouse models can be engineered to be either homozygous or heterozygous for a particular gene deletion or mutation. Models like this are often created using homologous recombination techniques in embryonic stem cells, this method is able to produce mice that are heterozygous for the knockout in specific tissue types. If a mouse contains the transgene in germ line cells it can be used to produce a line of mice that a completely heterozygous for the transgene. These mice in turn can be mated to produce a homozygous null (or mutant) model (Alberts et al., 2002).

27 1. BACKGROUND AND INTRODUCTION

In some cases the complete knock out of a gene is lethal, giving no information about the specific function of the gene other than that is necessary for life. In other cases the knock out of a gene shows no phenotypic change due to redundancy (the same function is carried out by more than one gene). In the case of lethal knockouts mutant copies of the gene can be tested rather than complete knockout, this is especially of interest if an identified variation to the gene exists in patients.

1.2.9 Summary

Although there are a number of tools available for the investigation into genetic risk variants for complex diseases, they all have certain limitations due to the nature of such diseases. Currently the best method is to combine various analyses in order to extract the most infor- mation possible from the available data. Where large pedigrees are available linkage proves to be the most informative option; however, due to the vast numbers of unrelated individ- uals with complex diseases such as major psychiatric illness the favoured starting point is currently association analysis. Investigation into the function of associated genes is becom- ing easier as more genes are annotated in gene databases, therefore, in the future, there is more hope for determining mechanisms and pathways that are affected in the disorders, enhancing our understanding of the underlying biology.

28 1.3 Disrupted in Schizophrenia 1 (DISC1)

1.3 Disrupted in Schizophrenia 1 (DISC1)

The DISC1 gene has been identified as a candidate gene in major psychiatric illness through both linkage and association studies (as outlined below). In particular the gene seems to be involved in predisposition to schizophrenia and bipolar disorder. The identification of this gene and the subsequent research into it are outlined in this section.

1.3.1 The Discovery of DISC1

The discovery of DISC1 came after St Clair et al. (1990) began to investigate the cytogenetic database for cases of psychiatric illness. A large Scottish family was identified from a cy- togenetic study conducted on boys in Scottish borstals in 1970 (Jacobs et al., 1970). With further investigation, St Clair and colleagues showed a regular, Mendelian autosomal dom- inant inheritance of major psychiatric disorders including schizophrenia, major depression and bipolar disorder in this family. Major psychiatric illness was found to be segregating with a (1;11)(q42.1;q14.3) translocation.

The mechanism by which this translocation caused the phenotypes observed could have been due to a number of genetic effects; however, the most likely explanation was that the translocation was disrupting a gene at one or other of the breakpoints. This idea was not followed up on until ten years after the description of the apparent inheritance pattern of this translocation by St Clair et al. (1990).

Study of the translocation by Millar et al. (2000) found that while the region surrounding the breakpoint on chromosome eleven was largely absent of genes, the area on chromosome one adjacent to the breakpoint was gene rich. There were two genes that were directly disrupted by the translocation and these became known as Disrupted in Schizophrenia one and two (DISC1 and DISC2, OMIM accession numbers 605210 and 606271 respectively) (Millar et al., 2000). It was later identified that DISC2 was antiparallel and antisense to DISC1. DISC2 was shown to be a 15kb RNA-only gene (Taylor et al., 2003), leaving DISC1 as the only protein coding gene disrupted by the translocation.

Although this pattern of inheritance is unusual in such a complex disease, it was confirmed by genetic testing that the disorders co-segregated with the balanced (1;11)(q42.1;q14.3) trans-

29 1. BACKGROUND AND INTRODUCTION location, with a maximum LOD score of 7.1 for psychiatric illness or of 3.6 for schizophrenia alone (Blackwood et al., 2001; Gudbjartsson et al., 2005). Analysis of the translocation has shown that when inherited there is about a fifty-fold increase in risk of developing schizo- phrenia, bipolar disorder or recurrent major depression (Blackwood et al., 2001). These scores suggested that the translocation was strongly linked to the disorders and deserved further investigation.

1.3.2 The Biology of DISC1

As described by Millar et al. (2000) the basic DISC1 gene has a sequence length of 415kb, which encodes a total 7.5kb, thirteen exon transcript. The 854 amino acid protein encoded by DISC1 is made up of an N-terminal head domain and a long helical C-terminal tail domain (Taylor et al., 2003). The gene has a large number of isoforms, which range greatly in size and composition. DISC1 is expressed in a variety of tissues within the human body.

1.3.2.1 Isoforms of DISC1

There are currently a total of twenty-three isoforms of DISC1 detailed on the NCBI Gene database, but a further twenty-four have been identified (Nakata et al., 2009) giving a total of forty-seven. The most commonly expressed transcript in the DISC1 gene is known as the long (L) isoform and consists of thirteen exons (Millar et al., 2000). It is this isoform that has been the subject of the majority of the investigations into DISC1 function. The other main alternative splice variants (shown and described, with the L isoform, in Figure 1.2) are known as the long variant (Lv), short (S) and extremely short (ES) isoforms (Taylor et al., 2003). The isoforms of DISC1 are of particular interest in the study of inherited psychiatric illness because their expression has previously been detected in human fetal brain tissue, indicating a potential role in early neurological development (Chubb et al., 2008).

1.3.2.2 Expression Patterns

DISC1 is an ubiquitous protein with expression observed in the brain, heart, placenta, kid- neys and pancreas of adult humans using Northern blot analysis (Millar et al., 2000). Tissues of the brain, heart and placenta show the highest levels of expression. Studies with mice and

30 1.3 Disrupted in Schizophrenia 1 (DISC1)

Figure 1.2: DISC1 Main Isoforms. A) The reference structure of DISC1 showing all identified exons across the various isoforms. The main exons are shown in black while less common exons are shown in blue. B) The four main isoforms of DISC1. From top to bottom: the Long (L) isoform–the most common isoform giving the reference structure, the Long Variant (Lv) isoform– contains a variant form of exon 11 (shown in red), the Short (S) isoform–truncates at exon 9a and the Extra Short (Es) isoform–truncates at exon 3. rats have shown similar results. The slightly shorter 851 amino acid protein has a 56% iden- tity with its human counterpart and shows additional expression in liver and testis (Ma et al., 2002). Similarly in the rodent studies, heart was a tissue with high expression. Although ex- pression is present in a number of tissues, particularly the heart, research has focused on the expression within the brain.

Studies have shown that the DISC1 protein is highly expressed in the brain during devel- opment, both during embryogenesis and the post-natal period (Ozeki et al., 2003; Schurov et al., 2004; Chubb et al., 2008; Mao et al., 2009). Areas of expression within the adult brain are broad and include the , cortex, hypothalamus, brain stem and the cerebellum. This can be compared to the developing brain where expression is seen with a very distinct profile in the developing cortex, the hypothalamus, pons, olfactory area, midbrain and lat- eral ventricles at different times during development (Schurov et al., 2004). This group also identified that the expression of DISC1 protein in the brain is limited to neurons and is not seen in the glia, shown by co-localisation with the neuronal marker MAP2 but not with an astroglial marker.

31 1. BACKGROUND AND INTRODUCTION

Studies suggest that not only is Disc1 mRNA expressed in mouse brain but it is also ex- pressed at its highest levels during critical brain development periods. Austin et al. (2004) found that Disc1 is expressed in the hippocampus of mice throughout hippocampal devel- opment (embryonic stages through to adulthood), expression in other areas of the brain in- cluding the hypothalamus and cerebral neocortex was also seen in adult mice. Interestingly the group identified regions where Disc1 mRNA is found during development of the brain but is absent in adult mice; these regions are parts of the stria terminalis, the reticular tha- lamic nucleus and the reuniens thalamic nucleus. From this, a conclusion was drawn that Disc1 dysfunction may have effects on the development and function of the hippocampus. Similarly, enhanced levels of Disc1 mRNA have been identified in the developing brain of rats and confirmation of translation to protein showed that active Disc1 was present during brain development (Miyoshi et al., 2003).

Chubb and colleagues (2008) identified that the expression of DISC1 mRNA in the brain of humans is more prominent in the periods leading up to adulthood (that is the prenatal, neonatal and pubertal periods), than in adulthood itself. Additionally, through mouse stud- ies, they identify that the location of expression within the brain alters over time. Conclu- sions can be drawn from these studies that the combination of spatial and temporal changes in expression of DISC1 indicate a role for the gene in development of the brain and that alterations to the timing or position of the expression may interfere with vital interactions between protein products of DISC1 and other associated proteins.

The fact that Disc1 expression is present in the brain, and specifically the developing brain of both rats and mice provides some basis for the hypothesis that DISC1 could be involved in psychiatric illness in humans.

1.3.2.3 The Function of DISC1

There is currently no knockout of Disc1 available from the International Mouse Phenotyp- ing Consortium (http://www.mousephenotype.org/data/genes/MGI:2447658). The gener- ation of a knockout mouse for the Disc1 protein has proven difficult (Jaaro-Peled, 2009), though there are a number of independent studies that have assessed the function of the DISC1 gene by mutagenesis and dominant negative model systems.

Li et al. (2007) showed that a transgenic mouse expressing a reversible and inducible form of mutant DISC1 created schizophrenia-like phenotypes when expressed during early post- natal development but not when expressed during adulthood. The mouse model used was

32 1.3 Disrupted in Schizophrenia 1 (DISC1) engineered to express a c-terminal fragment of DISC1 containing only amino acids 671-852 of the full length protein. The system was induced by tamoxifen and when the inducer was not available the mutant DISC1 was rapidly degraded, meaning that expression could be switched on and off at specific points during development. The mutant DISC1 had a dominant-negative effect on the full length protein and the resulting phenotypes identified were consistent with schizophrenia both in gross brain morphology and in behavioural phe- notypes. These results suggest that DISC1 expression is important during brain development and that aberrations to expression may lead to schizophrenia.

Evidence exists to suggest mice carrying mis-sense mutations in Disc1 exhibit behavioural effects similar to mood disorders, and that this phenotype can be rescued by antidepres- sant treatment. Clapcote et al. (2007) induced mis-sense mutations via ENU mutagenesis, two different mutations (Q31L and L110P) were assessed in homozygous, heterozygous and compound heterozygous form, with results differing between genotypes. The 31L geno- types showed a stronger phenotype for depression-like behaviours and the 100L genotypes being more schizophrenia-like, with the 100L mutation appearing to be dominant over the 31L.

The function of Disc1 in the mouse model has been consistent with the conclusions drawn from functional analyses conducted in humans, suggesting that the mouse is a reasonable model for the investigation of DISC1 in major psychiatric illness. Down regulated Disc1 ex- pression in mice in utero, via introduction of a carboxy-terminal-truncated mutant, made to mimic the truncated DISC1 protein created by the t(1:11)(q42.2;q14.3) translocation, showed that the DISC1 protein is a component of the motor complex, and dysfunctional DISC1 causes improper development of the cerebral cortex by impairing neurite outgrowth (Kamiya et al., 2005). Suppression of wild type DISC1 via RNAi mediated knock down, showed similar effects to over-expression of the truncated DISC1. Additionally, it was iden- tified that the expression of truncated DISC1 alters the localisation of the wild type protein, moving it from a perinuclear location to a more ubiquitous cytoplasmic distribution. These results suggest that the truncated DISC1 acts in a dominant negative fashion over the wild type protein, an interesting model considering that the translocation found in the Scottish family is heterozygous in all cases.

The function of the truncated form of the protein has been further assessed in mice, and evidence from these studies suggest that disruption of the Disc1 transcript results in re- duced synaptic activity (Kvajo et al., 2008). This truncated protein was shown to induce schizophrenia-like behaviour again through a dominant-negative mechanism. The protein

33 1. BACKGROUND AND INTRODUCTION still interacts with many of the same substrates as the full-length version, but is no longer completely functional, and it is therefore proposed that this truncated protein is competi- tively inhibiting interactions with the non-truncated version (Hikida et al., 2007).

Experiments have shown that knockdown of Disc1 using electroporation to introduce shRNA directly into the developing hippocampus in utero resulted in problems with the migration of granule cells in the (Meyer & Morris, 2009). Transient knockdown of Disc1 by Niwa et al. (2010) also by shRNA transfer in utero but this time into the prefrontal cortex, showed defects in proliferation and migration of neurons which could be rescued by ex- pression of wild-type protein. The transient disturbance of gene function at a prenatal time period led to abnormalities later in life which could be consistent with schizophrenia.

Work in zebrafish has been conducted to assess DISC1 function also. A study that knocks down the disc1 orthologue in zebrafish by morpholino methods found that the gene was necessary in the development of oligodendrocytes from olig2 precursor cells, in the hind brain as well as other brain region (Wood et al., 2009). Interestingly, the group found similar results with another schizophrenia candidate gene NRG1, so they conclude that the two genes may work together in the neurodevelopment.

A role at the was established for the DISC1 protein through the study of the neocor- tex by electron microscopy, showing that the protein was present at asymmetric in humans with a bias for the postsynaptic region (Kirkpatrick et al., 2006). More recently other groups have found that protein products of DISC1 are present at the synapse, but the mech- anism by which it acts is still not well defined. Camargo et al. (2007) suggest that the role for DISC1 is in regulating plasticity and post synaptic density, based on the results of a yeast two-hybrid analysis. Other studies have investigated the relationship of the DISC1 protein with other proteins and their roles at the synapse, such as Kal-7 and Rac1 (Hayashi-Takagi et al., 2010), and NDE1 and PDE4B (Bradshaw et al., 2008).

The findings that DISC1 is expressed in neurons is consistent with previous studies that identify neuronal proteins such as NDEL1 and FEZ1 as interacting with DISC1, implicating DISC1 in pathways of neuronal migration (Miyoshi et al., 2003; Ozeki et al., 2003; Kamiya et al., 2006). Investigation of the roles of DISC1 in adult hippocampal neurogenesis (Duan et al., 2007) found that the protein controls neuronal migration by associating with NDEL1, and subsequently controlling the rate of integration of new neurons into the brain. Disrup- tion of DISC1 using RNAi resulted in a reduced proliferation of progenitor cells, which in

34 1.3 Disrupted in Schizophrenia 1 (DISC1) turn led to premature neuronal differentiation. The over-expression of the DISC1 protein showed the opposite effect (Mao et al., 2009).

The roles in progenitor cell proliferation, as well as neuronal migration and differentiation are possibly due to the involvement of DISC1 in network and cas- cades, an interaction pathway that was established through research into the DISC1/NDEL1 protein interaction (Ozeki et al., 2003). Since this time a number of other microtubule and centrosomal proteins have been found to interact with the DISC1 protein (Brandon et al., 2004; Millar et al., 2005; Kamiya et al., 2008). Some of these interacting proteins have also been identified as schizophrenia risk factors (Gurling et al., 2006; Datta et al., 2010). The in- teraction between the DISC1 and FEZ1 proteins has been shown to be involved in neurite outgrowth (Miyoshi et al., 2003) and dendritic growth in newly developing neurons (Kang et al., 2011), functions that complement the DISC1-NDEL1 protein interactions.

A number of additional functions of DISC1 have been identified including cAMP signalling via the PDE4 family (Millar et al., 2005; Murdoch et al., 2007), and microtubule network sta- bilisation (Morris et al., 2003). A number of reviews also detail the many identified functions of DISC1 (Chubb et al., 2008; Brandon, 2007). The multiple roles of DISC1 already identified suggest that it is a pleiotropic protein involved in a number of different protein interactions and pathways, not all of which have yet been identified.

1.3.3 DISC1 in Psychiatric Illness

The Scottish family that was pivotal in the discovery of DISC1 were subjects of a larger result- ing study. A total of sixty-seven individuals from the family were followed up (see Figure 1.3) of which twenty-nine carried the translocation and of these carriers eighteen were diag- nosed with some kind of major psychiatric illness (Blackwood et al., 2001). Of the thirty-eight non-carriers there were no cases of diagnosable disorders.

Although around 30% of carriers had not been diagnosed with a psychiatric disorder of any kind these individuals had results that were indistinguishable from those with a psychiatric diagnosis in an event related potential test. Event related potential is a means to measure a brain’s response to a thought or perception. These results show that all translocation carri- ers have an event-related potential test result that is consistent with their relatives who are affected with schizophrenia, and therefore may have similar neurological changes and per- haps the capacity to develop these disorders themselves (Blackwood et al., 2001). The results

35 1. BACKGROUND AND INTRODUCTION

Figure 1.3: Pedigree of the Scottish Family Carrying the DISC1 Translocation. This pedigree has been adapted from Blackwood et al. (2001) and St Clair et al. (1990). Individuals affected by various psychiatric disorders and those that carry the (1;11)(q42.1;q14.3) translocation, are highlighted, as indicated by the key. of analyses carried out in this family are indicative that the disruption of the DISC1 gene is likely to be the cause of the major psychiatric illness seen in this family, leaving DISC1 as a very interesting candidate in the field of psychiatric genetics.

1.3.3.1 Existing Genetic Evidence for DISC1

A number of independent studies have investigated the role of DISC1 variation in major psychiatric disorders. The studies range over a number of cohorts of varying ethnicities, and have defined several different variants and haplotypes within the gene region that seem to be associated. Together, these studies have found several regions of interest across the gene, suggesting that there may be a number of common variants that have a contributing effect to the development of major psychiatric illness as is expected in complex disease ge- netics. While there are a number of studies that have been published suggesting evidence for linkage only three of these actually reach the level required for significance in a multi-point analysis. Two of these three are of the original Scottish family. Details of several linkage stud- ies in DISC1 can be found in Table 1.1 while a more in depth discussion of the association studies to date can be found in Chapter 3, and of epistatic analysis in Chapter 5.

Although these studies seem promising only three of the results reach significance (LOD of at least 3.3), the remainder of the studies are only suggestive of linkage with the LOD

36 1.3 Disrupted in Schizophrenia 1 (DISC1)

Table 1.1: Summary of DISC1 Linkage Findings. BPD=Bipolar Disorder, SCZT=Schizotypal. SCZ=Schizophrenia, RMD=Recurrent Major Depression, SCZA=Schizoaffective Disorder. Table adapted from Chubb et al. (2008)

Study Diagnoses Marker LOD Score Position

Blackwood et al. (2001) SCP, BPD, RMD Translocation (1;11)(q42.1;q14.3) 7.1 Intron 8 Hamshere et al. (2005) SCZA, SCZ, BPD D1S2800 3.54 1q42 St Clair et al. (1990) SCZ, SCZA, RMD Translocation (1;11)(q42.1;q14.3) 3.3 1q42 Ekelund et al. (2001) SCZ, SCZA, SCZT D1S2709 3.21 Intron 9 Ekelund et al. (2004) SCZ, SCZA, SCZT rs1000731 2.7 Intron 9 Detera-Wadleigh et al. (1999) BPD, RMD, SCZA D1S1660-D1S1678 2.67 1q25-42 Macgregor et al. (2004) BPD D1S103 2.63 1q42 Curtis et al. (2003) BPD D1S251 2.5 1q42 Gejman et al. (1993) BPD, RMD, SCZA D1S103 2.39 1q42 Hwu et al. (2003) SCZ, SCZA, SCZT D1S251 1.2 1q42 scores achieved. In order to assess if there is true linkage in these studies or if they are false positives, more individuals need to be added to the analyses. It is reasonably easy to generate LOD scores of the magnitude seen in these studies by only taking a small number of families, assuming heterogeneity, and then only including those families with a positive score. In such a situation where only selected small families are included, particularly when penetrance is incomplete, it is easy to introduce bias. Unfortunately this information is not often published within studies, so it is difficult to objectively assess the level to which this may be a problem.

A number of rare variations have also been identified in DISC1. The first is of course the translocation that led to its discovery in the Scottish pedigree. More recently, however, rare variants of more modest effect size have been identified in DISC1.

An increase in the burden of rare variants across DISC1 and 10 of its identified interacting partners has been found in a study of Swedish schizophrenia patients. Through sequencing analysis the authors report a significant increase (by 3.9%) of non-synonymous rare variation in cases compared to controls for variants with a MAF<0.01.(Moens et al., 2011). A slightly larger analysis of 506 cases (schizophrenia or bipolar disorder 1 with psychotic features) and 1211 controls found four non-synonymous (and two synonymous) variations within exon 11 of DISC1. One of the four non-synonymous variants (E751Q) was found in 2.17% of the cases and 1.8% of controls, this particular variant had been previously described in a separate but overlapping study by Song et al. (2010). A second of the variants (E751A) had also been reported previously (Song et al., 2010). The remaining newly identified variants

37 1. BACKGROUND AND INTRODUCTION appeared in only a single case each and in no controls. Deep sequencing of a 528kb region at the DISC1 locus was conducted in 653 Caucasian cases (schizophrenia, bipolar disorder and recurrent major depression) and 889 matched controls revealed a large number of rare variants (Thomson et al., 2014). Of the 2718 SNPs identified 2010 of them had minor allele frequencies of less than 1% and only 38% of the total had been previously reported. These SNPs mapped throughout the DISC1 gene including 36 in coding exons of which 23 were non-synonymous.

These data suggest that there are rare variants in DISC1, and given that there are currently a limited number of studies having investigated these rare variants (compared to the number looking at common variants) this seems like a promising avenue for future research.

1.3.3.2 Fusion Proteins

Initially, in the original Scottish family, it was thought that the region of chromosome eleven disrupted by the translocation was devoid of any genes and that the disruption of the gene at 1q42.1 was providing a simple loss (or gain) of function mutation through the truncated DISC1 protein (Millar et al., 2000). More recently, evidence has emerged that there is a tran- script present in this region of chromosome eleven and further, that this transcript is in the same orientation as DISC1. This has lead to the suggestion that the translocation does not only disrupt DISC1 but that it also may create a fusion protein between DISC1 and DISC1FP (DISC1 Fusion Protein) (Zhou et al., 2008) (for more detail see Chapter 3).

1.3.4 Summary

There is still a great deal unknown about DISC1, but what is known regarding the location, time and duration of expression as well as the function make the gene an attractive biological candidate in major psychiatric illness. Although the evidence for DISC1 involvement is not conclusive, there is a reasonable amount of genetic evidence from various candidate gene studies supporting the role of DISC1 in major psychiatric illness.

38 1.4 Current Research Area

1.4 Current Research Area

Over the 2008-2009 period the Molecular Genetics Laboratory of the University of Otago Pathology Department used the South Island Bipolar Study (SIBS) cohort to undertake an investigation into the involvement of DISC1 in bipolar disorder. The research conducted in this thesis will form a part of the ongoing study of the DISC1 gene and its involvement in major psychiatric illness. Currently the interest of this research into DISC1 can be split into two main sections; (1) the analysis of single nucleotide polymorphisms (SNPs) within the gene, in particular rs11122324 and its effect on a putative micro RNA binding site in the 3’ un-translated region (3’UTR) of the gene, and (2) the function of the gene in humans, specifically in, and during development of, the brain.

1.4.1 Background

1.4.1.1 The South Island Bipolar Study (SIBS)

The SIBS cohort is made up of a number of individuals recruited from the hospitals of Christchurch, Dunedin and Invercargill. Samples were collected from the identified patients throughout the South Island. The broad aim of this study was to investigate bipolar disorder on a molecular genetic level. The initial studies conducted in this cohort investigated the observed rates of affective disorders in first (and, occasionally second) degree relatives of individuals diagnosed with bipolar disorder I and II (Joyce et al., 2004).

In this study the importance of accurate diagnosis was taken very seriously, being sure to only include individuals (probands) who had previously received psychiatric treatment for bipolar disorder I and II. The assessment of affective disorders in related individuals (of the proband) was also very stringent, using the standard Diagnostic Interview for Genetic Stud- ies (DIGS), this was undertaken by experienced health professionals. Diagnoses of family members were made based on their interviews and other medical records. The relatives were then diagnosed according to the DSM-IV criteria, for either bipolar disorder I, II, not other- wise specified or other depressive disorder. Blood samples were taken from each proband and the relatives for DNA extraction and pedigrees were drawn for each family (Joyce et al., 2004).

39 1. BACKGROUND AND INTRODUCTION

The resulting cohort contained 210 probands diagnosed with either bipolar disorder I or II. Of these, 153 had at least one first-degree relative–totalling 423 individuals, of which 62% had some form of affective disorder. In a few cases, second-degree relatives were included. The final cohort numbers were 786 individuals, arranged into 140 families.

1.4.2 Micro-RNA

Over the past decade it has become apparent that there are a class of small RNA molecules known as microRNA (miRNA) that have a fundamental role in the regulation of gene ex- pression (Guarnieri & DiLeone, 2008) through direct interaction with target gene transcripts. These elements were first discovered in the model organism Caenorhabditis elegans with a re- alisation that miRNAs have a role in the control of developmental timing (Lagos-Quintana et al., 2001). MicroRNAs are present in both plants and animals, and are in fact one of the largest gene groups in higher eukaryotes.

Current knowledge of the structure of these regulatory elements is that they are small, single stranded RNA molecules that range in length from eighteen to twenty-five nucleotides (Ying et al., 2008). These RNAs are coded for either within introns of other genes, (where they are processed after splicing of introns has occurred), or by their own gene (Chang & Mendell, 2007).

Processing involves a number of steps, the pri-miRNA is transcribed and is then folded to form a stem-loop structure of sixty to eighty nucleotides. These stem-loop structures are cut free from the rest of the sequence by a complex containing a primary enzymatic compo- nent named Drosha. These excised hairpins are called pre-miRNAs which contain a short stem of approximately twenty-two bases and a two base 3’ overhang. These pre-miRNAs are exported to the of the cell for further processing by the enzyme Dicer which removes the loop structure of the molecule to create an eighteen to twenty-five nucleotide RNA duplex. The final step in the production of a fully mature miRNA is for the duplex to be incorporated into a RISC complex (Ying et al., 2008).

This biogenesis pathway is outlined in Figure 1.4. Only one strand of the miRNA is incorpo- rated within the RISC complex, this is usually the primary miRNA strand but a number of miRNAs use the alternate strand to varying degrees (Chang & Mendell, 2007).

Genetic variants that affect the binding sites of these molecules have been implicated as con- tributors to disease. There are two broad ways in which the miRNA associated RISC complex

40 1.4 Current Research Area

Figure 1.4: miRNA Biogenesis The pri-miRNA is transcribed from the gene (or intron of a gene). 7 The pri-miRNA is processed to remove the cap (m G) and tail (AAAn) by Drosha (green) and its co-factor Pasha (purple), creating a pre-miRNA, which is exported to the cytoplasm. Here the pre-miRNA is processed by Dicer (orange) to produce the eighteen to twenty-two base pairs miRNA duplex. This is unwound and the mature miRNA (preferentially) is loaded into the RNA-induced silencing complex (RISC) (blue). This complex can now bind to the complemen- tary mRNA and regulate gene expression. *The secondary miRNA produced is often degraded; however, sometimes it is taken into a RISC complex instead of the major miRNA and is then able to regulate gene expression. Figure adapted from Esquela-Kerscher & Slack (2006). can down-regulate gene expression. The first is by cleavage of mRNA strands following tran- scription (Rhoades et al., 2002). The second is by repression of translation of mRNAs (Chang & Mendell, 2007). Both of these methods rely on the miRNA complex binding to the mRNA strand at a binding site. Cleavage of the mRNA transcript requires a perfect match between the binding site and miRNA (Chang & Mendell, 2007). This is most likely due to the fact that the better the match between the binding site and the miRNA the better the affinity and so with a stronger bond the miRNA has a greater effect in repressing the translation of the protein.

41 1. BACKGROUND AND INTRODUCTION

Variation in the sequence that makes up the binding site for the miRNA may therefore dis- rupt the function of the molecule through a decrease in affinity for the binding site. This would lead to an increase in the amount of protein translated from the mRNA and thus a functional variation in the expression levels of a gene. This may result in a contribution to a disease phenotype. Mutations that create a miRNA binding site have been shown to have functional effects. A novel binding site for hsa-miR-189 was identified in patients with Tourette’s syndrome as a single base substitution in the 3’UTR of SLITRK1, this resulted in gene silencing through the repression system of miRNAs (Abelson et al., 2005). Similarly, in the GRN gene, a SNP in the 3’UTR increases the affinity for miRNA-659 at a pre-existing binding site, resulting in amplified suppression of translation. This SNP therefore increased the risk of frontotemporal dementia (Rademakers et al., 2008). It would then appear likely that if a binding site is lost through mutation that the opposite would be true—the protein will be expressed at higher levels and potentially have a functional effect.

1.4.3 Objectives of this Research

The broad objectives of this research project are to assess the association of major psychiatric illness with the (1;11)(q42.1;q14.3) translocation and to further elucidate the function of it’s disrupted gene, DISC1. More specifically, this will involve analyses to identify any variants in the vicinity of the translocation breakpoints, that are contributing risk (or protection) in the major psychiatric illnesses bipolar disorder and schizophrenia. This will be achieved by conducting large scale case-control studies. These association studies will be undertaken with particular attention being paid to the SNP rs11122324, which was identified as being associated with bipolar disorder in the SIBS cohort study (Oliver, 2009) (detailed in Chapter 3).

Concerning the molecular function of DISC1, a large yeast two-hybrid screen will be con- ducted to identify further proteins that are interacting with DISC1. This protein set will then be used to formulate and test an hypothesis of a functional mechanism by which major psy- chiatric illness may be caused. Finally, a subset of the genes encoding the proteins identified in the yeast two-hybrid analysis will be tested for interaction with DISC1 at a genetic level, in line with the hypothesis formed. The specific aims of each analysis are detailed in the introduction of each chapter.

42 1.4 Current Research Area

1.4.4 Conclusion

Research into the genetic contributions to complex diseases, such as major psychiatric illness, has been on-going for some time now but still has a long way to go. Due to the complex and complicated nature of these disorders the majority of contributing factors are of minor effect.

In addition the research into the function of DISC1 may provide some insight into the path- way in which the gene is acting. These pathways may elucidate the specific mechanism by which DISC1 interference or disruption can lead to phenotypes such as bipolar disorder and schizophrenia. Further understanding of this gene and variants within it shown to be associ- ated with major psychiatric illness, may lead to enhanced diagnostic procedures and a better understanding of the genetic factors underlying such diseases.

43 1. BACKGROUND AND INTRODUCTION

44 Chapter 2

Materials and Methods

2.1 In Silico Analysis Tools

2.1.1 miRNA Binding Site Prediction

In silico analysis was conducted using an online folding server, DINAmelt two-state hybridi- sation tool (http://dinamelt.bioinfo.rpi.edu/twostate.php). This tool hybridises a pre- dicted mature miRNA sequence to a short length of genomic DNA. The genomic DNA can include SNPs that may alter the binding affinity of the miRNA. When each of the alleles at the SNP have been tested comparison of the affinities can be made.

2.1.2 Enrichment Analysis

Enrichment analysis was conducted using the ToppGene Suite, ToppFun application (https: //toppgene.cchmc.org/enrichment.jsp). This tool is used to detect functional enrichment within an input gene list compared with a randomly selected gene list of the same size. Topp- Fun is able to draw information from 14 annotation categories (Chen et al., 2009) and outputs results in both tabular and graphical form. The correction applied was False Discovery Rate (FDR) with a p-value cut off of 0.05. Bonferroni correction is also shown for all significant results.

45 2. MATERIALS AND METHODS

2.1.3 Network Analysis

Network analysis was conducted using Ingenuity Pathway Analysis (IPA) software (Ingenu- ity Systems, Redwood City, CA). A basic Core Analysis was run using the Ingenuity Knowl- edge Base (genes only) option as the reference set and accepting both direct and indirect re- lationships. Results are shown in tabular form with the top networks being assigned a score which refers to the negative base ten logarithm of a p-value that measures the likelihood of the genes in a given network occurring by chance.

2.2 Data Sets

All data used in these analyses were obtained from the publicly available cohorts, provided by the Wellcome Trust Case Control Consortium (WTCCC), Molecular Genetics of Schizo- phrenia (MGS) collaboration and the Genetic Association Information Network (GAIN). Only cases and controls of Caucasian ancestry were included in the analyses.

2.2.1 GAIN

The cohorts provided by this network are an initiative of the Foundation for National Insti- tutes of Health, (http://www.fnih.org) and the US National Institute of Health with sup- port from various private partners (Manolio et al., 2007). Samples were selected from around the world, were genotyped and were then subjected to a rigorous review process to ensure quality, diagnosis and consent were to appropriate standards (Manolio et al., 2007). For the current study bipolar disorder (BPD) and schizophrenia (SCZ) were the only two disorders of the GAIN funded datasets that were utilised, specifics of these cohorts are detailed be- low. Controls for these cohorts were recruited by the Knowledge Network market research company (San Jose, CA) from throughout the USA and are representative of the general population.

The bipolar disorder data were obtained from a study conducted by the University of Califor- nia San Diego (Dick et al., 2003) The recruited participants from the USA were of European- American and African-American ethnicities. Cases were diagnosed by an interview in the style of the Diagnostic Instrument for Genetic Studies (DIGS) (Nurnberger et al., 1994) per- formed by trained professionals. Individuals that were diagnosed with bipolar 1 disorder,

46 2.2 Data Sets bipolar 2 disorder, bipolar not otherwise specified and schizoaffective bipolar disorder ac- cording to the DSM-IV were included in the study. Following review of each individual 691 cases and 1081 controls were genotyped over the Affymetrix R Genome-Wide Human SNP Array 6.0.

The schizophrenia dataset was originally obtained as part of a study conducted by Evanston North-Western Healthcare (Suarez et al., 2006) from the Molecular Genetics of Schizophrenia collaboration dataset. Participants in this study were recruited from the USA and Australia and were limited to those of European or African descent. Diagnosis was again by DIGS and DSM-IV criteria for schizophrenia or schizoaffective disorder. For the current study only the European participants were included, giving a total of 1404 cases and 1442 controls (including the 1081 bipolar disorder controls) that passed review and were genotyped on the same Affymetrix R 6.0 SNP array. Hereafter within this thesis these cohorts will be referred to as GAIN bipolar disorder and GAIN schizophrenia respectively.

2.2.2 non-GAIN

A second schizophrenia dataset was made up of those individuals recruited as a part of the Molecular Genetics of Schizophrenia collaboration that were not included in the afore- mentioned GAIN schizophrenia dataset, and as such was designated the non-GAIN schizo- phrenia data (as it will be so referred to within this thesis). About half of the Molecular Genetics of Schizophrenia collaboration were genotyped using the Affymetrix R 6.0 SNP ar- ray and make up the aforementioned GAIN schizophrenia cohort, the remaining individuals (1282 cases and 1365 controls) are those included in the non-GAIN cohort, these were geno- typed on the Affymetrix R 6.0 SNP array at a later time.

2.2.3 Wellcome Trust Case Control Consortium (WTCCC)

This is a large multi-disorder cohort (Wellcome Trust Case-Control Consortium, 2007) in- cluding cases for seven complex diseases (bipolar disorder, type 1 diabetes mellitus, type 2 diabetes mellitus, rheumatoid arthritis, hypertension, coronary artery disease and Crohn’s disease). Of these seven, this study utilised only the data for bipolar disorder and the shared controls. A total of 3004 controls representative of the general UK population are shared across all disorders analysed in the WTCCC study. The control group is made up of 1504

47 2. MATERIALS AND METHODS individuals recruited from the 1958 British Birth Cohort and 1500 individuals recruited from the UK blood service. The 2000 bipolar disorder cases were recruited from across Great Britain and were of Caucasian descent. Genotyping of these samples was performed using the GeneChip 500K Mapping Array Set (Affymetrix).

2.2.4 Access to Datasets

Principle investigator, Associate Professor Tony Merriman, obtained permission to access all data. The WTCCC data was obtained from the European Genome-Phenome Archive (http: //www.ebi.ac.uk/ega). The GAIN bipolar disorder, GAIN schizophrenia and non-GAIN schizophrenia data were obtained through the NCBI Genotypes and Phenotypes database (dbGaP) (http://www.ncbi.nlm.nih.gov/dbgap) under the following accession numbers:

Genome-Wide Association Study of Bipolar Disorder (GAIN) dbGaP Study Accession: • phs000017.v3.p1 Genome-Wide Association Study of Schizophrenia (GAIN) dbGaP Study Accession: • phs000021.v3.p2 Molecular Genetics of Schizophrenia non-GAIN Sample (non-GAIN) dbGaP Study Ac- • cession: phs000167.v1.p1

2.3 Association Analysis of Data

2.3.1 Quality Control of Data

The GAIN and nonGAIN datasets were quality controlled prior to commencement of this study by Assistant Research Fellow, Ruth Topless. Samples were excluded under the follow- ing parameters:

Heterozygosity outside acceptable range of 0.26-0.285 • Call rate of < 97% • Sex ambiguity • SNPs were also filtered and were excluded based on the following

Call rate of < 80% • HWE in controls of < 0.000001 •

48 2.3 Association Analysis of Data

> 3 Mendelian errors • > 2 discrepancies between duplicate samples • Plate effects: p-value < 10 8 for one plate or < 10 4 for two plates •

2.3.2 Population Substructure

Q-Q plots were generated in SNPmax by the inclusion of the PLINK command --adjust to the additional command line parameters option in a Case-Control analysis (done otherwise as described in Section 2.3.3.2). This method was used in two capacities: 1) to confirm that there was no apparent population substructure in the data used at a genome level, and 2) to assess for inflation due to polygenic effects across gene subsets.

Eigenstrat v1.0 (Li & Yu, 2008a) was used to evaluate the inflation factors associated with each dataset, to determine if substructure was an issue. Additionally, assessment of the first two principle components was conducted across the regions used in association analysis to further confirm that stratification was not present. This was conducted for each dataset using the Eigensoft 6.0.1, SmartPCA function (Patterson et al., 2006).

2.3.3 BC SNPmax |

SNPmax is the data handling and analysis software that was utilised to conduct the majority of the data manipulation and association analysis in this study. The software is provided by Biocomputing Platforms Ltd (http://www.bcplatforms.com/). Prior to commencement of this study data sets had undergone quality control and were uploaded in appropriate formats to SNPmax by Assistant Research Fellow Ruth Topless (Biochemistry Department, University of Otago).

2.3.3.1 Defining Gene Regions

Prior to commencing analysis of the data, marker maps were created to define the regions requiring analysis, which in this instance were specific genes. This process required the co- ordinates of the gene according to NCBI dbSNP Build 130 (May 2009 hg 36.3) to be identified using the Ensembl database (http://www.ensembl.org/index.html) and were extended to ensure that all up and downstream flanking sequences were included. Each marker map

49 2. MATERIALS AND METHODS was created as a subset of the NCBI dbSNP Build 130 map by entering all appropriate in- formation, including gene name, chromosome and base co-ordinates. The specific base co- ordinates of genes are given in the results section where appropriate.

2.3.3.2 PLINK Case-Control Analysis

One of the many available utilities of SNPmax is the PLINK case-control analysis tool which was the primary analysis run in this study. PLINK is an open source software package (Purcell et al., 2007) available from http://pngu.mgh.harvard.edu/purcell/plink/index. shtml, that has been integrated into the SNPmax interface. Using this utility, two tests were conducted, a basic allelic association analysis and a Cochrane-Armitage Trend analysis. Analyses were set up in such a way that they only included SNPs within the defined regions (see section 2.3.3.1) that had a minor allele frequency of >0.01 and an exact Hardy-Weinberg Equilibrium p-value from controls of >0.001 (where a significant p-value represents a de- viation from the null hypothesis of the expected equilibrium). Individuals were filtered so that those with a genotype missing-ness rate of more than 20% were excluded from further analysis.

2.3.3.3 Combining Datasets

To conduct meta-analyses the datasets used were combined to increase power. The ability to combine datasets is a function of SNPmax and requires only that the datasets be called on the same strand (strand orientation can be modified using the SNPmax convert function). The combine function allows elimination of subsets of markers and individuals, which was important in the combination of GAIN bipolar disorder and schizophrenia datasets as the bipolar disorder controls are a subset of the schizophrenia controls (duplicate controls were excluded). Only genotyped SNPs that were common to all datasets being combined were included. Once the cohorts had been combined they were analysed in the same manner as for the individual datasets as outlined in Section 2.3.3.2.

2.3.3.4 Gender Splitting for Sex Bias

Due to the suggestion of sex bias in associations of psychiatric illness, (Hennah et al., 2003; Hashimoto et al., 2006; Chen et al., 2007; Hennah & Porteous, 2009; Schumacher et al., 2009)

50 2.3 Association Analysis of Data all of the above outlined datasets were divided into separate male and female only data sub- sets. The number of individuals in each of the resulting datasets (after filtering for duplicate samples) is outlined in Table 2.1.

Table 2.1: Numbers of cases and controls in each of the gender split data subsets.

Cohort Gender Cases Controls GAIN SCZ Male 982 668 Female 422 774 non-GAIN SCZ Male 824 679 Female 355 685 GAIN BPD Male 334 554 Female 352 518 WTCCC BPD Male 751 1446 Female 1247 1492

2.3.4 Corrections for Multiple Testing

A standard False Discovery Rate (FDR) method (Benjamini & Hochberg, 1995) was used to account for the multiple SNPs and datasets being analysed. The q-value (FDR’s p-value equivalent) is calculated by the following equation:

FDR = E[V/V + S]=E[V/R]

Where V is the number of false discoveries, S is the number of true discoveries and R is the total number of discoveries. E denotes that V and S are expected values.

The FDR p-values were calculated for each individual analysis, that is the correction allows for the number of SNPs tested within the cohort under examination. However, where results were significant after this correction, further correction was made for the split-sex testing by increasing the number of tests done to reflect this split.

2.3.5 Power Calculations

Power calculations were conducted as previously described by Johnson et al. (2001). Briefly, this method calculates the power of an analysis for a specific odds ratio based on the minor

51 2. MATERIALS AND METHODS allele frequency and the size of the cohort. For a detailed explanation of this procedure and the equations used see the original paper (Johnson et al., 2001).

2.3.6 Genotyping Integrity

SNPs of interest were visually inspected for genotyping integrity using the genotype clus- ter plots. For the GAIN data these were available for download through dbGAP. In the case of the non-GAIN schizophrenia dataset cluster plots were constructed using Affymetrix Genotyping ConsoleTMsoftware (http://www.affymetrix.com) using a randomly selected 250 individuals.

2.4 Meta-Analysis of SNPs Reported in Literature

2.4.1 Haploview and Linkage Disequilibrium Analysis

Interesting SNPs were analysed for inter-marker linkage disequilibrium (LD). This analysis was conducted using Haploview 4.1 software (download available for free from the Broad In- stitute at http://www.broad.mit.edu/mpg/haploview/download.php). Data was imported to Haploview from either the GAIN and non-GAIN datasets used in the initial analyses or from the HapMap project phase one and two (data obtainable through the HapMap home- page http://www.hapmap.org/). Once imported the SNPs of interest were visualised using a haploplot with r2 values to assess their degree of linkage disequilibrium within popula- tions.

2.4.2 STATA

STATA was used to conduct meta-analyses of SNPs using the data from this study and data already available from the literature. Analyses were performed first with only the data from the literature and then with the addition of the data from this study. Data were uploaded to STATA via a tab delimited text file (example seen in Table 2.2).

Once uploaded the data required analysing to obtain a Mantel-Haenszel odds ratio with associated p-value, as well as the B-D p-value (a measure of heterogeneity). If the B-D p-value reached significance (<0.05) then the test was re-run with randomisation to account for the

52 2.5 Epistasis Analysis heterogeneity. This randomisation allowed for the odds ratio and p-value to be altered to account for the differences between the sample sets. Results were given in both tabular and graphical form.

Table 2.2: An example of the input file for STATA. Case1 and Control1 refer to the number of minor alleles in cases and controls respectively, while Case0 and Control0 refer to the number of major alleles.

Trial Trial Name Year Case 1 Control 1 Case 0 Control 0

1 Example 2009 211 315 1353 1451 2 ExampleA 2009 84 58 602 624

2.5 Epistasis Analysis

2.5.1 Imputation

Where imputation of SNPs was necessary the entire gene containing each SNP was imputed to provide an adequate haplotype set for the imputation to run on. Samples were first pre- phased using the SHAPEIT package in SNPmax, specifying dbSNP build 135 as the marker map. SHAPEIT (Delaneau et al., 2013) is a software package that infers haplotypes in in- dividuals prior to imputation. These data were then imputed through SNPmax using the IMPUTE2 package (Howie et al., 2009), this time specifying the chromosome and range of the gene to be imputed as the marker set and the 1000 genome phase I integrated variant set v3 (MacGT1) as the reference population. All imputed SNPs that were subse- quently used in analysis had to have a Hardy-Weinberg p-value in controls of >0.001(where a significant p-value represents a deviation from the null hypothesis of the expected equilib- rium) and also a minor allele frequency, in the control cohort, that was comparable to that of the published Caucasian frequencies.

2.5.2 Statistical Analyses

Logistic regression analysis was conducted using the Epicalc package (Chongsuvivatwong, 2012) (available from http://CRAN.R-project.org/package=epicalc) in R (R Core Team,

53 2. MATERIALS AND METHODS

2013) (free download at http://www.R-project.org/). The logistic regression was of a like- lihood ratio test comparing the null model of no interaction between the two SNPs with the alternate model of an interaction, accounting for the main effects of each. The equation below was used where L(m*) denotes the likelihood of the respective model and ll(m*) the natural log of the models likelihood.

lr = 2ln(L(m1)/L(m2)) = 2(ll(m2) ll(m1))

2.6 Primer Design and Synthesis

Primers for use in PCR amplification (refer to section 2.7) and sequencing (refer to section 2.9) were designed for use in both gene specific and plasmid specific reactions. A full list of all primers used in this study can be found in Appendix D.

All primers were designed from the mRNA, cDNA or plasmid sequences available from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov/gene). Where multiple isoforms existed the longest transcript variant was used. Primers were designed manually to be be- tween 18 and 25 bp, with a predicted annealing temperature of between 60oC and 70oC. On occasion specific tags were then added to the priming sequence. Oligonucleotides were syn- thesised by Integrated DNA Technologies and were received after standard desalting. All primers were resuspended to stock concentrations of 100 µM and to working stocks of 5 µM, both of which were stored at -20oC.

2.7 Polymerase Chain Reaction (PCR)

The methods for the Polymerase Chain Reaction (PCR) are based on those originally de- scribed by Mullis & Faloona (1987).

2.7.1 Optimisation

Primer pairs were optimised for annealing temperature by initially setting the temperature to a few degrees below the predicted melting temperature of the oligo. This was altered up or down until a satisfactory product was obtained. For primers that were particularly difficult

54 2.7 Polymerase Chain Reaction (PCR) to optimise a gradient of temperatures around the predicted melting temperature were used. The duration of the extension step was also optimised between reactions, where the general rule applied was one minute for up to 1kb and four minutes for fragments over 1 kb.

2.7.2 Standard Reaction Conditions

PCR contained 1X PCR buffer with 1.5mM MgCl2 (Roche), 200 µM dNTPs (made up in equal parts from Roche dATP, dCTP, dGTP and dTTP), 0.5 µM primers (forward and reverse), 0.1 U/µL AmpliTaq DNA Polymerase (Roche) and 20 ng of template DNA. PCR volumes ranged from 10-50 µL and were prepared and run in 200 µL thin walled PCR tubes with flat lids.

The standard PCR programme consisted of an initial 94oC denaturation step for four min- utes, followed by 35 cycles of: denaturation at 94oC for 30 seconds, annealing at primer dependent temperatures (59 65oC) for 30 seconds and extension at 72oC for one to four minutes. PCR reactions were done in MJ Research thermal cyclers with heated lids.

2.7.3 Hot Start PCR

In an attempt to minimise off target fragments, reactions sometimes used a hot start ap- proach. Reactions contained FastStart High Fidelity 1X reaction buffer (Roche) with 1.8 mM

MgCl2 (Roche), 200 µM dNTPs, 0.5 µM primers (forward and reverse), 0.1 U/µL FastStart High Fidelity DNA Polymerase (Roche) and 20 ng of template DNA. Reaction volumes ranged from 10-50 µL and were prepared and run in 200 µL thin walled PCR tubes with flat lids.

2.7.4 Long PCR

This method was used for longer fragments. Reactions contained Expand High Fidelity 1X reaction buffer (Roche) with 1.5 mM MgCl2 (Roche), 200 µM dNTPs, 0.5 µM primers (for- ward and reverse), 0.1 U/µL Expand High Fidelity DNA Polymerase (Roche) and 20 ng of template DNA. Reaction volumes ranged from 10-50 µL and were prepared and run in 200 µL thin walled PCR tubes with flat lids.

55 2. MATERIALS AND METHODS

2.7.5 Touch Down PCR

This programme was used in conjunction with both the standard PCR, hot start and expand PCR protocols. The PCR programme consisted of an initial 94oC denaturation step for four minutes, followed by six cycles of: denaturation at 94oC for 30 seconds, annealing for 30 seconds with the temperature starting 5oC above the desired annealing temperature and decreasing by 1oC each cycle and extension at 72oC for four minutes. Following the initial six cycles, 29 further cycles of: denaturation at 94oC for 30 seconds, annealing at desired temperature for 30 seconds and extension at 72oC for four minutes was conducted without delay. PCR reactions were done in MJ Research thermal cyclers.

2.8 Gel Electrophoresis

2.8.1 Agarose Gel Electrophoresis

Visualisation of PCR products was conducted using 0.8-2.0% agarose horizontal submarine slab gels with 0.5 µg/mL ethidium bromide and were run in 0.5X TBE and ethidium bromide buffer (45 mM Tris, 45 mM H3BO3 (boric acid), 1.2 mM Na2 EDTA and 0.5 µg/mL ethidium bromide). Samples were mixed with 0.1 volumes of loading dye (15% Ficoll and Orange G (Sigma) to produce colour). Gels were electrophoresed at 10 V/cm using a DC power supply until marker separation was adequate. Visualisation was carried out on an ultra violet transilluminator (Ultra-Lum, Inc.) and images were captured using a Panasonic CCTV video camera and Scion Image software.

2.8.2 Polyacrylamide Gel Electrophoresis

Proteins were separated using vertical SDS polyacrylamide gel apparatus (Hoeffer). The sep- arating gel was made by adding 40% acrylamide solution (37.5:1 acrylamide:bis acrylamide) (BioRad) at 8% w/v to 375 mM Tris-HCL (pH8.8), 0.1% v/v SDS and 8% v/v sucrose. The separating gel was polymerised using a 1/20 volume of 10% ammonium persulfate (APS) and a 1/4000 volume of TMED. A stacking gel comprising 4% w/v of the same acrylamide solution in 123 mM Tris-HCL (pH6.8) and 0.1% v/v SDS, was polymerised using a 1/25 volume of 10% APS and a 1/2500 volume of TMED.

56 2.9 DNA Sequencing

Protein samples were denatured prior to loading by addition of protein denaturing buffer (50 mM Tris-HCl (pH6.8), 2% SDS, 10% glycerol, 1% b-mercaptoethanol, 12.5 mM EDTA and Orange G to provide colour) and heating to 99oC for five minutes. Samples were loaded

R along with molecular weight marker SeeBlue (Invitrogen) and IRDye (680/800) protein marker (Li-Cor) and run at 25 mA for between one and two hours (or until sufficient sep- aration of the marker was visible) in protein running buffer (250 mM glycine, 25 mM Tris, 0.001% w/v SDS).

2.9 DNA Sequencing

2.9.1 Pre-Sequencing Template Purification

PCR product was treated prior to sequencing to remove excess dNTPs and primers by adding

0.7 µL of 5X sequencing buffer (0.4 M Tris- HCL pH 9.0, 10 mM MgCl2), 4 U Exonuclease I (BioLabs) and 0.8 U shrimp alkaline phosphatase (USB) to 2 µL of PCR product in a total volume of 5 µL. These reactions were incubated at 37oC for 15 minutes followed by 80oC for 15 minutes in a thermal cycler (MJ Research, Inc.).

2.9.2 Fluorescent Cycle Sequencing

The sequencing reaction was set up by adding 1.6 µmol of primer, 0.7 µL 5X sequencing buffer and 0.3 µL Big Dye Ready Reaction Mix (3.1 version, ABI) in a final volume of 10 µL. Samples were cycled at 96oC for 10 seconds, 50oC for five seconds and 60oC for four minutes, in a thermal cycler 25 times.

Sequence primers used included the primers that had been used in the original PCR and, in the case of longer sequences, nested primers (these were primers internal to the forward or reverse PCR primers).

2.9.3 Re-Suspension and Running

DNA was precipitated by addition of 62 µL precipitation mix (77.5% EtOH and 10 mM NaAc) to each sample. The supernatant was removed following centrifugation at 13000 rpm in a

57 2. MATERIALS AND METHODS microfuge for five minutes. Washing of the pellet was then completed with 100 µL of 80% ethanol. Precipitated DNA was dried in a PCR block at 37oC for 10-20 minutes.

Samples required resuspension in 20 µL of Hi-Di formamide (Applied Biosystems) and heat- ing to 96oC for three minutes prior to running. Samples were run for up to 120 minutes (depending on length), on an ABI Prism 310 Genetic Analyzer Capillary sequencer (Applied Biosystems) and analysed using an ABI PRISM sequencing software. Alternatively they were sent to be analysed by the University of Otago Genetic Analysis Services on an ABI 3730xl DNA Analyser.

Visual inspection of sequences was carried out using the freeware 4-Peaks (available from http://nucleobytes.com/index.php/4peaks) and aligned to reference using BLAST (Basic Local Alignment Search Tool) via NCBI (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

2.10 Yeast Methods

2.10.1 PCR of Yeast Colonies

DNA was not purified prior to PCR amplification, instead it was extracted from yeast colonies by picking a small amount of the colony from the plate with a sterile pipette tip and re- suspending it into 20 µL of 20 mM NaOH. This solution was then made up to 100 µL with sterile deionised water. This working concentration of DNA was used at 1 µL per 10 µL reaction as per the protocol described in section 2.7.

2.10.2 Lithium Acetate Transformation

This method has been adapted from Gietz et al. (1995) and is a simple but robust method of transforming small DNA fragments into yeast. This method was used to obtain large numbers of single colonies of the baits to be screened for interaction. Each individual trans- formation contained final concentrations of 33% w/v polyethylene glycol 4000 (PEG 4000), 100 mM lithium acetate (LiOAc), 250 µg/mL heat denatured carrier DNA and approximately 20 ng of carrier plasmid.

58 2.10 Yeast Methods

2.10.2.1 Preparation of Vectors

Plasmids used in the LiOAc transformations were linearised with BamHI restriction enzyme (New England Biolabs). A standard digest consisted of 2 µL of 1X NEBuffer 3 (New England Biolabs), 0.2 µL of 100 µg/mL Bovine Serum Albumin (New England Biolabs), 100 units of BamH1 (Stock was provided by New England Biolabs at 100 000 Units/mL) and approxi- mately 100 ng of plasmid (either p-GBAE-B for baits or pACT2 for preys) made up to 15 µL with sterile water. The reaction was incubated at 37oC for 18 hours.

2.10.2.2 Transformation

To obtain enough cells for eight transformations (method can be scaled up or down to suit needs) 0.7 mL of YPAD was inoculated with the yeast host to be transformed (i.e. AH109 for baits or Y187 for preys) and allowed to grow with shaking (250 rpm) overnight at 30oC. The following morning the culture was diluted in YPAD to a total of 10 mL and allowed to grow under the same conditions for a further four to five hours. Cells were then harvested by centrifugation (five minutes at 3000 rpm), resuspended in 1 mL of 100 mM LiOAc and transferred to a microfuge tube. Two cycles of washing were then carried out using 1 mL of 100 mM LiOAc finishing with centrifugation to pellet cells and removal of supernatant. Care- fully layered onto the pelleted cells was, 240 µL of 50% (w/v) solution of PEG 4000, 36 µL of 1 M LiOAc, 50 µL of 2 mg/mL heat-denatured carrier DNA (Salmon sperm) and 2.2 µL of 10 ng/µL BamHI linearised plasmid DNA (pGBAE-B for baits and pACT2 for preys) respec- tively. This was then mixed well by vortexing and aliquoted to 32 µL. Following addition of 4 µL of PCR product, reactions were placed in a thermal cycler and incubated 30oC for 30 minutes, 42oC for 25 minutes and 30oC for one minute. The resulting transformations could then each be diluted with 100 µL of sterile water and spread over selective plates.

2.10.3 Replica Plating

Transfer of colonies from one selective media to another without disruption of colony place- ment on the plate is a technique known as replica plating. This process used an 83 mm block, designed to fit 90 mm plates, (or a 138 mm block in the case of 140 mm plates) (ReplicaTech Inc) over-laid with squares of sterile cotton velvet. A single square of velvet is placed over the block and secured by an outer ring before the plate containing the colonies is pressed

59 2. MATERIALS AND METHODS down lightly onto the velvet. A second clean plate is then pressed onto the same velvet sur- face to pick up the colonies left by the previous plate. It is important to mark the orientation of the plates so that comparisons of individual colonies can be made.

2.10.4 Storage of Yeast

When not in use, yeast strains were stored frozen at -70oC in a 15% glycerol solution. This was achieved by culturing cells overnight in YPAD broth and then aliquoting into 1.5mL tubes in equal part with 30% glycerol solution.

2.11 Yeast Two-Hybrid Methods

2.11.1 Construction of Baits and Preys

The preys that were used came in the form of a universal plasmid library that had been pre- transformed into Y187 yeast host strain (see section 2.17.4.1). A total of four overlapping baits were used to encompass the full length of the DISC1 transcript, these had been designed, constructed and tested for auto-activation prior to commencement of the current study in the following manner.

2.11.1.1 Amplification of Baits

Reactions were set up as hot start reactions (see section 2.7.3) and run using a touch down programme with a final annealing temperature of 52oC (see section 2.7.5). Bait fragments were amplified from QUICK-Clone human cDNA (Clontech), using primers designed to con- tain a gene specific region of approximately 20 bp at the 3’ end and a tag at the 5’ end.

The tags used are referred to as A1 in the forward primer and A2 in the reverse; these tags correspond to the cloning sites in the plasmid vectors allowing for homologous recombina- tion (via Gap Repair Cloning, see 2.11.1.2) between the bait PCR fragments and the plasmid vector being used. For later application these tags also include the attB1 and attB2 sites used

R for cloning into the Gateway Vector system (see section 2.13).

60 2.11 Yeast Two-Hybrid Methods

A1: 5’ GAA TTC ACA AGT TTG TAC AAA AAA GCA GGC TGG A2: 5’ GTC GAC CAC TTT GTA CAA GAA AGC TGG GTG

Following agarose gel electrophoresis (see section 2.8.1) to confirm fragment length, the PCR product was used, without further purification, in yeast transformations.

2.11.1.2 Gap Repair Cloning

Baits were inserted into BamH1 linearised vectors (pGBAD-B) by gap repair cloning as is described by Oldenburg et al. (1997). In short, the A1 and A2 sequences added to the baits by PCR were also present on each side of the BamH1 cut site in the vector. When both the lin- earised vector and the PCR product were co-transformed into a yeast host, the host employs gap repair to join the PCR product into the vector by homologous recombination.

To ensure that this process was successful the transformed cells were plated on selective -trp dropout medium, which selects for the cells containing the pGBAD-B vector, and with a low adenine concentration so that the recombinant clones can be identified by their red colour. Each red colony was then confirmed to possess the insert by PCR with the A1 and A2 tagged primers.

2.11.2 Library Screen

The bait constructs, hosted in AH109 (see section 2.17.4), were grown overnight on -trp dropout plates. The resulting colonies were used to inoculate aliquots of 5.6 mL SD+C +Ura/ +low Ade and were left to grow overnight at 30oC and 2500 rpm. Simultaneously, an aliquot of the library (see section 2.17.4) (contained within Y187) was cultured under the same con- ditions in 200 mL of YPAD.

The overnight cultures were grown to saturation and approximately 5x108 yeast cells from the library (4 mL) were mixed with the same number of cells from each of the bait cultures (5.6 mL). Each mixture of cells was pelleted by centrifugation and resuspended in a small amount of the supernatant. The resulting cell suspensions were each plated onto a large (140 mm) YPAD plate and incubated at 30oC for six to eight hours to allow mating between the two strains to occur.

61 2. MATERIALS AND METHODS

Yeast were washed from the plates with sterile water into final volumes of 8 mL. Using 100 µL

4 of this cell suspension a serial dilution to 10 was performed. From these dilutions 100 µL 4 of the 10 was plated onto -leu/-trp plates to identify the number of diploids formed. The undiluted cell suspension was plated onto -leu/-trp/-his to select for diploids with possible interactions. All of these plates were incubated at 30oC for three to four days.

After the incubation period, colonies present on the serial dilution plates were counted to give estimates of the efficacy of mating. A number of the undiluted plates were used to replica plate (see section 2.10.3) onto -leu/-trp/-ade plates to check for activation of a second reporter gene. These plates were incubated for a further two days.

2.11.3 Prey Identification

Preys were identified, from extracted DNA, using PCR and sequencing followed by BLAST analysis. DNA was extracted and amplified from the individual yeast clones as is described in section 2.10.1 using primers 5’ AD-LD1 and 3’ AD-LD1, and was then sequenced using the standard ABI sequencing methods (see section 2.9).

5’ AD-LD1: CTATTCGATGATGAAGATACCCCACCAAACCC 3’ AD-LD1: GAACTTGCGGGGTTTTTCAGTATCTACGATT

Once sequence data were available a perl script (written by a summer student, Cameron McAlister in 2006) was used to automate a BLAST search to identify the best gene match for each sequence from the human nucleotide database (maintained by NCBI) and aggre- gate relevant information about total hits. Results from this application are provided as an excel spreadsheet, this spreadsheet was examined for any sequences that were unable to be called and these were manually searched using the BLAST algorithm with modified param- eters.

62 2.12 Bacteriological Methods

2.12 Bacteriological Methods

2.12.1 Competent Escherichia coli Preparation

A 5 mL overnight culture of SOB containing E.coli from either a frozen stock or a plate grown culture was used to inoculate a 100 mL SOB culture. This large culture was incubated at 30oC until the optical density reached 0.45-0.55 at 500 nm. Divided into two aliquots, the culture was chilled for 10-15 minutes on ice, followed by pelleting at 2500 rpm for 15 minutes at 4oC.

After discarding the supernatant the pellet was resuspended in 33 mL of chilled, filter ster- ilised RF1 buffer (100 mM KCl, 50 mM MnCl2.4H2O, 30 mM Potassium acetate, 15% w/v glycerol, 10 mM CaCl2.2H2O, pH5.8) and was left on ice for 15 minutes. Cells were again pelleted and the supernatant was discarded. Cells were then resuspended in 8 mL of chilled, filter sterilised RB2 buffer (10 mM MOPS (3-N-Morphilino-propsulfonic acid), 10 mM KCl,

75 mM CaCl2.2H2O, 15% w/v glycerol, pH6.8). A final incubation on ice for 15 minutes was completed whilst aliquots, to pre-chilled 1.5 mL sterile tubes, were made. Aliquots were flash frozen in liquid nitrogen and then stored at 70oC.

2.12.2 Plasmid DNA Preparation (Miniprep Method)

Overnight cultures of 5 mL TB broth supplemented with appropriate antibiotics for selection and inoculated with E.coli (DH5a) containing the plasmid to be purified were incubated at 37oC in a shaking incubator (250 rpm).

The culture was pelleted at 1800 g for five minutes and was then resuspended in 180 µL of resuspension solution (50 mM Tris-HCl pH7.5, 10 mM EDTA, 10 mg/ml RNase A) and transferred to a 1.5 mL microfuge tube. The resuspended cells were mixed by inversion with 200 µL of cell lysis solution (0.2 M NaOH, 1% SDS), then mixed again following the addition of 200 µL of neutralisation solution (2.55 M potassium acetate pH4.8). Following centrifuga- tion at 16000 g for five minutes the supernatant was removed to a fresh microfuge tube and 1 mL of DNA purification resin (10 mg/mL diatomaceous earth (Sigma D5384), 50 mM Tris- HCl pH7.0, 20 mM EDTA, 4 M guanidine thiocyanate (Sigma G6639)) was added and mixed by inversion. The suspension was drawn through a syringe attached to a Magic mini-prep column (Promega A7211) inserted into a vacuum manifold. The column was washed with 2 mL of column wash solution (100 mM NaCl, 10 mM Tris-HCl and 2.5 mM EDTA in 50%

63 2. MATERIALS AND METHODS ethanol) and 2 mL of 80% ethanol. Once dry, the column was transferred to a microfuge tube and was spun for five minutes at 16000 g to remove any remaining ethanol. The column was transferred to a fresh microfuge tube and 50 µL of hot sterile deionised water was aliquoted into the column. The column was spun for 30 seconds to expel the DNA solution into the microfuge tube.

2.12.3 Plasmid DNA Preparation (Midiprep Method)

Where larger volumes of purified plasmid were required a Midiprep sized plasmid prepa-

TM R ration was done. Either Promega Pure Yield or Machery-Nagel NucleoBond Xtra Midi Plus kits were used. Preparations were carried out according to the kit manual in both in- stances.

2.12.4 Transformation of Competent Cells

Competent cells were thawed on ice before 40 µL was added to a pre-chilled PCR tube con- taining between 0.2-1 µL of the plasmid DNA to be transformed. This reaction was incubated in a thermal cycler at 0oC for 30 minutes followed by a heat shock to 42oC for 30 seconds and returned to 0oC. Cells were subsequently added 1:5 to SOC in a microfuge tube and incu- bated with gentle shaking for one hour at 37oC. Between 10 µL and 200 µL (depending on the predicted transformation efficiency of the plasmid) was then spread onto an LB plate containing appropriate selection and left to grow overnight at 37oC.

R 2.13 Gateway Vector System

R The Gateway vector system (Invitrogen) provides an elegant and simple way to move a gene fragment between various vectors for functional analyses (Hartley et al., 2000) (see Figure 2.1). The system takes its basis from the site-specific recombination abilities of the bacteriophage lambda. The system has two stages—an entry BP reaction and a transfer LR reaction. The BP reaction is the cloning of a gene of interest into an entry clone as described below (section 2.13.2). This reaction requires that the gene to be inserted is flanked by attB

R sites and is cloned into a vector that contains attP sites (hence BP reaction). In the Gateway

64 R 2.13 Gateway Vector System entry vectors these attP1 and attP2 sites are present at the terminal ends of a cassette that con- tains a negative control selection marker (ccdB). During the BP reaction this selection marker is replaced by the gene of interest via recombination and results in the formation of attL sites, which allow for subsequent transfer to the various attR containing destination vectors by a second round of recombination known as the LR reaction (section 2.13.3).

R Figure 2.1: The Gateway Cloning system. The figure shows the mechanism of the BP and LR reactions, resulting in entry clones and destination clones respectively.

2.13.1 PCR from cDNA

Primers were designed to amplify full-length transcripts of human genes from Human Adult Brain First Strand cDNA (Stratagene) and, on occasion, from cloned plasmids containing the gene. These primers included the attB1 and attB2 tags, to allow cloning by recombination, as well as a Kozak sequence.

65 2. MATERIALS AND METHODS

Forward Primer Tag: 5’ GAATTCACAAGTTTGTACAAAAAAGCAGGCTGG 3’ Reverse Primer Tag: 5’ GTCGACCACTTTGTACAAGAAAGCTGGGTG 3’

Amplification was conducted by PCR (see section 2.7) and fragment length was checked using agarose gel electrophoresis (see section 2.8.1).

2.13.2 Entry Vector Cloning Reactions (BP Reactions)

TM The amplified gene cDNA sequences were cloned into pDONR 201 (See Section 2.17.7 for TM vector map) via a BP Clonase reaction.

A single reaction (that was scaled up as necessary) consisted of 0.2 µL of 5X BP buffer, 0.2 µL TM of BP clonase enzyme, 25 ng of pDONR 201 vector and 0.5 µL of the tagged cDNA PCR product. The reaction was incubated at 25oC in a thermal cycler for 16 hours. The reaction was then transformed (as described in section 2.12.4) into 40 µL of either DH5a or HB101 competent cells and plated on LB agar containing kanamycin (25 µg/mL).

Resulting colonies were checked for insert presence via PCR using the DONR1 and DONR2 primers and were then sequence verified by DNA sequencing.

DONR1: 3’ TCGCGTTAACGCTAGCATGGATCTC 5’ DONR2: 3’ TGTAACATCAGAGATTTTGAGACAC 5’

2.13.3 Destination Vector Transfer (LR Reactions)

Once inserted into the entry vector the cDNA sequence can be easily transferred to multiple TM destination vectors via the LR Clonase reaction (See Section 2.17.7 for vector maps).

Each reaction contained 0.2 µL of 5X LR buffer, 0.2 µL of LR clonase enzyme, 5 ng of purified TM recombinant pDONR 201 and 15 ng of the required destination vector. This reaction was incubated in a thermal cycler at 25oC for 16 hours. The reaction was transformed into 40 µL of either DH5a or HB101 competent cells and plated on LB agar containing the appropriate antibiotic selection.

Inserts of resulting colonies were checked via PCR using the gene specific primers.

66 2.14 Mammalian Cell Culture

2.14 Mammalian Cell Culture

2.14.1 Cell Maintenance and Storage

Cells were taken from frozen storage, thawed quickly and diluted into full supplemented media to minimise damage by DMSO. Cells were pelleted by a five-minute centrifugation at 250 g and then resuspended in fresh supplemented DMEM to ensure all DMSO was re- moved.

All cell lines were maintained in 75 cm2 flasks in DMEM supplemented with foetal calf serum (5% v/v), penicillin (100 U/mL) and streptomycin (100 mg/mL). Cells were kept at 37oC in a

5% CO2 atmosphere. When cells reached an 80-90% confluence they were split by aspirating the media, trypsinising the flasks and adding an appropriate aliquot of the resulting cell suspension to a new flask of fresh media.

Cells were stored in 1 mL aliquots suspended in freezing media (foetal calf serum containing 10% v/v DMSO) either in a -80oC freezer, for short term storage, or in liquid nitrogen for long term storage.

To prepare cells for storage they were trypsinised, pelleted at 250 g for five minutes and then resuspended in freezing media. The volume of freezing media used was determined based on the number of cells present. The suspension was finally aliquotted to 1.5 mL freezing vials (1 mL per vial) and frozen overnight in a -70oC freezer using a Nalgene Cryo Freezing container (this ensures a steady freezing rate of -1oC per minute). Once frozen cells were transferred to liquid nitrogen for long-term storage if necessary.

2.14.2 Mammalian Cell Transfection

TM Mammalian cell transfections used the Invitrogen Lipofectamine 2000 transfection reagent. Cells were trypsinised from flasks and plated out into 6-well plates at 70-80% confluence (approximate density of 2x105 cells per well) in 2.5 mL of foetal calf serum supplemented DMEM, containing no antibiotics.

TM The transfection procedure required that, for each well, 6 µL of Lipofectamine 2000 was TM added to 250 µL of serum and antibiotic free Optimem and that a total of 2000 ng of plas- TM mid DNA was added to a second aliquot of 250 µL of serum and antibiotic free Optimem . These reaction mixtures were incubated separately for five minutes at room temperature

67 2. MATERIALS AND METHODS and were then combined for a further incubation of 20 minutes. Finally the transfection reagent:DNA complex was added to the well of the plate in a drop-wise fashion, the plate was swirled to ensure even distribution through the media and then incubated under the standard growth conditions for between 16 and 24 hours.

2.14.2.1 Assessment of Transfection Efficiency

At times, the efficiency of transfection in mammalian cells was assessed by the co-transfection of a fluorescent-fusion plasmid designed to express either EYFP or mCherry. The fluores- cent construct used was either pDEST/TO/EYFP/myc-His carrying one of the prey genes or pDEST/TO/mCherry/myc-His carrying DISC1 (see section 2.17.7.5). This fluorescent plasmid was transfected in the same manner as all other plasmids used in this study. The fluorescent plasmid was co-transfected at a 1:1 ratio with the plasmid being tested during optimisation as a control for transfection efficiency. After the cells had been transfected and incubated for up to 24 hours the amount of fluorescence was measured by assessing the number of cells present versus the number expressing EYFP or mCherry via fluorescent mi- croscopy (as outlined in Section 2.16.2).

2.15 Protein Methods

2.15.1 Protein Isolation from Mammalian Cell Cultures

Following transfection in 6-well culture dishes, cells were scraped from the plate using sterile rubber scrapers. The media containing the cells was then spun at 300 g for 5 minutes to pellet the cells. The pellet was washed twice in cold PBS and then resuspended in 330 µL of cold lysis buffer (1% v/v Triton X-100, 25 mM TrisHCL pH 7.5, 150 mM NaCl, 25 mM KCL and one tablet of cOmplete EDTA-free protease inhibitor (Roche)) with 100 U/mL of

R BenzonaseNuclease. This mixture was left on ice for 60 minutes. After 30 minutes the lysates were homogenised by passing the solution through an 18 gauge needle five times, a 25 gauge needle 30 times and finally a 30 gauge needle 15 times. The lysates were kept on ice or frozen at -20oC until required.

68 2.15 Protein Methods

2.15.2 Coomassie Blue Staining

This technique was occasionally used to visualise the protein present on an SDS-page gel. The gel was fixed for 30 minutes in a fixing solution (50%v/v methanol and 10% v/v acetic acid). The gel was then transferred to the staining solution (50%v/v methanol and 10% v/v acetic acid with 0.05%w/v Coomassie Brilliant Blue R-250 (Sigma-Aldrich)) and was incubated, with shaking, at room temperature for between two and 18 hours. Destaining of gels, by multiple washes in destaining solution (5%v/v methanol and 7% v/v acetic acid), were repeated until the protein bands were easily distinguished from the background. The gel was then placed onto a white background (usually Whatman 3MM) to be imaged.

2.15.3 GST Pulldown

For each pulldown 20 µL of magnetic glutathione beads (Pierce), suspended as a 25% ethanol slurry, were aliquoted to 1.5 mL tubes. The beads were washed three times in cell lysis buffer (see 2.15.1) using a Magnabind magnetic stand (Pierce). For each reaction 300 µL of cell lysate (produced as described in 2.15.1) was added to the washed beads. The pulldowns were incubated overnight at 4oC in a rotator disk. Following incubation the lysate was re- moved from the beads and then washed three times in cell lysis buffer. Protein was dena- tured from the beads by addition of loading buffer (see 2.8.2) and heating to 99oC for five minutes. Samples were subsequently run alongside the pure lysates as described in sections 2.8.2 and 2.15.4.

2.15.4 Western Blotting

2.15.4.1 Transfer to Membrane

After running samples on a gel (as described in section 2.8.2) the protein was transferred to a 0.45 micron nitrocellulose membrane (Thermo Scientific) using a Hoeffer transfer unit as follows. The gel was removed from the glass plates, equilibrated in transfer buffer (25 mM Tris base, 250 mM glycine, 0.001% w/v SDS and 20% v/v methanol) and finally overlain with a square of the nitrocellulose membrane.

Following equilibration of the gel, the transfer apparatus was assembled so that the gel and membrane were sandwiched between four layers of Whatman 3MM chromatography paper

69 2. MATERIALS AND METHODS and a series of scouring pads, enough to ensure contact between the electrodes. All of the transfer materials were soaked in transfer buffer prior to assembly. The transfer unit was then closed, filled with transfer buffer and tapped gently to remove any air bubbles. The transfer was run in an electrophoresis tank (Hoeffer), which was filled with cold deionised water, at 300 mA for 60 minutes.

2.15.4.2 Blocking and Primary Antibody Incubation

Following transfer, the membrane was removed from the apparatus and inspected visually for marker transfer. The transfer was deemed successful upon visualisation of the marker on the membrane. However, on occasion the transfer was checked using 0.1%w/v Ponceau S (Sigma) and then destained in PBS. The membrane was briefly rinsed in PBS and then incubated, with shaking, in Odyssey Casein Blocking Buffer (Millennium Science) for 30 minutes at room temperature.

The membrane was then transferred to the primary antibody solution made up to the ap- propriate concentration in Odyssey Casein Blocking Buffer (0.1%v/v Tween-20) (see section 2.17.8 for details on antibody concentrations) and incubated overnight at 4oC on a shaking platform. Primary antibody was washed from the membrane by completing four rinses in PBS-T for five minutes each. Washing was performed at room temperature with gentle agi- tation.

2.15.4.3 Odyssey Secondary Antibody and Membrane Reading

The Odyssey secondary antibody was also made up to concentration in Odyssey Casein Blocking Buffer (0.1%v/v Tween-20). Incubation in the secondary antibody was at room temperature for one hour with shaking, protected from light. The membrane was again washed four times in PBS-T and then rinsed in PBS to remove any residual Tween, protected from light at all times. The membrane could then be read immediately or dried between sheets of Whatman 3MM and set aside to read at a later time. The membrane was read using the Odyssey Scanner, ensuring that the appropriate channels were used. Scan images were then analysed using the Odyssey v3.0 Application Software.

70 2.16 Functional Analyses

2.16 Functional Analyses

2.16.1 Localisation of Proteins

Cells were plated onto sterile coverslips, in 24-well plates, at 30-40% confluence, in 1 mL supplemented DMEM. Cells were immediately transfected with fluorescent constructs as is described in Section 2.14.2, with volumes altered to account for the smaller plate size used. TM For use in the 24-well plates 2 µL of Lipofectamine 2000 was added to 50 µL of serum TM and antibiotic free Optimem and a total of 400 ng of plasmid DNA was added to a second TM aliquot of 50 µL of serum and antibiotic free Optimem . After a 24 hour incubation the cells were mounted to slides using VectaShield containing DAPI then visualised by fluorescent microscopy as outlined in Section 2.16.2.

2.16.2 Fluorescent Microscopy

Plated cells assessed for transfection efficiency were viewed using an Olympus 1X71 inverted microscope with appropriate filters (Olympus UIS2 Series: U-MGFPHQ, Excitation filter - BP460-480HQ, Emission filter - BA495-540HQ and Dicromatic filter - DM485) for EYFP and (Olympus UIS2 Series: U-MRFPHQ, Excitation filter - BP535-555HQ, Emission filter - BA570- 625HQ and Dicromatic filter - DM565HQ) for mCherry.

A Zeiss Axioplan compound microscope was used to view all slides (fixed cells), and im- ages were captured using a SPOT-RT CCD camera (Diagnostic Instruments). Filters used were Axioplan Zeiss filter 49 (excitation 365, beam splitter 395 and emission 445/50) for DAPI stain, Axioplan Zeiss filter 25 (excitation 400+495+570, beam splitter 410+505+585 and emission 460+530+625) for mCherry and Axioplan Zeiss filter 09; (excitation 450-490, beam splitter 510 and emission 515) for EYFP.

71 2. MATERIALS AND METHODS

2.17 Materials

2.17.1 Yeast Media

All descriptions of media in this section are in broth form, where solid media was required 6 g/L of bacteriological agar (Oxoid) was added prior to autoclaving.

2.17.1.1 YPAD

YPAD is a non-selective, nutrient rich media that is ideal for growing all laboratory strains of yeast to high densities. It contains 10 g/L of yeast extract (Difco), 20 g/L of peptone (Oxoid), 20 g/L of glucose (JT Baker) and 50 mg/L of adenine hemisulfate (Sigma). Due to the high adenine concentrations ade1 and ade2 mutants grow as white colonies.

2.17.1.2 Synthetic Dextrose (SD)

Synthetic Dextrose (SD) is a defined minimal media that can be supplemented (refer to sec- tion 2.17.1.4) to support the growth of auxotrophic mutants. Prior to supplementation it con- tains, 1.7 g/L of yeast nitrogen base without amino acids and ammonium sulphate (Difco), 5 g/L ammonium sulphate (BDH) and 20 g/L of glucose.

2.17.1.3 SD with Casamino Acids (SD+C)

This is SD media with added casamino acids; that is, a semi-defined media lacking only adenine, tryptophan and uracil. It contains 1.7 g/L of yeast nitrogen base without amino acids and ammonium sulphate, 5 g/L ammonium sulphate, 20 g/L of glucose and 14 g/L of casein acid hydrolysate (Difco). This medium was used when selection for leucine or histidine was not required.

2.17.1.4 Supplementation

Supplementation of media was used on a drop out basis, that is, that media was prepared to be selective by certain elements being left out rather than standard selection, which requires

72 2.17 Materials the addition of certain compounds. Fully supplemented media contained tryptophan, his- tidine, leucine, adenine hemisulfate, methionine and uracil to support the growth of any strains lacking the ability to synthesise these compounds. Supplementation was, for the most part, applied to SD media but was, on occasion, also applied to SD+C.

Supplementation was achieved by addition of the required compounds to molten agar or broth. Histidine, uracil, methionine and tryptophan were all used at a final concentration of 20 µg/mL, while leucine was used at 100 µg/mL. Adenine hemisulfate was used at both low (20 µg/mL) and high (50 µg/mL) concentrations, the former of which was used for red/white colour selection. All supplements were made as 100X stock solutions, were ster- ilised by autoclaving and stored at room temperature. The exception was tryptophan, which is both heat and light sensitive so it was sterilised by filtration and stored at 4oC, in the dark.

2.17.2 Bacteriological Media

All descriptions of media in this section are in broth form, where solid media was required 6 g/L of bacteriological agar (Oxoid) was added prior to autoclaving.

2.17.2.1 Luria-Bertani (LB)

LB is a standard bacterial medium most commonly used in solid (agar) form. Standard LB contains 10 g of tryptone (Oxoid), 5 g of yeast extract (oxoid) and 10 g of sodium chloride (Biolab) made up to 1 L. Here, however, a low salt modified version of LB, known as LB Luria was used, this was identical to the standard LB with the exception of the lower NaCl concentration of 0.5 g/L.

2.17.2.2 Terrific Broth (TB)

TB is used to grow E. coli broth cultures to high densities required for plasmid DNA prepa- ration; this is possible due to the rich nutrient nature of the medium. TB contains 12 g of tryptone, 24 g of yeast extract and 4 mL of glycerol made up to 900 mL. Following auto- claving of this solution 100 mL of sterile potassium phosphate buffer (17 mM KH2PO4 and

17 mM K2HPO4) is added.

73 2. MATERIALS AND METHODS

2.17.2.3 Super Optimal Broth (SOB)

SOB is a modified version of LB that allows efficient transformation of plasmids and is used in the culture of E. coli prior to preparing competent cells. It contains 20 g of tryptone, 5 g of yeast extract, 10 mL of 1 M NaCl and 2.5 mL of 1 M KCl. The pH is adjusted to between 6.8 and 7.0, followed by autoclaving of the solution. Finally 10 mL of filter sterilised 1 M MgCl2,

1 M MgSO4 is added to a final total volume of 1 L.

2.17.2.4 SOB with Catabolite Repression (SOC)

SOC was the media used as a recovery medium for E. coli post transformation. It is prepared by addition of glucose to a SOB solution to a final concentration of 20 mM.

2.17.2.5 Antibiotic Supplementation

Broths were supplemented with appropriate antibiotic selection prior to growth of bacte- rial strains. Antibiotics were made up in 1000 x stock solutions, filter sterilised and stored frozen in aliquots. The final concentration of ampicillin was 200 µg/mL (or carbenicillin at 100 µg/mL) kanamycin was 25 µg/mL and gentamycin was 8 µg/mL.

2.17.3 Mammalian Cell Culture Media

2.17.3.1 Dulbecco’s Modified Eagle Medium (DMEM)

All mammalian cell cultures were grown and maintained in Dulbecco’s Modified Eagle Medium (DMEM) (Gibco), which contains high levels of glucose and glutamine. In flask growth and maintenance this was supplemented with 10%v/v foetal calf serum (Gibco), 1 U/mL penicillin and 100 mg/mL streptomycin.

2.17.3.2 Freezing Media

All mammalian cell lines were frozen in foetal calf serum with 10% v/v dimethyl sulfoxide (DMSO) (Sigma).

74 2.17 Materials

2.17.3.3 Phosphate Buffered Saline (PBS)

PBS used to wash cells was prepared with deionised water and PBS tablets (Oxoid) according to the manufacturer’s instructions and was then sterilised by autoclaving. PBS-T was made by addition of 0.1% v/v Tween-20.

2.17.4 Yeast Strains

2.17.4.1 Two-Hybrid Hosts

AH109 — MATa, trp1-901, leu2-3, 112, ura3-52, his3-200, gal4D, gal80D, LYS2::GAL1 UAS- GAL1TATA-HIS3, GAL2UAS-GAL2TATA-ADE2, MEL1 URA3::MEL1 UAS-MEL1TATA-lacZ

Y187 — MATa, ura3-52, his3-200, ade2-10,1 trp1-901, leu2-3, 112, gal4D, met-, gal80D, URA3:: GAL1UAS-GAL1TATA-lacZ

Both of these haploid strains of Saccharomyces cerevisiae were purchased from Clontech. They are unable to grow in the absence of tryptophan, leucine, histidine and adenine due to deficits in trp1, leu2, his3 and ade2 genes. Additionally, both carry reporter genes under the control of different GAL promoters and these will not be transcribed in the absence of GAL4 function. The strains are partner strains meaning that they are opposite mating types. This means that if one strain is used to host the baits (AH109) and the other is used to host the preys (Y187) then mating to create diploids can occur allowing interaction to occur between the baits and the preys.

2.17.4.2 Prey Library

The prey library used was a Universal Human Normalized Mate and Plate Library (Clon- tech), which comes ready to use. This library was constructed using cDNA transcripts ex- pressed in multiple tissues to give a fair representation of all expressed genes in humans, it is normalised to lower the copy number of particularly abundant genes so that screens are not biased toward these. The library contains 3 million independent clones. The cDNA con- structs of the genes are cloned as fusions with the GAL4 activating domain by homologous recombination with the vector pGADT7-RecAB in the Y187 host.

75 2. MATERIALS AND METHODS

2.17.5 Bacterial Strains

2.17.5.1 DH5a

DH5a cells (fhuA2 D(argF-lacZ)U169 phoA glnV44 f80 D(lacZ)M15 gyrA96 recA1 relA1 endA1

R thi-1 hsdR17) (Invitrogen) were used to host Gateway clones.

2.17.5.2 HB101

HB101 cells (hsd20(rB-, mB-, recA13, rpsL20, leu, proA2) (Invitrogen) were also on occasion

R used to host Gateway clones.

2.17.6 Mammalian Cell Lines

2.17.6.1 HEK293

A cell line originally derived from an unknown type of human embryonic kidney cell by Alex Van der Eb, and transformed with adenovirus by Frank Graham (Graham et al., 1977). The cells were obtained from Alison Fitches (Molecular Pathology Laboratory, Pathology Department, Dunedin School of Medicine).

2.17.6.2 HeLa

HeLa cells were derived from a human cervical adenocarcinoma (Scherer et al., 1953). These cells were obtained from Lynn Slobbe (Developmental Genetics Laboratory, Pathology De- partment, Dunedin School of Medicine).

TM 2.17.6.3 T-REx -293

This cell line was derived from the human embryonic kidney cell line (HEK-293) (Graham TM et al., 1977). Tetracycline-regulated expression-293 (T-REx -293) cells stably express the tetracycline (Tet) repressor from an integrated pcDNA6/TR plasmid. When transfected with a Tet repressible plasmid, expression can be induced with doxycycline. The cells were pur- chased from Invitrogen.

76 2.17 Materials

2.17.7 Plasmids

2.17.7.1 pGBAE-B

This plasmid (Semple et al., 2005) was used in the construction of baits for the yeast two- hybrid screen. It contains an ampicillin resistance gene for selection in E. coli and a TRP1 gene for selection in yeast. Cloning is achieved via homologous recombination at a single BamHI site that is flanked by attB1 and attB2 sites. Lying on either side of these attB sites are the GAL4 activating and binding domains (in frame).

This can therefore be used as a red/white reporter system for insertion when grown on ade- nine limiting media. In the absence of an insert both of the GAL4 domains are functional and expressed, producing activation of ADE2 and white colony growth; however, if an insertion is present the GAL4 expression is interrupted resulting in red colonies.

TM 2.17.7.2 pDONR 201

TM R The pDONR 201 (Invitrogen) plasmid is a Gateway donor vector containing the attP sites necessary to facilitate the BP reaction (Figure 2.2). The plasmid is kanamycin resistant and must be hosted in a ccdB resistant bacterial strain prior to insertion of a gene. It also contains transcription termination sequences, which prevent possible toxicity as a result of the vector promoting expression of the cloned gene.

TM 2.17.7.3 pDEST 27

TM R pDEST 27 (Invitrogen) is one of the Gateway destination vectors, with attR sites able to facilitate the LR reaction and ampicillin resistance for selection (Figure 2.3). It also con- tains a glutathione S-transferase (GST) tag under the control of a T7 promotor, so produces N-terminally fused GST proteins with any inserted gene. This plasmid confers ampicillin resistance for selection in E. coli.

77 2. MATERIALS AND METHODS

TM Figure 2.2: pDONR 201 Structure. The figure shows the structure (not to scale) of the

TM R TM pDONR 201 donor vector. This figure is adapted from the Gateway pDONR manual (June 2007).

TM TM Figure 2.3: pDEST 27 Structure. The figure shows the structure (not to scale) of the pDEST 27 destination vector. This figure is adapted from the Mammalian Expression System with

R Gateway Technology manual (March 2012).

78 2.17 Materials

2.17.7.4 pDEST/TO/myc-His

R This is a Gateway destination vector constructed by David Markie (Dunedin School of

R Medicine) via the insertion of the Gateway cassette and Kozak sequence into the commer- TM cially available pCDNA 4/TO/myc-His A (Invitrogen). The plasmid provides tetracycline- inducible expression of c-myc epitope and 6-His tagged proteins in mammalian cells. The TM inducibility of the plasmid is only valid in T-REx cell lines. In all other cell lines the expres- sion is constitutive. The myc-His tag is added to the C terminus of the cloned protein that is missing its terminal stop codon. Ampicillin resistance for selection in E. coli and zeocin selection in mammalian cells is also included. The plasmid structure is shown in Figure 2.4.

Figure 2.4: pDEST/TO/myc-His Structure. The figure shows (A) the structure (not to scale) of the pDEST/TO/myc-His destination vector and (B) detail of the modified portion of the vector at protein and DNA levels.

2.17.7.5 pDEST/TO/EYFP/myc-His and pDEST/TO/mCherry/myc-His

R These destination vectors for the Gateway system were also constructed by David Markie (Dunedin School of Medicine) by cloning the EYFP or mCherry gene (with Kozak sequence and without a stop codon) into the HindIII site of pDEST/TO/myc-His (see Figure 2.5). They are ampicillin resistant in E. coli and Zeocin resistant in mammalian cells.

79 2. MATERIALS AND METHODS

Tetracycline (or equivalent) induction is necessary in mammalian cells that are expressing the Tet repressor protein. Expression of fluorescence requires the EYFP or mCherry to be

R fused to the N-terminus of a protein that can be added via Gateway cloning.

Figure 2.5: pDEST/TO/EYFP(mCherry)/myc-His Structure. The figure shows the altered read- ing frame of pDEST/TO/myc-His to include the EYFP or mCherry tag. The remainder of the plasmid is identical to pDEST/TO/myc-His so is not shown here.

2.17.7.6 pCMV6-XL5

TM This plasmid is an OriGene TrueClone containing the L isoform of DISC1. The untagged, full length cDNA of DISC1 is under the control of a CMV promotor.

2.17.8 Antibodies

2.17.8.1 Anti-c-Myc Tag Antibody

This is a monoclonal mouse antibody and was raised against a synthetic peptide of the hu- man c-myc protein (MEQKLISEEDL). The antibody was supplied by Millipore at 1 mg/mL and was used at 1µg/mL for detection. This antibody was used to detect proteins that were fused to a c-myc epitope.

2.17.8.2 Anti- Polyclonal Antibody

This antibody was used as a positive control in the western blot applications giving a band of 42 kDa. It is an affinity isolated antigen specific antibody that was raised in rabbit against the eleven C-terminal residues that are common to all actin isoforms (Ser-Gly-Pro-Ser-Ile- Val-His-Arg-Lys-Cys-Phe) attached to a multiple antigen peptide backbone. The antibody was supplied by Sigma at 700 µg/mL and was used at 3.5 µg/mL

80 2.17 Materials

2.17.8.3 Anti-GST Antibody

This monoclonal mouse antibody was used to detect glutathione S-transferase (GST) tagged proteins, giving a band of 28-30 kDa plus the size of the fusion partner. The antibody was produced from clone DG122-2A7 with the GST derived from a pGEX expression vector that had been lined to KLH. Supplied at 1 mg/mL by Millipore, and used at 0.5 µg/mL.

2.17.8.4 Anti-DISC1 Antibody

This is an oligoclonal, anti-rabbit antibody targeted to amino acids 419-428 of human DISC1

R and gives a 91 kDa band. The antibody was purchased from Novex at 0.5 mg/mL and was diluted to 2 µg/mL for use.

2.17.8.5 Odyssey Secondary Antibodies

Two fluorescently labelled secondary antibodies were used for visualisation of bands with

R R the Odyssey system. The IRDye 680RD goat (polyclonal) anti-rabbit IgG (H+L), highly cross absorbed antibody is detected at a wavelength of 680 nm giving red fluorescent bands

R and the IRDye 800CW conjugated goat (polyclonal) anti-mouse IgG (H+L), highly cross absorbed antibody is detected at a wavelength of 800 nm giving green fluorescent bands. The different wavelengths of detection meant that the two antibodies could easily be used at the same time. These antibodies were both supplied by LI-COR at 1mg/mL and were used at a 1:15000 dilution.

81 2. MATERIALS AND METHODS

82 Chapter 3

Association of DISC1 and 11q14.3 with Psychiatric Illness

3.1 Introduction

Ongoing research now shows that the genetic basis to disease is often complex especially as it moves away from the classic Mendelian, single gene disorders. As discussed in Chapter 1 association analysis provides a means to assess the involvement of a multitude of variants in diseases of complex genetic nature. Although the results from such studies can provide great insight into the contributing genetics of a disease, they must be carried out carefully and subjected to various quality control measures before making conclusions as to their validity. Like any method, association analyses seldom reveal the entire picture (as environmental and other non-genetic factors must also be considered); however, they do provide a sound starting point in the search for candidate genes and variants.

3.1.1 DISC1 Associations in the Literature

Genetic association studies have previously identified the DISC1 gene as a promising candi- date in major psychiatric illnesses among Caucasian and Asian populations (Ekelund et al.,

83 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

2001; Hennah et al., 2003; Hodgkinson et al., 2004; Thomson et al., 2005; Hashimoto et al., 2006;

Palo et al., 2007; Qu et al., 2007). These studies have independently identified a number of regions within the gene to be associated with one or more major psychiatric disorders, indi- cating that it is likely that the gene has a host of associated variants (a number of these are outlined in Figure 3.1).

Many such associations are of minor effect in conferring risk (or protection), and although these identified variants have yet to be confirmed, collectively these risk variants may play an important role in assessing the heritability of disorders such as bipolar disorder and schizo- phrenia. Currently, DISC1 remains unconfirmed as a candidate according to a genome- wide level of significance (no published study has detected variants within DISC1 below

8 the widely accepted threshold of p-value<5x10 ). However, due to the reasonably consis- tent supporting evidence from a number of small candidate gene studies, DISC1 remains high on the list of possible candidate genes for major psychiatric illness (Hennah & Porteous, 2009), and warrants follow up in larger cohorts.

Caption for Figure 3.1: Summary of Major Association Analysis Findings. Both images show the structure of DISC1 (NCBI37 bases 231762615- 232172578). A) The locations of a num- ber of the SNPs previously reported to be associated with schizophrenia and/or bipolar disorder.

(Hennah et al., 2003; Hodgkinson et al., 2004; Burdick et al., 2005; Callicott et al., 2005; Thomson et al.,

2005; Hashimoto et al., 2006; Chen et al., 2007; Qu et al., 2007; Wood et al., 2007; Kilpinen et al., 2008;

Hennah & Porteous, 2009). B) The locations of various haploblocks reported to be associated with schizophrenia and/or bipolar disorder, where the haploblock was given a name in the original study this is included above the block (Hennah et al., 2003; Hodgkinson et al., 2004; Callicott et al., 2005;

Thomson et al., 2005; Hashimoto et al., 2006; Liu et al., 2006; Maeda et al., 2006; Palo et al., 2007; Hen- nah & Porteous, 2009). Only SNPs that fall within the DISC1 locus are reported here, some of the associated haploblocks also contain SNPs within TSNAX. This figure was generated using the free- ware FancyGENE (http://host13.bioinfo3.ifom-ieo-campus.it/fancygene/).

84 3.1 Introduction

Figure 3.1: Summary of Major Association Analysis Findings.

85 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

3.1.2 Discovery of a Putative miRNA Interaction

A search was previously undertaken for SNPs within miRNA binding sites of candidate genes (not limited to DISC1) for psychiatric disorders. This search was focused on such SNPs based on knowledge of the regulatory function miRNAs play in gene expression. The search identified two such SNPs in the 3’UTR of the DISC1 gene (Oliver, 2009). The first, rs980989, located in the full length isoform of the gene, was found to have no association with bipolar disorder in a family based association study in the SIBS cohort. The second SNP, rs11122324 is located within a variant isoform of DISC1 comprising only three exons. It is predicted that this isoform encodes a short form of the protein in normal individuals (known as the Extra Short (ES) isoform) similar to the truncated protein observed in the Scottish translocation family. This SNP has a minor allele (A) frequency of about 30% and assessment in the SIBS cohort revealed a significant association with bipolar disorder (p-value=0.022), especially in females (p-value=0.0057) (Oliver, 2009).

This SNP falls within a predicted binding site of miR-575; a miRNA that, as yet, has not been confirmed to be expressed in the human brain. However, the gene for miR-575 is located in intron one of the host gene SCD5, a protein-coding gene which has been shown to be expressed in human brain (Lengi & Corl, 2007), and so it is likely that the miRNA will be co-expressed with the SCD5 gene in the brain. There is also evidence that variant, truncated isoforms of DISC1 that contain the 3’UTR where this SNP occurs are also expressed in human brain (Nakata et al., 2009). With both the isoform of DISC1 that contains the binding site for miRNA-575, and the SCD5 gene containing the gene for the miRNA being expressed in the brain, it seems possible that the two may interact. If an interaction does occur it may result in reduced expression of the ES isoform of DISC1 in the brains of affected individuals.

It is hypothesised that the ancestral (major) allele (G), as defined by dbSNP, at the rs11122324 locus within DISC1 is translationally repressing the ES isoform of DISC1 through interaction with miR-575. The risk variant (A), which may have decreased affinity for miR-575, results in increased expression of the ES isoform, and thus functional interference of the full length DISC1 protein.

A non-synonymous SNP rs3738401, previously described as a candidate risk variant (Palo et al., 2007), was also analysed in the SIBS cohort and did not reach the same level of sig- nificance (p-value=0.055), but the gender effect was still present with females having a more significant association (p-value=0.0148) (Oliver, 2009). This suggested that rs11122324 may

86 3.1 Introduction be more closely associated with the common causal variant in this population than the pre- viously published SNP. Given its potential functional effect, based on miRNA affinity and subsequent gene expression, it was itself a good candidate for the DISC1 variant responsi- ble for bipolar disorder susceptibility in the South Island population (and possibly others of similar ethnicities), warranting follow up in further data sets. For this reason the association of rs11122324 will be re-evaluated in this study using the larger available datasets.

3.1.3 The 11q14.3 Region

Substantial effort has gone into assessing DISC1 for causal variants; however, the other lo- cus affected by the translocation affecting the Scottish family as described in Chapter 1 (at chromosome 11q14.3), has been largely overlooked due to early publications suggesting an absence of genes in the region (Millar et al., 2000). The region of chromosome 11 that is in- volved in the translocation identified in the Scottish family was originally described to be 11q21 by St Clair et al. (1990). This region is also identified to be in the same vicinity as a region of chromosome 11 involved in a reciprocal translocation with chromosome nine in an independent family segregating with manic depressive illness (Smith et al., 1989). The details of this second translocation are not adequate to establish if these two translocations are actu- ally in close proximity, in light of the re-defined position of the Scottish family translocation to 11q14.3.

Recently, evidence that there are fusion proteins created as a result of the (1;11)(q42.1;q14.3) translocation is emerging, (Zhou et al., 2008, 2010) leaving the chromosome 11 locus as an additional region of interest. According to the genomic fusion sequence described in the original paper (Zhou et al., 2008) the region of 11q14.3 containing DISC1 Fusion Partner 1 (DISC1FP1) the transcript that is involved in the creation of the fusion proteins is 660 kb in length, spanning 89984400-90648220 bp (NCBI37). There are two possible fusion proteins created by DISC1FP1 and DISC1 (shown in Figure 3.2), the first of which is a fusion of exons 1-3 of DISC1FP1 with exons 9-13 of DISC1 and the second is a fusion of DISC1 exons 1-8 with exons 4-7 of DISC1FP1 (Zhou et al., 2010).

The mechanism by which fusion proteins may cause disease is somewhat more complicated than a simple loss or gain of function mutation. Fusion transcripts have been identified from both derivative chromosomes, the der(11) fusion being more common than the der(1) fusion (Brandon et al., 2009). It is suggested that this is because the open reading frames of the der(1) fusion transcripts lead to splice events which result in an identifiably abnormal transcript

87 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Figure 3.2: Predicted DISC1-DISC1FP1 Fusion Constructs. A) The DISC1 (red) and DISC1FP1 (blue) structures and B) the two predicted fusion constructs created by the (1;11)(q42.1;q14.3) translocation–the der(11) and der(1) fusions. Exon numbers shown in the fusion constructs are the numbers as per the original DISC1 or DISC1FP1. Constructs are not shown to scale. which is degraded. The der(11) fusion transcripts are more stable but they are not expected to be translated in an in vivo situation. However, these transcripts do encode a protein that has similar features to the DISC1 truncation protein used to disrupt Disc1 function in mice (Li et al., 2007). Further splice variants of such fusion transcripts have also been identified but as yet there are no in vivo translational products of these transcripts known.

3.1.4 This Study

Association of DISC1 (with particular interest for rs11122324) and 11q14.3 in bipolar disorder and schizophrenia will be investigated by data analysis of the regions in large cohorts. The GAIN and WTCCC data for bipolar disorder and the GAIN and non-GAIN data for schizo- phrenia (detailed descriptions of these datasets can be found in Chapter 2) will be analysed individually in the DISC1 and 11q14.3 regions, followed by a meta-analyses of various com- binations of these data.

3.1.4.1 Aims

The aims of this chapter are to:

1. Attempt to validate and further investigate the association of the rs11122324 SNP in these larger cohorts (compared to the SIBS cohort),

88 3.1 Introduction

2. To identify any other variants within the DISC1 locus that could play small but additive roles in the genetic contribution to major psychiatric illness, especially bipolar disorder and schizophrenia,

3. To identify possible contributory variants within the 11q14.3 region,

4. Compare any associations found to the current literature to provide a summary of the involvement of DISC1 in major psychiatric illness.

89 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

3.2 Results

3.2.1 In Silico Analysis of miR-575 and rs11122324

Hybridisation of the predicted miRNA binding site to the DISC1 ES isoform 3’UTR was con- ducted using the DINAmelt two state hybridisation tool (http://mfold.rna.albany.edu/ ?q=DINAMelt/Two-state-folding) and shows that binding has a lower free energy with the common allele (G) than with the minor allele (A) at rs11122324 (Figure 3.3). The free energy (DG) of each of these hybridisations are -13.7 and -6.6 kcal/mol respectively.

Figure 3.3: In silico analyses of the predicted hybridisation of miR-575 to a site in the 30UTR of the DISC1 ES isoform A) Base-pairing of the common G allele at rs11122324 or B) at the risk A allele of rs11122324. In both cases the circled nucleotide is that of rs11122324. In each case the miRNA is shown as the sequence on the left and the mRNA as the sequence on the right. Images produced 30th June 2010

90 3.2 Results

3.2.2 Assessment of Datasets

The quality of the publicly available data was assessed by generating Q-Q plots using all available genotyped SNPs across the genome. The plots for GAIN BPD, GAIN SCZ and non-GAIN SCZ (Figure 3.4 B-D) all show reasonable quality with inflation factors of < 1.1. The WTCCC BPD (Figure 3.4 A), however, shows a large deviation from the expected values (with an inflation factor of > 1.1) indicating that there are some internal issue with the dataset in its current form. For this reason the the WTCCC data was removed from this study.

Figure 3.4: Q-Q Plots of Genome Wide Data Q-Q plots showing the expected versus observed p- values for all genotyped SNPs in the A) WTCCC BPD (1.435) B) GAIN BPD (1.019) C) GAIN SCZ (1.078) and D) non-GAIN SCZ (1.073) datasets. Numbers shown in parenthesis are the inflation factors associated with each dataset.

91 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Each dataset that was used in the association analysis was also assessed for stratification across the specific regions to be analysed (DISC1 and 11q14.3). Analysis of the first two prin- ciple components in these data showed that in all datasets there was no substructure evident between the cases and controls (as shown in Figure 3.5) which provides further support that any associations found are not due to stratification between the cases and controls.

Figure 3.5: Principle Component Analysis iof DISC1 and 11q14.3 Regions. Graphs showing the first two principle components for A) DISC1 GAIN SCZ, B) 11q14.3 GAIN SCZ, C) DISC1 nonGAIN SCZ, D) 11q14.3 nonGAIN SCZ, E) DISC1 GAIN BP and F) 11q14.3 GAIN BP. Cases (red) and controls (blue) are shown to cluster together in all datasets.

92 3.2 Results

Sub-structure was seen in the combined GAIN and nonGAIN schizophrenia dataset across the DISC1 region (Figure 3.6). This is explained by a distinction between the GAIN and nonGAIN data at this region. However, the cases and controls remain unstructured, so com- parison of the cases to the controls is unaffected by this substructure.

Figure 3.6: Principle Component Analysis in Combined GAIN and nonGAIN Schizophrenia Dataset. Graph showing A) the distinct clustering of the GAIN (blue) versus nonGAIN (red) data and B) the unstructured clustering of cases (red) and controls (blue) at the DISC1 region in the combined GAIN and nonGAIN dataset for schizophrenia.

3.2.3 Association Analysis of rs11122324

The association of rs11122324 with bipolar disorder, identified in the SIBS cohort, was not confirmed in any of these larger datasets as is outlined in Table 3.1. Due to the association found in the SIBS cohort showing a female biased gender effect, the four datasets were split to assess possible associations in each of males and females separately (Table 3.2). The split- ting of the data gives no further evidence for association than the combined gender analysis did.

93 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Table 3.1: Association Analysis of rs11122324. Association of rs11122324 with psychiatric illness for each of the three datasets analysed.

Genotype OR

Cohort Status MAF GG GA AA p-value (95% CI)

GAIN BPD Cases 0.339 288 (0.444) 281 (0.434) 79 (0.123) 0.901 0.991 Controls 0.341 439 (0.429) 472 (0.461) 113 (0.110) (0.855, 1.148) GAIN SCZ Cases 0.350 582 (0.431) 589 (0.437) 178 (0.132) 0.259 1.066 Controls 0.336 597 (0.434) 634 (0.461) 145 (0.105) (0.954, 1.193) non-GAIN SCZ Cases 0.364 475 (0.414) 508 (0.443) 163 (0.142) 0.103 1.102 Controls 0.342 573 (0.426) 626 (0.465) 147 (0.109) (0.981, 1.238)

Table 3.2: Association Analysis of rs11122324 Split by Gender. Association of rs11122324 with psychiatric illness for each of the three datasets analysed, where the grey and white rows show male and female results respectively.

Genotype OR

Cohort Status MAF GG GA AA p-value (95% CI)

GAIN BPD Cases 0.368 130 (0.404) 147 (0.457) 45 (0.140) 0.246 1.129 Controls 0.340 226 (0.430) 242 (0.460) 58 (0.110) (0.920, 1.385) Cases 0.310 161 (0.485) 136 (0.410) 35 (0.105) 0.186 0.868 Controls 0.341 213 (0.428) 230 (0.462) 55 (0.110) (0.703, 1.071) GAIN SCZ Cases 0.352 405 (0.428) 417 (0.441) 124 (0.131) 0.544 1.047 Controls 0.341 270 (0.427) 293 (0.464) 69 (0.109) (0.902, 1.217) Cases 0.347 177 (0.439) 172 (0.427) 54 (0.134) 0.437 1.074 Controls 0.331 327 (0.440) 341 (0.458) 76 (0.102) (0.897, 1.287) non-GAIN SCZ Cases 0.366 331 (0.413) 353 (0.441) 117 (0.146) 0.569 1.045 Controls 0.356 275 (0.412) 310 (0.464) 83 (0.124) (0.898, 1.215) Cases 0.358 144 (0.417) 155 (0.449) 46 (0.133) 0.167 1.150 Controls 0.327 298 (0.440) 316 (0.466) 64 (0.094) (0.945, 1.388)

94 3.2 Results

3.2.3.1 Meta-Analysis of rs11122324

Combination of datasets was conducted to achieve meta-analyses of schizophrenia (GAIN SCZ and non-GAIN SCZ) and psychiatric illness as a whole (all three data sets). Meta- analyses of these combinations (Table 3.3) show a possible trend toward significance in the combined schizophrenia data (p-value = 0.057). When combined with the bipolar disorder cohort this association disappears. Again there is no further evidence for association in these cohorts after splitting the analysis by gender (Table 3.4). The reduction in sample size due to the splitting actually removes any trend towards significance.

Table 3.3: Meta-Analyses of rs11122324. Association of rs11122324 with psychiatric illness in the combined datasets.

Genotype OR

Cohort Status MAF GG GA AA p-value (95% CI)

All SCZ Cases 0.357 1057 (0.424) 1097 (0.440) 341 (0.137) 0.057 1.082 Controls 0.339 1170 (0.430) 1260 (0.463) 292 (0.107) (0.998, 1.172) All BPD and SCZ Cases 0.353 1345 (0.428) 1378 (0.438) 420 (0.134) 0.139 1.059 Controls 0.340 1177 (0.428) 1273 (0.463) 297 (0.108) (0.982, 1.143)

Table 3.4: Meta-Analyses of rs11122324 Split by Gender. Association of rs11122324 with psychi- atric illness for the combined datasets, where grey and white rows show male and female results respectively.

Genotype OR

Cohort Status MAF GG GA AA p-value (95% CI)

All SCZ Cases 0.358 736 (0.421) 770 (0.441) 241 (0.138) 0.444 1.042 Controls 0.349 545 (0.419) 603 (0.464) 152 (0.117) (0.937, 1.159) Cases 0.352 321 (0.429) 327 (0.437) 100 (0.134) 0.131 1.107 Controls 0.330 625 (0.440) 657 (0.462) 140 (0.098) (0.972, 1.263) All BPD and SCZ Cases 0.360 865 (0.419) 916 (0.443) 285 (0.138) 0.429 1.042 Controls 0.350 549 (0.417) 611 (0.465) 155 (0.118) (0.941, 1.154) Cases 0.340 480 (0.446) 462 (0.429) 135 (0.125) 0.479 1.044 Controls 0.330 628 (0.439) 662 (0.462) 142 (0.099) (0.927, 1.175)

95 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

3.2.4 Association Analysis of DISC1

In the absence of confirmation of the association of rs11122324 it was decided to investigate all of the genotyped SNPs in the DISC1 gene. The gene region analysed in this DISC1 analysis was defined to be SNPs from position 229780000 to 230300000 of chromosome one according to NCBI reference sequence 36.3 (2009). Results of the various analyses are detailed below, full results for all SNPs can be found in Appendix B.

3.2.4.1 Individual Dataset Analyses

A number of SNPs reach the significance threshold of p-value<0.05 in the GAIN BPD, GAIN SCZ and non-GAIN SCZ datasets across male only, female only and combined analyses (Fig- ure 3.7), however, none of these SNPs remain significant after correction for multiple testing (Figure 3.8).

3.2.4.2 Meta-Analyses of DISC1

To increase the power of this study, meta-analyses of combinations of the available data were conducted, the results of which are detailed below.

In the combined schizophrenia dataset there were a number of SNPs that reached signifi- cance across the male only, female only and combined analyses (see Figure 3.9 A), none of these SNPs remain significant after correction for multiple testing (see Figure 3.9 B).

Analysis of combined GAIN BPD, GAIN SCZ and non-GAIN SCZ datasets show interesting results for a cluster of three SNPs in intron four of the gene (Figure 3.10). The most significant of these three SNPs (rs11122331) is very close to reaching significance in females after correc- tion for multiple testing (Figure 3.10 B). The remaining two SNPs that are in strong linkage disequilibrium with rs11122331 (see Figure 3.11) are also trending towards significance in females, though none actually reach significance in this dataset. Details of these SNPs can be seen in Table 3.5.

These three SNPs were assessed for genotyping integrity and were all shown to have rea- sonable calling in all three datasets (the scatter plots from each dataset for each SNP can be found in Appendix B).

96 3.2 Results

Figure 3.7: Manhattan Plots of Uncorrected Individual DISC1 Analyses. Manhattan Plots of each of the three datasets analysed over the DISC1 region. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) GAIN BPD, B) GAIN SCZ, and C) non-GAIN SCZ. Each plot shows log p-value plotted against the chromosomal position of that SNP (in 10 Mbp). Any SNPs that fall above the black line have a p-value of <0.05.

97 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Figure 3.8: Manhattan Plots of Corrected Individual DISC1 Analyses. Manhattan Plots of each of the three datasets analysed over the DISC1 region after correction for multiple testing. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) GAIN BPD (n=178), B) GAIN SCZ (n=181) andC) non-GAIN SCZ (n=189). Each plot shows log of the 10 FDR p-value plotted against the chromosomal position of that SNP (in Mbp). Any SNPs that fall above the black line have a corrected p-value of <0.05. The number of test (n) corrected for in each dataset are shown in parenthesis.

98 3.2 Results

Figure 3.9: Manhattan Plots of the Combined GAIN and non-GAIN SCZ DISC1 Analyses. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) prior to correction and B) after correction for multiple testing (n=189). Plotted values are A) the log 10 of the p-value or B) the log of the FDR p-value, plotted against the chromosomal position of 10 that SNP (in Mbp). Any SNPs that fall above the black line have a p-value of <0.05. The number of test (n) corrected for in each dataset are shown in parenthesis.

99 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Figure 3.10: Manhattan Plots of the Combined GAIN and non-GAIN SCZ and GAIN BPD DISC1 Analyses. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) prior to correction and B) after correction for multiple testing (n=189). Plotted values are A) the log of the p-value or B) the log of the FDR p-value, plotted against the chromo- 10 10 somal position of that SNP (in Mbp). Any SNPs that fall above the black line have a p-value of <0.05. The number of test (n) corrected for in each dataset are shown in parenthesis.

100 3.2 Results

Figure 3.11: Haploview Image of the s11122331 SNP Cluster Haploview images created using data from A) the GAIN and non-GAIN datasets and B) the HapMap 2 dataset. Both images show that the three SNPs are in high linkage disequilibrium. Values shown are r2 values.

Table 3.5: Meta-Analysis of Psychiatric Illness in Combined BPD and SCZ Dataset.. The clus- ter of SNPs trending towards corrected significance in the combined GAIN BPD, GAIN SCZ and non-GAIN SCZ dataset. Dark grey, light grey and white rows represent combined, female only and male only analyses respectively. The alleles for each SNP are rs11122331 T and A, rs1538979 A and G and rs11122330 C and T are represented as 1 and 2 respectively. FDR p-values are based on correction for 189 tests.

Genotype FDR OR

Cohort Status MAF 11 12 22 p-value p-value (95% CI)

rs11122331 Case 0.128 2386 (0.760) 700 (0.223) 53 (0.017) 0.052 0.659 0.900 Control 0.141 2019 (0.738) 661 (0.242) 54 (0.020) (0.809, 1.001) 4 Case 0.118 837 (0.778) 224 (0.208) 15 (0.014) 3x10 0.052 0.736 Control 0.154 1016 (0.714) 376 (0.264) 31 (0.022) (0.623, 0.869) Case 0.134 1549 (0.751) 476 (0.231) 38 (0.018) 0.370 0.808 1.069 Control 0.126 1003 (0.765) 285(0.217) 23 (0.018) (0.924, 1.237) rs1538979 Case 0.137 2351 (0.747) 730 (0.232) 65 (0.021) 0.065 0.659 0.907 Control 0.149 1995 (0.726) 691 (0.251) 63 (0.023) (0.818, 1.006) Case 0.129 820 (0.761) 238 (0.221) 20 (0.019) 0.001 0.087 0.767 Control 0.162 1003 (0.701) 393 (0.275) 35 (0.024) (0.653, 0.901) Case 0.141 1531 (0.740) 492 (0.238) 45 (0.022) 0.455 0.810 1.056 Control 0.134 992 (0.753) 298 (0.226) 28 (0.021) (0.916, 1.217) rs11122330 Case 0.131 2358 (0.756) 704 (0.226) 55 (0.018) 0.085 0.659 0.911 Control 0.142 2015 (0.738) 659 (0.241) 57 (0.021) (0.820, 1.013) Case 0.122 825 (0.773) 224 (0.210) 18 (0.017) 0.001 0.087 0.765 Control 0.154 1017 (0.715) 375 (0.264) 31 (0.022) (0.649, 0.902) Case 0.135 1533 (0.748) 480 (0.234) 37 (0.018) 0.431 0.810 1.060 Control 0.128 998 (0.763) 284 (0.217) 26 (0.020) (0.917, 1.226)

101 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

3.2.4.3 Comparison to Literature

The SNPs rs11122330 and rs1538979 both have published associations with psychiatric ill- ness. The former has been shown to be weakly associated with schizophrenia (G allele, p-value = 0.018) and in affective disorder (G allele, p-value = 0.012) prior to correction (Wood et al., 2007). The latter is significantly associated with bipolar disorder type 1, after correction, in a cohort of Finnish males (T allele, corrected p-value=0.016, OR[95%CI] = 2.73[1.42, 5.27]) (Hennah & Porteous, 2009). Wood et al. (2007) also identified uncorrected significance for a fourth SNP in this haploblock, rs12046794, with schizophrenia (A allele, p-value = 0.034). This SNP is not included in the current study, though given its inclusion in this haploblock (see Figure 3.12) it is tagged by the other three SNPs.

Figure 3.12: Haploview Image with Published Fourth SNP. HapMap 2 dataset haploblock shown earlier with the inclusion of a fourth SNP (rs12046794) in linkage disequilibrium. Val- ues shown are r2 values.

Meta analysis of data for rs1538979 from this study, with published data for this SNP (Hen- nah & Porteous, 2009; Schumacher et al., 2009) reveals a slight increase in significance for a protective effect in schizophrenia with a combined analysis in males and females (OR[95%CI] = 0.89[0.82, 0.97], p-value = 0.007) when compared to an analysis of the published literature alone (OR[95%CI] = 0.87[0.77, 0.98], p-value = 0.024) (Figure 3.13 A and B). The addition of data from the current study to a combined schizophrenia and bipolar disorder analysis fur- ther reveals a protective effect (OR[95%CI] = 0.92[0.86, 0.99], p-value = 0.020) that was not seen with the published data alone (OR[95%CI] = 0.94[0.85, 1.04], p-value = 0.242) (E and F). No effect is seen in the bipolar disorder meta analysis at this SNP (C and D).

102 3.2 Results

Figure 3.13: STATA Analysis of rs1538979. GAIN and non-GAIN studies refer to the data used in this study, Germany refers to Schumacher et al. (2009) and Finland, UK, Wales and Scotland refer to Hennah & Porteous (2009). Analyses are shown as follows A) schizophrenia with the current study included, B) schizophrenia literature only, C) bipolar disorder with the current study included, D) bipolar disorder literature only, E) schizophrenia and bipolar disorder with the current study included, F) schizophrenia and bipolar disorder literature only. Where the disease is not specified against the study in E and F it is schizophrenia.

103 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

3.2.5 Association Analysis of 11q14.3

The entire 11q14.3 region (defined to be SNPs from position 88760352 to 92460352 of chromo- some 11 according to NCBI reference sequence build 37.1 (2010)) was analysed for reasons previously detailed in Chapter 2, Section 2.3.3.1. This region was analysed in the same man- ner as the DISC1 region and the results of these analyses are detailed below. Full results for all SNPs are detailed in Appendix B.

3.2.5.1 Individual Dataset Analyses

Analysis of the 11q14.3 region shows that although there are significant associations present in the GAIN BPD and GAIN SCZ data sets prior to correction (Figure 3.14 A-B) this sig- nificance is lost after FDR correction for multiple testing (Figure 3.15 A-B). This result was maintained in the gender split analyses.

The non-GAIN SCZ data (Figures 3.14 C and 3.15 C) showed corrected significance for a number of SNPs across the gender combined and gender split analyses which are detailed in Table 3.6.

104 3.2 Results

Figure 3.14: Manhattan Plots of the Uncorrected Individual Chromosome 11q14.3 Analyses. Manhattan Plots of each of the three datasets for analysis of the 11q14.3 region. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) GAIN BPD, B) GAIN SCZ, and C) non-GAIN SCZ. Each plot shows log p-value plotted against the chromosomal 10 position of that SNP (in Mbp). Any SNPs that fall above the black line have a p-value of <0.05.

105 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Figure 3.15: Manhattan Plots of the Corrected Individual Chromosome 11q14.3 Analyses. Manhattan Plots of each of the three datasets for analysis of the 11q14.3 region after correction for multiple testing. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) GAIN BPD (n=833), B) GAIN SCZ (n=836) andC) non-GAIN SCZ (n=836). Each plot shows log of the FDR p-value plotted against the chromosomal position of that SNP (in Mbp). 10 Any SNPs that fall above the black line have a corrected p-value of <0.05. The number of test (n) corrected for in each dataset are shown in parenthesis.

106 3.2 Results

Table 3.6: SNPs Reaching Corrected Significance in the non-GAIN SCZ Dataset. Associations, with corrected significance, found in the analysis of the non-GAIN SCZ dataset. Dark grey, light grey and white rows show combined, male only and female only analyses respectively. FDR p-values are based on correction for 836 tests.

Genotype FDR OR

Cohort Status MAF 11 12 22 p-value p-value (95% CI)

5 rs7124944 Case 0.145 840 (0.731) 286 (0.249) 23 (0.020) 5.04x10 0.022 1.416 Control 0.107 1078 (0.800) 251 (0.186) 18 (0.013) (1.196,1.677) 5 rs35003084 Case 0.198 508 (0.639) 260 (0.327) 27 (0.034) 7.17x10 0.033 1.491 Control 0.142 486 (0.736) 161 (0.244) 13 (0.020) (1.223, 1.817) 4 rs7124944 Case 0.156 571 (0.711) 214 (0.267) 18 (0.022) 1.38x10 0.033 1.529 Control 0.108 535 (0.800) 124 (0.185) 10 (0.015) (1.228, 1.904) 4 rs2509382 Case 0.194 522 (0.650) 250 (0.311) 31 (0.039) 1.74x10 0.033 1.457 Control 0.142 496 (0.741) 156 (0.233) 17 (0.025) (1.196, 1.774) 4 rs12787172 Case 0.202 511 (0.636) 259 (0.323) 33 (0.041) 1.89x10 0.033 1.444 Control 0.150 486 (0.726) 166 (0.248) 17 (0.025) (1.190, 1.752) 4 rs3018365 Case 0.141 591 (0.739) 193 (0.241) 16 (0.020) 2.38x10 0.035 1.537 Control 0.096 542 (0.815) 118 (0.177) 5 (0.008) (1.220, 1.935) 4 rs1404531 Case 0.203 508 (0.633) 264 (0.329) 31 (0.039) 3.31x10 0.036 1.421 Control 0.152 483 (0.723) 167 (0.250) 18 (0.027) (1.172, 1.723) 4 rs1894134 Case 0.360 330 (0.411) 367 (0.458) 105 (0.131) 3.32x10 0.036 1.329 Control 0.297 321 (0.481) 297 (0.445) 50 (0.075) (1.138, 1.552) 4 rs11019229 Case 0.202 504 (0.640) 250 (0.317) 34 (0.043) 5.05x10 0.050 1.411 Control 0.152 476 (0.723) 164 (0.249) 18 (0.027) (1.161, 1.713) 5 rs7944181 Case 0.052 310 (0.896) 36 (0.104) 0 (0.000) 1.36x10 0.012 3.041 Control 0.018 653 (0.965) 24 (0.035) 0 (0.000) (1.799, 5.140) 5 rs7931883 Case 0.048 312 (0.904) 33 (0.096) 0 (0.000) 5.37x10 0.019 2.907 Control 0.017 654 (0.966) 23 (0.034) 0 (0.000) (1.693, 4.990) 5 rs11019188 Case 0.045 314 (0.910) 31 (0.090) 0 (0.000) 6.49x10 0.019 2.986 Control 0.016 656 (0.969) 21 (0.031) 0 (0.000) (1.703, 5.237) 5 rs12420663 Case 0.061 304 (0.879) 42 (0.121) 0 (0.000) 2.05x10 0.044 2.303 Control 0.027 641 (0.945) 37 (0.055) 0 (0.000) (1.466, 3.619)

107 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

3.2.5.2 Meta Analyses of 11q14.31

Meta-analysis of the GAIN BPD, GAIN SCZ and non-GAIN SCZ datasets across the 11q14.3 region revealed no significant SNPs in combined or gender split analyses (Figure 3.16).

In analysis of only the schizophrenia data (GAIN SCZ and non-GAIN SCZ datasets) there were no SNPs with corrected significance observed for the combined or female only analyses (Figure 3.17).

Corrected significance was reached in the male only analysis for the group of SNPs detailed in Table 3.7. This significance was retained after accounting for the gender split analy- ses.

Table 3.7: Schizophrenia Meta-Analysis in Males. Analysis of the GAIN and non-GAIN SCZ dataset. FDR p-values are based on correction for 836 tests.

Genotype FDR OR

Cohort Status MAF 11 12 22 p-value p-value (95% CI)

5 rs2509382 Case 0.182 1074 (0.673) 466 (0.292) 57 (0.036) 4.83x10 0.024 1.353 Control 0.141 886 (0.741) 283 (0.237) 27 (0.023) (1.169,1.566) 5 rs35003084 Case 0.186 1050 (0.662) 485 (0.306) 52 (0.033) 8.18x10 0.024 1.338 Control 0.146 862 (0.729) 296 (0.250) 24 (0.020) (1.157,1.547) 4 rs12787172 Case 0.188 1055 (0.661) 483 (0.302) 59 (0.037) 1.13x10 0.024 1.326 Control 0.149 867 (0.725) 302 (0.253) 27 (0.023) (1.149,1.53) 4 rs1404531 Case 0.190 1048 (0.657) 490 (0.307) 58 (0.036) 1.59x10 0.027 1.316 Control 0.151 863 (0.723) 301 (0.252) 30 (0.025) (1.141,1.518) 4 rs11019229 Case 0.186 1052 (0.666) 467 (0.296) 61 (0.039) 2.90x10 0.040 1.305 Control 0.149 858 (0.726) 295 (0.250) 29 (0.025) (1.130, 1.508)

1Components of this section published, see: Debono, R., Topless, R., Markie, D., Black, M.A. and Merriman, T.R. (2012). Analysis of the DISC1 translocation partner (11q14.3) in genetic risk of schizophrenia. Genes Brain Behav 11(7), 859–63.

108 3.2 Results

Figure 3.16: Manhattan Plots of the Combined GAIN and non-GAIN SCZ and BPD 11q14.3 Analyses. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) prior to correction and B) after correction for multiple testing (n=836). Plotted values are A) the log of the p-value or B) the log of the FDR p-value, plotted against the chromosomal 10 10 position of that SNP (in Mbp). Any SNPs that fall above the black line have a p-value of <0.05. The number of test (n) corrected for in each dataset are shown in parenthesis.

109 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Figure 3.17: Manhattan Plots of the Combined GAIN and non-GAIN SCZ 11q14.3 Analyses. Each plot shows the combined (black), male only (blue) and female only (red) analysis. A) prior to correction and B) after correction for multiple testing (n=836). Plotted values are A) the log 10 of the p-value or B) the log of the FDR p-value, plotted against the chromosomal position of 10 that SNP (in Mbp). Any SNPs that fall above the black line have a p-value of <0.05. The number of test (n) corrected for in each dataset are shown in parenthesis.

110 3.2 Results

The five significant SNPs are in high linkage disequilibrium as is shown in Figure 3.18. Geno- typing integrity was assessed for the five SNPs in the GAIN and non-GAIN datasets and all were found to be called well (scatterplots of the genotype calling can be found in Appendix B).

Figure 3.18: Haploview Image of Chromosome 11 SNPs. Haploblock containing the five signif- icant SNPs in the 11q14.3 region of the GAIN and non-GAIN SCZ dataset. Values shown are r2 values.

3.2.6 Population Stratification

Population stratification was assessed for the combined GAIN and non-GAIN data using v1.0 EIGENSTRAT (Li & Yu, 2008b). In the male only analysis it was 1.11, given that this factor is close to that typically expected (<1.1) (Yang et al., 2011a) and that the minor allele frequency of the most significant SNP, rs2509382, is very similar in diverse populations from the 1000 Genomes Study (European (EUR)=0.161, African (AFR)=0.152, Asian (ASN)=0.124) it is unlikely that population stratification is playing a significant role in our data.

111 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

3.3 Discussion

3.3.1 GWAS Analysis of Datasets in the Literature

All of the datasets used in this thesis had been previously evaluated by genome wide associa- tion scans, and none have revealed variants within DISC1 or the 11q14.3 region as significant at the genome wide level.

The WTCCC dataset analysis revealed a single significant locus for bipolar disorder (Well- come Trust Case-Control Consortium, 2007). The significant region was at chromosome 16p12, with the most significant SNP being rs420259, where a recessive model of the A allele provides a two-fold increase in risk in a genotypic association (OR[95%CI]=2.08[1.60-2.71], 8 p-value=6.29x10 ). This association was not replicated by the expanded reference group; however, a further four regions showed some evidence of association in this larger group. No association was found for SNPs in the DISC1 region, though the analysis cannot exclude the gene as a candidate as the power to detect associations with a relative risk of 1.3 for alle- les with minor allele frequencies of less than 0.05 was less than 40%, meaning that there may be rare variants that remain undetected.

GWAS of the MGS sample (containing the GAIN and nonGAIN samples) did not reveal any significant associations at the genome-wide level (Shi et al., 2009). Again associations in DISC1 cannot be ruled out as the authors state that they only had the power to detect risk alleles with minor allele frequencies of 30-60% at a relative risk of 1.3.

Although Shi et al. (2009) did not take a candidate gene approach to these data, they do provide detail on the results from their GWAS for previously described candidate genes. One of these genes is DISC1, but only the ES isoform is assessed (for details on the isoforms of DISC1 see Section 1.3.2 in Chapter 1). Results for five SNPs are given in the supplementary data; these results are shown in Table 3.8.

Interestingly one of the SNPs shown is rs11122324, which was the SNP of interest in the miRNA hypothesis due to its location in the 3’UTR of this ES isoform. In the analysis of this SNP in this dataset (GAIN and nonGAIN SCZ) in the current study the p-value = 0.057,

112 3.3 Discussion

Table 3.8: MGS DISC1 ES Isoform Results. Summary of results from the Molecular Genetics of Schizophrenia (MGS) genome wide association scan for DISC1 ES isoform SNPs.

SNP Allele OR p-value

rs2793092 G 1.106 0.034 rs12044355 C 1.089 0.038 rs11122324 T 1.088 0.038 rs12076286 G 1.107 0.044 rs12130935 A 1.144 0.048

OR = 1.082. The difference in values between studies can be attributed to different methods employed in the exclusion of individuals.

A meta-analysis of the MGS data with several other schizophrenia cohorts (totalling 12945 cases and 34591 controls) shows significant associations for seven regions, four on chromo- some 6 (MHC), one on chromosome 11 (NRGN) and one on chromosome 18 (TCF4) (Stefans- son et al., 2009).

The data genotyped under the GAIN bipolar disorder initiative has also been assessed by GWAS, with the strongest associations found for rs5907577 located at Xq27.1 (p-value = 1.6 6 6 x10 ) and rs10193871 in the NAP5 gene within 2q21.2 (p-value = 9.8 x10 ), neither reach genome-wide levels of significance (Smith et al., 2009).

The current study was examining candidate regions rather than the genome as a whole and for this reason the threshold for significance can be set lower due to the smaller number of tests being done (here a p-value <0.05 after FDR correction). Variants with smaller popu- lation effect sizes can therefore be detected more sensitively as there is less noise from the overall genome.

At the time that the analyses in this section of the thesis were performed, no candidate gene analysis of DISC1 or of the 11q14.3 region had been undertaken with these datasets, although subsequently such an investigation was carried out on the DISC1 region by Mathieson et al. (2012), which is discussed in Section 3.3.8.

3.3.2 Exclusion of the WTCCC Dataset

In the original analysis of the WTCCC dataset, following genotyping of the samples, several rigorous quality control steps were carried out. This resulted in 4.8% of individuals and 6.3%

113 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS of SNPs being removed from all analyses (Wellcome Trust Case-Control Consortium, 2007). Samples were excluded for a variety of reasons including contamination, false identity or ancestry, high levels of missing data and unexpected levels of heterozygosity. The removal of these SNPs and individuals removed any major substructure from the cohort and resulted in a dataset that was deemed to be suitable for analysis.

The dataset made publicly available by the WTCCC group (that which was to be used in this study) is the raw data collected in its entirety, along with lists of individuals that should be removed for each disease. After excluding these individuals from the bipolar disorder cohort the genomic inflation was still estimated to be 1.25 (which is greater than the threshold value of 1.1—for a more detailed explanation of genomic inflation see Chapter 5, Section 5.2.2). Inflation of this nature could be explained by the cases and controls being genotyped (or called) on opposite strands. However, this does not seem likely in this case as visual inspection of a small number of SNPs reveals no inconsistencies in the calling. Further, a strand conversion of the data to the forward strand reference of the 1000 genomes study dataset provides no improvement to the level of inflation.

The other cohorts used in this study, namely the GAIN and nonGAIN datasets, had under- gone quality control measures to remove explainable variation prior to the commencement of analysis (this was completed by Associate Research Fellow Ruth Topless). The quality con- trol carried out on these data was similar to that described above (as was completed by the WTCCC analysis team) and also included the removal of duplicate samples. This meant that upon the commencement of analysis in the current study the GAIN and nonGAIN datasets were already of a usable standard with regard to substructure and as such the calculations for genomic inflations were under the threshold of 1.1 allowing this data to be used in sub- sequent analyses.

This was not the case for the WTCCC data. It was not deemed feasible to proceed with quality control of these data as a part of the current study due to the time required to ensure all SNPs and individuals were screened adequately and then subsequently removed. On this basis it was decided that while the GAIN and nonGAIN datasets were sufficiently corrected (with inflation factors of <1.1) the WTCCC dataset (with inflation factor of 1.4) would be excluded from the analyses conducted in this research project.

114 3.3 Discussion

3.3.3 Separate Analysis of the GAIN and nonGAIN Datasets

The individuals phenotyped for the GAIN and nonGAIN datasets were all collected at once in a single broader cohort known as the MGS cohort (as was described in Chapter 2). The distinction between the two is that the participants were genotyped in two stages approxi- mately six months apart. This genotyping was carried out for both sets of participants on the Affymetrix R 6.0 SNP array and due to this the GAIN and nonGAIN are often considered to be a single dataset (Shi et al., 2009).

In the current study, while the datasets were assessed together as a single cohort (as com- bined GAIN and nonGAIN schizophrenia) they were also analysed as separate cohorts. This was done for a number of reasons: 1) the data are provided as two separate datasets, 2) due to this the QC of the data were conducted independently for each dataset and so, there may be batch effects in the genotyping between GAIN and nonGAIN that is not identified by the QC done, and 3) principle component analysis of the DISC1 region for the combined dataset revealed that the GAIN and nonGAIN data cluster separately (as was shown in Figure 3.6). Taken together these factors provide support for the separate analysis of these datasets, even though the GAIN and nonGAIN data are both part of the broader MGS study.

3.3.4 Meta-Analysis Assumptions and Problems

The combining of datasets to create a meta-analysis assessment for associated variants pro- duced a maximised sample size but also a more heterogenous cohort due to the combination of patients with bipolar disorder and schizophrenia. However, due to the initial evidence that DISC1, as identified in the Scottish translocation family, is involved in a multitude of psychiatric illnesses including bipolar disorder and schizophrenia, the basis of combining these disorders is supported. Additionally, the Cross-Disorder Group of the Psychiatric Ge- nomics Consortium (2013) calculated that there was a genetic correlation of 0.68 ( 0.04 s.e.) ± between bipolar disorder and schizophrenia.

3.3.5 Validity of Splitting Analysis by Gender

Sex bias is a phenomenon that often arises in the analysis of psychiatric illness and indeed in DISC1 analyses (Hennah et al., 2003; Hashimoto et al., 2006; Chen et al., 2007; Hennah & Porteous, 2009; Schumacher et al., 2009). An example of this is the HEP3 haplotype,

115 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS made up of rs751229 (T allele) and rs3738401 (A allele). This haplotype is shown to have a protective effect in schizophrenia; however, this protection is only significant in females (p-value=0.00024) whereas in males p-value=0.38 (Hennah et al., 2003), showing that the sep- arate analysis of genders is required in this region. A second study reports that there is an over-transmission of this same haplotype to males (p-value=0.008) and not to females (Palo et al., 2007). Such results exemplify the fact that the effects of this gene are hugely hetero- geneous between populations and therefore it is currently impossible to provide a definitive answer as to which variants are causative.

3.3.6 Multiple Testing Correction

As discussed in Chapter 1 there are numerous methods that can be employed to account for multiple testing in studies such as that conducted here. Permutation analysis is currently the gold standard but is unfortunately computationally very intensive and for this reason was not feasible in the current study. Bonferroni correction is often used in GWAS studies but this is a very stringent method and does not account for linkage disequilibrium, which means that the number of tests being conducted is certainly less than the number of tests actually being corrected for.

The threshold for significance used in this thesis was a p-value of <0.05 after correction for multiple testing. This threshold was applied to each analysis separately and included correc- tion for all SNPs tested within the cohort. The method of correction used in this thesis (False Discovery Rate (FDR)) was chosen as a middle ground between permutation and Bonfer- roni. FDR correction reduces the type 2 error (false-negatives) at the expense of increasing the rate of type 1 error (false positives). While this method still does not account for linkage disequilibrium the step up method of FDR correction, which increases a for each test (see Chapter 1), means that it is less stringent than the Bonferroni method. Given that there is less multiple testing present in a candidate gene study compared to a GWAS a less stringent method of correction is appropriate. While no method is perfect the increase in type 1 error and hence the lower level of stringency allowed by FDR correction is therefore reasonable in a candidate gene study.

116 3.3 Discussion

3.3.7 Conclusions on rs11122324 Association with Psychiatric Illness

The inability to validate the rs11122324 association found in the SIBS cohort may be at- tributed to one or a combination of the following reasons. The first explanation is that the finding in the SIBS cohort, and perhaps the findings of other small studies, are not real but are actually artifacts of the small study size. Small studies have the capacity to produce such artifacts, as a variant may be over-represented in the cases compared to controls purely by chance.

This false inflation of effect sizes is known as the Winners Curse effect. Due to this small, under-powered studies require variants to have large effect sizes in order to adequately de- tect them. In a small study, if a variant has by random chance a larger effect within the sample population it will be detected, and further reported on has having a significant effect when in fact across the population it may not have a massive effect at all. In larger studies these artifacts disappear, as the proportion of cases carrying the variant is not as large, and thus these fluctuations are removed.

A second explanation is the large amount of heterogeneity which is inherent in disorders such as bipolar disorder and schizophrenia. This heterogeneity is due to the wide diagnos- tic spectra and also the varying interpretations of this spectra between practitioners, which results in differing diagnoses of similar patients. This phenomenon is demonstrated by pre- vious studies of complex disorders giving such varied results between cohorts. Good, consis- tent phenotyping is therefore important in data collection, and while continuity is hopefully maintained within datasets, distinct datasets may have variations in diagnosis resulting in problems for meta-analysis.

Perhaps as further understanding of these complex diseases is established the diagnostic power will become more refined resulting in downstream analyses of patients increasing in accuracy. It is not possible to distinguish which of these is the case in the inability to repro- duce the association of rs11122324 with bipolar disorder (or find one with schizophrenia), but the results of this study show that there is not an observable association in the cohorts tested and so the previous result from the SIBS study must be interpreted with caution.

3.3.7.1 miR-575 and rs11122324

The DINAmelt two state hybridisation tool predicts that with the common allele (G) there is a perfect match in complementarity throughout the seed region of the miRNA from posi-

117 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS tions two to seven, followed by an A. This is known as a 7mer-1A target site. With the single base change at rs11122324, this complementarity is lost. This seed region (position 2-7 of the miRNA) is the minimum requirement for binding at a target site while still retaining speci- ficity (Lewis et al., 2005). The fact that this requirement is not fulfilled with the presence of an A allele at rs11122324 suggests that the affinity for binding is reduced and so the regulation of the transcript is also reduced.

The calculation of thermodynamic stability for RNA duplexes using the DINAmelt server (Markham & Zuker, 2005) gives a free energy (DG) of -13.7 kcal/mol for hybridisation be- tween miR-575 and the common allele at rs11122324, consistent with a functional interaction that would produce translational repression of short forms of the DISC1 protein (Doench & Sharp, 2004). In contrast, the risk allele at rs11122324 produces a DG of -6.6 kcal/mol, a reduction in affinity for miR-575. However, although thermodynamic stability is a useful predictor for miRNA target sites, the attributes of functional sites are not completely under- stood and predicted sites require experimental confirmation.

Further confirmation of the binding was found through the polymiRTS database (http:// compbio.uthsc.edu/miRSNP) which states that there is a binding site for miR-575 within the 30UTR of the DISC1 ES isoform at the sequence gcaCTGGCTAcga, which is the same sequence as found in the DINAmelt analysis (30th June 2010).

Reassessment of these databases six months later revealed that the binding prediction had changed (this was most likely due to implementation of a new algorithm) resulting in a smaller difference in binding affinity between the two SNPs. The DG values for the risk (A) allele and the common (G) allele variants of the DISC1 ES isoform became -4.6 and -9.6 kcal/mol respectively.

The research by Doench & Sharp (2004) used luciferase assays to assess the level of energy change that would point to a functional interaction. Their evidence suggested that a DG -5 was unlikely to indicate a real functional interaction, while a DG -6 or less was more likely  to indicate an interaction, the likelihood of which increased as the DG decreased. Those with a predicted DG of around -6 kcal/mol were found to be more equivocal.

Using either algorithm the binding affinity of the miRNA with DISC1 is higher when the common allele (G) at rs11122324 is present rather than the minor allele (A). The estimated predictions could even be considered sufficient to suggest that the miRNA binds with the common allele but not with the risk allele. The implication of this is that expression of the

118 3.3 Discussion

ES isoform would no longer be repressed (or at least would be less repressed) in the pres- ence of the risk allele, leading to potential functional interference with the full length DISC1 protein.

It is important to consider, however, that these are only in silico predictions and have been shown to be inaccurate at times (Ragan et al., 2011), particularly when taking into account other cellular factors that can affect binding aside from just free energy such as the concen- tration of the miRNA and mRNA and that the two are actually expressed together.

The inability to reproduce the observed genetic association at rs11122324, as well as the vari- ability in the binding affinity predictions based on the algorithm used resulted in the termi- nation of investigation into this hypothesis. It is now believed that the association found in the SIBS cohort was found by chance and that the SNP rs11122324 is not a causative SNP in psychiatric illnesses including bipolar disorder.

3.3.8 Discussion of DISC1 Association with Psychiatric Illness

Recently meta-analyses have been adopted as a means to gain more power from the bipolar disorder and schizophrenia datasets available. The first study of this kind conducted to identify associations between DISC1 and schizophrenia combined nine cohorts of European decent to analyse 50 SNPs in over 10 000 individuals and identified a single SNP (rs17817356) as significantly associated after correction (Schumacher et al., 2009). This SNP (rs17817356) falls within the intron 4-6 interval, which in the Schumacher study contained three additional SNPs that reached significance prior to correction. The SNP identified as significant prior to correction for multiple testing in the current study (rs11122331), as well as the other two SNPs in the described haploblock (rs11122330 and rs1538979), fall within this previously implicated region (within intron four of the gene). One of the additional SNPs found in the Schumacher study (rs1538979) was found to be significant when combined with a second SNP (rs821633).

Schumacher et al. (2009) also identified that intron nine of DISC1 is of particular interest in schizophrenia as a risk interval. This interval was initially found to be significant in a study by Hodgkinson et al. (2004), which found three significant SNPs within this region. Two of these three SNPs were found to be significant by Schumacher’s group (rs999710 p- value =0.0002 and rs9432024: p-value =0.0013). The results of this study seem to trend in the direction supporting the two associated intervals (intron 9 and introns 4-6) identified by

119 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

Schumacher et al. (2009), but no corrected significance was found. There were a number of SNPs, particularly in the female only analysis that reached significance prior to correction. Included in these SNPs was rs9432024 (p-value =0.048 [OR(95%CI) = 0.888 (0.789, 0.999)]), as well as a second SNP that also appears on the Schumacher list (rs9431708 p-value =0.041 [OR(95%CI) = 0.889 (0.795, 0.995)]).

More recently (after the current study was conducted) the most comprehensive meta-analysis to date was conducted by Mathieson et al. (2012), which included the data used by Schu- macher et al. (2009). The group analysed 1241 SNPs in a cohort of over 11 000 cases and 15 000 controls identified and found no evidence for association between DISC1 and schizophrenia (Mathieson et al., 2012). The authors find the region with the best evidence for an associa- tion with schizophrenia is intron six, which was included in one of the regions identified to trend towards association by Schumacher et al. (2009). These results are interesting in light of the many associations that have been identified by scores of smaller studies. Although Mathieson et al. (2012) admit that their study is underpowered to identify rare variants, they fairly conclusively state that there are no common variants in DISC1 that are associated with schizophrenia.

Opposing results such as these emphasise that if DISC1 is associated with psychiatric illness, there are a vast number of contributing factors that vary extensively between populations and genders, indicating that current association analysis findings alone are not adequate to implicate DISC1 as a causative gene.

3.3.8.1 The Haploblock Containing rs11122331

The SNP rs1538979 has previously been found to be associated with bipolar disorder and schizophrenia. However, there is inter-study heterogeneity regarding the extent and direc- tion of association, as well as the gender with which the association is found.

In a Finnish cohort the T allele at this locus increases the risk of bipolar disorder (type I) in males (OR[95%CI]=2.73 [1.42-5.57], corrected p-value=0.016). This significance is lost when the analysis is conducted with combined cohorts. The same allele provides risk in a cohort of females with bipolar disorder from London and protection in a cohort of males with schizo- phrenia from Aberdeen (Hennah & Porteous, 2009). Another study identifies rs1538979 to be a risk variant in males with schizophrenia when a second SNP, rs821633, is used as a con- ditional marker (Schumacher et al., 2009). The combination of the rs1538979 A allele and the

120 3.3 Discussion rs821633 G allele give an OR of 1.57 with p-value 0.016. This interplay was also observed in the previous study but producing risk in females with schizophrenia when the rs821633 C allele was combined with the T allele of rs1538979.

The addition of the data from this thesis for rs1538979 increases an effect of protection in schizophrenia and in combined schizophrenia and bipolar disorder seen in these aforemen- tioned studies. Taken together this suggests that the rs1538979 variant may be involved in the aetiology of bipolar disorder and schizophrenia; however, there are inconsistencies iden- tified with regard to the phenotype, gender and the direction of effect involving this SNP dependent upon the population under investigation.

A second SNP from the cluster has also been reported. A trend towards association was found for rs11122330 (the GG genotype had a p-value of 0.00833 and an allelic p-value of 0.01808) in schizophrenia and schizo-affective disorder, although the association did not withstand Bonferroni correction and the direction of association was not reported (Wood et al., 2007).

The SNP that almost reached significance (rs11122331) in the female cohort of the current study, is the third SNP of this described cluster and has not been previously reported on. It is of note, when considering the results described in this section, that none of the three SNPs making up this haploblock were found to be significant by Mathieson et al. (2012), casting doubt on the validity of such results.

3.3.9 Discussion of 11q14.3 Association with Psychiatric Illness

The region of chromosome 11 that is seen to create a fusion protein as a result of the translo- cation is in fact much smaller than the q14.3 region analysed here, (as is described in section 3.1.3) with the actual break point occurring between bases 90361098 and 90361099. The po- sitions of the nominally associated SNPs identified in this study are in fact outside of the actual gene region described for the fusion proteins (from 8-131kb upstream of the predicted DISC1FP site, see Figure 3.19). However, this does not exclude the SNPs as candidates in schizophrenia, as intergenic SNPs are causative in common disease (Cunnington et al., 2010; Carvajal-Carmona et al., 2011).

This specific fusion remains unique to the Scottish family though the results of this study suggest that this region is a further candidate in psychiatric illness. The possibility of fu- sion proteins does implicate the chromosome 11 region as a candidate region for psychiatric

121 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS illnesses and thus the association of SNPs in the general area do warrant further investiga- tion.

Figure 3.19: DISC1FP structure. The structure of DISC1FP (black) with the breakpoint of the t(1:11)(q42.1:q14.3) translocation (red) and the relative location of the significant SNPs in this study (blue) shown. This figure was generated using the freeware FancyGENE (http://host13. bioinfo3.ifom-ieo-campus.it/fancygene/).

There is not currently a large literature base regarding the fusion proteins described here, but there is some evidence suggesting a role for the fusions in psychiatric illness. Firstly, it has been found that one of the fusion proteins (DISC1-DISC1FP) is insoluble (Zhou et al., 2010), a characteristic of the DISC1 protein as well (Leliveld et al., 2008). This insolubility is seen to cause aggregation of the DISC1 protein in the brains of schizophrenia patients and may limit the protein’s ability to efficiently interact with other proteins. Additionally, the DISC1- DISC1FP1 version of the fusion protein has been identified to cluster in the mitochondria and destroy membrane potential (Eykelenboom et al., 2012).

3.3.10 Power

The power of this study is sufficient, so that conclusions can be drawn for any SNP found to have an effect of OR>1.3 (or OR<0.7), where the minor allele frequency is at least 10%, in the gender combined datasets (see Figures 3.20). Therefore it can be concluded that there are

122 3.3 Discussion no common SNPs with an effect size of greater than OR = 1.3 that are significantly associated with schizophrenia in this dataset. Based on the analysis this study was adequately powered to detect the associations that were found to be significant in this region; however, there is no correction applied to account for possible ’winners curse effect’.

Figure 3.20: Statistical Power Graph. Graph shows the achievable power at given minor al- lele frequencies for a range of odds ratios in cohorts of equivalent size to the combined GAIN and non-GAIN SCZ datasets for combined genders (A), males (B) and females (C). The p-value threshold used in these calculations was 0.05.

The study is still underpowered to detect an effect with an odds ratio of <1.3. The size of the study was also limiting in the assessment of rare variants (as is often the case in case-control studies), and although variants with minor allele frequencies of as low as 1% were tested, the ability to detect an association at this level was very low.

The power discussed here is that of the combined schizophrenia analyses. In the individual analyses of the GAIN and non-GAIN data sets the numbers of individuals and thus the power achievable at each level was lower and the same is true for the gender split analyses particularly in females (see Figure 3.20 C). For this reason the combining of the datasets, even

123 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS in light of the significant results in a single individual dataset (as was the case for non-GAIN schizophrenia in the 11q14.3 analysis for example), is justified.

3.3.11 Psychiatric Genomics Consortium Results

Since this study was performed, a collection of data from various GWAS studies investigat- ing both schizophrenia and bipolar disorder has been made freely available, mega-analysis of this data has been undertaken by the Psychiatric Genomics Consortium (http://www.med. unc.edu/pgc/results).

The results of these mega-analyses were analysed for the SNPs of interest from the current study, the results for each of these SNPs for the schizophrenia (Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium, 2011), and bipolar disorder (Psychi- atric GWAS Consortium Bipolar Disorder Working Group, 2011a) analyses are shown in Ta- ble 3.9.

Table 3.9: Psychiatric Genomics Consortium Results. Summary of results from the Psychiatric Genomics Consortium for SNPs of interest in Schizophrenia (SCZ) and Bipolar Disorder (BPD) cohorts. Where results for a SNP were not available NA is shown.

Dataset SNP Chromosome MAF p-value OR SCZ rs11122324 1 0.37 0.779 1.01 rs11122330 1 0.15 0.024 1.08 rs11122331 1 0.14 0.035 0.93 rs1538979 1 0.16 0.050 0.94 rs35003084 11 0.16 0.140 0.96 rs1404531 11 0.16 0.140 0.96 rs11019229 11 0.16 0.183 1.04 rs12787172 11 0.16 0.165 0.96 rs2509382 11 0.14 0.127 1.04 BPD rs11122324 1 0.31 0.750 1.01 rs11122330 1 NA NA NA rs11122331 1 0.16 0.665 1.02 rs1538979 1 0.17 0.640 0.98 rs35003084 11 NA NA NA rs1404531 11 0.16 0.577 1.02 rs11019229 11 0.15 0.577 1.02 rs12787172 11 0.16 0.554 0.98 rs2509382 11 0.14 0.835 0.99

124 3.3 Discussion

The numbers of individuals analysed for schizophrenia were 17836 cases and 33859 controls and in bipolar disorder, 11974 cases and 51792 controls. Though the cluster of DISC1 SNPs (rs11122330, rs11122331 and rs1538979) show p-values < 0.05, none of the SNPs detailed in 8 Table 3.9 reach the genome-wide level of significance (5x10 ) required for truly significant association in any of the disorders.

3.3.12 Conclusions

Despite having analysed DISC1 and 11q14.3 in a cohort of up to 3148 cases and 2725 controls very little support for DISC1 was identified. This is a finding that mirrors that of Mathieson et al. (2012), albeit in a smaller cohort, but with the inclusion of bipolar disorder as well as schizophrenia. This lack of findings could be explained if DISC1 is truly not associated with psychiatric illness, or due to the limitations of this study discussed below. The analysis of the 11q14.3 region showed some promising results, in the male cohort for a cluster of five SNPs, results that will require confirmation in other datasets. Of course the limitations discussed below have relevance to both of the regions analysed in this study.

Firstly this study although large, was underpowered to detect effects of OR < 1.3 (or >0.7). Although this study can rule out the existence of any common variant within these regions as having a large effect on the risk of developing schizophrenia or bipolar disorder (where OR>1.3) variants that may cause a minor effect cannot be ruled out. This poses quite a problem in disorders of complex nature such as psychiatric illnesses because it is likely that the contributing variants are numerous, and individually, of very small effect.

Additionally, the power limitations of the cohort size used here restricted the analysis to only include SNPs with a minor allele frequency of at least 1% and even at this level the power is very low. This excludes any SNPs that may have a large effect but are so rare in the population that they go unnoticed in a study such as this, where they are not well represented. Such rare variants have been shown to be associated with psychiatric illness (Walsh et al., 2008; Xu et al., 2008; Need et al., 2009; Levinson et al., 2011; St Clair et al., 1990), including the single variant that initially identified DISC1 as a candidate gene, a translocation that occurred within a single Scottish family.

The search for rare variants is becoming more feasible with the advent of relatively inex- pensive, deep sequencing methods. This technique has been demonstrated in schizophrenia (Myers et al., 2011), where rare mis-sense variants were found in excess among schizophrenia

125 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS cases in genes previously suggested to be associated with the disorder. Studies of this nature come with their own limitations, the most important to control for is that of population sub- structure. When looking for rare variants, the subtle differences between individuals from slightly different backgrounds can have a large confounding effect. Research on rare vari- ants in DISC1 is just beginning and has provided some evidence for association of variants present in less than 1% of the population with schizophrenia, bipolar disorder and recurrent major depression (Thomson et al., 2014; Green et al., 2011), but there is still a great deal to be done.

R A further limitation of this study is that only Affymetrix 6.0 genotyped SNPs (that had minor allele frequencies of at least 0.01) were analysed. This meant that in the DISC1 region approximately 180 SNPs were analysed, when in total there are 948 genotyped SNPs within the analysed region (according to the HapMap Project database (phases 1, 2 and 3)). So only approximately 19% of the known SNPs in the region were captured in this analysis. The same restrictions are also true of the chromosome 11 analysis. However, the variants

R genotyped on the Affymetrix 6.0 array capture on average 83% of the common variation in the caucasian genome (Li et al., 2008b).

Although the analyses undertaken in this study were comprehensive with regard to the data available, it was by no means an exhaustive analysis of the regions. Even despite the appar- ent lack of evidence for DISC1, further investigation is still required before the regions can be ruled out as candidates contributing to psychiatric illness. The future direction of genetic association in complex disease will turn towards deep sequencing to identify (or rule out) rare and copy number variants.

3.3.12.1 Summary

In this study meta-analyses to identify risk variants in DISC1 and 11q14.3 contributing to bipolar disorder and schizophrenia were undertaken, a first for these specific regions, in these datasets. The analyses revealed a small number of variants that appear to be conferring some risk or protection to the major psychiatric illnesses bipolar disorder and schizophrenia. The strongest association found in DISC1 was for a protective effect against psychiatric ill- ness at rs11122331 (A allele) in females. Two additional SNPs (rs11122330 and rs1538979) that are in high linkage with this SNP were also found to be trending towards significance, but none of these SNPs do actually reach significance in this data. In the chromosome 11 region

126 3.3 Discussion a group of SNPs were identified to be significantly associated with schizophrenia (rs2509382, rs35003084, rs12787172 and rs1404531) in the male schizophrenia dataset.

Although some support was found for the involvement of DISC1 in psychiatric illness, the findings are not directly supportive of associations identified in other studies. While some additional support for the involvement of DISC1 in schizophrenia and bipolar disorder is achieved in a meta-analysis of rs1538979 with published data and the data from this study, a role for common variants in the aetiology of these disorders remains unconfirmed, a re- sult that is in line with the largest relevant meta-analysis to date, that of Mathieson et al. (2012).

127 3. ASSOCIATION OF DISC1 AND 11Q14.3 WITH PSYCHIATRIC ILLNESS

128 Chapter 4

Discovery of Proteins Interacting with DISC1

4.1 Introduction

Biological processes are the result of complex interactions between multiple proteins en- coded by many genes. Often studying the pathways in which a particular protein or gene participates can provide insight into the function of that gene or protein. To understand in which pathways a specific protein is acting, it is useful to discover which proteins directly (or indirectly) interact with a protein of interest. Due to the physical interactions that are identified it can be assumed that proteins are acting in the same pathways. From this initial information hypotheses as to the function (or functions) of that protein of interest can be formed.

4.1.1 DISC1

Despite the popularity of DISC1 as a candidate for major psychiatric illness over the past decades, the information is very limited regarding any knowledge of its functions and hence the pathways in which it might act to cause such phenotypes. Expression of the DISC1

129 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1 gene has been identified in the brain and a few brain-related functions for the protein have been investigated, as were detailed in Chapter 1. The picture is still not clear, but evidence suggests involvement in multiple pathways for DISC1.

The DISC1 protein has a globular head domain and a helical tail domain, a structure which is indicative that the protein acts in a scaffold role (Cheung et al., 2007). The putative role as a scaffold protein has been a well-supported function for DISC1 with a number of interaction partners identified. The majority of these interactions were found through yeast two-hybrid analysis of the protein. A total of approximately 111 interacting proteins have so far been published from these analyses (Millar et al., 2003; Morris et al., 2003; Camargo et al., 2007). These three studies overlap slightly (see Figure 4.1) in interactions identified, but as they do not overlap completely, it can be assumed that the data is not yet saturated.

Figure 4.1: Overlap in Published Yeast Two-Hybrid Analyses. An unweighted venn dia- gram showing the overlap in the three main published yeast two-hybrid analyses of DISC1, (Millar et al., 2003; Morris et al., 2003; Camargo et al., 2007). Image was created using http: //www.pangloss.com/seidel/Protocols/venn.cgi.

4.1.2 Yeast Two-Hybrid Analysis

Yeast two-hybrid analysis is a genetic methodology used in the identification of protein- protein interactions and was originally described in 1989 (Fields & Song, 1989). The original paper describes the system as a way to assess the interactors of a known protein using a simple galactose selection method in the yeast Saccharomyces cerevisiae.

130 4.1 Introduction

This method relies on the fact that the GAL4 transcription factor has two functional domains that act independently of one another: the first consists of a DNA binding domain, which recognises the upstream activation sequence of the GAL genes and the second, an activation domain that is required in the recruitment of the machinery necessary for transcription. In the two-hybrid system this requirement is exploited by creating two hybrid constructs, the first consisting of the known protein bound to the binding domain (the bait) and the second, a library of genes each bound to the activating domain (the preys). If there is a physical interaction between the protein of the bait hybrid and a prey protein then there is also an interaction between the binding and activation domains of the GAL4 transcription factor, allowing the transcription of the downstream reporter (see Figure 4.2).

Figure 4.2: Yeast 2-Hybrid Mechanism of Action. The protein of interest (X) is expressed as a fusion protein with the binding domain (BD) of the GAL4 gene downstream of the reporter gene. This construct is known as the bait. Prey fusion proteins are a combination of potential interactors (Y and Z) with the activating domain (AD) of the GAL4 gene. A) Where there is a lack of physical interaction between X and a prey protein (in this case Z) the binding domain can bind to the GAL4 sequence but cannot activate transcription of the reporter gene due to the lack of interaction with the activating domain. B) Where there is an interaction between the bait and the prey (this time carrying protein Y) the binding domain and the activating domain act together to initiate the transcription of the reporter gene. UAS is the upstream activating sequence. Figure adapted from Fields & Song (1989).

131 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

This original method is still used today, although now there are a number of variations on the technique including using bacteria such as Escherichia coli instead of yeast (Dove & Hochschild, 1998; Hurt et al., 2003) as well as its use to identify DNA-protein interactions (Joung et al., 2000).

The reporter gene can be any gene that will result in a differential expression of cells that contain interactions compared to those that do not and must be under the control of an upstream activating sequence (UAS). There are a number of different indicators that can be used as reporters. Auxotrophic markers are a common method of positive selection. If there is an interaction the cell is able to grow on a nutrient deficient media because the reporter gene is such that the lacking nutrient is made endogenously. Examples of such reporters are the HIS3 and LEU2 genes (Young, 1998). Other methods of reporting include colorimetric (e.g. LacZ) and sensitivity or counter selection (e.g. CHY2) (Leanna & Hannink, 1996).

Results from such studies can be further assessed for functional enrichment to determine if any subset of the interacting proteins fall into a specific biological process pathway and from this the putative function of the gene under investigation can be inferred.

4.1.2.1 Implementing the System

Since the original method was described in 1989, a multitude of variations on the traditional system have been used. These variations include different vectors, hosts, reporters and prey libraries among other things (Fields & Song, 1989; James et al., 1996; Brent & Finley, 1997) which were all developed based on their specific advantages to various different screens. Due to these available variations, it is necessary to consider the choices prior to carrying out a screen and make decisions based on the best method for the desired application. The main considerations to be made and the decisions pertinent to this study are outlined below.

4.1.2.2 Choice of Prey Library

There are currently a number of prey libraries commercially available, which are composed of cDNAs from various tissues and species. For the purposes of the current study it was decided that human cDNA should be used, as the goal of the screen is to identify protein interactions that might tell us something about the function of DISC1 in humans. The ul- timate decision was to use a normalised, universal human cDNA library (Clontech). The normalisation of the library provides an advantage in a yeast two-hybrid screen as it gives

132 4.1 Introduction all interactions an approximately equal opportunity to be identified. In a non-normalised li- brary, those proteins which are more highly expressed will emerge from a screen at a higher frequency purely due to their abundance and not necessarily due to a strong interaction, which may result in a skewed assessment of function.

The universal library was chosen as opposed to a brain specific library to ensure the screen was not limited to interactions only occurring in the brain. In light of the aims of this study being to link the function of DISC1 to the phenotype of major psychiatric illness (or, more simply, disorders of the brain), the universal library and the resulting interactions identified from it should provide us with a more comprehensive knowledge of the putative functions of DISC1. The universal library that was used included fetal brain cDNA, which allows identification of interactions that may occur only during early development, an obviously important time in brain development.

4.1.2.3 Choice of Vector System and Construction of Baits

The construction of baits requires the insertion of a gene or partial gene open reading frame into a vector. This inserted sequence must be cloned downstream of the GAL4 binding do- main and open reading frame, to ensure the binding domain and inserted sequence are formed into a fusion protein. The insertion of the gene fragment can be achieved through restriction digest and ligation or through a process known as gap repair cloning (Oldenburg et al., 1997). Due to its simplicity the gap repair cloning method was used to construct the baits used in this study.

Gap repair cloning requires the use of vectors that contain elements allowing for homologous recombination. Here the plasmid pGBAE-B (Semple et al., 2005) was used which contains the GAL4 binding domain and also the TRP1 gene which can be used as a selectable marker in yeast. In addition to those requirements, pGBAE-B contains the necessary single restriction site (in this case BamHI) flanked by the homologous recombination sequences, (in this case attB1 and attB2), required for gap repair cloning. With the basis of the method being centred on homologous recombination, when amplifying the bait sequences from cDNA, primers must include the attB1 and attB2 sequences at their 50 ends. Once the linearised plasmid and the attB flanked open reading frames are introduced to a yeast host, the process of gap repair is employed by the yeast to recombine the free DNA into the plasmid via homologous recombination (see Figure 4.3).

133 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Figure 4.3: Gap Repair Cloning. The bait open reading frame (ORF) is amplified by PCR using primers that are flanked by the homologous attB sequences. The cloning vector is linearised using the restriction site between the two attB sites (here BamHI). These two fragments are then co- transfected into a yeast host where gap repair occurs via homologous recombination. The attB1 and the attB2 sites are shown here as red and purple boxes respectively. The binding domain (BD) in the figure is depicted as a yellow box.

4.1.2.4 Host Strains and Reporter Genes

The potential for false positive results in yeast two-hybrid analysis mean it is important to effectively and efficiently select against such results (the specifics of how this was achieved in this study are discussed in Section 4.3.1.3). Due to the limitations posed by lack of usable

134 4.1 Introduction antibiotic resistance genes available in yeast this selection is applied by complementation of biosynthetic genes that are deficient in a host strain. Therefore, it is necessary to select haploid hosts based on their engineered gene deficiencies and to be aware of the supple- mentation that will be required for such strains to grow.

The strains that were used in this study, AH109 and Y187 (detailed descriptions of the geno- types of these strains can be found in Chapter 2, Section 2.17.4.1), are both deficient in TRP1 and LEU2. This deficiency can be used to select for cells that have successfully transformed the plasmid containing the bait or prey. Additionally, these yeast strains are conditionally deficient in ADE2 and HIS3. These genes, which are under the control of different GAL promoters, should only be functional in the presence of GAL4 function, which requires an interaction between bait and prey (although this is not always the case, for further discus- sion see Section 4.3.1.3). These strains also contain a third, independent, reporter based on the MEL1 gene and blue/white selection in the presence of X-a-galactosidase.

4.1.3 Validation of Physical Interaction

Due to the presence of false positive results, interactions identified via yeast two-hybrid analysis require confirmation to assess if those interactions are artifactual or if they are truly occurring in human cells (for reasons discussed in Section 4.3.1.3). This confirmation process is able to strengthen the confidence of the results from the initial screen, but is often still not definitive.

Methods used to determine such an interaction typically assess the temporal and spatial expression of the two proteins to identify if a true, biologically meaningful interaction, is firstly, possible and secondly, actually occurring.

The gold standard for assessing physical interaction of two proteins is to capture the en- dogenous pair by specific antibody in a co-immunoprecipitation analysis. However, when screening a large number of interactions this method requires adjustment to account for the availability and cost of individual antibodies and also the identification of suitable cell lines for the interaction to be observed. For these reasons it is often necessary to create a sys- tem that is somewhat more artificial by making modifications to the expression or levels of proteins in the cells (see Sections 4.1.3.1 and 4.1.3.2).

Commonly used methods to assess interaction, in place of the capture of endogenous protein, include co-localisation, pulldown analysis, affinity capture with mass spectrometry and label

135 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1 transfer analysis. These methods all have advantages and disadvantages with respect to the level of artificial manipulation and ability to identify stable or more transient interactions and as such it is prudent to attempt validation by more than one method.

4.1.3.1 Epitope Tagging

As mentioned, to assess interactions that are occurring in cells, a method of detecting the proteins participating in that interaction is required. The most accurate way to achieve this is to buy or create antibodies against the proteins of interest. However, in practice, this is a costly and time consuming process, especially when considering the interaction being assessed may not actually exist, and thus a protein specific antibody will not be required in downstream applications.

A second, more cost effective screening method, is to attach an epitope tag to one or both of the proteins and use antibodies against those tags to assess the interaction status. Not only is this approach cheaper, it is also faster than the individual antibody approach as multiple pro- teins can be tested in parallel with a single antibody. The use of epitope tags does eliminate the ability to test for purely endogenous interactions because as at least one of the proteins being tested must contain the epitope; something that does not occur naturally.

4.1.3.2 Artificial Over-expression

The use of endogenous or transfected protein to assess interaction is an important consider- ation. As mentioned above the use of transfected protein can be advantageous as it allows for the addition of markers and tags that can be used further down the line; however, the use of endogenous protein is generally accepted as the preferred model simply because it is less artificial.

The use of endogenous protein requires that a cell line expressing both of the desired proteins is used, and so cell lines must be tested for endogenous expression prior to commencement. This identification of a suitable cell line can be difficult and time consuming, and for this reason proteins of interest are often introduced into the cells by transfection. Transfection of proteins creates an artificial environment because the proteins that have been introduced can be expressed at very high levels within the cells.

136 4.1 Introduction

Although the levels of transfected protein can be manipulated to more closely resemble the in vivo levels of expression, it can be difficult to achieve. One must first know the in vivo levels of expression and have a transfection system that allows such levels to be introduced, this can be difficult especially if the expression of the protein needs to be very low.

Over-expression of proteins may result in interactions being identified that may not occur if the proteins are only expressed at normal biological levels because the over-expression may force the protein into a sub-cellular localisation that it does not normally occupy, thus allowing for an interaction that may not occur naturally. Artificial expression of proteins also completely overrides any temporal expression of endogenous proteins.

4.1.4 This Study

In addition to a yeast two-hybrid screen to be carried out as a part of this research project (with the intention of identifying protein-interaction partners for DISC1) data were available from further screens undertaken by the students of the Wellcome Trust Advanced Course (WTAC) in Functional Genomics and Systems Biology (Cambridge), directed by Dr. David Markie (Department of Pathology, University of Otago). The first screen conducted at the 2009 WTAC course and a second conducted in parallel to this study at the 2010 course.

These additional screens were carried out using the same protocol as were to be used in this study. The results of these screens are currently unpublished and the raw data can be found in Appendix C (Summary of Y2H Results). The screens revealed a list of 556 distinct potential interacting proteins across four baits. A small number of these interacting proteins had been identified in similar, small-scale studies by Millar et al. (2003), Morris et al. (2003) and Camargo et al. (2007). The overlap between the published studies and the WTAC can be visualised in Figure 4.4.

This WTAC protein interaction dataset is preliminary, with the majority of interactions re- maining unconfirmed. The fact that there is not a complete overlap between the data ob- tained by the Wellcome Trust Advanced Course and that published by other groups suggests that the analysis of interacting proteins is not yet saturated.

137 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Figure 4.4: Overlap of Published DISC1 Interacting Proteins and the WTAC Data An un- weighted venn diagram of the published DISC1 interacting proteins and those found in the WTAC screens. Image created using http://www.pangloss.com/seidel/Protocols/venn.cgi.

4.1.4.1 Hypothesis and Aims

Based on the current data available from yeast two-hybrid analyses it is hypothesised that DISC1 has a function in either the structuring or regulation of intracellular proteins. In order to extend knowledge of the cellular role of this gene a large yeast two-hybrid screen will be performed. This will help identify any further interactants of DISC1 that were missed in the initial analysis. Any new data from this yeast two-hybrid analysis will be combined with the current data available from the two WTAC screens and the published interactions in the literature. With this data Gene Set Enrichment Analysis will be undertaken to identify functional enrichment in the dataset. This should provide a clearer picture of the function of the gene and allow a further refined hypothesis of DISC1 function.

138 4.2 Results

4.2 Results

4.2.1 Library Screen

Library screens were undertaken using four baits that were constructed for, and used in, the WTAC screens undertaken prior to the commencement of this study (as discussed in Section 4.1.4). The exact composition of the four baits is shown in Figure 4.5, bait one is DISC1(L)(1-349), which contains codons one through 349, bait two is DISC1(L)(300-605) con- taining codons 300 through 605, bait three DISC1(L)(450-670) containing codons 450 through 670 and bait four DISC1(L)(590-854) containing codons 590 through 854. Together these four baits overlap to cover the entire length of the DISC1 L isoform. Care was taken when design- ing these baits that known domains were kept intact within bait fragments.

Figure 4.5: Baits for DISC1 The labelled box represents the full length DISC1 cDNA, Baits 1- 4 are shown as blue lines, these overlap to obtain full coverage of this DISC1 cDNA sequence. Numbers under each bait construct refer to the first and last amino acids included in that bait. The vertical red boxes indicate the known domains considered when designing the baits, again the amino acid span for each is shown underneath.

Each of the four baits was used in a yeast two-hybrid screen (as detailed in Chapter 2, Section 2.11) by mating with a Universal Human Normalized Mate and Plate Library (Clontech). Each mating was first spread across plates containing media deficient in leucine, tryptophan

139 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1 and histidine (-leu/-trp/-his plates) to select for diploids with possible interactions. The colonies were then replica plated onto plates further deficient in adenine (-leu/-trp/-his/- ade plates) to check for activation of the second reporter gene.

DNA was extracted from all clones present on the adenine reporter plates using the Sodium Hydroxide method detailed in Chapter 2, Section 2.10.1. Clones were amplified by PCR and examples of the PCR products can be seen in Figure 4.6.

Figure 4.6: PCR Amplification of Preys. Examples of agarose gels run with the PCR products of the screen (A) using 96 clones of bait 1 and (B) 96 clones of bait 3. The marker (M) used in all cases was Roche DNA Molecular Weight Marker XIV (100 bp DNA size marker).

Following PCR the clones were sequenced and the resulting sequence was used to identify and compile a list of interacting proteins (as is described in Chapter 2, Section 2.11.3). An example of the obtained sequence is shown in Figure 4.7.

Figure 4.7: Example of Sequence from Yeast Two-Hybrid Analysis. An example of the sequence trace obtained from a single clone in the yeast two-hybrid screen.

140 4.2 Results

Each of the four screens was carried out once as a part of this research project and important features of each screen are detailed in Table 4.1.

Table 4.1: Results of DISC1 Yeast Two-Hybrid Screen. Important features of the four yeast two-hybrid screens conducted using the DISC1 baits. The number of individual clones that had DNA extracted, that successfully amplified and had sequence analysis attempted and finally that sequenced successfully are shown.

Bait DNA Extractions Amplified1 Successful Sequences 1 337 192 162 2 288 96 82 3 424 192 173 4 390 96 88 Total 1439 576 505

1 Not all successful amplifications were sequenced - only full 96-well plates were sent for sequence analysis.

To expand the screen size the results of the screens described above were then combined with results obtained through screens conducted by the Functional Genomics and Systems Biology students of the Wellcome Trust Advanced Course (WTAC) over two years (2009- 2010). These screens were performed using the same four baits and identical protocols to the screen described here. The combined results of all of these screens are detailed in Table 4.2. A list of all proteins identified in this screen can be found in Appendix C (Summary Y2H Results).

Table 4.2: Summary of DISC1 Yeast Two-Hybrid Screens. Novel interactions refers to the num- ber of interactions that are present that have not been previously published they are not neces- sarily novel within these three screens. The total number of novel interactions across these three screens was 822.

Screen Bait Clones Screened Total Interactions Total Unique Interactions Novel Interactions This Study DISC1(L)(1-349) 192 162 135 132 DISC1(L)(300-605) 96 82 52 48 DISC1(L)(490-670) 192 173 144 141 DISC1(L)(590-854) 96 88 80 75 WTAC 2009 DISC1(L)(1-349) 270 195 112 107 DISC1(L)(300-605) 231 176 65 59 DISC1(L)(490-670) 156 111 78 74 DISC1(L)(590-854) 110 86 48 43 WTAC 2010 DISC1(L)(1-349) 82 61 35 35 DISC1(L)(300-605) 46 35 20 19 DISC1(L)(490-670) 338 279 134 124 DISC1(L)(590-854) 398 245 208 201 Totals 2207 1693 1111 1058

141 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

4.2.2 Analysis of Interacting Proteins

Given the large number of potentially interacting proteins identified (822), these had to be analysed to identify possible functional themes, to obtain a subset of genes that could be used for further research.

This analysis was done using the freeware ToppGene (http://toppgene.cchmc.org/), and the ToppFun method which is based on enrichment of the annotated functions of the interact- ing proteins in Gene Ontology (www.geneontology.org/). The results of the analysis, using a combination of the results from this study and the WTAC screens as well as with the addition of all published DISC1 interacting proteins, for biological processes and cellular component (all conducted on 20 July 2010), are shown in Figures 4.8 and 4.9 respectively.

Each bar in the graphs represents the number of genes from the tested gene list that has some connection to the gene ontology term. The p-value is an indication of the probability of the genes in the list connecting with that term, when compared with a random selection of genes from throughout the genome in other words, a significant p-value indicates that a term derived from the tested list of genes is unlikely to have occurred by chance.

142 4.2 Results

Figure 4.8: ToppGene Enrichment Analysis. Enrichment analysis results for Biological Pro- cesses and Cellular Components of the dataset including the current study and the WTAC inter- acting proteins (total of 822 proteins included). The number of genes from the list that fall within each of the represented categories are shown by the bars, while the line indicates the p-value associated with the enrichment at each category.

143 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Figure 4.9: ToppGene Enrichment Analysis Including Published Interactions. Enrichment analysis results for Biological Processes and Cellular Components of the dataset including the current study and the WTAC interacting proteins as well as all currently published DISC1 inter- acting proteins (total of 905 proteins included. The number of genes from the list that fall within each of the represented categories are shown by the bars, while the line indicates the p-value associated with the enrichment at each category.

144 4.2 Results

Further network analysis was conducted using Ingenuity Pathway Analysis (IPA) software. A core analysis of the complete dataset (that is the interactions both from this study and the published literature) was performed (analysis conducted in July 2010) using the Ingenuity Knowledge Base (genes only) as a reference set and accepting both direct and indirect rela- tionships. The top five associated network functions from this analysis are shown in Table 4.3.

Table 4.3: Associated Network Functions from IPA Analysis. Summary of the top five networks identified in the IPA core analysis. The score of each analysis refers to the negative base ten logarithm of a p-value that measures the likelihood of the genes in a given network occurring by chance.

ID Associated Network Function Score

1 Developmental Disorder, Genetic Disorder, Cellular Function and Maintenance 49 2 Molecular Transport, Cellular Assembly and Organisation 47 3 Cellular Function and Maintenance, Cellular Assembly and Organisation, Cellular Movement 47 4 Cellular Assembly and Organisation, cell Cycle, DNA Replication, Recombination and Repair 37 5 Nucleic Acid Metabolism, Lipid Metabolism, Post-Translational Modification 35

4.2.2.1 Individual Protein Analysis

Proteins were individually screened for known function, searching for genes that fit with the initial hypothesis of structuring or regulation of intracellular proteins and the results of the enrichment analysis (molecular transport, etc). This was achieved by assessing the known functions of the proteins as described by NCBI. Proteins that seemed to be of interest as the result of this were noted for further investigation.

4.2.2.2 Hypothesis Development

Following the assessment of the interacting protein list via enrichment analysis and individ- ual protein evaluation a working hypothesis was developed. This hypothesis is that DISC1 has a role in the construction and maintenance of the primary cilia of cells. For a discussion on how this hypothesis was developed see Section 4.3.3.

145 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

4.2.3 Preys chosen for Further Investigation

From the results of the enrichment and functional analyses conducted in the previous sec- tions, of the preys identified in the initial screens, a short list of proteins for further investi- gation was made based on the following criteria:

1. The prey must have more than one independent hit from the combined WTAC screens and this study. (It was decided to use only proteins with multiple independent hits as these are more reliable interactions for reasons discussed in Section 4.3.1.3). 2. The prey must have a known function relevant to the hypothesis of primary cilia in- volvement. 3. The prey must be unique to this screen (including WTAC) i.e. not published or studied prior (in this capacity). 4. The prey will preferentially have a small coding region (<2500 bp) for ease of amplifi- cation without mutation.

The preys chosen for follow up analysis, based on the above criteria, are outlined and briefly described in Table 4.4. In addition positive and negative control genes to be used in subse- quent experiments were also chosen.

As a positive control FEZ1 was chosen based on evidence from Miyoshi et al. (2003) that the two proteins co-localise in neuronal growth cones and evidence of both biochemical and functional interactions between FEZ1 and DISC1 in the regulation of dendrite development (Kang et al., 2011). Of note is the fact that FEZ1 was not one of the interactions identified in the yeast two-hybrid screens conducted here or by the WTAC students.

RAD51 was chosen as a negative control. It has never been identified as being an interacting partner of DISC1 and had previously been cloned into the appropriate vector system for an unrelated project and as such was readily available for use in this capacity.

146 4.2 Results

Table 4.4: Preys Chosen for Confirmation. Brief descriptions of the preys selected to be con- firmed for interaction with DISC1 in mammalian cells. Information is summarised from the NCBI Gene database.

cDNA Independent Prey Name Location Length (bp) Hits Description

ATP6V1B2 8p21.3 1533 2 One component of a larger multi-subunit enzyme (V-ATPase) which is re- sponsible for acidifying intracellular in eukaryotes. This pro- cess is important in protein sorting, endocytosis and proton gradient gen- eration. CEP70 3q22.3 1791 5 A component of the centrosome, this protein has roles in the assembly and elongation of . CHMP2B 3p11.2 639 8 Makes up part of the Endosomal Sorting Complex Required for Transport III (ESCRT-III). This complex is involved in recycling of cell surface recep- tors, and the construction of vesicles containing endosomal components. It is known to be expressed in most neurons. CLINT1 5q33.3 1929 2 Interacts with Clathrin and has similarity to the endocytic adapter pro- teins. It is thought to be involved in the transport of vesicles between the Golgi and endosomes. Mutations in this gene have been linked to psychiatric illness. EXOC5 14q22.3 2124 2 One component of the exocyst complex, which is necessary for the tar- geted docking of vesicles to the plasma membrane and is also involved in the generation of cell surface polarity. Other components of the exocyst complex have been associated with ciliopathies. FEZ1 11q24.2 1179 0 This will act as a positive control. An orthologue of a gene responsible for axonal bundling and elongation in Caenorhabditis elegans. IFT81 12q24.13 2028 8 Known to be involved in the processes of cilium assembly and intra- ciliary transport, particularly in motile cilium. NUP62CL Xq22.3 552 8 Found within the nuclear pore complexes and identified to be involved in protein transport. RAB11A 15q22.31 648 36 A member of the Rab family which are GTPases and may be involved in the transport and localisation of proteins. RAB5C 17q21.2 648 2 Another member of the Rab family, thought to ensure correct localisation of proteins. SNX2 5q23 1557 6 Belonging to the sorting nexins family SNX2 has a role in protein sort- ing within the endocytic pathway. It is identified as having roles in cell communication and protein transport. SNX5 20p11 1212 3 Another of the sorting nexins, with roles in endosome sorting and intra- cellular trafficking. STX6 1q25.3 765 2 Identified to have a number of roles in endosomal transport and organi- sation. RAD51 15q15.1 1020 0 This will act as a negative control. RAD51 is involved in the mechanisms of DNA repair and homologous recombination, processes which are not deemed to be important in the formation and maintenance of the primary cilia.

147 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

4.2.4 Cloning of Genes

Each of the genes chosen for confirmation, along with DISC1, was cloned into a vector via

R the Gateway system as described in Chapter 2, Section 2.13. For details of the vectors used see Chapter 2, Section 2.17.7.

4.2.4.1 cDNA Amplification

Primers were designed for each of the prey genes and were tagged with the attB1 and attB2

R Gateway sequences (for full details on all primers used see Appendix D).

With these primers each of the chosen prey genes were amplified from cDNA by PCR (as described in Chapter 2, Section 2.7). Primers for the full length DISC1 cDNA (specifically the L-isoform) were designed in the same way and the DISC1 cDNA was amplified from TM OriGene TrueClone containing the cDNA sequence.

After optimisation of the primers, 11 of the 12 prey genes (all but CEP70) and DISC1 were successfully amplified. The positive control to be used in the co-localisation analysis (FEZ1) was also successfully amplified.

4.2.4.2 Entry Clones

These amplified fragments were then cloned into pDONRTM201, by BP reaction, to create entry clones. The relevant sequence of the construct upon completion of this reaction is shown in Figure 4.10.

Entry clones of each gene were sequenced using the pDONR3 and pDONR4 primers as well as internal gene primers where necessary. Sequencing was repeated with different clones until a full length gene with no significant mutations was identified. A summary of this sequence analysis can be seen in Table 4.5 and an example of the sequence quality obtained is shown in Figure 4.11.

After numerous unsuccessful attempts to identify sequence perfect clones two of the prey genes, CLINT1 and EXOC5 were removed from further analysis.

148 4.2 Results

TM Figure 4.10: pDONR 201 Sequence Following BP Reaction. The nucleic acid and amino acid TM sequence of the pDONR 201 construct following completion of the BP reaction. The blue shows the regions originating from the attB1 and attB2 tagged PCR product, the remainder shows most TM of the attL1 and attL2 sites that are part of the original pDONR 201 vector.

Figure 4.11: Entry Clone Sequence Trace. An example of the sequence quality obtained in the TM confirmation of entry clones. The sequence shown is the end of the attL1 site in the pDONR 201 vector and the beginning of the SNX2 cDNA sequence.

149 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Table 4.5: Summary of BP Clone Confirmation. The table summarises the key features in the TM confirmation of the sequence of each of the cloned prey (and DISC1) genes in the pDONR 201 entry vector. Details of the primers can be found in Appendix D.

Gene Name Primers Used Description

pDONR3, pDONR4, Clone contains a single synonymous base change c.1526A>C, ATP6V1B2 A1ATP6V1B2-1, A1ATP6V1B2-3, p.Ala510Ala. This is not a known SNP in the dbSNP database. A2ATP6V1B2-2 pDONR3, pDONR4, A1CHMP2B-1, CHMP2B Sequence perfect match to reference. A2CHMP2B-2 pDONR3, pDONR4, A1CLINT1-1, Unable to obtain a suitable clone without mutation. Gene re- CLINT1 A2CLINT1-2 moved from further analysis. pDONR3, pDONR4, A1DISC1-3, Clone contains a single synonymous base change c.1407C>T, DISC1 A2DISC1-10, DISC1-15, DISC1-17 p.Ile469Ile. This is a known SNP, rs2492367. pDONR3, pDONR4, A1EXOC5-1, Unable to obtain a suitable clone without mutation. Gene re- EXOC5 A2EXOC5-2 moved from further analysis. FEZ1 pDONR3, pDONR4 Sequence perfect match to reference. pDONR3, pDONR4, A1IFT81-1, IFT81 Sequence perfect match to reference. A2IFT81-2, A1IFT81-3, A2IFT81-4 NUP62CL pDONR3, pDONR4 Sequence perfect match to reference. pDONR1, pDONR3, pDONR4, RAB11A Sequence perfect match to reference. A1RAB11A-1, A2RAB11A-2 pDONR3, pDONR4, A1RAB5C-1, RAB5C Sequence perfect match to reference. A2RAB5C-2 SNX2 pDONR3, pDONR4 Sequence perfect match to reference. pDONR3, pDONR4, A1SNX5-1, SNX5 Sequence perfect match to reference. A2SNX5-2 pDONR3, pDONR4, A1STX6-1, STX6 Sequence perfect match to reference. A2STX6-2

150 4.2 Results

4.2.4.3 Destination Clones

For use in the confirmation of interactions in human cell lines (see Section 4.2.5.5) the entry clones were used to transfer the prey genes and DISC1 to destination vectors that would ex- press fusion proteins. DISC1 was transferred to pDEST/TO/myc-His and each of the prey TM genes to pDEST 27 by LR reaction. The fusion proteins created, along with the relevant sequence of the constructs following this transfer are shown in Figures 4.12 and 4.13 respec- tively.

Figure 4.12: DISC1 Expression Construct. The expression construct from the pDEST/TO/myc- His-DISC1 plasmid showing, A) the structure of the fusion protein and B) the nucleic acid se- quence and amino acid translation of each component.

The entry clones were also used to transfer DISC1 and the prey genes to destination vec- tors containing fluorescent tags for use in co-localisation experiments. DISC1 was trans- ferred to pDEST/TO/mCherry/myc-His and the preys plus the positive control, FEZ1, to pDEST/TO/EYFP/myc-His. Due to the way these vectors were constructed (see Chapter 2, Section 2.17.7.5) the sequence surrounding the inserted gene is identical to that shown in Figure 4.12, with the fluorescent protein (either mCherry or EYFP) at the N-terminus of the attB1 peptide.

151 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

TM Figure 4.13: Prey Genes Expression Construct. The expression construct from the pDEST 27- Prey Gene plasmids showing, A) the structure of the fusion protein and B) the nucleic acid se- quence and amino acid translation of each component.

RAD51 was chosen as the control for these experiments because it is not known to interact with DISC1 and the required expression vectors had already been constructed. The primer TM design, amplification and cloning of RAD51 into pDONR 201 and transfer into the destina- tion vector pDEST/TO/EYFP/myc-His, was completed previously by Lauren Foreman as part of an honours research project (Foreman, 2012).

4.2.4.4 Confirmation of Expression Constructs

Destination clones were confirmed to contain the appropriate insert by amplification of that insert with the corresponding gene specific primers. These amplified regions were run on agarose gel to ensure a product of the predicted length. Where the length of the insert was not as predicted, further clones were tested until a suitable candidate was found. All destination vectors were eventually found to contain inserts of the correct size.

The sequence of the inserted genes within the destination vectors, was assumed to be the TM same as that found in the pDONR 201 entry clones as no further PCR was required to move

R between the Gateway vectors. However, for the destination vectors to be used in the con- TM firmation of interaction analysis (pDEST 27-preys and pDEST/TO/myc-His-DISC1), the

152 4.2 Results relevant junctions between the vector and the newly inserted cDNA were sequenced to en- sure no mutations had occurred at the sites of recombination.

In pDEST/TO/myc-His both recombination sites were sequenced as the myc-His tag is 3’ to the inserted DISC1 sequence but the 5’ recombination site is also important to ensure that the protein fusion is in frame. Analysis of the DISC1 construct confirmed that the sequence at both of these recombination sites was correct and in frame (shown in Figure 4.14).

Figure 4.14: Confirmation of pDEST/TO/myc-His-DISC1 Construct. The reading frame of the relevant portions of pDEST/TO/myc-His are shown in A) and C) above the sequence trace for the region of 20 bp up and downstream of B) the highlighted ATG start site and D) the high- lighted final codon of the DISC1 insert.

Due to a reliance on PCR amplification for sequencing of the construct, and the possible risk of PCR contamination in these sequencing reactions, the structure of the pDEST/TO/myc- His-DISC1 clone was additionally confirmed by restriction digest. Digests were performed using HindIII, PstI, PvuII and TaqI. The resulting digests were run on an agarose gel and gave the predicted banding pattern as expected (shown in Figure 4.15).

153 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Figure 4.15: Digests of pDEST/TO/myc-His-DISC1. An agarose gel showing the pDEST/TO/myc-His-DISC1 digested with each of HindIII, PstI, PvuII and TaqI. For each there is a cut (C) and an uncut (U) lane, in the uncut lanes buffer was added in place of enzyme during the digest. The marker used (labelled 1Kb) was Invitrogen 1Kb DNA Ladder.

TM In pDEST 27 the fusion of the GST gene is to the 50 end of the prey gene and as such the sequence analysis of these constructs was limited to the recombination site at this end of the inserted cDNA. Analysis of this sequence found that all clones had an in frame recom- bination between the attB1 site and the ATG start codon of the gene as shown in Figure 4.16.

It should be pointed out that the SNX2 construct lacks the Kozak sequence that is present in the other constructs but that the gene sequence is still inserted in frame and the presence of a Kozak sequence at this location is not necessary for expression of the protein in these experiments.

Additionally, there is a single base ambiguity in the SNX5 sequence; however, close inspec- tion of this locus reveals that the R called, at position 5 of the shown sequence, can be inter- preted as the expected A due to the level of G background in the bases immediately prior to the ambiguity.

154 4.2 Results

TM Figure 4.16: Confirmation of pDEST 27-Prey Constructs. The reading frame of the relevant TM portion of pDEST 27 is shown in A) above the sequence traces, shown in B) for the region of 20 bp up and down stream of the highlighted ATG start site for each of the prey genes. Note that the codon NNN differs between constructs to make up a Kozak sequence, this codon is missing from SNX2 as no Kozak sequence was added to this construct; however, the inserted cDNA is still in frame and a Kozak sequence is not necessary at this position for this experiment.

155 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

4.2.5 Confirmation of Interaction in Human Cells

4.2.5.1 Sub-Cellular Protein Co-Localisation

Fluorescent fusion proteins of DISC1-mCherry and of each prey gene fused to EYFP (as con- structed in Section 4.2.4.3) were used to assess localisation of the proteins within human TM cells. Human T-REx -293 cells were plated onto coverslips, transfected with the fluorescent constructs and then viewed by fluorescent microscopy (as described in Chapter 2, Sections 2.16.1 and 2.16.2 respectively).

Initially pDEST/TO/mCherry/myc-His-DISC1 was transfected alone to ascertain the loca- tion of this construct in the cells for comparison upon co-transfection with prey constructs. Examples of this localisation are shown in Figure 4.17. The DISC1 was seen to express in dense spots in, or near to, the nucleus of the cells. The images were also captured with the EYFP filter, as a control for further experiments, and it was noted that there was no apparent bleed through of the DISC1-mCherry expression to the EYFP filter.

Figure 4.17: Localisation of DISC1 in Human Cells. Examples of fluorescent microscope images of DISC1-mCherry fusion protein. Images are shown with both mCherry and EYFP filters. The merge image is a combination of both mCherry and EYFP as well as DAPI to show the nucleus of each cell. The scale bar shown represents 10µm.

156 4.2 Results

Co-transfection of pDEST/TO/mCherry/myc-His-DISC1 was then undertaken with the pos- itive control (pDEST/TO/EYFP/myc-His-FEZ1) and two selected prey genes (RAB11A and SNX5). All appeared to co-localise with DISC1 in the same dense spots in, or close to, the nucleus (see Figure 4.18). The other four prey constructs showed the same result.

Figure 4.18: Co-Localisation of DISC1 and Prey Proteins in Human Cells. Examples of fluo- rescent microscope images showing co-localisation of DISC1-mCherry fusion protein with FEZ1- EYFP, RAB11A-EYFP and SNX5-EYFP. The merge image is a combination of both mCherry and EYFP as well as DAPI to show the nucleus of each cell. The scale bar shown represents 10µm.

Due to the fact that all of these proteins seemed to have the same expression patterns, pDEST/TO/EYFP/myc-His-RAB11A was transfected independently to assess if the expres- sion pattern differed from that seen in the co-transfection with DISC1-mCherry. It was found that on its own, this construct had a far more ubiquitous expression pattern (see Figure 4.19) with no evidence for the dense spots seen upon co-transfection. Similar results were also true of the four other preys.

157 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Figure 4.19: Localisation of RAB11A in Human Cells. Example of fluorescent microscope im- ages of RAB11A-EYFP fusion protein. The scale bar shown represents 10µm.

The change in expression pattern of RAB11A-EYFP in the presence of DISC1-mCherry could be due to a true interaction between the two proteins in cells, or it could suggest that the co- localisation seen between DISC1-mCherry and these prey proteins is due to over expression of the transfected constructs.

To assess this further, the negative control plasmid (pDEST/TO/EYFP/myc-His-RAD51) was co-transfected with pDEST/TO/mCherry/myc-His-DISC1. The same dense spots were seen again as shown in Figure 4.20.

Figure 4.20: Co-Localisation of DISC1 and RAD51 in Human Cells. Examples of fluorescent microscope images showing co-localisation of DISC1-mCherry fusion protein with RAD51-EYFP. The merge image is a combination of both mCherry and EYFP as well as DAPI to show the nucleus of each cell. The scale bar shown represents 10µm.

Without a clear difference between putative interacting proteins and a negative control it was concluded that there was some inherent problem with this experiment (see Section 4.3.4.1 for further discussion) and as such it was not pursued further.

158 4.2 Results

4.2.5.2 Optimisation of Transfection

Various cell lines, ratios of transfection reagent to plasmid DNA, incubation times and cell densities were tested to find the optimum conditions that were transferrable to both the preys and DISC1 so that co-transfection would be possible.

The optimum protocol used in subsequent experiments is outlined in Chapter 2, Section 2.14.2. Images showing examples of the transfection efficiencies (see Chapter 2, Section 2.14.2.1) obtained for one prey (RAB11A) and for DISC1 are shown in Figure 4.21.

TM Figure 4.21: Transfection Efficiency of RAB11A and DISC1 in Human T-REx -293 Cells. Examples of fluorescent microscope images showing transfection efficiency of pDEST/TO/EYFP/myc-His-RAB11A (A and C) and of pDEST/TO/mCherry/myc-His-DISC1 (B and D). The total number of cells observed with no filter (A and B) can be compared to the number expressing the fluorescent fusion protein (C and D) with the appropriate fluorescent filter.

159 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

It is important to note at this point that although this level of transfection was possible, the actual efficiency varied from experiment to experiment even when using the same proto- col.

The efficiency of the DISC1 transfections was slightly less but still reasonably high; however, there were increased levels of cytotoxicity with the DISC1 constructs when compared to the prey constructs. This increase in cytotoxicity is evident in the confluence of the plated cells and in the morphology of the cells after transfection with DISC1. The cells take on a more rounded appearance and there are far more floating (dead) cells in the plates following transfection.

4.2.5.3 Detection of Prey Proteins

TM TM Expression of the nine pDEST 27-Prey constructs was tested in T-REx -293 cells. This ex- pression was assessed by transfection of the constructs followed by western blot analysis (as described Chapter 2, Sections 2.14.2, 2.15.1 and 2.15.4) using an anti-GST primary antibody to detect the fusion protein and an anti-actin loading control.

Seven of the nine constructs were successfully detected by this method as can be seen in Figure 4.22.

Figure 4.22: Expression of Prey-GST Fusion Proteins in Human Cells. Western blots showing expression of seven prey-GST fusion proteins (by detection with anti-GST antibody) with an actin loading control (detected by anti-actin antibody). Size indications shown are in kDA. The sizes of each construct are RAB5C - 49kDa, RAB11A - 50kDa, CHMP2B - 58kDa, SNX2 - 98kDa, STX6 - 55kDa, SNX5 - 73kDa , IFT81 - 106kDa of which 26kDa is the GST. The size of the actin loading control is 42kDa.

160 4.2 Results

Expression of the NUP62CL and ATP6V1B2 fusion proteins were not observed across several experiments and as such these preys were removed from further analysis. It was later con- firmed that the failure to detect these protein fusions was most likely due to loss of activity of the available anti-GST antibody.

4.2.5.4 Detection of DISC1

The expression of the DISC1 fusion proteins proved to be more difficult. After numerous unsuccessful attempts at several different protein extraction protocols, it was noted using the fluorescent pDEST/TO/mCherry/myc-His-DISC1 fusion construct, that the expressed protein was caught up in the cellular debris and subsequently being discarded.

Once this issue was identified several steps were taken in an attempt to release the expressed DISC1 protein from this debris, including use of different lysis buffers, isolation of various different fractions of the lysate, homogenisation and the use of nucleases. Eventually it was

R determined that lysis with the use of Benzonase Nuclease and homogenisation of the lysate through progressively smaller gauge needles (and removal of any centrifugation steps) re- sulted in expression and subsequent detection of the DISC1 construct in western blot analy- sis with an anti-c-myc antibody. Examples of this expression are shown in Figure 4.23. The secondary band of b-actin seen in Figure 4.23 was not unique to the DISC1 expression and was shown to most likely be a batch effect in the b-actin antibody. This was a polyclonal antibody and therefore was likely due to presence of other significant B cell clones in that animal with reactivity to another protein. The band was not seen with other batches of the antibody. However, the expression of these DISC1 constructs was not a consistently repro- ducible event, even under the optimised transfection and lysis conditions.

It was subsequently identified that as well as being caught up in the cellular debris, the DISC1 fusion protein was possibly forming aggregates within the cells. In an attempt to ad- dress this issue, the amount of pDEST/TO/myc-His-DISC1 used in transfections was low- ered to try and identify a level at which expression could be seen and aggregate build-up 1 avoided. Tests that sequentially halved the amount of the DISC1 construct down to /32 of the original amount showed no expression of the DISC1-myc-His fusion protein when de- tected with an anti-c-myc antibody. However, when the same blot was re-probed with an anti-DISC1 antibody, all ratios tested showed DISC1 expression (see Figure 4.24).

161 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Figure 4.23: Expression of DISC1 Fusion Proteins in Human Cells. Western blots showing expression detected by anti-c-myc-antibody (in green) of A) m-Cherry-DISC1-myc (128 kDa), B) DISC1-myc (99kDa) and C) DISC1-myc (99kDa) with GST-CHMP2B (58kDa) detected by anti- GST antibody (also in green). In all cases the red is the b-actin loading control, visible at 42kDa with a secondary band visible at 80kDa.

Figure 4.24: Expression of DISC1 at Varying Levels of Transfection. Western blot showing expression of DISC1-myc (99kDa) (detected by anti-DISC1 antibody) across six sequentially low- ered amounts of transfected pDEST/TO/myc-His-DISC1.

This initial analysis suggested that the use of an anti-DISC1 antibody may be the solution to the inconsistent expression issues; though again the expression of DISC1 was not consistently reproducible even when using the anti-DISC1 antibody.

4.2.5.5 GST Pulldown Analysis

The seven remaining preys were tested for the presence of an interaction with DISC1 in human cells, via GST pulldown and antibody detection by western blot as is described in Chapter 2, Sections 2.15.3–2.15.4.3.

Occasionally these pulldown analyses were suggestive of interactions between certain prey proteins and DISC1, including CHMP2B (see Figure 4.25)

162 4.2 Results

Figure 4.25: GST Pulldown of DISC1. Western blot showing the GST pulldown of DISC1-myc (99kDa) using GST-CHMP2B (58kDa) but not with negative control (GST-RAD51 63kDa). For each of CHMP2B and RAD51 a pre pulldown lysate (L) and post pulldown (PD) lane are shown. GST and c-myc are both shown in green and the actin loading control is shown in red.

Unfortunately due to the inconsistencies in the expression and detection of the DISC1 con- struct there was a lack of reproducibility despite numerous attempts for any putative pull- down results and as such, none of the interactions can be confirmed at this stage.

163 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

4.3 Discussion

4.3.1 Addressing the Limitations of the Yeast Two-Hybrid System

The yeast two-hybrid screen provides a robust and efficient method of identifying protein- protein interactions, yet like any experimental system it must be used with caution due to inherent limitations (some of which were described in Section 4.1.2). The protein-protein interaction list that is generated as a result of a yeast two-hybrid screen should therefore not be considered complete or definitive but a guide to reasonable candidates for further investigation.

4.3.1.1 Bait Design

There is some debate as to the optimal design of baits in yeast two-hybrid screens with re- gard to whether they should express full length protein or several distinct but overlapping fragments of that full length protein. The use of a single full length bait is preferred by some as it is seen to be less artificial. This may be a misnomer, however, as there is no guarantee that in the yeast cells the protein will undergo all necessary post-translational conformation changes that it would in human cells. This may lead to the protein being folded incorrectly and as such, interaction domains that are normally accessible may be hidden or conversely, domains that are not normally accessible may become so. For this reason, along with the fact that large proteins are difficult to clone in their full length form, it is common for baits to be constructed to express known or predicted binding domains which may or may not cover the entire protein.

The aim of the yeast two-hybrid screen in this study was to identify as many potential inter- acting partners of DISC1 as possible and as such the entire length of the protein had to be interrogated. Due to the length of the DISC1 protein it was decided that it should be broken into four shorter but overlapping fragments that would together cover the entire transcript of the gene. In the design of the bait fragments there was an effort to keep known domain boundaries intact in order to maximise the number of interactions found.

164 4.3 Discussion

It should be noted that similar issues to those encountered with full length protein are appli- cable here also. The shorter fragments are also unlikely to fold in a native fashion and so any binding domains normally created by the three-dimensional structure of the protein may be missing in fragmented bait screens. However, the relative ease of cloning four smaller baits was deemed to be preferable in this situation, especially given that any interactions to be further studied would be subjected to confirmation.

4.3.1.2 False Negative Results

A yeast two-hybrid screen may be unable to detect some interactions that are real these are false negative results. This lack of detection may occur for a number of uncontrollable rea- sons such as fusion protein toxicity, mis-folding, mis-localisation, lack of appropriate post- translational modification and steric hindrance due to interference introduced by the fused GAL4 protein. Further, it should be considered that the interactions detected are only those which can occur in a yeast nucleus and so may not be a complete representation of interac- tions that occur in other species.

Other limitations can be controlled at least to an extent, during experimental design. One ex- ample is a simple undersaturation of the screen, which can be combated, in theory at least, by repeating the screen a number of times until complete overlap of datasets is obtained. This is not always feasible in practice as there may be huge numbers of apparent interactions.

This was identified as an issue for DISC1 from the outset and is the reason the screen in this study was so extensive. Although the ability to add the two WTAC screens to the data ob- tained from this analysis did reduce the level of false negatives, there is still not complete overlap between this screen and those that have been published previously (see Figure 4.26), suggesting that either there are still a number of interacting proteins that are left uniden- tified (the screen is still undersaturated) or there are a high number of false positive inter- actions identified in these screens (see Section 4.3.1.3 for a discussion of possible causes for this).

Another modifiable cause of false negative results stems from the fact that a number of protein-protein interactions will be tissue specific. It is important when selecting a prey library to ensure that there is a fair representation of all appropriate tissues. The best way to circumvent this issue was to ensure that the library of preys used in this analysis gave a fair representation of the entire human genome, which the normalised universal cDNA library

165 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

Figure 4.26: Overlap of All known DISC1 Yeast Two-Hybrid Interaction Datasets An un- weighted venn diagram of the published DISC1 interacting proteins and those found in the WTAC screens and the screen of the current study. Image created using http://www.pangloss. com/seidel/Protocols/venn.cgi. attempts to do though it does not include all tissue types, and indeed this library is the best currently available to avoid these issues as much as possible.

The prey library used was designed to maximise gene representation by using a variety of human tissues, and has been normalised so that highly expressed transcripts are reduced and less frequent transcripts are enhanced. The library contains approximately three million independent clones which should provide very good coverage of the transcripts in the tis- sues included. The number of cells used in each screen (for both the baits and preys) was two orders of magnitude higher than this (5x108), and the number of diploids formed from each screen was in the order of 107, so each screen should have had a good coverage of the 3x106 clones in the library. Even so, this library is not comprehensive, not all proteins will be represented and of those that are, not all of them will necessarily be in their complete form, which means interaction domains may be missing.

This issue of false negatives is difficult to overcome as there is no way to infer which genes have been missed and as such the absence of an interaction in the screen does not exclude it as an interaction in vivo. This is something that must be kept in mind when undertaking enrichment and pathway analysis—there may be missing links.

166 4.3 Discussion

4.3.1.3 False Positive Results

As was alluded to in the introduction of this chapter, false positive results are also a common artifact of yeast two-hybrid screens. False positive results refer to the interactions isolated from a screen which for one of several reasons should not have been identified as potential interactions.

Auto-Activation Bait auto-activation refers to the ability of a bait to activate a GAL promoter without any input from the activating domain. Proteins that have such an effect are usually, but not always, transcription factors and for this reason it is wise to test baits for auto-activation prior to using them in a screen.

Similarly, some preys can bind the GAL-upstream activation sequence directly, without in- teraction with the bait. This is referred to as prey auto-activation. Preys that are consistently isolated across multiple screens with different baits can be considered ‘sticky preys’ and there are lists of such preys available in the literature (Fromont-Racine et al., 1997; Vidalain et al., 2004).

The number of false positive results due to auto-activation has been reduced in more re- cent years with the introduction of multiple auxotrophic selection markers to many systems. Prior to this addition the levels of false positive results in a single screen could range from 10-70% of the total identified preys (Fang & Macool, 2002). This reduction in auto-activation is simply down to probability, in that the chance of any protein being able to activate two independent GAL promoters is far less likely than for a single promotor. Therefore to reduce the number of false positives due to auto-activation in this study two auxotrophic selection markers were used, namely, histidine and adenine under the control of the GAL1 and GAL2 promoters respectively.

Selection for ADE2 after selection for HIS3 was chosen because the two different reporters provide different characteristics. Firstly, the ADE2 reporter, which is fused to the GAL2 pro- moter, is more stringent than that of the GAL1-HIS3 construct, meaning that a stronger inter- action is required between the bait and the prey to satisfy it. The use of this more stringent second reporter was successful in eliminating at least some of the false positive interactions, as is evident in the reduction in clone numbers present after selection for ADE2 compared to after the HIS3 selection.

167 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

It must be kept in mind that this increase in stringency may result in an increase in false negative results for low affinity interactions. Secondly ADE2 selection is less affected by residual growth due to the levels of adenine stored by cells and the rate at which it is used. Due to these differences primary selection is conducted in the absence of histidine to ensure that low affinity interactions are not immediately lost. This selection is then followed up by omitting the adenine to ensure there are no false positives due to residual growth.

Use of a third or even fourth reporter will continue to reduce the numbers of false positive interactions identified; however, there is still a balance between decreasing false positives and increasing false negatives. For this reason, it was decided that two selectable markers was sufficient in this study.

Host Mutation False positive results can also arise due to mutations in the host strain activating the reporter genes. Host mutations can cause constitutive expression of a reporter gene, in the absence of a bait or prey protein. False positives of this nature can cause bias within libraries, giving apparent multiple hits for proteins in a screen that are all in fact a result of a single host mutation.

In cases where the host mutation is the cause of activation, controlling for this issue falls back, in part, to the use of multiple selection markers. Many activating mutants will affect only a single reporter gene and as such use of multiple reporters will eliminate a number of these mutants. Additionally, in most screens including that undertaken here, the host strains containing the bait are tested for such spontaneous mutation by assessment of growth on the selectable marker, prior to the screen being conducted. This way suitable clones can be selected on the basis of being mutation free, at least at this point, though mutation may well still occur later in the growth cycle and due to the large number of cells used in each screen host mutation, to some degree, is not at all unlikely.

Traditionally the method of controlling for this kind of false positive in prey strains was to confirm interactions by rescuing the preys and re-cloning them into a different host as persis- tence of growth after such a test lends more weight to the interaction being real (Petermann et al., 1998). However, in a screen of a library such as the one undertaken in this study, the presence of multiple independent hits provides the same level of confidence in the specificity of an interaction. Within a library of preys there are more often than not several different cDNA lengths for the same gene. The term independent hits refer to these different cDNAs

168 4.3 Discussion of a single prey being identified to interact with a bait. By sequencing the prey of each in- teraction it can be determined if the cDNAs involved are all the same clone or if they are independent. Additionally, bait prey switches do not always confirm true positives. It has been observed that two-hybrid interactions are often directional for proteins that are already known to interact. Due to this there are instances where previously known interactions are found in yeast two-hybrid screens, but when bait-prey switched they do not interact. These are the reasons that only multiple independent hits were chosen for follow up analysis in this study.

Further, the presence of a library bias can be assessed in this study as while the two WTAC screens were undertaken with the same batch of cDNA library, it was very likely a different batch to that used in the screen performed here.

Figure 4.27 shows that there is no real difference in the amount of overlap between the two WTAC studies when compared to their overlap with this study. This result suggests that there is unlikely to be a library bias, at least in the batch used by the WTAC.

Figure 4.27: Assessment of Library Bias. An unweighted venn diagram of the DISC1 interacting proteins found in the WTAC 2009 and 2010 screens and the screen of the current study. Image created using http://www.pangloss.com/seidel/Protocols/venn.cgi.

Biological Irrelevance Multiple and independent hits in a yeast two-hybrid screen can be accepted as validation that an interaction is truly occurring, that is to say the interaction is not a technical false positive. However, this only confirms that the two proteins interact if they are artificially

169 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1 over-expressed in a yeast nucleus, it does not say anything about the interaction being bi- ologically meaningful. This can therefore be classed as a third type of false positive. Only upon further investigation of these interactions with regard to the temporal and spatial ex- pression of the endogenous proteins in a human cell line can it be deemed that the interaction is real.

This class of false positive is the most difficult to deal with as there is really no way to control for them within a screen. It is because of these false positives in particular that validation experiments such as those described in Section 4.1.3 in the introduction to this chapter are important. The reality of a yeast two-hybrid screen is that all interactions must be considered putative and that none can truly be accepted as real until the appropriate biological analyses have been conducted.

4.3.1.4 Single Hits

Although the number of false positives can be reduced by the controls specified above, they are still present and how they are accounted for remains controversial. Without considering systematic error, the chance of the same false positive occurring multiple times is statistically unlikely. Given this, it is widely accepted that prey proteins that gain multiple hits (especially when the hits are independent) are less likely to be false positive than those that only gain a single hit, and it is the validity of these single hits that remain debatable.

In many cases it is assumed that there are too many false positives among single hit proteins for any of them to be included in further analysis, as was the case in the published DISC1 yeast two-hybrid screens (Millar et al., 2003; Morris et al., 2003; Camargo et al., 2007). How- ever, when each interaction is considered independently of the others there is equal chance that each interaction is real and the fact that some occur more often than others may be attributed to a number of other factors including number of interaction domains and inter- action strength.

Given the large number of single hits identified in this screen, including the WTAC data (a total of 566), and the inability to accurately determine which of these single hits were false and which were not, an all or none approach had to be taken. With the large decrease in data that would have resulted from the removal of these clones it was decided that all proteins identified in the screen would be included in the enrichment analysis.

170 4.3 Discussion

4.3.1.5 Yeast as a Human Model

To undertake a study such as this one, an assumption that Saccharomyces cerevisiae cells are a fair model for their human counterparts must be made. This is, for the most part, a rea- sonable assumption given that it is an eukaryotic organism so has the same complex cell structure as humans. Due to this, its cell structure allows for the tertiary protein formation and other complex processes necessary for successful human protein interaction. Although it is by definition eukaryotic, yeast comes with many of the benefits of working with prokary- otes in that it is unicellular, meaning it is easy to grow on defined media in a very short time.

However, yeast is not human. Given the artificial environment that the interactions are being identified in, there are always going to be false negative and false positive interactions due solely to the fact that the organism used as the host is not the natural environment in which the interaction would normally occur. For this reason it is not only important to be cautious of the results as discussed in the Sections of 4.3.1.2 and 4.3.1.3, but also to confirm that the interactions to be studied further are confirmed to occur in human cells.

4.3.2 Inclusion of DISC1 Interactions in the Literature

The method of using interactions to determine the unknown functions of a protein of interest, provides a robust system that is easy to implement, requires little prior knowledge, and no assumptions of the protein being investigated need be made. While the yeast two-hybrid system can be very sensitive if done at a large enough scale, it is lacking in specificity; a problem that is observed through the lack of support for interactions from other methods. It is also difficult to determine if all of the potential interactions have been captured unless there is complete overlap between studies. Whether a lack of overlap is due to a lack of saturation, false positives or some other limitation is uncertain; however, it has become more widely accepted that the confirmation of an interaction by more than one method is a necessary step in ensuring accuracy (Von Mering et al., 2002; Deane et al., 2002).

The uncertainty in the validity of yeast two-hybrid results due to false positive interaction extends beyond this study and into those of the literature. There are, as mentioned pre- viously, a large number of published DISC1 interactant proteins but the majority of these remain unconfirmed by subsequent analyses in human cells. The specifics of the proteins identified and those that have been confirmed are included in Chapter 5, Section 5.1.1.

171 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

The presence of false positives in any of these studies may cause a bias in any subsequent analyses of the data, for example in the gene set enrichment carried out here. Although there may be a bias in the results, it is impossible to accurately predict which proteins should be excluded from the analysis without confirming each of them individually, a task beyond the scope of this thesis. Given this situation it was decided to include all proteins in the devel- opment of an hypothesis with the assumptions that any false positives should be random and not weighted to a single pathway, and that further analysis of a set of proteins, chosen based on any hypotheses, should exclude any false positives after the fact. On this basis it is better to include false positive to provide as much initial information to the analysis as possible.

4.3.3 Development of an Hypothesis

The time and resource constraints of this project meant one particular function needed to be selected for further investigation. DISC1 is a large protein with many interactants (as evidence by the yeast two-hybrid screens) and ubiquitous expression so is likely to have a multitude of functions, potentially throughout the body.

The initial available results from the WTAC in 2009 led to the formation of an hypothesis that DISC1 had a role in the structuring or regulation of intracellular proteins. This hypothesis was based on enrichment analysis and literature research into the known functions of the interaction proteins that had been identified in this screen.

Enrichment analysis (using ToppFun) conducted on the full set of interacting partners of DISC1 (including the screen undertaken here and the WTAC screens), provided further sup- port for the hypothesis of protein regulation. There was evidence from this analysis that protein transport and localisation were enriched as were cellular components required in the movement of such molecules (vesicles and golgi apparatus). The statistical significance of these enrichment functions increased further when adding previously published interac- tants. The addition of the published interactants identified by Camargo et al. (2007), Morris et al. (2003) and Millar et al. (2003) provided additional enrichment for microtubule asso- ciated functions as well as cytoplasmic projections and the centrosome. Network analysis (using IPA) showed significant scores for networks that had similar themes with molecu- lar transport and cellular organisation, assembly and maintenance being prominent in the results.

172 4.3 Discussion

In an effort to further refine this hypothesis, the enriched functions were investigated in light of known localisation of DISC1. Centrosomal expression of the DISC1 protein had been previously identified by Morris et al. (2003). This was interesting given the centrosome was an enriched cellular component in the analysis of all known DISC1 interacting partners. Research into the functions and properties of the centrosome revealed that this is the hub for microtubule organisation (Andersen et al., 2003). The centrosome of the cell is made up of a pair of centrioles which have a functional role in the organisation of the centrosome and thus the rest of the cell (Feldman et al., 2007) and are the starting structure for the formation of the primary cilium that is present in most cell types (Sorokin, 1968). As described in Section 4.3.3.1 below, the construction of primary cilia requires the delivery of all necessary macromolecules to the distal end of the growing organelle.

Finally, research conducted by Miyoshi et al. (2009) found that treatment with lithium, a long standing treatment in mental illness, caused elongation of the primary cilia in both mouse brain and in cultured rodent cells. The authors consequently suggested the organelle as a future therapeutic target in psychiatric illness.

The convergence of the enriched roles and networks discussed above along with the DISC1 localisation evidence, the manner in which the primary cilium of cells is constructed, and the modifying effect of a known treatment for psychiatric illness led to the development of the working hypothesis for this thesis. That DISC1 has a role in facilitating the transport of proteins to the centrosome for the construction and functioning of the primary cilium.

4.3.3.1 The Primary Cilium

The primary cilium, as an organelle, was first described by German scientist Zimmerman in 1898, but was not defined as the primary cilium until Sorokin (1968) described the structure of this single transient structure, produced from one of a pair of centrioles of a cell. Over a number of years more was learnt about the structure and function of this organelle, from theories that it played a part in repression of cell division to recognition of it as a sensory organelle and its involvement in disease. The first hundred years of primary cilia biology are covered in a review by Wheatley (2005). It is currently accepted that primary cilia are present in most cell types and their role is largely one of sensing and coordination of signalling pathways including Hedgehog, Wnt and dopamine.

Sorokin (1968) identified that the structure of the primary cilium was in a 9+0 microtubule arrangement rather than the 9+2 arrangement of other motile cilia. This arrangement shows

173 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1 that primary cilia lack the central pair of microtubules that are present in their motile coun- terparts. Primary cilia are only present in cells while in the stationary phase of cell division

(G0). When a cell begins to actively divide the structure is reabsorbed by the cell and is subsequently reproduced in the resulting daughter cells (Satir et al., 2010).

Briefly, the construction of a primary cilium begins with the formation of a basal body at the mother centriole. This structure attaches to a vesicle derived from the golgi and migrates towards the cell surface. The axenome of the cilia begins to extend into the lumen of this vesicle and expands it outwards. Once it arrives at the cell surface the basal body becomes embedded in the membrane of the cell and the vesicle is exocytosed. The primary cilium then continues to grow outward from the cell’s surface (Sorokin, 1962; Satir et al., 2010). The elongation of the ciliary microtubule structure is from the distal end and so, due to a lack of protein synthesis mechanisms at this location, all necessary components must be transported up the length of the existing structure to the tip where they can be added on to extend the length of the cilium. This process involves a number of proteins including intraflagellar transport proteins (IFT’s) which are transported via kinesin and dynein mechanisms (Ped- ersen & Rosenbaum, 2008). This is of interest given that Kamiya et al. (2005) found that the DISC1 protein was a component of the dynein motor complex as was described in Chapter 1. Therefore the prediction that a selection of DISC1’s interacting partners are involved in the packaging and transport of the necessary building blocks to the tip of the growing cilium seems a reasonable assumption.

At the time that the hypothesis of DISC1 (and interacting proteins) involvement with the primary cilium was posed in this study, a study published by Marley & von Zastrow (2010) identified that DISC1 indeed localises to the base of primary cilia in NIH3T3 cells and, when these cells were depleted of DISC1 protein via siRNA mediated knockdown, the primary cilia were lost. This confirmed that the hypothesis posed here was valid and that investi- gation into the contribution of other proteins (namely DISC1 interactants) to this pathway was warranted. A second, later study by the same authors, established roles for several pub- lished interacting partners of DISC1 (CEP63, FEZ1, PDE4B and SYNE1) as well as candidates from other sources, in the loss or reduction of primary cilia in NIH3T3 cells (Marley & von Zastrow, 2012). None of the interaction partners tested by Marley & von Zastrow (2012) had been included in the current study as the focus here was on novel interactions yet to be published.

Further research needs to be done to determine the physiological significance of this and if it could play a role in the development of mental illness. Does mutation of these genes rather

174 4.3 Discussion than complete knock out confer the same or similar phenotype? Does this loss of primary cilia have an actual effect on the development of the brain? Does a loss of primary cilia later in life alter the structure of the brain somehow and thus account for the late onset of most mental illnesses?

4.3.4 Assessment of Interaction in Human Cells

Knowledge of the rates of false positives among yeast two-hybrid analyses lead to an expec- tation that a number of the interacting preys chosen would not confirm in human cell lines, though it was expected that by screening a reasonable number of preys, the validity of some interactions could be strengthened.

Given that the methods used in these experiments have varying levels of artificial execution, it must be kept in mind that protein interactions that are verified by these analyses are still only candidates. Due to forced over-expression it cannot be assured that the two proteins would ever have an opportunity to interact in a normal biological situation. This would require that the proteins were both expressed in the same cell, at the same time and that they localised to the same sub-cellular compartment.

4.3.4.1 Limitations of Co-Localisation

Co-localisation experiments aim to assess the physical location of proteins within the cell. From this analysis it can be determined if two proteins exist together in the same cellular compartment and thus if it is possible that an interaction can occur between them.

The major caveat to this experiment is that it is usually highly artificial because the easiest way to visualise the location of proteins within cells is to create and introduce fluorescently tagged fusion proteins. An alternative approach is to use fluorescently labelled antibodies against the proteins of interest but this comes with a relatively high associated cost when compared to the fusion protein method, especially if many different protein locations are to be interrogated.

For the ease of assessing a large number of proteins and the less restrictive cost, the fluores- cent fusion protein approach was taken in this study. Using these fusion constructs, attempts were made to determine if DISC1 and the selected prey proteins were occupying the same

175 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1 cellular regions and therefore if there was any chance of them being able to interact in a human cell.

From the outset this experiment was limited (as the yeast two-hybrid analysis was), in that it would only be able to determine if there was a possible interaction between two proteins that were artificially expressed in the same cell, albeit this time one of human origin. There was still no way of determining if these two proteins would ever be natively expressed at the same time within the cells even if they were co-localising. Further, the introduction of fusion proteins leads to the issue of over-expression that was discussed in Section 4.1.3.2. This was perhaps a particular problem in this study as the fusion proteins used were under the control of a very strong promoter and expression levels were expected to be reasonably high.

The literature describes localisation of DISC1 protein within several different cellular com- partments including growth cones, mitochondria, , synapses, membranes, cen- trosomes and base of the primary cilia. A review of DISC1 by Soares et al. (2011) gives a comprehensive overview of the identified locations of DISC1 expression. Of particular inter- est to this study is the localisation of DISC1 to the and the base of the primary cilia.

Endogenous DISC1 protein is shown to co-localise with g-tubulin, a marker of the centro- some, in both COS-7 and cortical neuron cell lines (Kamiya et al., 2008), where it is predicted to be involved in the organisation of microtubules. Morris et al. (2003) describe DISC1 ex- pression in neuronal cells and note that the location of full length DISC1 protein expression is largely perinuclear and that the expression was seen as small dense spots, an expression pat- tern that would be consistent with centrosomal localisation. Transiently transfected DISC1- GFP fusion protein has also been shown to have the same dense centrosomal expression in SH-SY5Y and HeLa cells (Miyoshi et al., 2004). Marley & von Zastrow (2010) further refines the location to dense spots localising at the base of the primary cilium of NIH3T3 cells.

TM The expression of DISC1 protein identified in the T-REx -293 cells showed what was at first thought to be characteristic centrosomal localisation identified in these previous studies with discrete regions of perinuclear expression most directly comparable to those found by Mar- ley & von Zastrow (2010) (as was shown in Figure 4.17). However, these spots of expression were perhaps denser than was expected based on this literature. This could be due to the difference in cell lines between the experiments, though given that the expression pattern in the literature was consistent across several cell lines, it may be that there is an issue of aggregation of the protein due to over-expression and that this aggregation of DISC1 meant

176 4.3 Discussion any proteins co-transfected with it stuck to the aggregates giving the false impression of co-localisation.

Aggregation or self binding of DISC1 protein in various forms has been described several times in the literature (Brandon et al., 2004; Kamiya et al., 2005; Leliveld et al., 2008, 2009). This aggregation of DISC1 in itself has been linked to mental illness with reports that 20% of patients with schizophrenia, bipolar disorder or major depressive disorder had evidence of aggregated DISC1 in their brains upon post-mortem analysis but that this protein ag- gregation is absent in both healthy controls and in patients with neurodegenerative disease (Leliveld et al., 2008). Of particular interest amongst these studies were those that identified perinuclear aggresomes of full-length DISC1-GFP fusion proteins (Ottis et al., 2011; Morris et al., 2003). Upon closer inspection, it is these perinuclear aggresomes that resemble the subcellular expression pattern seen in Figure 4.17.

In these studies the aggresome formation could be overcome by using truncated forms of the protein, though not always, and some of the earlier mentioned studies found aggregate formation with truncated versions of DISC1. Truncated forms of the DISC1 protein were also tested for cellular expression in this study (data not included). The protein sections created by the four baits used in the yeast two-hybrid screen were fused to mCherry in the same manner as the full length construct; however, these showed even more obvious signs of protein aggregation within the cells even prior to inspection at a sub-cellular level and as a result no localisation analysis was continued with these truncated proteins.

It cannot be ruled out that the localisation of proteins seen in this study revealed true co- localisation with DISC1 and that RAD51 is not an adequate negative control. Given the potential lack of saturation of the yeast two-hybrid screen and thus the false negatives still requiring identification, it is possible that RAD51 does interact with DISC1. However, it seems most likely that although this type of analysis has worked for other groups, the system chosen to test localisation in this study was not viable, most likely due to the over-expression of the proteins.

Morris et al. (2003) and Marley & von Zastrow (2010) both used viral vectors to deliver pro- tein into cells this made it possible to introduce lower levels of the exogenous proteins at what was presumably closer to physiological levels. The large aggresomes seen by Morris et al. (2003) were not present when the viral vector method of delivery was used but were present in 16% of cells that were transiently transfected.

177 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

4.3.4.2 Limitations of Pulldown Analysis

The method used for pulldown and co-precipitation of DISC1 and the selected interacting partners was somewhat artificial in the use of epitope tags and transient over expression of proteins (as was discussed in Section 4.1.3) but it was deemed to be a reasonable first step in the confirmation of interactions in human cells.

This system was chosen as it required minimal cost, owing to only two antibodies being required to test all of the protein interactions, and that it required less time because a sin- gle cell line could be chosen to express all proteins rather than having to identify cell lines that contained endogenous expression. The fact that the system did not work consistently necessitates discussion of why this may have been the case.

It would seem from the results obtained that there was not an issue with the methods used per se, as the expression of the majority of the constructs tested was reliably reproducible. This suggests that the addition of the epitope tags was not creating a global issue with ex- pression or folding, at least not one that was detectable by these means. To examine whether there was an issue with only the c-myc epitope (as opposed to the GST epitope that was expressed with the prey proteins) several of the preys were also tested for expression with the pDEST/TO/myc-His plasmid and all showed expression equivalent to that with the TM pDEST 27 plasmid (the data for this was not shown in the results section as the analysis was undertaken purely to test this theory). Equally it can be concluded that there was no issue with the antibodies recognising the epitopes, both the anti-GST and the anti-c-myc worked consistently with control lysates. Transfection and lysis protocols also worked consistently with the prey proteins which suggests that they were both transferable across a number of different proteins.

As all of the above issues are therefore unlikely causes of the inconsistent DISC1 expression the focus must turn to an issue with the DISC1 construct or protein. Having confirmed the sequence of the inserted DISC1 cDNA and the structure of the pDEST/TO/myc-His plasmid containing it, the issue can reasonably be assumed not to be one of transcription of DISC1 upon transfection into the cells. If this was the case it would be expected that the same issue would be inherent to the plasmid being used, which has been demonstrated to express other protein constructs.

Transfection efficiency of the pDEST/TO/mCherry/myc-His-DISC1 construct varied but was consistently present. Similar levels of variance in transfection efficiency were present

178 4.3 Discussion with the prey pDEST/TO/EYFP/myc-His plasmids, so it seemed reasonable that these re- sults could be readily transferrable to the non-fluorescent version of the plasmid and thus that DISC1 was successfully being introduced into the cells.

An issue of protein translation or maintenance was also a possibility and qPCR of DISC1 from the cell lysates was considered to assess if there was evidence that the mRNA was in- deed being produced but that there was no subsequent protein expression evident. However, due to the fact that DISC1 expression was seen on occasion it was decided that qPCR was unlikely to aid in determining exactly what the issue was. If there was protein expression on some occasions, then expression and maintenance of the protein from this plasmid within the chosen cell type was possible and as such the results of qPCR and protein expression comparison may also be inconsistent across several experiments—especially if the issue of detecting DISC1 expression is not one of protein production but one of protein extraction or detection.

Having considered the above it seemed unlikely that the issue fell with transfection, tran- TM scription or translation of DISC1 in the T-REx -293 cells, leaving protein extraction or de- tection to be considered as reasons for the experimental inconsistencies. Antibody detection of the protein could be ruled out as an issue, as the anti-c-myc antibody was able to detect the c-myc epitope reliably. In addition, the use of a purchased anti-DISC1 antibody was also un- able to show consistent DISC1 expression using the same methods (data not included).

Issues with detection of proteins may also be the result of low levels of expression; however, this seemed unlikely as all of the protein constructs relied on the same promoter—CMV, which is well documented to be a very strong promoter leading to high levels of protein expression in mammalian cells, as was seen with the prey gene constructs. The higher levels of cytotoxicity with the DISC1 constructs would certainly have resulted in lower levels of DISC1 expression compared with the preys; however, with optimisation of the transfection protocol to minimise this cell death it seems unlikely that the difference would be enough to lose expression completely, especially given the strong promoter used.

This left extraction issues as a possible cause of the inconsistencies in the ability to show DISC1 expression. This was, in hindsight, the most likely process to be causing an issue given the earlier problems with the fluorescent DISC1 being caught up in the cellular debris (see Section 4.2.5.4) albeit a problem that was considered abated by the addition of nuclease and homogenisation to the extraction protocol. It had been assumed that the localisation of TM the DISC1 protein within the T-REx -293 cells was showing that it was a nuclear associated

179 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1 protein and thus there may have been issues of chromosomal DNA contamination making it

R difficult to extract the protein from the nuclear pellet. The addition of benzonase nuclease ’fixed’ this issue by degrading the chromosomal DNA and releasing the protein into the lysate.

A second reason for protein extraction issues may be due to protein aggregation. Aggregates of proteins including DISC1 tend to be insoluble (Ottis et al., 2011), even when treated with a detergent (Leliveld et al., 2008). This may explain the lack of DISC1 expression seen in many of the experiments undertaken here, due to the over-expression of DISC1 that is seemingly inherent to the transient expression methods used. It is perhaps unlikely then, if DISC1 is aggregating in the cells, that expression would ever be seen, although there is evidence that the level of aggregation is not always consistent even with transfection of the same construct.

Morris et al. (2003) found that 16% of transiently transfected cells showed large aggregates of DISC1 compared to 84% showing smaller punctuate structures. Given that the same group were able to show protein-protein interaction with DISC1 and several interacting partners by transient transfection and co-immunoprecipitation, it is possible that the smaller punctuate structures remain soluble within cell lysates and are able to be processed. The proportion of cells exhibiting high levels of aggregation compared with the smaller punctuate structures is likely correlated to the degree of over-expression within the cells. Therefore it is possible that the level of over-expression created in this study was higher on average than that achieved by Morris et al. (2003) and this would lead to the failure to detect DISC1 expression consistently in the current study.

There is no clear explanation as to why DISC1 expression was seen in some instances but not in the majority of cases. One hypothesis could be that the ‘random expression’ of DISC1 in this study was not truly random, but in fact was linked to small fluctuations in transfection efficiency achieved between experiments with regard to the number of plasmids taken up by each cell. When the efficiency was high the expressed protein aggregated, was not soluble and was thus not detected in western blot analysis. Although it is probably also true that if the efficiency was too low there would not be adequate expression of the DISC1 protein for it to be detected. The middle ground in transfection efficiency may have resulted in the expression remaining below a critical threshold and as such large aggregates were not formed, the protein remained soluble (at least in part) and expression was detectable. This hypothesis unfortunately cannot be confirmed from the results produced in this thesis.

180 4.3 Discussion

The problem was difficult to resolve using the methods in this study, as these fluctuations in transfection efficiency were random and very difficult to control. Attempts to reduce the amount of DISC1 expressed in each cell by reducing the total amount of plasmid added during transfection also failed to give consistent results, suggesting that other ways of re- ducing protein expression levels within the cells need to be tested. Alternatively there are methods that can be used to dissolve protein aggregates by addition of denaturants and reducing agents to the lysate; however, this is likely to further interfere with any protein in- teractions trying to be detected, rendering any results that may have been drawn somewhat dubious.

4.3.4.3 Future Directions for Validation

If the localisation and co-precipitation of the proteins from this study were to be further analysed a method for introducing DISC1 at lower concentrations is required. This may be achieved by using a vector with a weaker promoter or further optimising the transfection protocol to achieve consistently lower levels of DISC1 expression from the current construct (which may involve the use of a different cell line).

Alternative methods might also be considered including using viral vectors or electropo- ration to introduce the DISC1 protein to the cells at lower levels. Another method worth considering is RNA interference targeted to DISC1 that would knock down the DISC1 ex- pression enough to remove the potential for aggregate formation but to leave sufficient pro- tein to be detected. This method has been used in the past in an effort to reduce the cytotoxic effects of DISC1 over-expression in cell lines. Marley & von Zastrow (2010) used shRNA targeted to endogenous DISC1 which lowered the level of expression by at least 80%, which meant that upon transfection of a tagged DISC1 construct that was resistant to the shRNA the levels of DISC1 expression could be titrated to an optimal level.

The success of these experiments remains the next step in the analysis of the data gener- ated from the yeast two-hybrid screen. Any interactions confirmed by these methods could then be tested in light of the proposed hypothesis, i.e. assessment of effect on primary cilia. This could be achieved initially by the knock out of each protein in cells and assessment of phenotype.

181 4. DISCOVERY OF PROTEINS INTERACTING WITH DISC1

4.3.5 Conclusions

The yeast two-hybrid method of detecting interactions between proteins is a powerful tool to assess the unknown functions of a bait protein. Though it has a number of limitations and considerations involved, it remains one of the best methods to quickly identify a set of candidate genes for further analysis. During the implementation of this method all possible steps were taken to minimise the error inherent to this system, and thus the gene set obtained is the most accurate that could be obtained from the number of screens done.

The databases and bioinformatic tools available to assess the enrichment of gene sets ob- tained from these analyses are becoming more and more comprehensive as the functions of more and more genes are elucidated. However, it must be kept in mind that there are still gaps, that these databases only include the known functions of proteins, and for these rea- sons, there may be functions for the protein of interest that are not identified via enrichment analyses.

Together these two methods provide a powerful means to assess protein function as long as the results are treated as candidates and are confirmed by appropriate means. Given this, the unsuccessful attempts to confirm any of the identified interactions in human cells limited the interpretation and functional study of the data obtained.

4.3.5.1 Summary

This study (with the inclusion of the WTAC data) describes the largest yeast two-hybrid screen of DISC1 done to date, identifying a total of 822 interactions of which 797 were novel. Gene enrichment analysis of the resulting prey list suggested a number of putative roles for DISC1 functionally. Upon analysis of the gene set enrichment, the functions of the individual preys and the literature, it is hypothesised that DISC1 and a subset of its interacting partners play a role in the construction and maintenance of the primary cilium of cells. However, due to limitations with regard to the confirmation of protein interactions in human cells no actual assessment of this hypothesis could be undertaken as a part of this thesis.

182 Chapter 5

Interaction at a Genetic Level

5.1 Introduction

In the study of DISC1, analysis of interaction has been conducted largely at a molecular level, with numerous protein interactions being identified through yeast two-hybrid screens, and a handful of these being confirmed via independent means. However, there has been little assessment of interaction at a genetic level. Given that mental illness has been established as a complex disease state, and that there has been evidence of physical interaction between various proteins, it seems reasonable that there may be such interactions between genes oc- curring at a genetic level (epistasis) and that such a possibility should be explored.

5.1.1 DISC1 Interactions in the Literature

As was discussed in the previous chapter, with the function of DISC1 under investigation for some years now there have been a large number of interactions identified through yeast two-hybrid screens. A total of 119 interactions have been described in the literature (these are outlined in Appendix E, Summary of DISC1 Interactions). Of these interactions only a small number have been confirmed by secondary experiments or independent analysis (described in Table 5.1).

183 5. INTERACTION AT A GENETIC LEVEL

Table 5.1: Confirmed DISC1 Protein Interactions. Proteins identified as interacting partners of DISC1, by various yeast two-hybrid studies, that have been confirmed by at least one other method or an independently published yeast two-hybrid study, including the current study (only multiple independent hits from the current study are included).

Protein Chromosome Initial Method Reference Subsequent Methods References

ACTN2 1q42-q43 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) ATF5 19q13.3 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) Mammalian Two-Hybrid Morris et al. (2003) CEP63 3q22.2 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) Two-Hybrid Camargo et al. (2007) EIF3H 8q24.11 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) Two-Hybrid Camargo et al. (2007) MAP1A 15q15.3 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) RANBPD9 6p23 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) SPTBN4 19q13.13 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) SYNE1 6q25 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003) Two-Hybrid Camargo et al. (2007) TRAF3IP1 2q37.3 Two-Hybrid Morris et al. (2003) Affinity Capture Western Morris et al. (2003)) Two-Hybrid Camargo et al. (2007) ITSN1 21q22.1 Two-Hybrid Morris et al. (2003) Affinity Capture Western Wong et al. (2012) FEZ1 11q24.2 Two-Hybrid Miyoshi et al. (2003) Affinity Capture Western Miyoshi et al. (2003) IMMT 2p11.2 Two-Hybrid Millar et al. (2003) Two-Hybrid Camargo et al. (2007) Two-Hybrid Park et al. (2010) Affinity Capture Western Park et al. (2010) Co-localisation Park et al. (2010) Far Western Park et al. (2010) Reconstituted Complex Park et al. (2010) ANKHD1 5q31.3 Two-Hybrid Morris et al. (2003) Co-Immunoprecipitation Morris et al. (2003) NDEL1 17p13.1 Two-Hybrid Morris et al. (2003) Two-Hybrid Ozeki et al. (2003) Affinity Capture Western Morris et al. (2003) Affinity Capture Western Ozeki et al. (2003) Reconstituted Complex Brandon et al. (2004) Affinity Capture Western Brandon et al. (2004) Reconstituted Complex Ozeki et al. (2003) Two-Hybrid Camargo et al. (2007) Two-Hybrid Park et al. (2010) PCNT 21q22.3 Two-Hybrid Miyoshi et al. (2004) Co-Immunoprecipitation Miyoshi et al. (2004) Co-localisation Miyoshi et al. (2004) AKAP9 7q21-q22 Two-Hybrid Camargo et al. (2007) Two-Hybrid Millar et al. (2003) ATF4 22q13.1 Two-Hybrid Morris et al. (2003) Two-Hybrid Millar et al. (2003) KALRN 3q21.2 Two-Hybrid Camargo et al. (2007) Two-Hybrid Millar et al. (2003) SMARCE1 17q21.2 Two-Hybrid Camargo et al. (2007) Two-Hybrid Millar et al. (2003) SPTAN1 9q34.11 Two-Hybrid Camargo et al. (2007) Two-Hybrid Millar et al. (2003) CDC5L 6p21 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) CDK5RAP3 17q21.32 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) DCTN2 12q13.3 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) EXOC1 4q12 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) FRYL 4p11 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) OLFM1 9q34.3 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) PGK1 Xq13.3 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) SNX6 14q13.1 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC) UTRN 6q24 Two-Hybrid Camargo et al. (2007) Two-Hybrid This Study (including WTAC)

184 5.1 Introduction

5.1.2 Gene-Gene Epistatic Interaction

There has been a small amount of work undertaken with regard to genetic interactions of DISC1, but this research has been largely confined within DISC1 itself. When compared to the number of association studies of the DISC1 locus, there has been little attention given to the interaction of DISC1 with other genes.

Two studies have found interactions at the genetic level between DISC1 and two genes en- coding proteins that are confirmed to interact with DISC1 (based on the evidence shown in Table 5.1). The first of these studies identified an epistatic relationship with SNPs in DISC1 and NDEL1, and the second between DISC1 and FEZ1.

The first of these studies is described by Burdick et al. (2008). They found that in a cohort recruited from the Zucker Hillside Hospital (ZHH), risk of schizophrenia was increased in individuals carrying the G allele at NDEL1 rs1391768 with a Ser/Ser background at DISC1 Ser704Cys (c2 = 8.86, df = 1; p-value =0.003, OR[95%CI] = 2.44 [1.35-4.41]). This result was calculated by a likelihood ratio test in a backward stepwise regression analysis that was corrected for gender, age, IQ and the main effect of each SNP separately.

This group also found a marginally significant result between the same DISC1 SNP and the unconfirmed interaction partner NDE1 (rs3784859) in a slightly different but overlapping cohort. The association here was with individuals carrying the Cys allele at DISC1 Ser704Cys and the G allele at rs3784859 (c2 = 3.89, df = 1; p-value =0.049, OR[95%CI] = 2.00 [1.00-3.97]). Although the authors admit that this second result would not stand up to correction for multiple testing they found it interesting that the interactions identified with NDEL1 and NDE1 are on opposite DISC1 backgrounds. They suggested that these opposite epistatic interactions may be evidence of some kind of competitive imbalance of interaction between the protein products of these two genes and DISC1.

Kang et al. (2011) provide an example of an epistatic interaction between DISC1 and FEZ1. The study showed that across two cohorts there is an association with schizophrenia for in- teracting genotypes of Ser704Cys in DISC1 and rs12224788 in FEZ1. In their initial cohort, also from the Zucker Hillside Hospital, they found a significantly increased risk for schizo- phrenia when the C allele at rs12224788 was tested in the context of a Ser/Ser background of DISC1 Ser704Cys. This risk was identified first by chi-squared analysis (c2 = 4.75, df = 1; p-value =0.029, OR [95% CI] = 2.55 [1.1-6.0]; Fisher’s exact p-value = 0.046) and then by a likelihood ratio in a backwards stepwise regression (Beta = 0.54, p-value = 0.028). Replica- tion of the analysis in the GAIN schizophrenia cohort (using a proxy SNP in 100% linkage

185 5. INTERACTION AT A GENETIC LEVEL disequilibrium, rs1754605) shows similar results in the regression analysis, with the interac- tion term showing a significant association (Beta = -0.45, p-value = 0.039). This association is, however, in the opposite direction to that found in the ZHH cohort, a chi-squared analysis revealed that the interaction was this time in individuals with the GG genotype at the FEZ1 locus who were also Cys carriers at DISC1 (c2 = 2.83, df = 1; p-value =0.05, OR [95% CI] = 0.77 [not reported]). The authors claim that though the interactions are different, the pattern of the interactions are consistent, suggesting that although the interactions are with opposing genotype combinations, the direction of effect is also in opposing directions with the ZHH cohort showing risk (with rs12224788 GG x Ser704Cys Cys Carriers) while the GAIN cohort shows protection (with rs12224788 C carriers x Ser704Cys Ser/Ser).

Nicodemus et al. (2010) found an interaction between DISC1 and an unconfirmed protein interaction partner CIT. They reported that being homozygous for the dominant allele at both DISC1 rs1411771 and CIT rs10744743 predicted risk of schizophrenia (OR [95% CI] = 3.01 [1.37-6.98], LRT p-value = 0.007). The same analysis failed to find interaction between the same DISC1 SNP (rs1411771) and five other selected SNPs from CIT (rs440299, rs203340, rs3847960), PAFAH1B1 (rs7212450) and NDEL1 (rs4791707). The DISC1xCIT interaction iden- tified (found in an American cohort) was not replicated when tested in two further cohorts, one of which was the GAIN cohort and the second a pooled cohort of cases and controls from Aberdeen and Germany.

There are also studies that have found interactions with genes that have not been identified as protein interacting partners of DISC1. An interaction with COMT is described by Nicodemus et al. (2007), they identified that an increased number of Met alleles at Val158Met in COMT, in combination with a carrier status of the C allele at DISC1 rs7546310, showed an increased risk for schizophrenia in the National Institute of Mental Health sibling study cohort. When compared to individuals who were Val/Val (A/A) at rs7546310, those with a single copy of the Met allele had an increased risk of almost three-fold (OR[95%CI = 2.77 [1.09-7.02], p- value = 0.032, LRT p-value = 0.013), and in individuals with two copies of the Met allele the increase was over 14-fold (OR [95% CI =14.63 [2.03-105.5], p-value = 0.008).

An interaction has also been found between DISC1 and NRG1 (Mata et al., 2009). The pres- ence of a T allele at the NRG1 SNP rs6994992 in combination with a T/T genotype at DISC1 rs2793092, was determined to be associated with increased lateral ventricle volume in pa- tients with schizophrenia (F = 7.82, df = 2,54; p-value = 0.007).

186 5.1 Introduction

5.1.3 Aims of This Study

The interaction of DISC1 with a selection of SNPs from known DISC1 interacting partners will be investigated in this chapter. This analysis will be undertaken using the GAIN and non-GAIN datasets (as described in Chapter 2). Included in the analyses will be attempted replication of the findings from Kang et al. (2011) and Burdick et al. (2008) in the aforemen- tioned datasets. These two studies were chosen for validation analysis because their find- ings are with genes that are confirmed interacting partners at the protein level. Further SNPs from this list of confirmed protein interaction partners will then be selected and analysed for epistatic interaction with DISC1. Finally, a subset of the preys identified in the yeast two-hybrid analysis will be chosen for the same analysis. This set will include the proteins chosen for confirmation in the previous chapter among a group of proteins that have known or apparent functions in the construction and maintenance of the primary cilium.

187 5. INTERACTION AT A GENETIC LEVEL

5.2 Results

Full results not shown in this chapter can be found in Appendix E. Within this appendix, files are named according to the subsection to which they relate.

5.2.1 Validation of Published Epistatic Relationships

5.2.1.1 FEZ1 x DISC1 Interaction

This interaction was assessed in the GAIN schizophrenia dataset by the original authors (Kang et al., 2011), so to avoid overlap in sample data, in this case, the validation will be limited to the non-GAIN schizophrenia dataset.

As in the original analysis a proxy SNP (rs1754605) was used for the rs821616 (Ser704Cys) locus. The authors report that the r2 for these two SNPs is 100 and inspection of the SNPs in the European cohorts of the 1000 Genomes data set reveals an r2 of 98 as is shown in Figure 5.1.

Figure 5.1: Linkage Disequilibrium between rs1754605 and rs821616. A Haploview image showing the linkage disequilibrium between rs1754605 and rs821616 in the 1000 Genomes data. The image was generated using all five of the European populations (CEU, TSI, FIN, GBR and IBS), data formatted and downloaded from http://browser.1000genomes.org/Homo_sapiens/ UserData/Haploview. The metric shown is the r2 value between the SNPs.

188 5.2 Results

Initially the alleles for each SNP were grouped in the same manner as in the original analysis (carriers of one or two minor alleles were grouped together) and a c2 analysis was conducted using these data (see Table 5.2). The lack of association seen in the c2 analysis is confirmed by logistic regression analysis comparing the SNPxSNP interaction model against the null giving c2 = 0.03 with p-value of 0.870 (with 1 df).

Table 5.2: c2 Analysis for DISC1xFEZ1 Interaction in non-GAIN SCZ with Grouped Geno- types. Summary of the c2 analysis of individuals from the non-GAIN schizophrenia dataset falling into each of the grouped genotype classes, for DISC1 rs1754605 and FEZ1 rs12224788, normalised to the major allele homozygotes.

Frequency (%)

Genotype Controls Cases c2 p-value OR [95% CI] DISC1 /FEZ1 T,T/C,C 278 (0.206) 232 (0.202) 1.00 DISC1 /FEZ1 T,T/C,G or G,G 408 (0.303) 363 (0.316) 0.31 0.576 0.94 [0.75, 1.17] DISC1 /FEZ1 T,C or C,C/C,C 277 (0.206) 220 (0.191) 0.15 0.696 1.05 [0.82, 1.35] DISC1 /FEZ1 T,C or C,C/C,G or G,G 384 (0.285) 334 (0.291) 0.13 0.722 0.96 [0.76, 1.20]

The same analysis was repeated, this time without the grouping of alleles as there are suffi- cient individuals in the non-GAIN cohort that each genotype can be represented on its own (Table 5.3).

Table 5.3: c2 Analysis for DISC1xFEZ1 Interaction in non-GAIN SCZ. Summary of the c2 anal- ysis of individuals from the non-GAIN schizophrenia dataset falling into each of the genotype classes, for DISC1 rs1754605 and FEZ1 rs12224788, normalised to the major allele homozygotes.

Frequency (%)

Genotype Controls Cases c2 p-value OR [95% CI] DISC1 /FEZ1 T,T/C,C 278 (0.206) 232 (0.202) 1.00 DISC1 /FEZ1 T,T/C,G 301 (0.223) 271 (0.236) 0.39 0.534 0.93 [0.73, 1.18] DISC1 /FEZ1 T,T/G,G 107 (0.079) 92 (0.080) 0.03 0.859 0.97 [0.70, 1.35] DISC1 /FEZ1 T,C/C,C 224 (0.166) 176 (0.153) 0.20 0.654 1.06 [0.82, 1.38] DISC1 /FEZ1 T,C/C,G 235 (0.174) 212 (0.185) 0.36 0.549 0.93 [0.72, 1.19] DISC1 /FEZ1 T,C/G,G 77 (0.057) 70 (0.061) 0.21 0.648 0.92 [0.64, 1.33] DISC1 /FEZ1 C,C/C,C 53 (0.039) 44 (0.038) 0.00 0.981 1.01 [0.65, 1.55] DISC1 /FEZ1 C,C/C,G 51 (0.038) 37 (0.032) 0.36 0.549 1.15 [0.73, 1.82] DISC1 /FEZ1 C,C/G,G 21 (0.016) 15 (0.013) 0.20 0.656 1.17 [0.59, 2.32]

Again the logistic regression analysis shows no interaction between these two SNPs (c2 = 0.91 with p-value of 0.923 with 4 df). The results of the original study (Kang et al., 2011) are not confirmed in this dataset.

189 5. INTERACTION AT A GENETIC LEVEL

5.2.1.2 NDEL1 x DISC1 Interaction

The SNPs studied for epistasis between NDEL1 (rs1391768) and DISC1 (rs821616) were not genotyped on the Affymetrix 6.0 Chip. Although the DISC1 variant had a good proxy SNP in rs1754605, none of the SNPs in high linkage disequilibrium with rs1391768 (shown in Figure 5.2) were genotyped on the Affymetrix 6.0 platform either. This resulted in imputation of the NDEL1 SNP being necessary and subsequently undertaken.

Figure 5.2: Possible Proxy SNPs for rs1391768. A Haploview image showing possible proxy SNP candidates in NDEL1 for rs1391768. Image was generated using the Hapmap version 3 data. The metric shown is the r2 value between the SNPs, the black box with no numbers indicate an r2 value of 1.

The interaction between the imputed rs1391768 and rs1754605, with alleles grouped in the same manner as the original data, was estimated by c2 analysis (see Table 5.4). This analysis was conducted using the combined GAIN and non-GAIN schizophrenia dataset, as the origi- nal paper (Burdick et al., 2008) finds its association in a different schizophrenia dataset.

Logistic regression analysis of the SNPxSNP interaction gives c2 = 2.09 with p-value of 0.148 (with 1 df) showing that there is not a significant interaction between these two SNPs in this dataset. Again the analysis was repeated without the grouping of alleles (Table 5.5) and the same logistic regression analysis for the SNPxSNP interaction was undertaken giving c2 = 5.38 with p-value = 0.250. There is uncorrected evidence for an increased risk of schizo- phrenia in patients who are heterozygous at the DISC1 SNP and homozygous recessive at the NDEL1 SNP (Table 5.5) (OR [95%CI] = 1.32 [1.04, 1.67], p-value = 0.021), but overall there is no evidence of and epistatic interaction—the results of Burdick et al. (2008) are not confirmed in this dataset.

190 5.2 Results

Table 5.4: c2 Analysis for DISC1xNDEL1 Interaction in GAIN and non-GAIN SCZ with Grouped Genotypes. Summary of the c2 analysis of individuals from the GAIN and non-GAIN schizophrenia dataset falling into each of the grouped genotype classes, for DISC1 Ser704Cys and NDEL1 rs1391768, normalised to the major allele homozygotes.

Frequency (%)

Genotype Controls Cases c2 p-value OR [95% CI] DISC1 /NDEL1 T,T/A,A 371 (0.136) 355 (0.142) 1.00 DISC1 /NDEL1 T,T//A,G or G,G 1009 (0.370) 924 (0.370) 0.25 0.614 1.04 [0.88, 1.24] DISC1 /NDEL1 T,C or C,C/A,A 341 (0.125) 364 (0.146) 1.07 0.301 0.90 [0.73, 1.10] DISC1 /NDEL1 T,C or C,C/A,G or G,G 1004 (0.368) 857 (0.343) 1.70 0.192 1.12 [0.94, 1.33]

Table 5.5: c2 Analysis for DISC1xNDEL1 Interaction in GAIN and non-GAIN SCZ. Summary of the c2 analysis of individuals from the GAIN and non-GAIN schizophrenia dataset falling into each of the genotype classes, for DISC1 s1754605 and NDEL1 rs1391768, normalised to the major allele homozygotes.

Frequency (%)

Genotype Controls Cases c2 p-value OR [95% CI] DISC1 /NDEL1 T,T/A,A 371 (0.136) 355 (0.142) 1.00 DISC1 /NDEL1 T,T/A,G 720 (0.264) 663 (0.265) 0.18 0.675 1.04 [0.87, 1.24] DISC1 /NDEL1 T,T/G,G 289 (0.106) 261 (0.104) 0.26 0.609 1.06 [0.85, 1.32] DISC1 /NDEL1 T,C/A,A 284 (0.104) 311 (0.124) 1.49 0.223 0.87 [0.70, 1.09] DISC1 /NDEL1 T,C/A,G 555 (0.204) 511 (0.204) 0.16 0.689 1.04 [0.86, 1.26] DISC1 /NDEL1 T,C/G,G 270 (0.099) 196 (0.078) 5.34 0.021 1.32 [1.04, 1.67] DISC1 /NDEL1 C,C/A,A 57 (0.021) 53 (0.021) 0.02 0.889 1.03 [0.69, 1.54] DISC1 /NDEL1 C,C/A,G 130 (0.048) 105 (0.042) 1.27 0.261 1.18 [0.88, 1.59] DISC1 /NDEL1 C,C/G,G 49 (0.018) 45 (0.018) 0.04 0.851 1.04 [0.68, 1.60]

191 5. INTERACTION AT A GENETIC LEVEL

5.2.2 Assessment of the Confirmed Interacting Gene Set

To assess whether the confirmed interacting proteins have any evidence of association with mental illness, Q-Q plots of the observed versus expected p-values were generated for each of the datasets. All SNPs within each gene region were included in this analysis, where the gene region was defined as the base co-ordinates of the gene (according to NCBI build 37.1) rounded up or down to the nearest 1000 bp.

Genomic inflation and Q-Q plots are typically used as quality control methods to assess any unexpected deviation from an expected distribution of p-values which may be a result of substructure in the dataset. Here these same methods are used in a less conventional manner to assess if there is a deviation from the expected values present with a specific gene set that is thought to be associated with major psychiatric illness. An inflation factor of < 1.1 and a mean c2 of < 1 are expected under the null hypothesis (Yang et al., 2011a), therefore if the values exceed this there is evidence of some kind of stratification, which, in this case, may be due to the gene set being associated with psychiatric illness.

This analysis of inflation was undertaken first for the set of genes that had confirmed protein interaction with DISC1 in the published literature (see Table 5.1). The analysis was per- formed with both the combined genders and gender split datasets. The genomic inflation factors associated with each of these analyses along with the mean c2 values are detailed in Table 5.6. The plots generated for each of the individual datasets can be seen in Fig- ure 5.3 (GAIN bipolar disorder A-C, GAIN schizophrenia D-F and non-GAIN schizophrenia G-I).

Table 5.6: Individual Dataset Q-Q Analysis Results of Published Gene Set. Inflation factors and mean c2 results obtained for combined and gender split analyses of the individual data sets.

Dataset Cohort Inflation Factor c2 GAIN BPD Combined 1.00 0.99 Male 1.00 0.88 Female 1.14 1.06 GAIN SCZ Combined 1.30 1.14 Male 1.34 1.12 Female 1.15 1.11 Non GAIN SCZ Combined 1.27 1.19 Male 1.10 1.07 Female 1.17 1.08

192 5.2 Results

Figure 5.3: Q-Q Plots Of Published Gene Set in Individual Datasets. The observed versus

expected -Log10(P) for the published confirmed interacting partners of DISC1 in A) GAIN BPD Combined Genders (n=1166), B) GAIN BPD Males (n=1172), C) GAIN BPD Females (n=1190), D) GAIN SCZ Combined (n=1170), E) GAIN SCZ Males (n=1171), F) GAIN SCZ Females (n=1171), G) non-GAIN SCZ Combined (n=1207), H) non-GAIN SCZ Males (n=1206) and I) non-GAIN SCZ Females (n=1211). Note that the scale of the observed values (y-axis) is not always consis- tent. The number of SNPs (n) included in each plot are shown in parenthesis.

There is no evidence for inflation (values <1.1) in the GAIN bipolar dataset for combined or male only cohorts. There is suggestion of association in the female only cohort with an inflation factor of 1.14. There is evidence of inflation in both of the schizophrenia datasets with the combined analyses both having inflation factors of >1.2, and the males in the GAIN data reaching above 1.3.

193 5. INTERACTION AT A GENETIC LEVEL

To achieve more power by use of a larger dataset, the GAIN and non-GAIN schizophrenia cohorts were combined into a single dataset and the same analysis was repeated (Table 5.7 and Figure 5.4).

Table 5.7: Combined SCZ Dataset Q-Q Analysis Results for Published Gene Set. Inflation factors and mean c2 results obtained for combined GAIN and non-GAIN SCZ dataset.

Cohort Inflation Factor c2 Combined 1.13 1.17 Male 1.23 1.17 Female 1.12 1.09

Figure 5.4: Q-Q Plots Of Published Gene Set in Combined SCZ Dataset. The observed versus

expected -Log10(P) for the published confirmed interacting partners of DISC1 in A) Combined Genders (n=1187), B) Males (n=1185), C) Females (n=1189), in the combined GAIN and non- GAIN SCZ datasets. Note that the scale of the observed values (y-axis) is not always consistent. The number of SNPs (n) included in each plot are shown in parenthesis.

In this combined schizophrenia cohort there is some inflation seen in each of the combined, female and male analyses, with the males showing the largest inflation (l = 1.23).

To further increase the size of the cohort the GAIN bipolar disorder data was also added and again the analysis was repeated (see Table 5.8 and Figure 5.5). The inflation for the combined genders was increased in the larger dataset but both the male and female gender split analyses show less inflation.

194 5.2 Results

Table 5.8: Combined SCZ and BPD Dataset Q-Q Analysis Results for Published Gene Set. Inflation factors and mean c2 results obtained for combined GAIN and non-GAIN datasets.

Cohort Inflation Factor c2 Combined 1.27 1.09 Male 1.16 1.08 Female 1.08 0.98

Figure 5.5: Q-Q Plots Of Published Gene Set in Combined SCZ and BPD Dataset. The ob-

served versus expected -Log10(P) for the published confirmed interacting partners of DISC1 in A) Combined Genders (n=1187), B) Males (n=1188), C) Females (n=1190), in the combined GAIN and non-GAIN datasets. Note that the scale of the observed values (y-axis) is not always consis- tent. The number of SNPs (n) included in each plot are shown in parenthesis.

To assess which pathways the inflation seen in these datasets was potentially affecting, a functional enrichment analysis of the genes included in the inflation analysis was conducted using the ToppGene (http://toppgene.cchmc.org/) ToppFun method. The top results for each of Biological Process, Molecular Function and Cellular Component (as defined by Gene Ontology (www.geneontology.org/)) for this gene subset are shown in Table 5.9.

195 5. INTERACTION AT A GENETIC LEVEL

Table 5.9: ToppGene Enrichment Analysis of Confirmed Published Gene Set. Enrichment analysis results for Biological Process, Molecular Function and Cellular Component of the con- firmed published interactant gene set.

GO: Molecular Function Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 07 04 1 GO:0008092 cytoskeletal protein binding 9.32E 1.14E 8 792 06 04 2 GO:0032403 protein complex binding 2.97E 3.62E 8 924 05 03 3 GO:0005198 structural molecule activity 4.56E 5.56E 6 641 05 03 4 GO:0003779 actin binding 5.19E 6.33E 5 392 04 02 5 GO:0015631 tubulin binding 1.40E 1.71E 4 251

GO: Biological Process Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 08 05 1 GO:0048858 cell projection morphogenesis 6.96E 5.31E 9 821 08 05 2 GO:0032990 cell part morphogenesis 8.47E 6.46E 9 840 07 05 3 GO:0030030 cell projection organisation 1.31E 9.99E 10 1201 06 03 4 GO:0000902 cell morphogenesis 1.36E 1.03E 9 1165 06 03 5 GO:0032989 cellular component morphogenesis 2.32E 1.77E 9 1242

GO: Cellular Component Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 10 08 1 GO:0044430 cytoskeletal part 1.82E 2.80E 13 1477 10 08 2 GO:0005856 cytoskeleton 4.33E 6.66E 14 1993 10 07 3 GO:0015630 microtubule cytoskeleton 6.96E 1.07E 11 971 08 06 4 GO:0005815 microtubule organising centre 3.87E 5.96E 8 538 08 06 5 GO:0097458 neuron part 6.08E 9.36E 10 1130

196 5.2 Results

The complete analysis was then repeated, this time with the inclusion of the protein inter- actions that were found in both a published study and in this study (including the WTAC results) by yeast two-hybrid analysis (see Table 5.1). The analyses in the individual datasets are shown in Figure 5.6 and Table 5.10.

Figure 5.6: Q-Q Plots Of Complete Gene Set in Individual Datasets. The observed versus

expected -Log10(P) for the published confirmed interacting partners of DISC1 in A) GAIN BPD Combined Genders (n=1613), B) GAIN BPD Males (n=1621), C) GAIN BPD Females (n=1685), D) GAIN SCZ Combined (n=1618), E) GAIN SCZ Males (n=1618), F) GAIN SCZ Females (n=1626), G) non-GAIN SCZ Combined (n=1682), H) non-GAIN SCZ Males (n=1683) and I) non-GAIN SCZ Females (n=1692). Note that the scale of the observed values (y-axis) is not always consis- tent. The number of SNPs (n) included in each plot are shown in parenthesis.

A similar trend is seen here with the GAIN bipolar data now having inflation values of exclusively <1.1, and both of the schizophrenia datasets having inflation values of >1.1, though not as high as in the previous analysis.

197 5. INTERACTION AT A GENETIC LEVEL

Table 5.10: Individual Dataset Q-Q Analysis Results for Complete Gene Set. Inflation factors and mean c2 results obtained for combined and gender split analyses of the individual data sets.

Dataset Cohort Inflation Factor c2 GAIN BPD Combined 1.01 1.04 Male 1.05 1.03 Female 1.03 0.99 GAIN SCZ Combined 1.20 1.11 Male 1.03 1.02 Female 1.05 1.04 Non GAIN SCZ Combined 1.25 1.11 Male 1.13 1.10 Female 1.02 1.01

With the combining of the two schizophrenia datasets there were slight increases in both of the gender split analyses but the overall combined analysis showed no real change (see Table 5.11 and Figure 5.7).

Table 5.11: Combined SCZ Dataset Q-Q Analysis Results for Complete Gene Set. Inflation factors and mean c2 results obtained for combined GAIN and non-GAIN SCZ datasets.

Cohort Inflation Factor c2 Combined 1.20 1.19 Male 1.23 1.12 Female 1.19 1.14

Figure 5.7: Q-Q Plots Of Complete Gene Set in Combined SCZ Dataset. The observed versus

expected -Log10(P) for the published confirmed interacting partners of DISC1 in A) Combined Genders (n=1647), B) Males (n=1647), C) Females (n=1654), in the combined GAIN and non- GAIN SCZ datasets. Note that the scale of the observed values (y-axis) is not always consistent. The number of SNPs (n) included in each plot are shown in parenthesis.

198 5.2 Results

Finally the gene set was tested in the combined datasets for both schizophrenia and bipolar disorder (Table 5.12 and Figure 5.8).

Table 5.12: Combined SCZ and BPD Dataset Q-Q Analysis Results for Complete Gene Set. Inflation factors and mean c2 results obtained for combined GAIN and non-GAIN datasets.

Cohort Inflation Factor c2 Combined 1.34 1.16 Male 1.19 1.08 Female 1.18 1.08

Figure 5.8: Q-Q Plots Of Complete Gene Set in Combined SCZ and BPD Dataset. The ob-

served versus expected -Log10(P) for the published confirmed interacting partners of DISC1 in A) Combined Genders (n=1647), B) Males (n=1650), C) Females (n=1655), in the combined GAIN and non-GAIN datasets. Note that the scale of the observed values (y-axis) is not always consis- tent. The number of SNPs (n) included in each plot are shown in parenthesis.

Across all of the genes analysed for inflation there were no SNPs that, on their own, reach corrected significance for association with schizophrenia or bipolar disorder (results of these analyses are included in Appendix E).

Enrichment analysis was also repeated to include the additional genes that were used for this second set of inflation analysis. Again the results of this enrichment for Biological Pro- cess, Molecular Function and Cellular Component for this gene subset are shown in Table 5.13.

199 5. INTERACTION AT A GENETIC LEVEL

Table 5.13: ToppGene Enrichment Analysis of Complete Gene Set. Enrichment analysis results for Biological Process, Molecular Function and Cellular Component of the complete interactant gene set.

GO: Molecular Function Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 07 05 1 GO:0008092 cytoskeletal protein binding 2.04E 2.87E 10 792 07 04 2 GO:0032403 protein complex binding 8.43E 1.19E 10 924 05 03 3 GO:0030507 spectrin binding 1.14E 1.60E 3 28 05 03 4 GO:0003779 actin binding 2.89E 4.07E 6 392 04 02 5 GO:0005198 structural molecule activity 4.27E 6.02E 6 641

GO: Biological Process Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 06 03 1 GO:0048858 cell projection morphogenesis 2.89E 2.49E 9 821 06 03 2 GO:0048699 generation of neurons 3.34E 2.88E 11 1362 06 03 3 GO:0032990 cell part morphogenesis 3.49E 3.01E 9 840 06 03 4 GO:0051960 regulation of nervous system development 4.46E 3.84E 8 640 06 03 5 GO:0022008 neurogenesis 5.78E 4.98E 11 1441

GO: Cellular Component Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 10 07 1 GO:0044430 cytoskeletal part 6.44E 1.15E 15 1477 09 07 2 GO:0005856 cytoskeleton 4.02E 7.20E 16 1993 09 06 3 GO:0015630 microtubule cytoskeleton 7.44E 1.33E 12 971 08 06 4 GO:0097458 neuron part 4.03E 7.22E 12 1130 08 05 5 GO:0005815 microtubule organising centre 7.03E 1.26E 9 538

200 5.2 Results

5.2.3 Gene x Gene Interactions

To further investigate this set of confirmed DISC1 interacting partners, gene x gene inter- action analysis was conducted in a similar way to that done earlier with FEZ1 and NDEL1. A subset of SNPs from these interacting partners were selected and were then tested for interaction with DISC1 at the gene level.

5.2.3.1 SNP Selection

To create a subset of SNPs to be used in the gene x gene interaction analysis, all coding non- synonymous variants (according to dbSNP) in each of the genes were identified. The number of such variants for each gene are summarised in Table 5.14.

Given the large number of SNPs identified to be coding and non-synonymous, the list was limited to variants that had minor allele frequencies of 10% or more according to the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2012) (also summarised in Ta- ble 5.14).

Due to linkage disequilibrium, or a lack of independence, between pairs or groups of SNPs that are present within the same gene or chromosome, the list was further reduced to include only a single SNP from each region. Where SNPs had an r2 value of >70 only a single representative SNP was chosen, this avoided the problem of inadvertently performing the same test twice. The degree of linkage between groups of SNPs that were identified to be within the same gene or chromosome was assessed using Haploview and can be seen in Figure 5.9 A-F.

The same process was carried out for DISC1, of the 118 non-synonymous SNPs in the gene (also detailed in Appendix E) there were three with minor allele frequencies of greater than 10%, these three SNPs are rs3738401 (MAF = 0.34), rs6675281 (MAF = 0.12) and rs821616 (MAF = 0.30). There is no linkage disequilibrium between these SNPs with an r2 value of zero.

201 5. INTERACTION AT A GENETIC LEVEL

Table 5.14: Summary Table of non-Synonymous Variants. This table details the number of non- synonymous (NS) coding variants and the number of these variants with minor allele frequencies (MAF) of greater than 10%.

Gene Location NS SNPs MAF>10%

ACTN2 1q42-q43 102 0 AKAP9 7q21-q22 329 2 ANKHD1 5q31.3 34 0 ATF4 22q13.1 52 1 ATF5 19q13.3 38 1 CDC5L 6p21 50 0 CDK5RAP3 17q21.32 64 0 CEP63 3q22.2 54 0 DCTN2 12q13.3 35 0 EIF3H 8q24.11 21 0 EXOC1 4q12 59 0 FEZ1 11q24.2 83 0 FRYL 4p11 190 0 IMMT 2p11.2 69 1 ITSN1 21q22.1 104 0 KALRN 3q21.2 234 0 MAP1A 15q15.3 269 3 NDEL1 17p13.1 19 0 OLFM1 9q34.3 22 0 PCNT 21q22.3 486 17 PGK1 Xq13.3 49 0 RANBPD9 6p23 36 0 SMARCE1 17q21.2 25 0 SNX6 14q13.1 9 0 SPTAN1 9q34.11 153 0 SPTBN4 19q13.13 157 2 SYNE1 6q25 944 8 TRAF3IP1 2q37.3 71 2 UTRN 6q24 353 2

202 5.2 Results

Figure 5.9: Haploview Linkage Disequilibrium Patterns. Haploblock structures (or lack there of) showing r2 values for A) AKAP9, B) ATF5 and SPTBN4 (Chr19), C) IMMT and TRAF3IP1 (Chr 2), D) MAP1A, E) SYNE1 and UTRN (Chr 6) and F) PCNT. Images were generated in Haploview using the GAIN bipolar disorder dataset. Where there is a black box with no number indicated the r2 value is 1.

Some of the selected SNPs do not have genotyped proxies on the Affymetrx 6.0 chip and so, for consistency, where a SNP was not genotyped the data was imputed as is described in Chapter 2 Section 2.5.1. Details of the expected minor allele frequencies (from the 1000 genomes and Exome Sequencing Project) along with the imputed genotypes for the list of chosen SNPs across the three cohorts used are shown in Table 5.15.

203 5. INTERACTION AT A GENETIC LEVEL

Table 5.15: Minor Allele Frequency Assessment. The expected (1000 Genomes (1000G) Eu- rpean) frequency and observed imputed minor allele frequencies for the chosen SNPs.

Minor Allele Frequencies

GAIN SCZ non-GAIN SCZ GAIN BPD

Gene SNP Minor Allele 1000G Cases Controls Cases Controls Cases Controls

AKAP9 rs6960867* G 0.39 0.39 0.38 0.39 0.38 0.37 0.38 ATF4 rs4894 C 0.32 0.31 0.30 0.32 0.31 0.31 0.30 ATF5 rs283526 T 0.49 - - 0.48 0.48 0.47 0.48 DISC1 rs821616 A 0.30 0.29 0.29 0.28 0.29 0.27 0.29 rs3738401 T 0.34 0.32 0.30 0.32 0.32 0.32 0.30 rs6675281 T 0.12 0.14 0.16 0.14 0.14 0.14 0.15 IMMT rs1050301 A 0.37 0.36 0.33 0.34 0.33 0.34 0.34 MAP1A rs2245715 A 0.10 0.09 0.10 0.09 0.10 0.10 0.10 rs62020612 A 0.20 0.20 0.19 0.19 0.19 0.21 0.18 PCNT rs2839227* G 0.15 0.13 0.14 0.13 0.13 0.14 0.13 rs6518291* G 0.23 0.20 0.20 0.19 0.20 0.21 0.19 rs35940413 G 0.14 0.12 0.11 0.11 0.11 0.12 0.11 rs2839245 C 0.10 0.08 0.08 0.06 0.08 0.08 0.07 rs2070425 T 0.28 0.32 0.34 0.34 0.35 0.32 0.33 rs2073376 A 0.34 0.37 0.37 0.37 0.38 0.35 0.37 rs2073380 C 0.22 0.22 0.23 0.21 0.22 0.22 0.22 SPTBN4 rs73931308 A 0.16 0.12 0.12 0.12 0.13 0.11 0.13 rs814501 A 0.48 0.49 0.51 0.50 0.53 0.50 0.50 SYNE1 rs2252755 G 0.26 0.33 0.33 0.32 0.33 0.33 0.33 rs6911096* A 0.23 0.26 0.25 0.25 0.26 0.25 0.25 rs9479297* T 0.25 0.30 0.29 0.28 0.29 0.29 0.29 rs4645434* C 0.39 0.40 0.37 0.37 0.41 0.39 0.37 rs214950 A 0.22 0.23 0.22 0.22 0.23 0.24 0.22 rs214976 G 0.44 0.43 0.41 0.42 0.43 0.45 0.41 TRAF3IP1 rs12464423 T 0.32 0.32 0.31 0.32 0.33 0.33 0.31 UTRN rs1534443 G 0.24 0.24 0.23 0.23 0.24 0.23 0.23

* Denotes a genotyped SNP, all others have been imputed.

The ATF5 SNP, rs283526, failed Hardy-Weinberg Equilibrium testing in the GAIN schizo- phrenia cohort, but not in the GAIN bipolar disorder or non-GAIN schizophrenia cohorts, this SNP therefore remained in subsequent analyses that did not involve the GAIN schizo- phrenia data.

5.2.3.2 Main Effects Assessment

The association of each of the candidate SNPs with schizophrenia, bipolar disorder or com- bined psychiatric illness was assessed in each cohort using a simple case control analysis. The results of these analyses are shown in Tables 5.16 - 5.20.

204 5.2 Results

Table 5.16: Association Analysis of Epistasis Candidate SNPs in GAIN SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the GAIN schizophrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI) rs6960867* Case 0.389 509 (0.377) 632 (0.468) 210 (0.155) 0.679 0.851 1.023 Control 0.384 525 (0.381) 648 (0.47) 205 (0.149) (0.918, 1.141) rs4894 Case 0.313 634 (0.469) 589 (0.436) 128 (0.095) 0.339 0.606 1.058 Control 0.301 660 (0.479) 607 (0.440) 111 (0.081) (0.943, 1.187) rs821616 Case 0.286 686 (0.508) 558 (0.413) 107 (0.079) 0.990 0.990 1.001 Control 0.286 697 (0.506) 575 (0.417) 106 (0.077) (0.890, 1.126) rs3738401 Case 0.317 635 (0.470) 575( 0.426) 141 (0.104) 0.222 0.551 1.074 Control 0.302 668 (0.485) 588 (0.427) 122 (0.089) (0.958, 1.205) rs6675281 Case 0.137 1012 (0.749) 309 (0.229) 30 (0.022) 0.059 0.548 0.865 Control 0.155 990 (0.718) 350 (0.254) 38 (0.028) (0.744, 1.006) rs1050301 Case 0.356 575 (0.426) 591 (0.437) 185 (0.137) 0.066 0.548 1.111 Control 0.332 604 (0.438) 633 (0.459) 141 (0.102) (0.993, 1.242) rs2245715 Case 0.091 1120 (0.829) 215 (0.159) 16 (0.012) 0.236 0.551 0.8968 Control 0.101 1118 (0.811) 242 (0.176) 18 (0.013) (0.749, 1.074) rs62020612 Case 0.199 866 (0.641) 432 (0.320) 53 (0.039) 0.297 0.571 1.074 Control 0.188 902 (0.655) 434 (0.315) 42 (0.030) (0.939, 1.229) rs2839227* Case 0.132 1020 (0.755) 306 (0.226) 25 (0.019) 0.756 0.859 0.9755 Control 0.135 1034 (0.750) 317 (0.230) 27 (0.020) (0.835, 1.140) rs6518291* Case 0.202 859 (0.636) 439 (0.325) 53 (0.039) 0.715 0.851 1.025 Control 0.198 896 (0.650) 419 (0.304) 63 (0.046) (0.898, 1.171) rs35940413 Case 0.116 1052 (0.779) 284 (0.21) 15 (0.011) 0.196 0.551 1.118 Control 0.105 1103 (0.800) 260 (0.189) 15 (0.011) (0.944, 1.324) rs2839245 Case 0.081 1142 (0.845) 200 (0.148) 9 (0.007) 0.442 0.651 1.081 Control 0.075 1178 (0.855) 193 (0.140) 7 (0.005) (0.887, 1.317) rs2070425 Case 0.322 615 (0.455) 601 (0.445) 135 (0.100) 0.204 0.551 0.9295 Control 0.339 603 (0.438) 617 (0.448) 158 (0.115) (0.830, 1.040) rs2073376 Case 0.372 526 (0.389) 645 (0.477) 180 (0.133) 0.596 0.828 1.030 Control 0.365 563 (0.409) 624 (0.453) 191 (0.139) (0.923, 1.150) rs2073380 Case 0.221 814 (0.603) 476 (0.352) 61 (0.045) 0.369 0.615 0.9435 Control 0.232 806 (0.585) 506 (0.367) 66 (0.048) (0.831, 1.071) rs73931308 Case 0.119 1044 (0.773) 292 (0.216) 15 (0.011) 0.948 0.988 0.995 Control 0.120 1059 (0.769) 308 (0.224) 11 (0.008) (0.845, 1.171) rs814501 Case 0.489 328 (0.243) 725 (0.537) 298 (0.221) 0.243 0.551 0.9387 Control 0.505 335 (0.243) 694 (0.504) 348 (0.253) (0.844, 1.044) rs2252755 Case 0.331 586 (0.434) 637 (0.472) 128 (0.095) 0.861 0.936 0.990 Control 0.333 623 (0.452) 593 (0.430) 162 (0.118) (0.885, 1.108) rs6911096* Case 0.260 743 (0.550) 513 (0.380) 95 (0.070) 0.229 0.551 1.078 Control 0.246 779 (0.565) 520 (0.377) 79 (0.057) (0.954, 1.218) rs9479297* Case 0.301 670 (0.496) 548 (0.406) 133 (0.098) 0.286 0.571 1.065 Control 0.288 705 (0.512) 552 (0.401) 121 (0.088) (0.948, 1.197) rs4645434* Case 0.400 498 (0.369) 625 (0.463) 228 (0.169) 0.028 0.548 1.130 Control 0.371 555 (0.403) 623 (0.452) 200 (0.145) (1.013, 1.260) rs214950 Case 0.225 812 (0.601) 470 (0.348) 69 (0.051) 0.696 0.851 1.026 Control 0.221 836 (0.607) 476 (0.345) 66 (0.048) (0.903, 1.165) rs214976 Case 0.431 435 (0.322) 667 (0.494) 249 (0.184) 0.091 0.551 1.097 Control 0.409 478 (0.347) 674 (0.489) 226 (0.164) (0.985, 1.222) rs12464423 Case 0.319 625 (0.463) 590 (0.437) 136 (0.101) 0.415 0.648 1.049 Control 0.309 645 (0.468) 615 (0.446) 118 (0.086) (0.935, 1.176) rs1534443 Case 0.243 768 (0.568) 509 (0.377) 74 (0.055) 0.194 0.551 1.086 Control 0.228 816 (0.592) 495 (0.359) 67 (0.049) (0.959, 1.231)

* Denotes a genotyped SNP, all others have been imputed.

205 5. INTERACTION AT A GENETIC LEVEL

Table 5.17: Association Analysis of Epistasis Candidate SNPs in non-GAIN SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the non-GAIN schizophrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI) rs6960867* Case 0.390 430 (0.374) 542 (0.472) 177 (0.154) 0.565 0.810 1.034 Control 0.382 522 (0.388) 621 (0.461) 204 (0.151) (0.922, 1.159) rs4894 Case 0.321 534 (0.465) 492 (0.428) 123 (0.107) 0.515 0.810 1.041 Control 0.313 632 (0.469) 588 (0.437) 127 (0.094) (0.923, 1.173) rs283526 Case 0.479 299 (0.260) 600 (0.522) 250 (0.218) 0.884 0.884 1.008 Control 0.477 373 (0.277) 664 (0.493) 310 (0.230) (0.902, 1.127) rs821616 Case 0.282 597 (0.520) 456 (0.397) 96 (0.084) 0.616 0.810 0.969 Control 0.288 691 (0.513) 535 (0.397) 121 (0.090) (0.857, 1.096) rs3738401 Case 0.323 537 (0.467) 481 (0.419) 131 (0.114) 0.555 0.810 1.037 Control 0.316 629 (0.467) 586 (0.435) 132 (0.098) (0.920, 1.168) rs6675281 Case 0.138 851 (0.741) 280 (0.244) 18 (0.016) 0.834 0.867 0.983 Control 0.140 995 (0.739) 328 (0.244) 24 (0.018) (0.837, 1.155) rs1050301 Case 0.342 506 (0.440) 501 (0.436) 142 (0.124) 0.502 0.810 1.041 Control 0.333 603 (0.448) 592 (0.439) 152 (0.113) (0.926, 1.171) rs2245715 Case 0.090 951 (0.828) 190 (0.165) 8 (0.007) 0.205 0.810 0.884 Control 0.100 1087 (0.807) 250 (0.186) 10 (0.007) (0.731, 1.070) rs62020612 Case 0.190 757 (0.659) 348 (0.303) 44 (0.038) 0.743 0.840 0.977 Control 0.193 880 (0.653) 413 (0.307) 54 (0.040) (0.848, 1.125) rs2839227* Case 0.135 853 (0.742) 283 (0.246) 13 (0.011) 0.504 0.810 1.058 Control 0.128 1027 (0.762) 295 (0.219) 25 (0.019) (0.897, 1.247) rs6518291* Case 0.189 746 (0.649) 371 (0.323) 32 (0.028) 0.372 0.801 0.938 Control 0.199 869 (0.645) 419 (0.311) 59 (0.044) (0.815, 1.080) rs35940413 Case 0.108 912 (0.794) 225 (0.196) 12 (0.010) 0.831 0.867 0.981 Control 0.110 1075 (0.798) 247 (0.183) 25 (0.019) (0.821, 1.172) rs2839245 Case 0.064 1005 (0.875) 140 (0.122) 4 (0.003) 0.033 0.283 0.790 Control 0.080 1148 (0.852) 182 (0.135) 17 (0.013) (0.636, 0.981) rs2070425 Case 0.344 505 (0.440) 498 (0.433) 146 (0.127) 0.586 0.810 0.968 Control 0.351 571 (0.424) 606 (0.450) 170 (0.126) (0.861, 1.088) rs2073376 Case 0.366 465 (0.405) 528 (0.460) 156 (0.136) 0.445 0.810 0.956 Control 0.376 519 (0.385) 643 (0.477) 185 (0.137) (0.852, 1.073) rs2073380 Case 0.214 709 (0.617) 388 (0.338) 52 (0.045) 0.698 0.840 0.974 Control 0.219 820 (0.609) 465 (0.345) 62 (0.046) (0.851, 1.115) rs73931308 Case 0.118 890 (0.775) 246 (0.214) 13 (0.011) 0.124 0.801 0.876 Control 0.133 1000 (0.742) 336 (0.249) 11 (0.008) (0.740, 1.037) rs814501 Case 0.500 275 (0.239) 599 (0.521) 275 (0.239) 0.024 0.283 0.880 Control 0.532 290 (0.215) 681 (0.506) 376 (0.279) (0.740, 0.984 ) rs2252755 Case 0.324 527 (0.459) 500 (0.435) 122 (0.106) 0.721 0.840 0.979 Control 0.329 617 (0.458) 575 (0.427) 155 (0.115) (0.869, 1.102) rs6911096* Case 0.250 644 (0.560) 435 (0.379) 70 (0.061) 0.257 0.810 0.929 Control 0.264 722 (0.536) 538 (0.399) 87 (0.065) (0.818, 1.055) rs9479297* Case 0.278 619 (0.539) 422 (0.367) 108 (0.094) 0.522 0.810 0.960 Control 0.286 710 (0.527) 504 (0.374) 133 (0.099) (0.849, 1.087) rs4645434* Case 0.370 448 (0.390) 551 (0.480) 150 (0.131) 0.011 0.283 0.863 Control 0.405 473 (0.351) 656 (0.487) 218 (0.162) (0.770, 0.967) rs214950 Case 0.221 696 (0.606) 399 (0.347) 54 (0.047) 0.498 0.810 0.955 Control 0.229 808 (0.600) 462 (0.343) 77 (0.057) (0.836, 1.091) rs214976 Case 0.417 390 (0.339) 560 (0.487) 199 (0.173) 0.279 0.810 0.940 Control 0.432 446 (0.331) 638 (0.474) 263 (0.195) (0.840, 1.052) rs12464423 Case 0.318 545 (0.474) 478 (0.416) 126 (0.110) 0.516 0.810 0.961 Control 0.326 606 (0.450) 603 (0.448) 138 (0.102) (0.853, 1.083) rs1534443 Case 0.234 673 (0.586) 415 (0.361) 61 (0.053) 0.571 0.810 0.963 Control 0.241 764 (0.567) 518 (0.385) 65 (0.048) (0.845, 1.097)

* Denotes a genotyped SNP, all others have been imputed.

206 5.2 Results

Table 5.18: Association Analysis of Epistasis Candidate SNPs in Combined SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the combined GAIN and non-GAIN schizo- phrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI) rs6960867* Case 0.390 939 (0.376) 1174 (0.470) 387 (0.155) 0.485 0.948 1.029 Control 0.383 1047 (0.384) 1269 (0.466) 409 (0.150) (0.951, 1.113) rs4894 Case 0.317 1168 (0.467) 1081 (0.432) 251 (0.100) 0.270 0.926 1.048 Control 0.307 1292 (0.474) 1195 (0.439) 238 (0.087) (0.964, 1.138) rs821616 Case 0.284 1283 (0.513) 1014 (0.406) 203 (0.081) 0.737 0.948 0.986 Control 0.287 1388 (0.509) 1110 (0.407) 227 (0.083) (0.905, 1.073) rs3738401 Case 0.320 1172 (0.469) 1056 (0.422) 272 (0.109) 0.211 0.843 1.054 Control 0.309 1297 (0.476) 1174 (0.431) 254 (0.093) (0.971, 1.145) rs6675281 Case 0.137 1863 (0.745) 589 (0.236) 48 (0.019) 0.138 0.826 0.92 Control 0.147 1985 (0.728) 678 (0.249) 62 (0.023) (0.824, 1.027) rs1050301 Case 0.349 1081 (0.432) 1092 (0.437) 327 (0.131) 0.068 0.674 1.078 Control 0.332 1207 (0.443) 1225 (0.450) 293 (0.108) (0.994, 1.169) rs2245715 Case 0.091 2071 (0.828) 405 (0.162) 24 (0.010) 0.084 0.674 0.891 Control 0.101 2205 (0.809) 492 (0.181) 28 (0.010) (0.782, 1.016) rs62020612 Case 0.195 1623 (0.649) 780 (0.312) 97 (0.039) 0.590 0.948 1.027 Control 0.191 1782 (0.654) 847 (0.311) 96 (0.035) (0.932, 1.132) rs2839227* Case 0.133 1873 (0.749) 589 (0.236) 38 (0.015) 0.807 0.948 1.014 Control 0.131 2061 (0.756) 612 (0.225) 52 (0.019) (0.906, 1.136) rs6518291* Case 0.189 746 (0.649) 371 (0.323) 32 (0.028) 0.926 0.926 0.938 Control 0.199 869 (0.645) 419 (0.311) 59 (0.044) (0.815, 1.080) rs35940413 Case 0.113 1964 (0.786) 509 (0.204) 27 (0.011) 0.425 0.926 1.051 Control 0.108 2178 (0.799) 507 (0.186) 40 (0.015) (0.930, 1.188) rs2839245 Case 0.073 2147 (0.859) 340 (0.136) 13 (0.005) 0.394 0.926 0.939 Control 0.078 2326 (0.854) 375 (0.138) 24 (0.009) (0.812, 1.086) rs2070425 Case 0.332 1120 (0.448) 1099 (0.440) 281 (0.112) 0.175 0.840 0.945 Control 0.345 1174 (0.431) 1223 (0.449) 328 (0.120) (0.872, 1.025) rs2073376 Case 0.369 991 (0.396) 1173 (0.469) 336 (0.134) 0.877 0.948 0.994 Control 0.371 1082 (0.397) 1267 (0.465) 376 (0.138) (0.918, 1.076) rs2073380 Case 0.218 1523 (0.609) 864 (0.346) 113 (0.045) 0.380 0.926 0.960 Control 0.225 1626 (0.597) 971 (0.356) 128 (0.047) (0.875, 1.052) rs814501 Case 0.494 603 (0.241) 1324 (0.530) 573 (0.229) 0.014 0.325 0.908 Control 0.518 625 (0.229) 1375 (0.505) 724 (0.266) (0.841, 0.980) rs2252755 Case 0.327 1113 (0.445) 1137 (0.455) 250 (0.100) 0.725 0.948 0.985 Control 0.331 1240 (0.455) 1168 (0.429) 317 (0.116) (0.908, 1.069) rs6911096* Case 0.256 1387 (0.555) 948 (0.379) 165 (0.066) 0.948 0.948 1.003 Control 0.255 1501 (0.551) 1058 (0.388) 166 (0.061) (0.918, 1.095) rs9479297* Case 0.290 1289 (0.516) 970 (0.388) 241 (0.096) 0.699 0.948 1.017 Control 0.287 1415 (0.519) 1056 (0.388) 254 (0.093) (0.934, 1.107) rs4645434* Case 0.386 946 (0.378) 1176 (0.470) 378 (0.151) 0.861 0.948 0.993 Control 0.388 1028 (0.377) 1279 (0.469) 418 (0.153) (0.918, 1.074) rs214950 Case 0.223 1508 (0.603) 869 (0.348) 123 (0.049) 0.846 0.948 0.991 Control 0.225 1644 (0.603) 938 (0.344) 143 (0.052) (0.904, 1.086) rs214976 Case 0.425 825 (0.330) 1227 (0.491) 448 (0.179) 0.648 0.948 1.018 Control 0.420 924 (0.339) 1312 (0.481) 489 (0.179) (0.942, 1.101) rs12464423 Case 0.318 1170 (0.468) 1068 (0.427) 262 (0.105) 0.915 0.948 1.004 Control 0.317 1251 (0.459) 1218 (0.447) 256 (0.094) (0.925, 1.091) rs1534443 Case 0.239 1441 (0.576) 924 (0.370) 135 (0.054) 0.590 0.948 1.025 Control 0.234 1580 (0.580) 1013 (0.372) 132 (0.048) (0.937, 1.122)

* Denotes a genotyped SNP, all others have been imputed.

207 5. INTERACTION AT A GENETIC LEVEL

Table 5.19: Association Analysis of Epistasis Candidate SNPs in BPD. Case-Control analysis of the SNP candidates for epistasis, tested in the GAIN bipolar disorder cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI) rs6960867* Case 0.369 262 (0.404) 294 (0.454) 92 (0.142) 0.397 0.793 0.940 Control 0.383 390 (0.380) 484 (0.472) 151 (0.147) (0.814, 1.085) rs4894 Case 0.315 304 (0.469) 280 (0.432) 64 (0.099) 0.487 0.796 1.055 Control 0.303 488 (0.476) 452 (0.441) 85 (0.083) (0.908, 1.226) rs283526 Case 0.475 167 (0.258) 347 (0.535) 134 (0.207) 0.656 0.902 0.969 Control 0.482 258 (0.252) 545 (0.532) 222 (0.217) (0.843, 1.114) rs821616 Case 0.273 345 (0.532) 252 (0.389) 51 (0.079) 0.408 0.793 0.937 Control 0.286 512 (0.500) 439 (0.428) 74 (0.072) (0.802, 1.094) rs3738401 Case 0.317 307 (0.474) 271 (0.418) 70 (0.108) 0.324 0.793 1.079 Control 0.301 500 (0.488) 433 (0.422) 92 (0.090) (0.928, 1.254) rs6675281 Case 0.140 476 (0.735) 163 (0.252) 9 (0.014) 0.490 0.796 0.9323 Control 0.148 743 (0.725) 260 (0.254) 22 (0.021) (0.764, 1.137) rs1050301 Case 0.342 275 (0.424) 303 (0.468) 70 (0.108) 0.822 0.977 1.017 Control 0.338 438 (0.427) 481 (0.469) 106 (0.103) (0.878, 1.178) rs2245715 Case 0.103 521 (0.804) 121 (0.187) 6 (0.009) 0.806 0.977 1.029 Control 0.100 834 (0.814) 177 (0.173) 14 (0.014) (0.818, 1.296) rs62020612 Case 0.205 406 (0.627) 218 (0.336) 24 (0.037) 0.136 0.793 1.142 Control 0.184 679 (0.662) 314 (0.306) 32 (0.031) (0.959, 1.361) rs2839227* Case 0.140 472 (0.728) 171 (0.264) 5 (0.008) 0.566 0.865 1.061 Control 0.133 771 (0.752) 236 (0.230) 18 (0.018) (0.867, 1.299) rs6518291* Case 0.206 398 (0.614) 233 (0.360) 17 (0.026) 0.383 0.793 1.08 Control 0.194 671 (0.655) 311 (0.303) 43 (0.042) (0.908, 1.285) rs35940413 Case 0.118 503 (0.776) 137 (0.211) 8 (0.012) 0.293 0.793 1.125 Control 0.106 820 (0.800) 192 (0.187) 13 (0.013) (0.903, 1.401) rs2839245 Case 0.081 548 (0.846) 95 (0.147) 5 (0.008) 0.375 0.793 1.125 Control 0.073 883 (0.861) 135 (0.132) 7 (0.007) (0.867, 1.459) rs2070425 Case 0.316 298 (0.460) 291 (0.449) 59 (0.091) 0.427 0.816 0.941 Control 0.329 464 (0.453) 448 (0.437) 113 (0.110) (0.811, 1.093) rs2073376 Case 0.353 266 (0.410) 307 (0.474) 75 (0.116) 0.218 0.793 0.913 Control 0.374 408 (0.398) 468 (0.457) 149 (0.145) (0.790, 1.055) rs2073380 Case 0.218 392 (0.605) 229 (0.353) 27 (0.042) 0.659 0.902 0.963 Control 0.225 612 (0.597) 365 (0.356) 48 (0.047) (0.814, 1.139) rs73931308 Case 0.108 515 (0.795) 126 (0.194) 7 (0.011) 0.111 0.793 0.8375 Control 0.126 775 (0.756) 241 (0.235) 9 (0.009) (0.673, 1.042) rs814501 Case 0.499 153 (0.236) 343 (0.529) 152 (0.235) 0.922 0.989 0.993 Control 0.501 249 (0.243) 525 (0.512) 251 (0.245) (0.864, 1.141) rs2252755 Case 0.330 303 (0.468) 263 (0.406) 82 (0.127) 0.848 0.977 0.986 Control 0.333 456 (0.445) 456 (0.445) 113 (0.110) (0.850, 1.143) rs6911096* Case 0.249 356 (0.549) 261 (0.403) 31 (0.048) 0.951 0.989 1.005 Control 0.248 581 (0.567) 379 (0.370) 65 (0.063) (0.856, 1.180) rs9479297* Case 0.294 314 (0.485) 287 (0.443) 47 (0.073) 0.864 0.977 1.013 Control 0.291 529 (0.516) 395 (0.385) 101 (0.099) (0.870, 1.181) rs4645434* Case 0.387 238 (0.367) 318 (0.491) 92 (0.142) 0.268 0.793 1.084 Control 0.368 417 (0.407) 461 (0.450) 147 (0.143) (0.940, 1.252) rs214950 Case 0.243 367 (0.566) 247 (0.381) 34 (0.052) 0.087 0.793 1.155 Control 0.218 623 (0.608) 358 (0.349) 44 (0.043) (0.980, 1.362) rs214976 Case 0.450 191 (0.295) 331 (0.511) 126 (0.194) 0.012 0.315 1.197 Control 0.406 360 (0.351) 498 (0.486) 167 (0.163) (1.040, 1.378) rs12464423 Case 0.326 283 (0.437) 307 (0.474) 58 (0.090) 0.424 0.793 1.063 Control 0.313 473 (0.461) 462 (0.451) 90 (0.088) (0.916, 1.233) rs1534443 Case 0.234 376 (0.580) 241 (0.372) 31 (0.048) 0.993 0.993 1.001 Control 0.234 592 (0.578) 387 (0.378) 46 (0.045) (0.849, 1.180)

* Denotes a genotyped SNP, all others have been imputed.

208 5.2 Results

Table 5.20: Association Analysis of Epistasis Candidate SNPs in Combined BPD and SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the combined GAIN bipolar disorder, GAIN schizophrenia and non-GAIN schizophrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI) rs6960867* Case 0.385 1201 (0.382) 1468 (0.466) 479 (0.152) 0.791 0.904 1.010 Control 0.383 1047 (0.384) 1269 (0.466) 409 (0.150) (0.938, 1.088) rs4894 Case 0.316 1472 (0.468) 1361 (0.432) 315 (0.100) 0.261 0.801 1.046 Control 0.307 1292 (0.474) 1195 (0.439) 238 (0.087) (0.967, 1.131) rs821616 Case 0.282 1628 (0.517) 1266 (0.402) 254 (0.081) 0.533 0.904 0.975 Control 0.287 1388 (0.509) 1110 (0.407) 227 (0.083) (0.900, 1.056) rs3738401 Case 0.319 1479 (0.470) 1327 (0.422) 342 (0.109) 0.209 0.801 1.051 Control 0.309 1297 (0.476) 1174 (0.431) 254 (0.093) (0.972, 1.137) rs6675281 Case 0.138 2339 (0.743) 752 (0.239) 57 (0.018) 0.137 0.801 0.924 Control 0.147 1985 (0.728) 678 (0.249) 62 (0.023) (0.833, 1.025) rs1050301 Case 0.348 1356 (0.431) 1395 (0.443) 397 (0.126) 0.079 0.634 1.071 Control 0.332 1207 (0.443) 1225 (0.450) 293 (0.108) (0.992, 1.156) rs2245715 Case 0.093 2592 (0.823) 526 (0.167) 30 (0.010) 0.171 0.801 0.918 Control 0.101 2205 (0.809) 492 (0.181) 28 (0.010) (0.812, 1.038) rs62020612 Case 0.197 2029 (0.645) 998 (0.317) 121 (0.038) 0.389 0.848 1.041 Control 0.191 1782 (0.654) 847 (0.311) 96 (0.035) (0.950, 1.141) rs2839227* Case 0.134 2345 (0.745) 760 (0.241) 43 (0.014) 0.634 0.904 1.026 Control 0.131 2061 (0.756) 612 (0.225) 52 (0.019) (0.922, 1.142) rs6518291* Case 0.198 2003 (0.636) 1043 (0.331) 102 (0.032) 0.949 0.949 0.997 Control 0.199 1765 (0.648) 838 (0.308) 122 (0.045) (0.910, 1.092) rs35940413 Case 0.114 2467 (0.784) 646 (0.205) 35 (0.011) 0.300 0.801 1.063 Control 0.108 2178 (0.799) 507 (0.186) 40 (0.015) (0.947, 1.193) rs2839245 Case 0.075 2695 (0.856) 435 (0.138) 18 (0.006) 0.568 0.904 0.961 Control 0.078 2326 (0.854) 375 (0.138) 24 (0.009) (0.838, 1.102) rs2070425 Case 0.329 1418 (0.450) 1390 (0.442) 340 (0.108) 0.067 0.634 0.931 Control 0.345 1174 (0.431) 1223 (0.449) 328 (0.120) (0.862, 1.005) rs2073376 Case 0.366 1257 (0.399) 1480 (0.470) 411 (0.131) 0.588 0.904 0.979 Control 0.371 1082 (0.397) 1267 (0.465) 376 (0.138) (0.909, 1.056) rs2073380 Case 0.218 1915 (0.608) 1093 (0.347) 140 (0.044) 0.358 0.848 0.960 Control 0.225 1626 (0.597) 971 (0.356) 128 (0.047) (0.880, 1.047) rs814501 Case 0.495 756 (0.240) 1667 (0.530) 725 (0.230) 0.013 0.301 0.912 Control 0.517 625 (0.229) 1375 (0.505) 724 (0.266) (0.848, 0.980) rs2252755 Case 0.328 1416 (0.450) 1400 (0.445) 332 (0.105) 0.746 0.904 0.987 Control 0.331 1240 (0.455) 1168 (0.429) 317 (0.116) (0.914, 1.067) rs6911096* Case 0.254 1743 (0.554) 1209 (0.384) 196 (0.062) 0.925 0.949 0.996 Control 0.255 1501 (0.551) 1058 (0.388) 166 (0.061) (0.917, 1.082) rs9479297* Case 0.291 1603 (0.509) 1257 (0.399) 288 (0.091) 0.620 0.904 1.020 Control 0.287 1415 (0.519) 1056 (0.388) 254 (0.093) (0.942, 1.105) rs4645434* Case 0.387 1184 (0.376) 1494 (0.475) 470 (0.149) 0.870 0.949 0.994 Control 0.388 1028 (0.377) 1279 (0.469) 418 (0.153) (0.923, 1.071) rs214950 Case 0.227 1875 (0.596) 1116 (0.355) 157 (0.050) 0.743 0.904 1.015 Control 0.225 1644 (0.603) 938 (0.344) 143 (0.052) (0.930, 1.107) rs214976 Case 0.430 1016 (0.323) 1558 (0.495) 574 (0.182) 0.293 0.801 1.04 Control 0.420 924 (0.339) 1312 (0.481) 489 (0.179) (0.967, 1.119) rs12464423 Case 0.320 1453 (0.462) 1375 (0.437) 320 (0.102) 0.762 0.904 1.012 Control 0.317 1251 (0.459) 1218 (0.447) 256 (0.094) (0.936, 1.094) rs1534443 Case 0.238 1817 (0.577) 1165 (0.370) 166 (0.053) 0.660 0.904 1.019 Control 0.234 1580 (0.580) 1013 (0.372) 132 (0.048) (0.936, 1.110)

* Denotes a genotyped SNP, all others have been imputed.

209 5. INTERACTION AT A GENETIC LEVEL

Of note is rs814501 (in SPTBN4), which reaches uncorrected significance in the non-GAIN (p-value = 0.024) and combined schizophrenia cohorts (p-value = 0.014) as well as in the combined psychiatric illness cohort (p-value = 0.013). The minor allele frequency of this SNP is high (close to 0.5) and as such the allele with the lower frequency is not the same in all cohorts, for the purpose of this study the minor allele at this SNP was set to be the A allele which is the minor allele in the 1000 Genomes dataset.

A second SNP rs4645434 (in SYNE1) reaches uncorrected significance in GAIN schizophrenia (p-value = 0.028) and in non-GAIN schizophrenia (p-value = 0.011) but in the opposite direc- tions. A single SNP reaches uncorrected significance in the GAIN bipolar disorder cohort (rs214976 (in SYNE1), p-value = 0.012). Finally, one SNP (rs73931308) in SPTBN4, failed Hardy-Weinberg equilibrium tests in the combined schizophrenia cohort, and as such was excluded from subsequent analyses where combined schizophrenia was included.

5.2.4 Assessment of Epistasis - Confirmed Gene Set

Each of the chosen SNPs was analysed for epistatic interaction with each of the three DISC1 SNPs (rs821616, rs3734801 and rs6675281) in the individual and combined GAIN and non- GAIN schizophrenia and bipolar disorder datasets. All data that is not shown here can be found in Appendix E.

Interaction was first assessed by c2 analysis for all genotype combinations of the two SNPs, normalised to the major allele homozygotes. A likelihood ratio test of a logistic regression comparing the interaction model to the null was then conducted for all SNP pairs. Where a particular SNP-SNP combination had low or no individuals for one of the genotypes (com- monly the double minor allele homozygotes - 2,2/2,2) the minor allele homozygotes (2,2) and the heterozygotes (1,2) were grouped (as was carried out in Section 5.2.1) and the same analysis was undertaken.

5.2.4.1 GAIN Schizophrenia

The analysis of interaction in the GAIN schizophrenia dataset revealed six interactions that showed a significant overall interaction effect (as detailed below). The remainder of the SNPs analysed in this dataset showed no evidence of epistatic relationship with the DISC1 SNPs tested. Results for all logistic regression analyses in the GAIN schizophrenia dataset are shown in Table 5.21.

210 5.2 Results

Table 5.21: Summary Of Regression Analyses in GAIN SCZ. Summary of the egression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs in the GAIN schizophrenia dataset. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value

AKAP9 rs6960867* 1.52 4 0.824 5.18 4 0.269 0.92 4 0.922 ATF4 rs4894 5.49 4 0.241 3.61 4 0.462 3.59 4 0.464 IMMT rs1050301 1.65 4 0.800 7.59 4 0.107 3.98 4 0.408 MAP1A rs62020612 9.30 4 0.054 2.81 4 0.589 0.18 1 0.670 rs2245715 11.93 4 0.018 3.31 4 0.507 0.67 1 0.414 PCNT rs2839227* 1.86 4 0.761 2.82 4 0.588 0.29 1 0.589 rs6518291* 1.52 4 0.824 4.59 4 0.332 4.28 4 0.370 rs35940413 0.51 1 0.473 3.30 4 0.509 1.09 1 0.296 rs2070425 7.66 4 0.105 2.67 4 0.615 4.23 4 0.376 rs2073380 7.30 4 0.121 3.10 4 0.512 3.26 4 0.515 rs2839245 0.40 1 0.527 0.97 1 0.324 0.05 1 0.821 rs2073376 3.90 4 0.420 10.12 4 0.038 5.66 4 0.224 SPTBN4 rs814501 1.77 4 0.777 6.64 4 0.156 0.42 4 0.981 SYNE1 rs2252755 13.05 4 0.011 5.04 4 0.283 3.06 4 0.548 rs6911096* 1.70 4 0.791 5.30 4 0.258 4.02 4 0.402 rs9479297* 4.53 4 0.339 5.09 4 0.278 1.61 4 0.807 rs4645434* 5.75 4 0.218 7.19 4 0.126 1.83 4 0.767 rs214950 2.64 4 0.620 5.91 4 0.206 3.56 4 0.469 rs214976 9.48 4 0.050 16.29 4 0.003 6.56 4 0.161 TRAF3IP1 rs12464423 11.66 4 0.020 3.52 4 0.474 3.52 4 0.475 UTRN rs1534443 3.25 4 0.517 5.87 4 0.209 1.59 4 0.811

* Denotes a genotyped SNP, all others have been imputed.

Two of the six significant interactions were with DISC1 rs3738401, which showed a signifi- cant interaction with SYNE1 rs214976 (c2 = 16.29, p-value = 0.003) and PCNT rs2073376 (c2 = 10.12, p-value = 0.038).

The remaining four significant interactions were between DISC1 rs821616 and TRAF3IP1 rs12464423 (c2 = 11.66, p-value = 0.020), MAP1A rs2245715 (c2 = 11.93, p-value = 0.018), SYNE1 rs214976 (c2 = 9.48, p-value = 0.050) and SYNE1 rs2252755 (c2 = 13.05, p-value = 0.011). Full results of the c2 analysis for each of the significant interactions are shown in Tables 5.22 - 5.27.

211 5. INTERACTION AT A GENETIC LEVEL

Table 5.22: c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in GAIN SCZ. Sum- mary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs3738401 x SYNE1 rs214976, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 G,G/G,G 102 (0.074) 110 (0.081) 1.00 DISC1/SYNE1 G,G/G,A 322 (0.234) 320 (0.237) 0.27 0.606 1.09 [0.80, 1.48] DISC1/SYNE1 G,G/A,A 244 (0.177) 205 (0.152) 2.24 0.134 1.28 [0.93, 1.78] DISC1/SYNE1 G,A/G,G 114 (0.083) 105 (0.078) 0.67 0.413 1.17 [0.80, 1.71] DISC1/SYNE1 G,A/G,A 274 (0.199) 284 (0.21) 0.06 0.806 1.04 [0.76, 1.43] DISC1/SYNE1 G,A/A,A 200 (0.145) 186 (0.138) 0.75 0.387 1.16 [0.83, 1.62] DISC1/SYNE1 A,A/G,G 10 (0.007) 34 (0.025) 9.54 0.002 0.32 [0.15, 0.67] DISC1/SYNE1 A,A/G,A 78 (0.057) 63 (0.047) 1.76 0.185 1.34 [0.87, 2.05] DISC1/SYNE1 A,A/A,A 34 (0.025) 44 (0.033) 0.47 0.494 0.83 [0.49, 1.41]

Table 5.23: c2 Analysis for DISC1 rs3738401 x PCNT rs2073376 Interaction in GAIN SCZ. Sum- mary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs3738401 x PCNT rs2073376, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/PCNT G,G/G,G 276 (0.200) 245 (0.181) 1.00 DISC1/PCNT G,G/G,A 299 (0.217) 313 (0.232) 1.91 0.167 0.85 [0.67, 1.07] DISC1/PCNT G,G/A,A 93 (0.067) 77 (0.057) 0.15 0.694 1.07 [0.76, 1.52] DISC1/PCNT G,A/G,G 243 (0.176) 223 (0.165) 0.07 0.795 0.97 [0.75, 1.24] DISC1/PCNT G,A/G,A 268 (0.194) 259 (0.192) 0.47 0.492 0.92 [0.72, 1.17] DISC1/PCNT G,A/A,A 77 (0.056) 93 (0.069) 3.03 0.082 0.73 [0.52, 1.04] DISC1/PCNT A,A/G,G 44 (0.032) 58 (0.043) 3.30 0.069 0.67 [0.44, 1.03] DISC1/PCNT A,A/G,A 57 (0.041) 73 (0.054) 3.47 0.062 0.69 [0.47, 1.02] DISC1/PCNT A,A/A,A 21 (0.015) 10 (0.007) 2.57 0.109 1.86 [0.86, 4.04]

212 5.2 Results

Table 5.24: c2 Analysis for DISC1 rs821616 x TRAF3IP1 rs12464423 Interaction in GAIN SCZ. Summary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs821616 x TRAF3IP1 rs12464423, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/TRAF3IP1 A,A/G,G 298 (0.216) 301 (0.223) 1.00 DISC1/TRAF3IP1 A,A/G,T 330 (0.239) 321 (0.238) 0.11 0.739 1.04 [0.83, 1.30] DISC1/TRAF3IP1 A,A/T,T 69 (0.050) 64 (0.047) 0.20 0.657 1.09 [0.75, 1.59] DISC1/TRAF3IP1 A,T/G,G 284 (0.206) 278 (0.206) 0.07 0.789 1.03 [0.82, 1.30] DISC1/TRAF3IP1 A,T/G,T 245 (0.178) 222 (0.164) 0.77 0.379 1.11 [0.87, 1.42] DISC1/TRAF3IP1 A,T/T,T 46 (0.033) 58 (0.043) 1.08 0.299 0.80 [0.53, 1.22] DISC1/TRAF3IP1 T,T/G,G 63 (0.046) 46 (0.034) 2.39 0.122 1.38 [0.92, 2.09] DISC1/TRAF3IP1 T,T/G,T 40 (0.029) 47 (0.035) 0.43 0.511 0.86 [0.55, 1.35] DISC1/TRAF3IP1 T,T/T,T 3 (0.002) 14 (0.010) 6.82 0.009 0.22 [0.06, 0.76]

Table 5.25: c2 Analysis for DISC1 rs821616 x MAP1A rs2245715 Interaction in GAIN SCZ. Sum- mary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs821616 x MAP1A rs2245715, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/MAP1A A,A/G,G 590 (0.428) 562 (0.416) 1.00 DISC1/MAP1A A,A/G,A 100 (0.073) 117 (0.087) 1.92 0.165 0.81 [0.61, 1.09] DISC1/MAP1A A,A/A,A 7 (0.005) 7 (0.005) 0.01 0.928 0.95 [0.33, 2.73] DISC1/MAP1A A,T/G,G 444 (0.322) 474 (0.351) 1.66 0.198 0.89 [0.75, 1.06] DISC1/MAP1A A,T/G,A 121 (0.088) 77 (0.057) 6.64 0.010 1.50 [1.10, 2.04] DISC1/MAP1A A,T/A,A 10 (0.007) 7 (0.005) 0.39 0.533 1.36 [0.51, 3.60] DISC1/MAP1A T,T/G,G 84 (0.061) 84 (0.062) 0.09 0.768 0.95 [0.69, 1.32] DISC1/MAP1A T,T/G,A 21 (0.015) 21 (0.016) 0.02 0.877 0.95 [0.51, 1.76] DISC1/MAP1A T,T/A,A 1 (0.001) 2 (0.001) 0.38 0.536 0.48 [0.04, 5.27]

213 5. INTERACTION AT A GENETIC LEVEL

Table 5.26: c2 Analysis for DISC1 rs821616 x SYNE1 rs214976 Interaction in GAIN SCZ. Sum- mary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs821616 x SYNE1 rs214976, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 A,A/G,G 119 (0.086) 109 (0.081) 1.00 DISC1/SYNE1 A,A/G,A 356 (0.258) 353 (0.261) 0.27 0.603 0.92 [0.69, 1.25] DISC1/SYNE1 A,A/A,A 222 (0.161) 224 (0.166) 0.35 0.553 0.91 [0.66, 1.25] DISC1/SYNE1 A,T/G,G 93 (0.067) 115 (0.085) 2.44 0.118 0.74 [0.51, 1.08] DISC1/SYNE1 A,T/G,A 270 (0.196) 259 (0.192) 0.08 0.771 0.95 [0.70, 1.30] DISC1/SYNE1 A,T/A,A 212 (0.154) 184 (0.136) 0.10 0.746 1.06 [0.76, 1.46] DISC1/SYNE1 T,T/G,G 14 (0.010) 25 (0.019) 3.54 0.060 0.51 [0.25, 1.04] DISC1/SYNE1 T,T/G,A 48 (0.035) 55 (0.041) 0.89 0.346 0.80 [0.50, 1.27] DISC1/SYNE1 T,T/A,A 44 (0.032) 27 (0.020) 2.09 0.148 1.49 [0.87, 2.57]

Table 5.27: c2 Analysis forDISC1 rs821616 x SYNE1 rs2252755 Interaction in GAIN SCZ. Sum- mary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs821616 x SYNE1 rs2252755, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 A,A/C,C 321 (0.233) 283 (0.209) 1.00 DISC1/SYNE1 A,A/C,G 305 (0.221) 333 (0.246) 3.54 0.060 0.81 [0.65, 1.01] DISC1/SYNE1 A,A/G,G 71 (0.052) 70 (0.052) 0.36 0.550 0.89 [0.62, 1.29] DISC1/SYNE1 A,T/C,C 258 (0.187) 265 (0.196) 1.63 0.201 0.86 [0.68, 1.09] DISC1/SYNE1 A,T/C,G 233 (0.169) 249 (0.184) 2.48 0.116 0.82 [0.65, 1.05] DISC1/SYNE1 A,T/G,G 84 (0.061) 44 (0.033) 6.65 0.010 1.68 [1.13, 2.51] DISC1/SYNE1 T,T/C,C 44 (0.032) 38 (0.028) 0.01 0.930 1.02 [0.64, 1.62] DISC1/SYNE1 T,T/C,G 55 (0.040) 55 (0.041) 0.37 0.543 0.88 [0.59, 1.32] DISC1/SYNE1 T,T/G,G 7 (0.005) 14 (0.010) 3.19 0.074 0.44 [0.18, 1.11]

214 5.2 Results

5.2.4.2 Non GAIN Schizophrenia

In the analysis of the non-GAIN schizophrenia dataset two SNPs in the gene PCNT (rs6518291 and rs35940413) showed significant interactions with DISC1 rs6675281 (c2 = 5.78, p-value = 0.016 and c2 = 3.84, p-value = 0.050 respectively). All other interactions tested were not significant as can be seen in Table 5.28.

Table 5.28: Summary Of Regression Analyses in non-GAIN SCZ. Summary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs in the non-GAIN schizophrenia dataset. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value

AKAP9 rs6960867* 2.59 4 0.628 4.75 4 0.314 6.82 4 0.146 ATF4 rs4894 4.78 4 0.311 1.36 4 0.849 6.18 4 0.186 ATF5 rs283526 1.14 4 0.888 2.47 4 0.649 2.31 4 0.679 IMMT rs1050301 4.88 4 0.300 1.58 4 0.812 0.57 4 0.966 MAP1A rs62020612 6.18 4 0.186 2.03 4 0.731 2.56 4 0.633 rs2245715 1.01 4 0.908 3.82 4 0.432 0.16 1 0.686 PCNT rs2839227* 3.93 4 0.415 0.55 1 0.460 1.21 1 0.271 rs6518291* 6.34 4 0.175 1.42 4 0.841 5.76 1 0.016 rs35940413 1.54 1 0.214 5.55 4 0.235 3.84 1 0.050 rs2070425 0.83 4 0.934 1.46 4 0.834 4.83 4 0.305 rs2073380 4.17 4 0.384 2.07 4 0.724 2.11 4 0.716 rs2839245 1.74 1 0.188 0.04 1 0.851 1.77 1 0.183 rs2073376 1.60 4 0.808 5.47 4 0.242 1.56 4 0.515 SPTBN4 rs814501 4.58 4 0.333 7.80 4 0.099 2.07 4 0.723 rs73931308 1.39 1 0.239 2.26 4 0.689 1.14 1 0.286 SYNE1 rs2252755 5.03 4 0.284 1.25 4 0.870 4.77 4 0.312 rs6911096* 7.79 4 0.099 2.40 4 0.662 7.90 4 0.095 rs9479297* 6.44 4 0.169 1.87 4 0.759 4.94 4 0.294 rs4645434* 3.80 4 0.433 1.79 4 0.774 2.76 4 0.600 rs214950 0.73 4 0.948 4.26 4 0.372 1.49 1 0.223 rs214976 8.67 4 0.070 5.19 4 0.268 0.01 1 0.917 TRAF3IP1 rs12464423 0.71 4 0.950 6.84 4 0.144 0.91 4 0.923 UTRN rs1534443 7.33 4 0.120 1.18 4 0.882 1.07 1 0.301

* Denotes a genotyped SNP, all others have been imputed.

215 5. INTERACTION AT A GENETIC LEVEL

In both significant interactions the genotype frequency of individuals that were homozygous for the minor allele at both loci (i.e. T,T/G,G) was very low and as such the significant interactions seen here are from a grouped analysis. Full results of the c2 analyses for these interactions are shown in Tables 5.29 and 5.30.

Table 5.29: c2 Analysis for DISC1 rs6675281 x PCNT rs6518291 Interaction in non-GAIN SCZ with Grouped Alleles. Summary of the c2 analysis of individuals from the GAIN bipolar disor- der dataset for DISC1 rs6675281 x PCNT rs6518291, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/PCNT C,C/A,A 642 (0.477) 576 (0.501) 1.00 DISC1/PCNT C,C/A,G or G,G 353 (0.262) 275 (0.239) 2.04 0.153 1.15 [0.95, 1.40] DISC1/PCNT C,T or T,T/A,A 227 (0.169) 170 (0.148) 2.41 0.121 1.20 [0.95, 1.51] DISC1/PCNT C,T or T,T/A,G or G,G 125 (0.093) 128 (0.111) 0.92 0.339 0.88 [0.67, 1.15]

Table 5.30: c2 Analysis for DISC1 rs6675281 x PCNT rs35940413 Interaction in non-GAIN SCZ with Grouped Alleles. Summary of the c2 analysis of individuals from the GAIN bipolar disor- der dataset for DISC1 rs6675281 x PCNT rs35940413, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/PCNT C,C/A,A 793 (0.589) 691 (0.601) 1.00 DISC1/PCNT C,C/A,G or G,G 202 (0.15) 160 (0.139) 0.65 0.418 1.10 [0.87, 1.39] DISC1/PCNT C,T or T,T/A,A 282 (0.209) 221 (0.192) 1.04 0.307 1.11 [0.91, 1.36] DISC1/PCNT C,T or T,T/A,G or G,G 70 (0.052) 77 (0.067) 1.82 0.178 0.79 [0.56, 1.11]

216 5.2 Results

5.2.4.3 Combined Schizophrenia

Upon combination of the two schizophrenia cohorts there are two SNP combinations that show an overall interaction effect after regression analysis as is seen in Table 5.31.

Table 5.31: Summary Of Regression Analyses in Combined SCZ. Summary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs in the combined GAIN and non-GAIN schizophrenia datasets. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value

AKAP9 rs6960867* 3.11 4 0.540 1.17 4 0.883 1.24 4 0.871 ATF4 rs4894 6.78 4 0.148 1.41 4 0.842 5.85 4 0.211 ATF5 rs283526 5.08 4 0.279 5.66 4 0.226 6.44 4 0.168 IMMT rs1050301 2.33 4 0.675 5.16 4 0.271 1.72 4 0.787 MAP1A rs62020612 0.85 4 0.932 2.66 4 0.615 1.91 4 0.753 rs2245715 4.25 4 0.373 3.57 4 0.468 0.07 1 0.791 PCNT rs2839227* 3.26 4 0.515 0.71 4 0.950 0.14 1 0.712 rs6518291* 5.51 4 0.239 4.52 4 0.341 2.81 4 0.590 rs35940413 0.07 1 0.798 7.40 4 0.116 1.00 4 0.910 rs2070425 4.59 4 0.331 3.33 4 0.505 6.98 4 0.137 rs2073380 4.25 4 0.373 2.66 4 0.615 3.13 4 0.536 rs2839245 0.11 1 0.742 7.65 4 0.105 1.89 4 0.756 rs2073376 1.75 4 0.781 5.91 4 0.206 5.04 4 0.283 SPTBN4 rs814501 3.52 4 0.476 6.90 4 0.141 1.65 4 0.801 SYNE1 rs2252755 1.53 4 0.821 4.71 4 0.318 4.69 4 0.320 rs6911096* 5.58 4 0.233 7.23 4 0.124 8.45 4 0.076 rs9479297* 5.73 4 0.220 5.96 4 0.202 4.82 4 0.306 rs4645434* 1.75 4 0.781 5.79 4 0.215 3.28 4 0.513 rs214950 3.09 4 0.544 7.97 4 0.093 0.80 4 0.938 rs214976 3.30 4 0.509 12.61 4 0.013 7.26 4 0.123 TRAF3IP1 rs12464423 4.76 4 0.313 2.69 4 0.611 3.10 4 0.542 UTRN rs1534443 10.23 4 0.037 3.58 4 0.465 0.84 4 0.933

* Denotes a genotyped SNP, all others have been imputed.

Full results for the two significant interactions, DISC1 rs3738401 x SYNE1 rs214976 (c2 = 12.61, p-value = 0.013) and DISC1 rs821616 x UTRN rs1534443 (c2 = 10.23, p-value = 0.037) are shown in Tables 5.32 and 5.33 respectively,

217 5. INTERACTION AT A GENETIC LEVEL

Table 5.32: c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in Combined SCZ. Summary of the c2 analysis of individuals from the combined schizophrenia dataset for DISC1 rs3738401 x SYNE1 rs214976, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 G,G/G,G 234 (0.086) 205 (0.082) 1.00 DISC1/SYNE1 G,G/G,A 619 (0.227) 584 (0.234) 0.44 0.507 0.93 [0.75, 1.16] DISC1/SYNE1 G,G/A,A 444 (0.163) 383 (0.153) 0.02 0.896 1.02 [0.81, 1.28] DISC1/SYNE1 G,A/G,G 225 (0.083) 182 (0.073) 0.33 0.564 1.08 [0.83, 1.42] DISC1/SYNE1 G,A/G,A 552 (0.203) 512 (0.205) 0.25 0.615 0.94 [0.76, 1.18] DISC1/SYNE1 G,A/A,A 397 (0.146) 362 (0.145) 0.11 0.739 0.96 [0.76, 1.22] DISC1/SYNE1 A,A/G,G 30 (0.011) 61 (0.024) 12.47 0.000 0.43 [0.27, 0.69] DISC1/SYNE1 A,A/G,A 141 (0.052) 131 (0.052) 0.14 0.704 0.94 [0.70, 1.28] DISC1/SYNE1 A,A/A,A 83 (0.030) 80 (0.032) 0.27 0.603 0.91 [0.63, 1.30]

Table 5.33: c2 Analysis for DISC1 rs821616 x UTRN rs1534443 Interaction in Combined SCZ. Summary of the c2 analysis of individuals from the combined schizophrenia dataset for DISC1 rs821616 x UTRN rs1534443, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/UTRN A,A/A,A 806 (0.296) 765 (0.306) 1.00 DISC1/UTRN A,A/A,G 514 (0.189) 437 (0.175) 1.79 0.181 1.12 [0.95, 1.31] DISC1/UTRN A,A/G,G 68 (0.025) 81 (0.032) 1.75 0.186 0.80 [0.57, 1.12] DISC1/UTRN A,T/A,A 659 (0.242) 559 (0.224) 2.16 0.142 1.12 [0.96, 1.30] DISC1/UTRN A,T/A,G 401 (0.147) 409 (0.164) 0.69 0.406 0.93 [0.79, 1.10] DISC1/UTRN A,T/G,G 50 (0.018) 46 (0.018) 0.02 0.882 1.03 [0.68, 1.56] DISC1/UTRN T,T/A,A 115 (0.042) 117 (0.047) 0.24 0.621 0.93 [0.71, 1.23] DISC1/UTRN T,T/A,G 98 (0.036) 78 (0.031) 1.21 0.270 1.19 [0.87, 1.63] DISC1/UTRN T,T/G,G 14 (0.005) 8 (0.003) 1.32 0.250 1.66 [0.69, 3.98]

218 5.2 Results

5.2.4.4 GAIN Bipolar Disorder

The three significant interaction results described below along with those for the remainder of the SNP x SNP interactions tested are shown in Table 5.34.

Table 5.34: Summary Of Regression Analyses in GAIN BPD. Summary of the regression anal- ysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs in the GAIN bipolar disorder dataset. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value

AKAP9 rs6960867* 0.56 4 0.967 10.47 4 0.003 2.09 4 0.719 ATF4 rs4894 2.37 4 0.668 3.68 4 0.451 4.34 4 0.362 ATF5 rs283526 2.17 4 0.705 3.05 4 0.550 1.03 4 0.905 IMMT rs1050301 3.57 4 0.468 0.46 4 0.977 0.70 4 0.952 MAP1A rs62020612 7.75 4 0.101 2.74 4 0.603 0.04 1 0.841 rs2245715 1.47 1 0.226 0.74 1 0.389 0.10 1 0.747 PCNT rs2839227* 3.13 1 0.077 0.56 1 0.455 0.62 1 0.432 rs6518291* 0.52 1 0.472 7.58 4 0.188 1.48 1 0.224 rs35940413 0.67 1 0.415 0.33 4 0.988 0.04 1 0.840 rs2070425 4.91 4 0.296 2.80 4 0.592 1.86 1 0.172 rs2073380 1.51 1 0.219 0.87 4 0.929 1.95 1 0.162 rs2839245 0.03 1 0.870 1.62 4 0.805 0.05 1 0.830 rs2073376 5.59 4 0.232 1.58 4 0.812 2.78 4 0.596 SPTBN4 rs814501 3.82 4 0.431 4.27 4 0.371 0.34 4 0.987 rs73931308 0.45 1 0.500 1.80 1 0.180 0.59 1 0.443 SYNE1 rs2252755 3.70 4 0.448 3.38 4 0.496 2.93 4 0.570 rs6911096* 2.83 4 0.568 1.48 4 0.830 0.54 1 0.464 rs9479297* 0.26 4 0.992 4.39 4 0.356 3.31 4 0.507 rs4645434* 2.82 4 0.589 4.97 4 0.291 1.81 4 0.770 rs214950 2.97 4 0.563 8.95 4 0.062 0.31 1 0.575 rs214976 7.51 4 0.111 9.55 4 0.049 0.95 4 0.918 TRAF3IP1 rs12464423 11.64 4 0.020 4.98 4 0.290 3.78 1 0.052 UTRN rs1534443 7.65 4 0.105 2.10 4 0.717 5.09 4 0.278

* Denotes a genotyped SNP, all others have been imputed.

In the GAIN bipolar disorder cohort the same DISC1 x TRAF3IP1 interaction that was seen with the GAIN schizophrenia dataset shows significance (rs821616 x rs12464423 - c2 = 11.64, p-value = 0.020). The c2 analysis results are shown in full in Table 5.35.

219 5. INTERACTION AT A GENETIC LEVEL

Table 5.35: c2 Analysis for DISC1 rs821616 x TRAF3IP1 rs12464423 Interaction in GAIN BPD. Summary of the c2 analysis of individuals from the GAIN bipolar disorder dataset for DISC1 rs821616 x TRAF3IP1 rs12464423, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/TRAF3IP1 A,A/G,G 217 (0.212) 151 (0.233) 1.00 DISC1/TRAF3IP1 A,A/G,T 242 (0.236) 172 (0.265) 0.02 0.884 0.98 [0.74, 1.30] DISC1/TRAF3IP1 A,A/T,T 53 (0.052) 22 (0.034) 3.58 0.058 1.68 [0.98, 2.87] DISC1/TRAF3IP1 A,T/G,G 213 (0.208) 110 (0.170) 3.56 0.059 1.35 [0.99, 1.84] DISC1/TRAF3IP1 A,T/G,T 191 (0.186) 111 (0.171) 1.27 0.259 1.20 [0.88, 1.64] DISC1/TRAF3IP1 A,T/T,T 35 (0.034) 31 (0.048) 0.81 0.368 0.79 [0.46, 1.33] DISC1/TRAF3IP1 T,T/G,G 43 (0.042) 22 (0.034) 1.19 0.275 1.36 [0.78, 2.37] DISC1/TRAF3IP1 T,T/G,T 29 (0.028) 24 (0.037) 0.34 0.557 0.84 [0.47, 1.50] DISC1/TRAF3IP1 T,T/T,T 2 (0.002) 5 (0.008) 2.61 0.106 0.28 [0.05, 1.45]

The SYNE1 SNP rs214976 that showed a significant interaction in the GAIN schizophrenia and combined schizophrenia datasets also shows a significant interaction here with DISC1 rs3738401 (c2 = 9.55, p-value = 0.049). Full results are shown in Table 5.36.

Table 5.36: c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in GAIN BPD. Summary of the c2 analysis of individuals from the GAIN bipolar disorder dataset for DISC1 rs3738401 x SYNE1 rs214976, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 G,G/G,G 78 (0.076) 56 (0.086) 1.00 DISC1/SYNE1 G,G/G,A 238 (0.232) 151 (0.233) 0.37 0.544 1.13 [0.76, 1.69] DISC1/SYNE1 G,G/A,A 184 (0.18) 100 (0.154) 1.68 0.194 1.32 [0.87, 2.01] DISC1/SYNE1 G,A/G,G 80 (0.078) 53 (0.082) 0.10 0.747 1.08 [0.67, 1.77] DISC1/SYNE1 G,A/G,A 205 (0.200) 150 (0.231) 0.01 0.926 0.98 [0.66, 1.47] DISC1/SYNE1 G,A/A,A 148 (0.144) 68 (0.105) 3.84 0.050 1.56 [1.00, 2.44] DISC1/SYNE1 A,A/G,G 9 (0.009) 17 (0.026) 4.89 0.027 0.38 [0.16, 0.91] DISC1/SYNE1 A,A/G,A 55 (0.054) 30 (0.046) 0.92 0.337 1.32 [0.75, 2.31] DISC1/SYNE1 A,A/A,A 28 (0.027) 23 (0.035) 0.17 0.684 0.87 [0.46, 1.67]

220 5.2 Results

DISC1 rs3738401 also shows a significant interaction with AKAP9 rs6960867 (c2 = 10.47, p- value = 0.003), the full analysis of this interaction is shown in Table 5.37.

Table 5.37: c2 Analysis for DISC1 rs3738401 x AKAP9 rs6960867 Interaction in GAIN BPD. Summary of the c2 analysis of individuals from the GAIN bipolar disorder dataset for DISC1 rs3738401 x AKAP9 rs6960867, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/AKAP9 G,G/A,A 170 (0.166) 120 (0.185) 1.00 DISC1/AKAP9 G,G/A,G 242 (0.236) 151 (0.233) 0.61 0.435 1.13 [0.83, 1.54] DISC1/AKAP9 G,G/G,G 88 (0.086) 36 (0.056) 5.64 0.018 1.73 [1.10, 2.71] DISC1/AKAP9 G,A/A,A 179 (0.175) 109 (0.168) 0.75 0.385 1.16 [0.83, 1.62] DISC1/AKAP9 G,A/A,G 205 (0.200) 115 (0.177) 1.90 0.168 1.26 [0.91, 1.74] DISC1/AKAP9 G,A/G,G 49 (0.048) 47 (0.073) 1.69 0.194 0.74 [0.46, 1.17] DISC1/AKAP9 A,A/A,A 41 (0.040) 33 (0.051) 0.25 0.617 0.88 [0.52, 1.47] DISC1/AKAP9 A,A/A,G 37 (0.036) 28 (0.043) 0.06 0.802 0.93 [0.54, 1.61] DISC1/AKAP9 A,A/G,G 14 (0.014) 9 (0.014) 0.04 0.833 1.10 [0.46, 2.62]

221 5. INTERACTION AT A GENETIC LEVEL

5.2.4.5 Combined Bipolar Disorder and Schizophrenia

Full results of the logistic regression analysis for all SNP x SNP combinations in this com- bined bipolar disorder and schizophrenia dataset are shown in Table 5.38.

Table 5.38: Summary Of Regression Analyses in Combined BPD and SCZ. Summary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs in the combined bipolar disorder and schizophrenia datasets. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value

AKAP9 rs6960867* 3.11 4 0.589 0.67 4 0.955 1.01 4 0.908 ATF4 rs4894 6.25 4 0.181 1.94 4 0.746 6.19 4 0.186 ATF5 rs283526 6.12 4 0.190 7.38 4 0.117 5.20 4 0.265 IMMT rs1050301 2.99 4 0.560 3.67 4 0.452 0.97 4 0.915 MAP1A rs62020612 1.39 4 0.847 3.31 4 0.507 2.54 4 0.637 rs2245715 2.83 4 0.586 3.90 4 0.420 0.00 1 0.975 PCNT rs2839227* 2.62 4 0.623 0.50 4 0.974 0.03 1 0.871 rs6518291* 7.22 4 0.125 5.97 4 0.201 2.57 4 0.631 rs35940413 0.02 1 0.893 7.10 4 0.131 1.74 4 0.784 rs2070425 6.65 4 0.156 3.59 4 0.465 8.21 4 0.084 rs2073380 4.04 4 0.401 1.49 4 0.829 4.43 4 0.351 rs2839245 0.08 1 0.774 8.05 4 0.090 2.24 4 0.692 rs2073376 2.03 4 0.731 5.85 4 0.211 4.73 4 0.371 SPTBN4 rs814501 2.03 4 0.731 5.79 4 0.215 0.88 4 0.928 SYNE1 rs2252755 1.63 4 0.804 2.72 4 0.606 3.78 4 0.436 rs6911096* 3.60 4 0.464 4.47 4 0.346 11.37 4 0.023 rs9479297* 5.61 4 0.230 5.10 4 0.277 5.40 4 0.248 rs4645434* 2.46 4 0.652 6.11 4 0.191 2.66 4 0.617 rs214950 0.97 4 0.915 10.20 4 0.037 0.87 4 0.929 rs214976 2.74 4 0.602 14.09 4 0.007 6.78 4 0.148 TRAF3IP1 rs12464423 6.91 4 0.141 2.67 4 0.614 3.58 4 0.465 UTRN rs1534443 6.59 4 0.159 2.17 4 0.705 1.66 4 0.798

* Denotes a genotyped SNP, all others have been imputed.

222 5.2 Results

The interaction between DISC1 rs3738401 x SYNE1 rs214976, is again significant in this com- bined cohort (c2 = 14.09, p-value = 0.007), results shown below in Table 5.39.

Table 5.39: c2 Analysis for DISC1 rs3738401 x SYNE1 rs214976 Interaction in Combined BPD and SCZ. Summary of the c2 analysis of individuals from the Combined bipolar disorder and schizophrenia dataset for DISC1 rs3738401 x SYNE1 rs214976, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 G,G/G,G 234 (0.086) 261 (0.083) 1.00 DISC1/SYNE1 G,G/G,A 619 (0.227) 735 (0.233) 0.35 0.552 0.94 [0.76, 1.15] DISC1/SYNE1 G,G/A,A 444 (0.163) 483 (0.153) 0.05 0.823 1.03 [0.82, 1.28] DISC1/SYNE1 G,A/G,G 225 (0.083) 235 (0.075) 0.26 0.612 1.07 [0.83, 1.38] DISC1/SYNE1 G,A/G,A 552 (0.203) 662 (0.21) 0.46 0.497 0.93 [0.75, 1.15] DISC1/SYNE1 G,A/A,A 397 (0.146) 430 (0.137) 0.07 0.796 1.03 [0.82, 1.29] DISC1/SYNE1 A,A/G,G 30 (0.011) 78 (0.025) 13.69 0.000 0.43 [0.27, 0.68] DISC1/SYNE1 A,A/G,A 141 (0.052) 161 (0.051) 0.03 0.873 0.98 [0.73, 1.30] DISC1/SYNE1 A,A/A,A 83 (0.030) 103 (0.033) 0.38 0.537 0.90 [0.64, 1.26]

The tested interactions between DISC1 rs3738401 and SYNE1 rs214950 (c2 = 10.20, p-value = 0.037) and DISC1 rs6675281 and SYNE1 rs6911096 (c2 = 11.37, p-value = 0.023) are also significant in this dataset (see Tables 5.40 and 5.41).

Table 5.40: c2 Analysis for DISC1 rs3738401 x SYNE1 rs214950 Interaction in Combined BPD and SCZ. Summary of the c2 analysis of individuals from the combined bipolar disorder and schizophrenia dataset for DISC1 rs3738401 and SYNE1 rs214950, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 G,G/G,G 768 (0.282) 883 (0.280) 1.00 DISC1/SYNE1 G,G/G,A 455 (0.167) 530 (0.168) 0.03 0.872 0.99 [0.84, 1.16] DISC1/SYNE1 G,G/A,A 74 (0.027) 66 (0.021) 2.08 0.149 1.29 [0.91, 1.82] DISC1/SYNE1 G,A/G,G 727 (0.267) 788 (0.250) 0.68 0.408 1.06 [0.92, 1.22] DISC1/SYNE1 G,A/G,A 389 (0.143) 479 (0.152) 0.66 0.415 0.93 [0.79, 1.10] DISC1/SYNE1 G,A/A,A 58 (0.021) 60 (0.019) 0.31 0.579 1.11 [0.76, 1.61] DISC1/SYNE1 A,A/G,G 149 (0.055) 204 (0.065) 2.17 0.140 0.84 [0.67, 1.06] DISC1/SYNE1 A,A/G,A 94 (0.034) 107 (0.034) 0.00 0.947 1.01 [0.75, 1.35] DISC1/SYNE1 A,A/A,A 11 (0.004) 31 (0.010) 6.81 0.009 0.41 [0.20, 0.82]

223 5. INTERACTION AT A GENETIC LEVEL

Table 5.41: c2 Analysis for DISC1 rs6675281 x SYNE1 rs6911096 Interaction in Combined BPD and SCZ. Summary of the c2 analysis of individuals from the combined bipolar disorder and schizophrenia dataset for DISC1 rs6675281 and SYNE1 rs6911096, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/SYNE1 C,C/T,T 1068 (0.392) 1282 (0.407) 1.00 DISC1/SYNE1 C,C/T,A 804 (0.295) 903 (0.287) 1.09 0.297 1.07 [0.94, 1.21] DISC1/SYNE1 C,C/A,A 113 (0.041) 154 (0.049) 0.95 0.331 0.88 [0.68, 1.14] DISC1/SYNE1 C,T/T,T 406 (0.149) 428 (0.136) 2.59 0.108 1.14 [0.97, 1.33] DISC1/SYNE1 C,T/T,A 225 (0.083) 288 (0.091) 0.43 0.513 0.94 [0.77, 1.14] DISC1/SYNE1 C,T/A,A 47 (0.017) 36 (0.011) 4.04 0.045 1.57 [1.01, 2.44] DISC1/SYNE1 T,T/T,T 27 (0.010) 33 (0.010) 0.00 0.945 0.98 [0.59, 1.64] DISC1/SYNE1 T,T/T,A 29 (0.011) 18 (0.006) 4.91 0.027 1.93 [1.07, 3.50] DISC1/SYNE1 T,T/A,A 6 (0.002) 6 (0.002) 0.10 0.752 1.20 [0.39, 3.73]

224 5.2 Results

5.2.5 Assessment of Prey Subset

A subset of prey genes from the yeast two-hybrid analysis was chosen for further analysis at the genetic level. This subset includes the preys chosen for confirmation in the previous chapter, as well as any other preys that have a known or predicted function in the construc- tion and maintenance of the primary cilium, with more than one hit in the two-hybrid screen (including the WTAC results). The preys that were chosen along with a brief description of why, are shown in Table 5.42.

Table 5.42: Yeast Two-Hybrid Prey Subset. This table details the preys chosen for genetic anal- ysis from the yeast two-hybrid screen and the number of times each was pulled out as an inter- actant of DISC1 (including the WTAC and the current study screens). A brief description of why each was identified to be involved with the primary cilium is also included, descriptions are

taken from the Gene (http://www.ncbi.nlm.nih.gov/gene/) and GeneCards R (http: //www.genecards.org/) descriptions of function.

Gene Hits Reason Chosen

ARF4 3 Has involvement in protein trafficking as well as modulation of vesicle budding within the Golgi apparatus ATP6V1B2 2 As part of a vacuolar ATPase (V-ATPase), has roles in protein sort- ing, receptor-mediated endocytosis, and the generation of gradients in synaptic vesicles. ATP6V1D 2 Is thought to play a role in cilium construction by regulating the trans- port of proteins to the correct location during cilium biogenesis. CCT3 4 Is a component of the BBSome, a complex known to be involved in cilio- genesis through the study of polycystic kidney disease. CCT’s are specif- ically involved in the transport of vesicles to the cilium. CCT4 7 Same as above. CEP70 5 Located at the centrosome, has roles in the organisation of microtubules at varying times in the cell cycle. CHMP2B 8 A component of the ESCRT-III complex (Endosomal Sorting Complex Required for Transport III). This complex functions to recycle and de- grade cell surface receptors and to concentrate ubiquitinated endosomal cargos to be taken up by vesicles. Continued on next page

225 5. INTERACTION AT A GENETIC LEVEL

Table 5.42 – continued from previous page Gene Hits Reason Chosen CLINT1 2 Has similarities with endocytic adapter proteins and interacts with clathrin (see below). This protein also has known associations with psy- chotic disorder susceptibility. CLTC 4 Coats vesicles to target them to specific cellular locations including the cell membrane and the golgi network. Also involved in endocytosis of a number of macromolecules. DCTN2 7 Localises to the centrosome where is acts to attach microtubules. EXOC1 7 A component of the exocyst complex. This complex has been associated with polycystic kidney disease, a known ciliopathy, and is necessary for targeting exocytic vesicles to their correct docking sites. EXOC5 2 Same as above FEZ2 4 This protein has been associated with polycystic kidney disease. FRYL 5 Plays a role in maintaining cell extensions that polarise cells. FZD3 2 Part of the frizzled gene family. Most members of this family are in- volved in the wnt signalling pathway which has known interaction with the primary cilia. This gene is also and identified susceptibility locus for schizophrenia. GOLGA4 2 May have a function in membrane-anchoring that is regulated by Rab-6 within the Golgi apparatus. IFT81 8 An intraflagellar transport protein predicted to locate to the centrosome and have roles in cilium assembly and trafficking of proteins to the cil- ium. IQUB 2 By similarity, it is predicted that the protein encoded by this gene has roles in the construction and maintenance of cilia. KIF1A 5 A member of the kinesin family of proteins. This family is known to have a role in the transport of macromolecules to the distal ends of mi- crotubules. NPHP1 3 May interact with IFTs (see above) during cilia assembly. NUP133 2 A component of the nuclear pore. The nuclear pore has similarities to the ciliary pore and nucleoporins (NUP’s) are important for the entry of kinesins into the growing cilia. NUP155 4 Same as above. NUP62CL 8 Same as above. NUP93 2 Same as above. Continued on next page

226 5.2 Results

Table 5.42 – continued from previous page Gene Hits Reason Chosen OCRL 3 Localised in the trans-Golgi network and described as involved in pri- mary cilia assembly. RAB11A 36 Member of the Rab protein family. Family members may ensure accu- racy in the location of docking and fusion of vesicles. RAB5C 2 Same as above. SNTN 2 By similarity, the protein encoded by this gene may have a role in linking the membrane of the cilia to the microtubules. SNX2 6 A member of the sorting nexin family, this protein has roles in the sorting of proteins within the endocytic pathway. SNX5 3 Same as above. STX6 2 Involved with the SNARE complex and thus intracellular vesicle traf- ficking. TMEM138 2 Shown to be involved in ciliogenesis through experiments in mice. TPR 9 Associated with the nuclear pore complex, directly interacting with sev- eral components. The nuclear export of some proteins require this pro- tein. TRIP11 3 Involved in binding the golgi membranes to the ends of microtubules. USO1 2 By similarity, this protein has roles in targeted transport of vesicles and anchoring of these vesicles.

227 5. INTERACTION AT A GENETIC LEVEL

This subset of genes was then analysed in the same manner as the previously described confirmed interacting gene set. The first step of this was the generation of Q-Q plots of the observed versus expected p-values, again this was done for combined and split genders in all three datasets. These plots are shown in Figure 5.10 (GAIN bipolar disorder A-C, GAIN schizophrenia D-F and non-GAIN schizophrenia G-I).

Figure 5.10: Q-Q Plots Of the Selected Prey Gene Set in Individual Datasets. The observed

versus expected -Log10(P) for the selected interacting partners of DISC1 found by yeast two- hybrid screen in A) GAIN BPD Combined Genders (n=1189), B) GAIN BPD Males (n=1187), C) GAIN BPD Females (n=1281), D) GAIN SCZ Combined (n=1182), E) GAIN SCZ Males (n=1188), F) GAIN SCZ Females (n=1231), G) non-GAIN SCZ Combined (n=1227), H) non-GAIN SCZ Males (n=1232) and I) non-GAIN SCZ Females (n=1269). Note that the scale of the observed values (y-axis) is not always consistent. The number of SNPs (n) included in each plot are shown in parenthesis.

228 5.2 Results

The genomic inflation factors associated with each of these plots are shown in Table 5.43. There is no real evidence of inflation (values of >1.1) in the GAIN schizophrenia dataset with the combined cohort just reaching 1.1 but both the male and female cohorts separately falling below this value.

The non-GAIN dataset shows some evidence for inflation in all three analyses with the com- bined analysis showing the strongest evidence with an inflation factor of 1.16. The GAIN bipolar disorder dataset shows the strongest evidence of inflation in the male only cohort (inflation factor = 1.33), the female cohort also shows some inflation but the combined anal- ysis shows no evidence.

Table 5.43: Individual Dataset Q-Q Analysis Results for Selected Prey Genes. Inflation factors and mean c2 results obtained for combined and gender split analyses of the individual data sets.

Dataset Cohort Inflation Factor c2 GAIN BPD Combined 1.00 1.03 Male 1.33 1.19 Female 1.17 0.97 GAIN SCZ Combined 1.11 1.14 Male 1.00 0.92 Female 1.05 1.06 non-GAIN SCZ Combined 1.16 1.11 Male 1.13 1.08 Female 1.11 0.95

The inflation seen in the GAIN bipolar disorder male dataset is contributed to by three SNPs (rs1106634, rs135253777 and rs4922139) that fall within the ATP6V1B2 region. Two of these SNPs are in high linkage disequilibrium as can be seen in Figure 5.11.

The most strongly associated of the three, rs1106634, shows evidence of increased risk in both the combined (OR[95%CI] = 1.56 [1.28-1.90], FDR p-value = 0.0053) and in the male only analysis (OR[95%CI] = 2.01 [1.51-2.67], FDR p-value = 0.0005 ). The other two SNPs show increased risk of bipolar disorder only in the male analysis (rs13253777 - OR[95%CI] = 1.56 [1.27-1.93], FDR p-value = 0.0046 and rs4922139 - OR[95%CI] = 1.58 [1.27-1.91], FDR p-value = 0.0046 ).

The analysis was next conducted in the combined GAIN and non-GAIN schizophrenia dataset giving the plots seen in Figure 5.12 and the inflation factors shown in Table 5.44.

229 5. INTERACTION AT A GENETIC LEVEL

Figure 5.11: Haploview Plot of Significant SNPs in ATP6V1B2. Haploview plot generated from the GAIN bipolar disorder cohort showing the linkage disequilibrium (r2 values shown) that exists between the three significantly associated SNPs rs1106634, rs135253777 and rs4922139.

Figure 5.12: Q-Q Plots of Selected Prey Gene Set in Combined SCZ Dataset. The observed

versus expected -Log10(P) for the selected interacting partners of DISC1 found by yeast two- hybrid screen in A) Combined Genders (n=1200), B) Males (n=1205), C) Females (n=1241), in the combined GAIN and non-GAIN SCZ datasets.Note that the scale of the observed values (y-axis) is not always consistent. The number of SNPs (n) included in each plot are shown in parenthesis.

Table 5.44: Combined SCZ Dataset Q-Q Analysis Results for Selected Prey Genes. Inflation factors and mean c2 results obtained for combined GAIN and non-GAIN dataset.

Cohort Inflation Factor c2 Combined 1.01 1.25 Male 1.00 1.04 Female 1.10 1.05

230 5.2 Results

Although the calculated inflation factor suggests there is no inflation when analysing the combined genders, evidence for inflation is suggested by both a visual inspection of the Q-Q plot and the fact that the calculated mean c2 is 1.25 (i.e. >1). This discrepancy is explained by the relatively low observed median c2 compared to the observed mean c2 which in it- self is the result of an odd distribution of data that is bottom heavy. As the inflation factor is calculated using the observed median c2 (0.461600) divided by the expected median c2 (0.4549364) in the case of this data with its unusual distribution this is not an adequate de- scription of the inflation, and the mean and spread of data on the Q-Q plot should also be taken into account.

There are two SNPs within the USO1 gene region (rs324702 and rs324734) that show a sig- nificant association in the combined schizophrenia dataset (GAIN and non-GAIN schizo- phrenia). Both SNPs show a protective effect with the minor allele for schizophrenia (rs324702 - OR[95%CI] = 0.82 [0.75-0.91], FDR p-value = 0.050; rs324734 OR[95%CI] = 0.83 [0.75-0.91], FDR p-value = 0.050) and they are in high linkage disequilibrium (Figure 5.13).

Figure 5.13: Haploview Plot of Significant SNPs in USO1. Haploview plot generated from the GAIN and non-GAIN schizophrenia cohort showing the linkage disequilibrium (r2 values shown) that exists between the two significantly associated SNPs rs324702 and rs324734.

231 5. INTERACTION AT A GENETIC LEVEL

Finally, the analysis was carried out in the combined GAIN and non-GAIN datasets for both bipolar disorder and schizophrenia together (Figure 5.14 and Table 5.45).

Figure 5.14: Q-Q Plots of Selected Prey Gene Set in Combined BPD and SCZ Dataset. The

observed versus expected -Log10(P) for the selected interacting partners of DISC1 found by yeast two-hybrid screen in A) Combined Genders (n=1201), B) Males (n=1205), C) Females (n=1243), in the combined GAIN and non-GAIN BPDD and SCZ datasets. Note that the scale of the observed values (y-axis) is not always consistent. The number of SNPs (n) included in each plot are shown in parenthesis.

Table 5.45: Combined BPD and SCZ Dataset Q-Q Analysis Results for Selected Prey Genes. Inflation factors and mean c2 results obtained for combined GAIN and non-GAIN dataset.

Cohort Inflation Factor c2 Combined 1.00 1.20 Male 1.00 0.99 Female 1.00 1.05

In this combined dataset, the same issue is true of the combined genders analysis as was the case in the combined schizophrenia analysis—both the Q-Q plot and the mean c2 suggest inflation, but the inflation factor itself does not adequately represent this due to the odd distribution of the data and the resulting low median c2 value.

Although this subset of genes was chosen on a basis that they had a potential function in the construction and maintenance of the primary cilia, an enrichment analysis using ToppFun was still conducted. The results of this analysis for Biological Process, Molecular Function and Cellular Component are shown in Table 5.46.

232 5.2 Results

Table 5.46: ToppGene Enrichment Analysis of Prey Gene Set. Enrichment analysis results for Biological Process, Molecular Function and Cellular Component of the 35 chosen prey genes.

GO: Molecular Function Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 06 04 1 GO:0017056 structural constituent of nuclear pore 1.04E 1.28E 3 11 04 02 2 GO:0005487 nucleocytoplasmic transporter activity 3.66E 4.51E 2 15

GO: Biological Process Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 14 11 1 GO:0015031 protein transport 7.80E 5.82E 20 1457 13 10 2 GO:0045184 establishment of protein localisation 3.08E 2.30E 20 1567 12 09 3 GO:0008104 protein localisation 1.60E 1.20E 21 1957 09 06 4 GO:0061024 membrane organisation 1.85E 1.38E 13 839 09 06 5 GO:0046907 intracellular transport 4.09E 3.05E 16 1532

GO: Cellular Component Number of Genes in

Hit GO:ID Name p-value Bonferroni Input Annotation 08 06 1 GO:0016023 cytoplasmic membrane-bounded vesicle 2.45E 4.36E 13 1062 08 05 2 GO:0031410 cytoplasmic vesicle 7.17E 1.28E 13 1163 07 05 3 GO:0005643 nuclear pore 2.57E 4.57E 5 75 07 04 4 GO:0042995 cell projection 8.99E 1.60E 14 1711 07 04 5 GO:0046930 pore complex 9.31E 1.66E 5 97

5.2.5.1 Identification of SNPs to be Tested in the Prey Subset

Using the same criteria as used previously, to include coding non-synonymous SNPs with a minor allele frequency of 10% or greater, a total of 12 SNPs were identified from the chosen gene subset. A summary of the number of coding, non-synonymous SNPs along with the number that have minor allele frequencies of greater than 10% for each gene is shown in Table 5.47.

Genes or chromosomes that contained multiple SNPs were assessed for linkage disequilib- rium (see Figure 5.15) and where there was an r2 of 70 or greater between a pair or group of SNPs, a single representative SNP was chosen. The final list of SNPs chosen along with their expected and observed minor allele frequencies are shown in Table 5.48.

233 5. INTERACTION AT A GENETIC LEVEL

Table 5.47: Summary of Non-Synonymous SNPs in Prey Gene Subset The table details the number of coding non-synonymous SNPs in each of the included prey genes along with the number of these SNPs that have a minor allele frequency of >10%.

Gene Location # NS SNPs MAF>10%

ARF4 3p21.2-p21.1 2 0 ATP6V1B2 8p21.3 35 0 ATP6V1D 14q23-q24.2 14 0 CCT3 1q23 46 0 CCT4 2p15 49 0 CEP70 3q22.3 63 1 CHMP2B 3p11.2 32 0 CLINT1 5q33.3 40 0 CLTC 17q23.1 43 0 DCTN2 12q13.3 35 0 EXOC1 4q12 59 0 EXOC5 14q22.3 29 0 FEZ2 2p21 34 3 FRYL 4p11 190 0 FZD3 8p21 37 0 GOLGA4 3p22-p21.3 223 2 IFT81 12q24.13 68 0 IQUB 7q31.32 92 1 KIF1A 2q37.3 121 0 NPHP1 2q13 67 0 NUP133 1q42.13 111 2 NUP155 5p13.1 95 0 NUP62CL Xq22.3 16 1 NUP93 16q13 73 0 OCRL Xq25 69 0 RAB11A 15q22.31 4 0 RAB5C 17q21.2 22 0 SNTN 3p14.2 16 0 SNX2 5q23 31 0 SNX5 20p11 26 0 STX6 1q25.3 22 0 TMEM138 11q12.2 17 0 TPR 1q25 149 1 TRIP11 14q31-32 186 1 USO1 4q21.1 53 0

234 5.2 Results

Figure 5.15: Haploview Plots of Prey Subset SNPs. Haploview plots of the SNPs from the prey subset with potential linkage disequilibrium, A) CEP70 and GOLGA4 (Ch3), B) FEZ2 and C) NUP133 and TRP (Ch1). Plots generated from the GAIN bipolar disorder data. The metric shown is the r2 value between the SNPs, where a black box with no number indicates the r2 value is 1.

Table 5.48: Minor Allele Frequencies of Chosen SNPs from the Prey Subset. The expected (1000 Genomes (1000G) European) frequency and observed imputed minor allele frequencies for the chosen SNPs from the prey subset.

Minor Allele Frequencies

GAIN SCZ non-GAIN SCZ GAIN BPD

Gene SNP Minor Allele 1000G Cases Controls Cases Controls Cases Controls CEP70 rs1673607 C 0.46 0.45 0.44 0.43 0.45 0.46 0.44 DISC1 rs821616 A 0.30 0.29 0.29 0.28 0.29 0.27 0.29 rs3738401 T 0.34 0.32 0.30 0.32 0.32 0.32 0.30 rs6675281 T 0.12 0.14 0.15 0.14 0.14 0.14 0.15 FEZ2 rs848642 A 0.29 0.32 0.31 0.32 0.30 0.31 0.30 rs1544655 G 0.30 0.27 0.26 0.25 0.27 0.28 0.26 rs2287104 T 0.33 0.31 0.30 0.30 0.31 0.34 0.35 GOLGA4 rs11718848 A 0.37 0.38 0.38 0.39 0.39 0.37 0.38 IQUB rs10255061 T 0.29 0.29 0.27 0.26 0.27 0.27 0.27 NUP133 rs1065674* C 0.21 0.21 0.22 0.23 0.21 0.20 0.22 NUP62CL rs1298577* G 0.20 ------TRIP11 rs1051340 T 0.32 - - 0.33 0.31 0.30 0.33 TRP rs61744267 T 0.20 0.12 0.12 0.18 0.18 0.18 0.18

* Denotes a genotyped SNP, all others have been imputed.

The NUP62CL SNP, rs1298577, failed Hardy-Weinberg Equilibrium testing in all three co- horts and was removed from subsequent analyses. TRIP11 rs1051340, failed Hardy-Weinberg Equilibrium testing in the GAIN schizophrenia cohort and was omitted from any further analyses done using this data.

235 5. INTERACTION AT A GENETIC LEVEL

5.2.6 Main Effect Analysis for the Prey Subset

The association of each of the chosen SNPs from the prey subset was assessed in each of the cohorts available as well as in the combined cohorts (see Tables 5.49 - 5.53). The results of the three DISC1 SNPs (rs821616, rs3738401 and rs6675281) are shown again in these tables, so that they can be assessed in this context.

The FEZ2 SNP rs8486242 shows evidence of an association with risk of schizophrenia in the combined GAIN and non-GAIN schizophrenia dataset, but only in the uncorrected analysis (p-value = 0.040). In the non-GAIN schizophrenia cohort there is evidence of risk with the TRIP11 SNP rs1051340 (p-value = 0.032) but again, only in the uncorrected analysis. No other SNPs reach significance in any of the tested datasets.

Table 5.49: Association Analysis of Epistasis Candidate SNPs from the Prey Subset in GAIN SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the GAIN schizophrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI)

rs1673607 Case 0.446 438 (0.324) 620 (0.459) 293 (0.217) 0.765 0.937 1.016 Control 0.442 425 (0.308) 687 (0.499) 266 (0.193) (0.914, 1.131) rs821616 Case 0.286 686 (0.508) 558 (0.413) 107 (0.079) 0.990 0.990 1.001 Control 0.286 697 (0.506) 575 (0.417) 106 (0.077) (0.890, 1.126) rs3738401 Case 0.317 635 (0.470) 575 (0.426) 141 (0.104) 0.222 0.638 1.074 Control 0.302 668 (0.485) 588 (0.427) 122 (0.089) (0.958, 1.205) rs6675281 Case 0.137 1012 (0.749) 309 (0.229) 30 (0.022) 0.059 0.638 0.865 Control 0.155 990 (0.718) 350 (0.254) 38 (0.028) (0.744, 1.006) rs848642 Case 0.323 625 (0.463) 579 (0.429) 147 (0.109) 0.200 0.638 1.078 Control 0.307 669 (0.485) 572 (0.415) 137 (0.099) (0.961, 1.208) rs1544655 Case 0.270 725 (0.537) 523 (0.387) 103 (0.076) 0.307 0.638 1.065 Control 0.258 763 (0.554) 520 (0.377) 95 (0.069) (0.944, 1.201) rs2287104 Case 0.312 632 (0.468) 596 (0.441) 123 (0.091) 0.271 0.638 1.067 Control 0.298 676 (0.491) 583 (0.423) 119 (0.086) (0.951, 1.197) rs11718848 Case 0.385 515 (0.381) 632 (0.468) 204 (0.151) 0.852 0.937 1.010 Control 0.382 511 (0.371) 680 (0.493) 187 (0.136) (0.906, 1.127) rs10255061 Case 0.286 695 (0.514) 539 (0.399) 117 (0.087) 0.348 0.638 1.058 Control 0.275 720 (0.522) 559 (0.406) 99 (0.072) (0.940, 1.191) rs1065674* Case 0.213 837 (0.620) 453 (0.335) 61 (0.045) 0.781 0.937 0.982 Control 0.216 855 (0.620) 451 (0.327) 72 (0.052) (0.863, 1.117) rs61744267 Case 0.124 1033 (0.765) 301 (0.223) 17 (0.013) 0.603 0.937 1.044 Control 0.119 1063 (0.771) 301 (0.218) 14 (0.010) (0.888, 1.228)

* Denotes a genotyped SNP, all others have been imputed.

236 5.2 Results

Table 5.50: Association Analysis of Epistasis Candidate SNPs from the Prey Subset in non- GAIN SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the non-GAIN schizophrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI)

rs1673607 Case 0.432 363 (0.316) 579 (0.504) 207 (0.180) 0.115 0.276 0.914 Control 0.454 405 (0.301) 660 (0.490) 282 (0.209) (0.817, 1.022) rs821616 Case 0.282 597 (0.520) 456 (0.397) 96 (0.084) 0.616 0.838 0.969 Control 0.288 691 (0.513) 535 (0.397) 121 (0.090) (0.857, 1.096) rs3738401 Case 0.323 537 (0.467) 481 (0.419) 131 (0.114) 0.555 0.838 1.037 Control 0.316 629 (0.467) 586 (0.435) 132 (0.098) (0.920, 1.168) rs6675281 Case 0.138 851 (0.741) 280 (0.244) 18 (0.016) 0.834 0.873 0.983 Control 0.140 995 (0.739) 328 (0.244) 24 (0.018) (0.837, 1.155) rs848642 Case 0.319 541 (0.471) 483 (0.420) 125 (0.109) 0.111 0.276 1.103 Control 0.298 673 (0.500) 545 (0.405) 129 (0.096) (0.978, 1.244) rs1544655 Case 0.252 647 (0.563) 424 (0.369) 78 (0.068) 0.096 0.276 0.898 Control 0.273 702 (0.521) 554 (0.411) 91 (0.068) (0.791, 1.019) rs2287104 Case 0.299 564 (0.491) 483 (0.420) 102 (0.089) 0.326 0.652 0.941 Control 0.312 618 (0.459) 618 (0.459) 111 (0.082) (0.834, 1.062) rs11718848 Case 0.386 428 (0.372) 554 (0.482) 167 (0.145) 0.873 0.873 0.991 Control 0.389 490 (0.364) 667 (0.495) 190 (0.141) (0.884, 1.111) rs10255061 Case 0.264 621 (0.540) 449 (0.391) 79 (0.069) 0.628 0.838 0.969 Control 0.270 726 (0.539) 514 (0.382) 107 (0.079) (0.855, 1.099) rs1065674* Case 0.228 677 (0.589) 420 (0.366) 52 (0.045) 0.069 0.276 1.133 Control 0.207 862 (0.640) 413 (0.307) 72 (0.053) (0.990, 1.297) rs1051340 Case 0.334 527 (0.459) 476 (0.414) 146 (0.127) 0.032 0.276 1.139 Control 0.306 642 (0.477) 586 (0.435) 119 (0.088) (1.011, 1.283) rs61744267 Case 0.176 787 (0.685) 320 (0.279) 42 (0.037) 0.827 0.873 0.984 Control 0.178 912 (0.677) 390 (0.290) 45 (0.033) (0.850, 1.138)

* Denotes a genotyped SNP, all others have been imputed.

237 5. INTERACTION AT A GENETIC LEVEL

Table 5.51: Association Analysis of Epistasis Candidate SNPs from the Prey Subset in Com- bined SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the combined GAIN and non-GAIN schizophrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI)

rs1673607 Case 0.440 801 (0.320) 1199 (0.480) 500 (0.200) 0.385 0.705 0.966 Control 0.448 830 (0.305) 1347 (0.494) 548 (0.201) (0.895, 1.044) rs821616 Case 0.273 345 (0.532) 252 (0.389) 51 (0.079) 0.408 0.793 0.937 Control 0.286 512 (0.500) 439 (0.428) 74 (0.072) (0.802, 1.094) rs3738401 Case 0.320 1172 (0.469) 1056 (0.422) 272 (0.109) 0.211 0.667 1.054 Control 0.309 1297 (0.476) 1174 (0.431) 254 (0.093) (0.971, 1.145) rs6675281 Case 0.137 1863 (0.745) 589 (0.236) 48 (0.019) 0.138 0.667 0.920 Control 0.147 1985 (0.728) 678 (0.249) 62 (0.023) (0.824, 1.027) rs848642 Case 0.321 1166 (0.466) 1062 (0.425) 272 (0.109) 0.040 0.439 1.091 Control 0.303 1342 (0.492) 1117 (0.410) 266 (0.098) (1.004, 1.185) rs1544655 Case 0.262 1372 (0.549) 947 (0.379) 181 (0.072) 0.683 0.944 0.982 Control 0.265 1465 (0.538) 1074 (0.394) 186 (0.068) (0.900, 1.071) rs2287104 Case 0.306 1196 (0.478) 1079 (0.432) 225 (0.090) 0.909 0.992 1.005 Control 0.305 1294 (0.475) 1201 (0.441) 230 (0.084) (0.925, 1.092) rs11718848 Case 0.386 943 (0.377) 1186 (0.474) 371 (0.148) 0.992 0.992 1.000 Control 0.386 1001 (0.367) 1347 (0.494) 377 (0.138) (0.925, 1.082) rs10255061 Case 0.276 1316 (0.526) 988 (0.395) 196 (0.078) 0.687 0.944 1.018 Control 0.273 1446 (0.531) 1073 (0.394) 206 (0.076) (0.934, 1.109) rs1065674* Case 0.220 1514 (0.606) 873 (0.349) 113 (0.045) 0.295 0.667 1.051 Control 0.211 1717 (0.630) 864 (0.317) 144 (0.053) (0.957, 1.154) rs1051340 Case 0.298 1278 (0.511) 955 (0.382) 267 (0.107) 0.303 0.667 1.045 Control 0.289 1411 (0.518) 1055 (0.387) 259 (0.095) (0.961, 1.137) rs61744267 Case 0.148 1820 (0.728) 621 (0.248) 59 (0.024) 0.927 0.992 0.995 Control 0.148 1975 (0.725) 691 (0.254) 59 (0.022) (0.893, 1.108)

* Denotes a genotyped SNP, all others have been imputed.

238 5.2 Results

Table 5.52: Association Analysis of Epistasis Candidate SNPs from the Prey Subset in BPD. Case-Control analysis of the SNP candidates for epistasis, tested in the GAIN bipolar disorder cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI)

rs1673607 Case 0.461 185 (0.285) 328 (0.506) 135 (0.208) 0.195 0.777 1.097 Control 0.439 321 (0.313) 509 (0.497) 195 (0.190) (0.954, 1.261) rs821616 Case 0.273 345 (0.532) 252 (0.389) 51 (0.079) 0.408 0.817 0.937 Control 0.286 512 (0.500) 439 (0.428) 74 (0.072) (0.802, 1.094) rs3738401 Case 0.317 307 (0.474) 271 (0.418) 70 (0.108) 0.324 0.777 1.079 Control 0.301 500 (0.488) 433 (0.422) 92 (0.090) (0.928, 1.254) rs6675281 Case 0.140 476 (0.735) 163 (0.252) 9 (0.014) 0.490 0.839 0.932 Control 0.148 743 (0.725) 260 (0.254) 22 (0.021) (0.764, 1.137) rs848642 Case 0.309 302 (0.466) 292 (0.451) 54 (0.083) 0.818 0.896 1.018 Control 0.305 502 (0.490) 421 (0.411) 102 (0.100) (0.875, 1.183) rs1544655 Case 0.277 334 (0.515) 269 (0.415) 45 (0.069) 0.293 0.777 1.088 Control 0.261 567 (0.553) 382 (0.373) 76 (0.074) (0.930, 1.272) rs2287104 Case 0.343 273 (0.421) 306 (0.472) 69 (0.106) 0.869 0.896 0.988 Control 0.345 438 (0.427) 466 (0.455) 121 (0.118) (0.853, 1.144) rs11718848 Case 0.374 261 (0.403) 289 (0.446) 98 (0.151) 0.737 0.896 0.9757 Control 0.380 382 (0.373) 507 (0.495) 136 (0.133) (0.845, 1.126) rs10255061 Case 0.269 344 (0.531) 260 (0.401) 44 (0.068) 0.841 0.896 1.016 Control 0.265 547 (0.534) 412 (0.402) 66 (0.064) (0.868, 1.189) rs1065674* Case 0.196 413 (0.637) 216 (0.333) 19 (0.029) 0.119 0.712 0.872 Control 0.219 635 (0.620) 332 (0.324) 58 (0.057) (0.734, 1.036) rs1051340 Case 0.304 321 (0.495) 260 (0.401) 67 (0.103) 0.113 0.712 0.886 Control 0.330 468 (0.457) 437 (0.426) 120 (0.117) (0.763, 1.029) rs61744267 Case 0.182 444 (0.685) 172 (0.265) 32 (0.049) 0.896 0.896 0.988 Control 0.184 684 (0.667) 305 (0.298) 36 (0.035) (0.825, 1.183)

* Denotes a genotyped SNP, all others have been imputed.

239 5. INTERACTION AT A GENETIC LEVEL

Table 5.53: Association Analysis of Epistasis Candidate SNPs from the Prey Subset in Com- bined BPD and SCZ. Case-Control analysis of the SNP candidates for epistasis, tested in the combined GAIN bipolar disorder, GAIN schizophrenia and non-GAIN schizophrenia cohort.

Genotype FDR OR

SNP Status MAF 11 12 22 p-value p-value (95% CI)

rs1673607 Case 0.444 986 (0.313) 1527 (0.485) 635 (0.202) 0.663 0.810 0.984 Control 0.448 830 (0.305) 1347 (0.494) 548 (0.201) (0.915, 1.058) rs821616 Case 0.282 1628 (0.517) 1266 (0.402) 254 (0.081) 0.533 0.810 0.975 Control 0.287 388 (0.509) 1110 (0.407) 227 (0.083)1 (0.900, 1.056) rs3738401 Case 0.319 1479 (0.470) 1327 (0.422) 342 (0.109) 0.209 0.591 1.051 Control 0.309 1297 (0.476) 1174 (0.431) 254 (0.093) (0.972, 1.137) rs6675281 Case 0.138 2339 (0.743) 752 (0.239) 57 (0.018) 0.137 0.591 0.924 Control 0.147 1985 (0.728) 678 (0.249) 62 (0.023) (0.833, 1.025) rs848642 Case 0.319 1468 (0.466) 1354 (0.430) 326 (0.104) 0.061 0.591 1.078 Control 0.303 1342 (0.492) 1117 (0.410) 266 (0.098) (0.997, 1.166) rs1544655 Case 0.265 1706 (0.542) 1216 (0.386) 226 (0.072) 0.962 0.962 0.998 Control 0.265 1465 (0.538) 1074 (0.394) 186 (0.068) (0.919, 1.083) rs2287104 Case 0.313 1469 (0.467) 1385 (0.440) 294 (0.093) 0.315 0.612 1.041 Control 0.305 1294 (0.475) 1201 (0.441) 230 (0.084) (0.963, 1.126) rs11718848 Case 0.383 469(0.149) 1475(0.469) 1204(0.382) 0.803 0.891 0.991 Control 0.386 377(0.138) 1347(0.494) 1001(0.367) (0.919,1.067) rs10255061 Case 0.275 1660 (0.527) 1248 (0.396) 240 (0.076) 0.810 0.891 1.010 Control 0.273 1446 (0.531) 1073 (0.394) 206 (0.076) (0.931, 1.096) rs1065674* Case 0.215 1927 (0.612) 1089 (0.346) 132 (0.042) 0.642 0.810 1.021 Control 0.211 1717 (0.630) 864 (0.317) 144 (0.053) (0.935, 1.116) rs1051340 Case 0.299 1599 (0.508) 1215 (0.386) 334 (0.106) 0.215 0.591 1.052 Control 0.289 1411 (0.518) 1055 (0.387) 259 (0.095) (0.971, 1.139) rs61744267 Case 0.155 2264 (0.719) 793 (0.252) 91 (0.029) 0.334 0.612 1.051 Control 0.148 1975 (0.725) 691 (0.254) 59 (0.022) (0.950, 1.163)

* Denotes a genotyped SNP, all others have been imputed.

240 5.2 Results

5.2.7 Assessment of Epistasis - Prey Subset Genes

Each of the chosen SNPs from the prey gene subset were tested for epistatic interaction with each of the three DISC1 SNPs in the five datasets (GAIN SCZ, non-GAIN SCZ, combined SCZ, GAIN BPD and combined BPD and SCZ) as was described in Section 5.2.4.

Again, full results for each SNP x SNP interaction that are not shown here in the text can be found in Appendix E.

5.2.7.1 GAIN Schizophrenia

A summary of the regression analysis for all tested SNP x SNP combinations in the GAIN schizophrenia dataset is shown in Table 5.54.

Table 5.54: Summary Of Regression Analyses for Prey Subset SNPs in GAIN SCZ. Summary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs from the prey subset in the GAIN schizophrenia dataset. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value CEP70 rs1673607 3.08 4 0.545 2.94 4 0.569 3.57 4 0.467 FEZ2 rs848642 5.62 4 0.229 6.39 4 0.172 1.71 4 0.789 rs1544655 4.55 4 0.337 5.04 4 0.284 2.09 4 0.720 rs2287104 3.70 4 0.448 2.23 4 0.693 2.18 4 0.703 GOLGA4 rs11718848 2.01 4 0.734 2.42 4 0.659 9.51 4 0.049 IQUB rs10255061 1.98 4 0.739 13.41 4 0.009 1.64 1 0.200 NUP133 rs1065674* 6.10 4 0.192 3.43 4 0.489 0.06 1 0.811 TRP rs61744267 2.13 4 0.713 2.40 4 0.663 2.64 1 0.104

* Denotes a genotyped SNP, all others have been imputed.

A significant interaction was observed between DISC1 rs3738401 and IQUB rs10255061 (c2 = 13.41 and p-value = 0.009) in this cohort (full results shown in Table 5.55).

241 5. INTERACTION AT A GENETIC LEVEL

Table 5.55: c2 Analysis for DISC1 rs3738401 x IQUB rs10255061 Interaction in GAIN SCZ. Sum- mary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs3738401 x IQUB rs10255061, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/IQUB G,G/C,C 329 (0.239) 341 (0.252) 1.00 DISC1/IQUB G,G/C,T 280 (0.203) 244 (0.181) 2.21 0.137 1.19 [0.95, 1.50] DISC1/IQUB G,G/T,T 59 (0.043) 50 (0.037) 0.95 0.331 1.22 [0.81, 1.84] DISC1/IQUB G,A/C,C 326 (0.237) 282 (0.209) 2.60 0.107 1.20 [0.96, 1.49] DISC1/IQUB G,A/C,T 226 (0.164) 242 (0.179) 0.07 0.787 0.97 [0.76, 1.23] DISC1/IQUB G,A/T,T 36 (0.026) 51 (0.038) 1.84 0.175 0.73 [0.47, 1.15] DISC1/IQUB A,A/C,C 65 (0.047) 72 (0.053) 0.13 0.723 0.94 [0.65, 1.35] DISC1/IQUB A,A/C,T 53 (0.038) 53 (0.039) 0.03 0.864 1.04 [0.69, 1.56] DISC1/IQUB A,A/T,T 4 (0.003) 16 (0.012) 6.59 0.010 0.26 [0.09, 0.78]

There was also evidence of an overall interaction between the DISC1 SNP rs6675281 and GOLGA4 rs11718848 in the GAIN schizophrenia cohort with c2 = 9.51 and p-value = 0.049 (see Table 5.56 for full results).

Table 5.56: c2 Analysis for DISC1 rs6675281 x GOLGA4 rs11718848 Interaction in GAIN SCZ. Summary of the c2 analysis of individuals from the GAIN schizophrenia dataset for DISC1 rs6675281 x GOLGA4 rs11718848, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/GOLGA4 C,C/C,C 348 (0.253) 386 (0.286) 1.00 DISC1/GOLGA4 C,C/C,A 505 (0.366) 478 (0.354) 2.64 0.104 1.17 [0.97, 1.42] DISC1/GOLGA4 C,C/A,A 137 (0.099) 148 (0.110) 0.04 0.850 1.03 [0.78, 1.35] DISC1/GOLGA4 C,T/C,C 145 (0.105) 116 (0.086) 5.11 0.024 1.39 [1.04, 1.84] DISC1/GOLGA4 C,T/C,A 165 (0.120) 140 (0.104) 3.85 0.050 1.31 [1.00, 1.71] DISC1/GOLGA4 C,T/A,A 40 (0.029) 53 (0.039) 0.64 0.423 0.84 [0.54, 1.29] DISC1/GOLGA4 T,T/C,C 18 (0.013) 13 (0.010) 1.35 0.245 1.54 [0.74, 3.18] DISC1/GOLGA4 T,T/C,A 10 (0.007) 14 (0.010) 0.31 0.579 0.79 [0.35, 1.81] DISC1/GOLGA4 T,T/A,A 10 (0.007) 3 (0.002) 4.46 0.035 3.70 [1.01, 13.54]

242 5.2 Results

5.2.7.2 non-GAIN Schizophrenia

The logistic regression summary results for all interactions in the non-GAIN schizophrenia cohort are shown in Table 5.57. In the analysis of the non-GAIN schizophrenia dataset there was a single interaction that showed a significant interaction (see Table 5.58), between DISC1 rs3738401 and CEP70 rs1673607 (c2 = 10.39 and p-value = 0.034).

Table 5.57: Summary Of Regression Analyses for Prey Subset SNPs in non-GAIN SCZ. Sum- mary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs from the prey subset in the non-GAIN schizophrenia dataset. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value CEP70 rs1673607 5.18 4 0.269 10.39 4 0.034 8.60 1 0.072 FEZ2 rs848642 3.13 4 0.536 6.78 4 0.148 0.01 1 0.906 rs1544655 4.21 4 0.378 2.41 4 0.661 1.24 1 0.265 rs2287104 3.42 4 0.491 2.27 4 0.686 4.42 4 0.352 GOLGA4 rs11718848 3.18 4 0.528 1.14 4 0.889 5.51 4 0.507 IQUB rs10255061 8.27 4 0.082 1.00 4 0.910 3.94 4 0.414 NUP133 rs1065674* 7.78 4 0.100 1.14 4 0.887 1.06 1 0.900 TRIP11 rs1051340 5.50 4 0.240 2.52 4 0.641 0.04 1 0.852 TRP rs61744267 6.89 4 0.142 1.26 4 0.869 0.14 1 0.708

* Denotes a genotyped SNP, all others have been imputed.

Table 5.58: c2 Analysis for DISC1 rs3738401 x CEP70 rs1673607 Interaction in non-GAIN SCZ. Summary of the c2 analysis of individuals from the non-GAIN schizophrenia dataset for DISC1 rs3738401 x CEP70 rs1673607, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/CEP70 G,G/T,T 188 (0.140) 168 (0.146) 1.00 DISC1/CEP70 G,G/T,C 294 (0.218) 278 (0.242) 0.17 0.676 0.95 [0.73, 1.23] DISC1/CEP70 G,G/C,C 147 (0.109) 91 (0.079) 4.65 0.031 1.44 [1.03, 2.02] DISC1/CEP70 G,A/T,T 186 (0.138) 150 (0.131) 0.45 0.501 1.11 [0.82, 1.49] DISC1/CEP70 G,A/T,C 287 (0.213) 242 (0.211) 0.18 0.673 1.06 [0.81, 1.39] DISC1/CEP70 G,A/C,C 113 (0.084) 89 (0.077) 0.51 0.476 1.13 [0.80, 1.61] DISC1/CEP70 A,A/T,T 31 (0.023) 45 (0.039) 3.62 0.057 0.62 [0.37, 1.02] DISC1/CEP70 A,A/T,C 79 (0.059) 59 (0.051) 0.79 0.375 1.20 [0.81, 1.78] DISC1/CEP70 A,A/C,C 22 (0.016) 27 (0.023) 1.08 0.299 0.73 [0.40, 1.33]

243 5. INTERACTION AT A GENETIC LEVEL

5.2.7.3 Combined Schizophrenia

Upon combination of the two schizophrenia cohorts there was a single significant interaction observed between DISC1 rs3738401 and FEZ2 rs848642 (c2 = 10.65 and p-value = 0.031), the remainder of the SNP x SNP interaction results for the combined schizophrenia cohort showed no evidence of epistasis (see Table 5.59). The full results of the analysis for the significant SNP x SNP combination are shown in Table 5.60.

Table 5.59: Summary Of Regression Analyses for Prey Subset SNPs in Combined SCZ. Sum- mary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs from the prey subset in the combined schizophrenia datasets. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value CEP70 rs1673607 4.83 4 0.305 5.14 4 0.274 8.08 4 0.089 FEZ2 rs848642 1.44 4 0.838 10.65 4 0.031 1.38 4 0.849 rs1544655 3.61 4 0.461 3.61 4 0.460 2.03 4 0.731 rs2287104 3.23 4 0.520 2.40 4 0.663 3.70 4 0.449 GOLGA4 rs11718848 3.15 4 0.534 0.81 4 0.937 7.18 4 0.127 IQUB rs10255061 4.08 4 0.395 8.06 4 0.089 7.07 4 0.132 NUP133 rs1065674* 8.31 4 0.081 3.39 4 0.495 3.41 4 0.491 TRP rs61744267 3.44 4 0.486 2.07 4 0.722 0.93 1 0.336

* Denotes a genotyped SNP, all others have been imputed.

Table 5.60: c2 Analysis for DISC1 rs3738401 x FEZ2 rs848642 Interaction in non-GAIN SCZ. Summary of the c2 analysis of individuals from the non-GAIN schizophrenia dataset for DISC1 rs3738401 x FEZ2 rs848642, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/FEZ2 G,G/G,G 619 (0.227) 551 (0.220) 1.00 DISC1/FEZ2 G,G/G,A 520 (0.191) 495 (0.198) 0.61 0.435 0.94 [0.79, 1.11] DISC1/FEZ2 G,G/A,A 158 (0.058) 126 (0.050) 0.68 0.408 1.12 [0.86, 1.45] DISC1/FEZ2 G,A/G,G 601 (0.221) 487 (0.195) 1.24 0.266 1.10 [0.93, 1.30] DISC1/FEZ2 G,A/G,A 485 (0.178) 447 (0.179) 0.16 0.692 0.97 [0.81, 1.15] DISC1/FEZ2 G,A/A,A 88 (0.032) 122 (0.049) 8.62 0.003 0.64 [0.48, 0.86] DISC1/FEZ2 A,A/G,G 122 (0.045) 128 (0.051) 1.39 0.238 0.85 [0.65, 1.12] DISC1/FEZ2 A,A/G,A 112 (0.041) 120 (0.048) 1.66 0.197 0.83 [0.63, 1.10] DISC1/FEZ2 A,A/A,A 20 (0.007) 24 (0.010) 0.94 0.331 0.74 [0.41, 1.36]

244 5.2 Results

This SNP x SNP interaction was not significant in each of the individual schizophrenia datasets (GAIN = c2 = 6.39 and p-value = 0.172 and non-GAIN = c2 = 6.78 and p-value = 0.148); however, the increased power of the larger dataset is enough to reach significance (albeit uncorrected).

5.2.7.4 GAIN Bipolar Disorder

Summary results for all interactions tested in the GAIN bipolar disorder dataset are shown in Table 5.61.

Table 5.61: Summary Of Regression Analyses for Prey Subset SNPs in GAIN BPD. Summary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs from the prey subset in the GAIN bipolar disorder dataset. The interac- tion model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value CEP70 rs1673607 2.06 4 0.724 4.76 4 0.313 2.66 4 0.689 FEZ2 rs848642 0.62 4 0.960 10.15 4 0.038 0.55 1 0.460 rs1544655 4.97 4 0.291 1.30 4 0.861 0.00 1 0.982 rs2287104 15.25 4 0.004 3.72 4 0.446 0.06 1 0.802 GOLGA4 rs11718848 1.59 4 0.810 5.46 4 0.243 7.17 4 0.127 IQUB rs10255061 1.77 4 0.777 2.81 4 0.590 1.08 1 0.299 NUP133 rs1065674* 1.94 4 0.747 2.82 4 0.588 0.38 1 0.538 TRIP11 rs1051340 0.77 4 0.942 0.55 4 0.969 0.00 1 0.948 TRP rs61744267 2.90 4 0.574 5.07 4 0.280 0.10 1 0.754

* Denotes a genotyped SNP, all others have been imputed.

The same interaction that was observed as significant in the combined schizophrenia cohort was also significant in the GAIN bipolar disorder cohort (DISC1 rs3738401 x FEZ2 rs848642 - c2 = 10.15 and p-value = 0.038). For full results see Table 5.62.

A second interaction, involving another FEZ2 SNP was also significant in this dataset (DISC1 rs821616 x FEZ2 rs2287104 - c2 = 15.25 and p-value = 0.004). Full results for this analysis are shown in Table 5.63.

245 5. INTERACTION AT A GENETIC LEVEL

Table 5.62: c2 Analysis for DISC1 rs3738401 x FEZ2 rs848642 Interaction in GAIN BPD. Summary of the c2 analysis of individuals from the GAIN bipolar disorder dataset for DISC1 rs3738401 x FEZ2 rs848642, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/FEZ2 G,G/G,G 247 (0.241) 146 (0.225) 1.00 DISC1/FEZ2 G,G/G,A 189 (0.184) 141 (0.218) 2.33 0.127 0.79 [0.59, 1.07] DISC1/FEZ2 G,G/A,A 64 (0.062) 20 (0.031) 5.43 0.020 1.89 [1.10, 3.25] DISC1/FEZ2 G,A/G,G 216 (0.211) 127 (0.196) 0.00 0.972 1.01 [0.74, 1.36] DISC1/FEZ2 G,A/G,A 186 (0.181) 117 (0.181) 0.16 0.693 0.94 [0.69, 1.28] DISC1/FEZ2 G,A/A,A 31 (0.030) 27 (0.042) 1.89 0.169 0.68 [0.39, 1.18] DISC1/FEZ2 A,A/G,G 39 (0.038) 29 (0.045) 0.74 0.388 0.79 [0.47, 1.34] DISC1/FEZ2 A,A/G,A 46 (0.045) 34 (0.052) 0.81 0.369 0.80 [0.49, 1.30] DISC1/FEZ2 A,A/A,A 7 (0.007) 7 (0.011) 0.95 0.329 0.59 [0.20, 1.72]

Table 5.63: c2 Analysis for DISC1 rs821616 x FEZ2 rs2287104 Interaction in GAIN BPD. Sum- mary of the c2 analysis of individuals from the GAIN bipolar disorder dataset for DISC1 rs821616 x FEZ2 rs2287104, normalised to the major allele homozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/FEZ2 A,A/C,C 224 (0.219) 147 (0.227) 1.00 DISC1/FEZ2 A,A/C,T 231 (0.225) 154 (0.238) 0.01 0.916 0.98 [0.74, 1.32] DISC1/FEZ2 A,A/T,T 57 (0.056) 44 (0.068) 0.51 0.474 0.85 [0.54, 1.33] DISC1/FEZ2 A,T/C,C 192 (0.187) 102 (0.157) 1.70 0.192 1.24 [0.90, 1.70] DISC1/FEZ2 A,T/C,T 189 (0.184) 132 (0.204) 0.16 0.689 0.94 [0.69, 1.27] DISC1/FEZ2 A,T/T,T 58 (0.057) 18 (0.028) 6.88 0.009 2.11 [1.20, 3.73] DISC1/FEZ2 T,T/C,C 22 (0.021) 24 (0.037) 2.67 0.103 0.60 [0.33, 1.11] DISC1/FEZ2 T,T/C,T 46 (0.045) 20 (0.031) 2.06 0.151 1.51 [0.86, 2.65] DISC1/FEZ2 T,T/T,T 6 (0.006) 7 (0.011) 1.06 0.304 0.56 [0.19, 1.71]

246 5.2 Results

5.2.7.5 Combined Bipolar Disorder and Schizophrenia

A summary of all regression analyses in this combined bipolar disorder and schizophrenia data is seen in Table 5.64. The significance seen for the interaction between DISC1 rs3738401 and FEZ2 rs848642 in both the combined schizophrenia and GAIN bipolar disorder data fur- ther increases (see Table 5.65) in this combined dataset (c2 = 13.87 and p-value = 0.008).

Table 5.64: Summary Of Regression Analyses for Prey Subset SNPs in Combined BPD and SCZ. Summary of the regression analysis for interaction between DISC1 rs821616, rs3738401 and rs6675281 and each of the chosen SNPs from the prey subset in the combined bipolar disorder and schizophrenia datasets. The interaction model was tested against the null model.

rs821616 rs3738401 rs6675281

Gene SNP c2 df p-value c2 df p-value c2 df p-value CEP70 rs1673607 3.35 4 0.500 3.96 4 0.411 6.39 4 0.172 FEZ2 rs848642 1.60 4 0.808 13.87 4 0.008 1.11 4 0.893 rs1544655 2.86 4 0.581 3.98 4 0.408 2.69 4 0.610 rs2287104 4.34 4 0.362 3.12 4 0.537 2.92 4 0.571 GOLGA4 rs11718848 2.58 4 0.631 0.89 4 0.926 6.92 4 0.140 IQUB rs10255061 2.86 4 0.581 6.78 4 0.148 6.73 4 0.151 NUP133 rs1065674* 6.10 4 0.192 2.93 4 0.570 4.22 4 0.377 TRP rs61744267 3.25 4 0.515 5.09 4 0.278 0.69 1 0.406

* Denotes a genotyped SNP, all others have been imputed.

Table 5.65: c2 Analysis for DISC1 rs3738401 x FEZ2 rs848642 Interaction in Combined BPD and SCZ. Summary of the c2 analysis of individuals from the combined bipolar disorder and schizophrenia datasets for DISC1 rs3738401 x FEZ2 rs848642, normalised to the major allele ho- mozygotes.

Frequency (%)

Genotype Combination Controls Cases c2 p-value OR [95% CI] DISC1/FEZ2 G,G/G,G 619 (0.227) 697 (0.221) 1.00 DISC1/FEZ2 G,G/G,A 520 (0.191) 636 (0.202) 1.04 0.307 0.92 [0.79, 1.08] DISC1/FEZ2 G,G/A,A 158 (0.058) 146 (0.046) 2.41 0.120 1.22 [0.95, 1.56] DISC1/FEZ2 G,A/G,G 601 (0.221) 614 (0.195) 1.49 0.222 1.10 [0.94, 1.29] DISC1/FEZ2 G,A/G,A 485 (0.178) 564 (0.179) 0.15 0.698 0.97 [0.82, 1.14] DISC1/FEZ2 G,A/A,A 88 (0.032) 149 (0.047) 7.95 0.005 0.67 [0.50, 0.88] DISC1/FEZ2 A,A/G,G 122 (0.045) 157 (0.050) 1.01 0.314 0.87 [0.67, 1.13] DISC1/FEZ2 A,A/G,A 112 (0.041) 154 (0.049) 2.16 0.141 0.82 [0.63, 1.07] DISC1/FEZ2 A,A/A,A 20 (0.007) 31 (0.010) 1.21 0.272 0.73 [0.41, 1.29]

247 5. INTERACTION AT A GENETIC LEVEL

5.3 Discussion

5.3.1 Validation of Published Epistatic Relationships

The interaction between DISC1 Ser704Cys and FEZ1 rs12224788 found by Kang et al. (2011) was not replicated in the non-GAIN dataset. In this dataset there is no increased (or de- creased) risk for schizophrenia in association with the combined genotype at these SNPs. With this result not in support of the original findings there was no reason to test the inter- action in a combined GAIN and non-GAIN schizophrenia cohort.

The same conclusion is true of the DISC1 Ser704Cys by NDEL1 rs1391768 interaction in the combined GAIN and non-GAIN schizophrenia dataset. While there was some suggestion of increased risk of schizophrenia with the two loci in individuals carrying the Cys allele at DISC1 in combination with a GG genotype at the NDEL1 SNP (OR [95% CI] = 1.32 [1.04-1.67], uncorrected p-value = 0.021), there was no evidence of an overall interaction between these two SNPs in relation to schizophrenia. The pattern of increased risk found here was not the same as the increased risk for schizophrenia of G carriers at NDEL1 on a DISC1 Ser/Ser background that was identified in the original paper (Burdick et al., 2008).

The analysis undertaken here does not support epistatic interaction between these two genes and DISC1, though it does not exclude interaction either as the tests were not extended to include further SNPs within these genes.

5.3.2 Interacting Gene Set: Inflation and Association

Deviation from expected values (inflation) in a QQ plot is typically interpreted as evidence for stratification in the data being analysed; however, there is evidence that suggests the pres- ence of inflation in a sample that contains no population stratification or cryptic relatedness may be reporting on a true polygenic effect (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014; Yang et al., 2011b).

248 5.3 Discussion

Given that the data used in these analyses had undergone quality control and showed no sign of inflation at a genome wide level (see Figure 3.4) any inflation (l>1.1) seen in a subset of genes in these datasets could be attributable to polygenic effects, though it is also possible that this sub-group shows inflation by chance.

In the confirmed published interacting gene set, there was suggestion of inflation in both the GAIN and non-GAIN schizophrenia cohorts and although the inflation remained after com- bination of the two datasets it was reduced. This result suggests that any inflation seen in the individual cohorts was not consistent, showing that the contributing genes or SNPs were different in each cohort. The increase in power gained by the addition of the GAIN bipolar disorder data to create a psychiatric illness dataset saw some increase in inflation when com- pared to the combined schizophrenia data; however, the inflation present in this psychiatric illness dataset is still below what was seen for the GAIN schizophrenia data alone. There was minimal evidence for inflation in the GAIN bipolar disorder data but only in the female analysis.

With the addition of the interacting proteins that were confirmed by yeast two-hybrid anal- ysis in both the literature and in the current study (Wellcome Trust Advanced Course in- cluded) the inflation values across all individual datasets were reduced (though still above 1.1 in both schizophrenia datasets) in the combined gender analyses) and in the combined datasets there was some increase in inflation evident. In the combined schizophrenia dataset while the male only analysis remained constant both the female and combined gender in- flation scores were increased. In the dataset which included the bipolar disorder cohort all three inflation factors were increased by the addition of the interacting partners confirmed through this study.

Assessment of the same genes in the publicly available data provided by the PGC ((http: //www.med.unc.edu/pgc/results, previously described in Chapter 3), lends support to the results found in this study. The analysis of the PGC data is limited to combined genders only and does not include those genes found on the X chromosome, as these data are not available. For the confirmed published interaction set (including those interactions confirmed by the current study) the schizophrenia cohort gave l = 1.63 (Figure 5.16 A). This result is consistent with inflation and replicates (albeit more strongly) what was found in the current study. In

249 5. INTERACTION AT A GENETIC LEVEL the analysis of the prey subset in the PGC schizophrenia data l = 1.11 (Figure 5.16 B), again this is consistent with what was found in the current study with very marginal inflation seen. For the confirmed published interactions set in the PGC bipolar disorder cohort l = 1.15 (Figure 5.16 C). This slight inflation was not seen in the current study with all l<1.05. In the prey subset the PGC bipolar data gave l = 0.95 (Figure 5.16 D) a result consistent with no inflation which again replicates the findings for combined genders in the current study.

Figure 5.16: Q-Q Plots Of Psychiatric Genomics Consortium Data]. The observed versus ex-

pected -Log10(P) in the PGC data for the A) combined published interaction gene set in schizo- phrenia (n=6454), B) prey subset in schizophrenia (n=6567), C) combined published interaction gene set in bipolar disorder (n=10652), D) prey subset in bipolar disorder (n=10837). Note that the scale of the observed values (y-axis) is not always consistent. The number of SNPs (n) in- cluded in each plot are shown in parenthesis.

These results suggest that at least some of the confirmed interacting partners may be playing a role in the development of schizophrenia, a result that was expected based on the liter- ature surrounding some of these confirmed interactions (as was discussed in Chapter 1). Interestingly, the addition of genes identified in the current study, which were previously unconfirmed interacting partners of DISC1, increased the inflation suggesting that they too (at least in part) are playing some role in the development of psychiatric illness.

250 5.3 Discussion

No individual SNPs within any of the selected genes showed association with disease. Re- sults of this nature seem to support the theory of ‘many small variants of minor effect’ in the development of schizophrenia and bipolar disorder. As was discussed in Chapter 3, deep sequencing methods and larger cohorts should provide more power to combat this issue in the future, but at this stage it is perhaps encouraging to see in which genes these rare variants may lie.

In the prey gene set chosen based on the primary cilia hypothesis there was less evidence for inflation in the datasets on the whole with the combined gender analyses in both the combined schizophrenia and combined schizophrenia and bipolar disorder cohorts show- ing inflation factors of close to 1.00. However, both the mean c2 and a visual inspection of the Q-Q plots suggested inflation was present, in fact the mean c2 values seen for these cohorts (1.25 and 1.20 respectively) were the largest seen in any of the analyses conducted. The issue causing this inconsistency between tests was that the median c2, from which the inflation factor is calculated, was influenced by an unusual spread in the data. The largest inflation factor was seen in the GAIN bipolar disorder male only cohort, with some small inflation seen also for the females of this same dataset and for both genders in the non-GAIN schizophrenia cohorts. Results such as this show the importance of interpreting all evidence together before drawing conclusions.

Unlike in the previously discussed confirmed interacting datasets, there were individual SNPs that showed association in their respective datasets. The association in the GAIN bipo- lar data of three SNPs within the ATP6V1B2 region are contributing to the inflation seen in the male only analysis. One of these SNPs, intronic rs1106634, has been reported previously. A meta-analysis that contained the GAIN data found association of this SNP with bipolar 6 disorder (p-value = 5.63x10 ), schizophrenia (p-value = 0.024) and with a combination of 6 the two disorders (OR = 1.34, p-value = 3.97x10 ) (Wang et al., 2010b). A second meta- analysis shows an increased risk of major depressive disorder in combined gender analysis 7 (OR = 1.30, p-value = 6.78x10 ) in a ‘broad’ phenotypic model (Shyn et al., 2011). The other two SNPs (intronic rs135253777 and upstream rs4922139) have not previously been reported on. These two SNPs are in high linkage disequilibrium (r2 = 92) so it is likely that they are both reporting on the same variant.

251 5. INTERACTION AT A GENETIC LEVEL

There are also a pair of SNPs that reach corrected significance within the USO1 gene region in the combined schizophrenia cohort. One of these (rs324734) is an intronic SNP and the other (rs324702) is upstream of the gene. Neither has been identified to be associated with any psychiatric disorder in the past.

The fact that there was some inflation seen for the prey gene subset is promising. Though the gene set may not necessarily have an affiliation to the primary cilia (though enrichment analysis suggest that it may), the inflation seen does suggest that there is some association with schizophrenia and bipolar disorder.

5.3.2.1 Enrichment Analysis

Enrichment analysis was conducted for the three sets of genes included in the inflation anal- ysis. This analysis was undertaken in an effort to identify if there was any indication that the gene sets showing inflation had a functional enrichment for processes that may in some way be involved in the development of major psychiatric illness.

The gene sets used in these analyses were quite small (up to 35 genes) and so the results are perhaps not as valid as enrichment conducted with a larger set of genes. When assessing only a small number of genes the significance of an enrichment may be higher by chance due to the fact that it is easier to fit a high proportion of a small gene set into a single func- tional annotation than it is for a larger group. Taking this into consideration, the p-values provided by the analyses are not particularly robust but the annotations given still provide some insight into the function of these gene sets.

There was enrichment seen in both the confirmed published interactant and complete gene sets for protein and molecule binding as well as for cytoskeleton and microtubule associa- tion. The addition of the interactions confirmed by yeast two-hybrid screen (i.e. the complete gene set) also showed an enrichment for processes involved with neurons and neurogene- sis while the confirmed published interactant set alone showed a stronger enrichment for functions involved with cell morphology and cell projections.

The enrichment for protein binding, microtubule association, cellular projections and organ- isation appears to be consistent with the hypothesis of primary cilia construction proposed

252 5.3 Discussion as part of this study. However, other pathways and functions may also be indicated by this enrichment—including roles in brain development for the complete gene set.

Although the prey gene set was chosen to complement the hypothesis of primary cilia con- struction and maintenance, the enrichment of this gene set was also assessed. The enrich- ment found for protein transport and localisation, vesicles, the nuclear pore and cell projec- tions are consistent with what was predicted of this gene set. The limitations of these results with respect to the p-values associated with the enriched functions should be considered when interpreting this result, though it is promising that the predicted functions were the most ‘significant’.

5.3.3 Epistatic Relationships with Confirmed Interactants

The analysis of epistasis between the three chosen DISC1 SNPs (rs821616, rs3738401 and rs6675281) and the set of SNPs chosen from the pool of confirmed interaction partners of DISC1 revealed some evidence that interplay between genes may contribute to the pheno- type of psychiatric illness.

It must be noted at this point that none of the interactions tested were subjected to correc- tion for multiple testing and that if such correction was undertaken none of the associations identified would hold as significant. Bonferroni correction would set the p-value threshold at 0.00015, for the 333 tests conducted; in the analyses conducted the lowest p-value found was 0.003 (in the GAIN bipolar disorder dataset with AKAP9 rs6960867 and DISC1 rs3738401). There were a total of 16 significant interactions identified from the 333 tests conducted. These significant interactions were found across all five cohorts, in six of the 10 genes analysed and with all three of the DISC1 SNPs.

The gene and SNP that showed the most consistent association from this analysis, and is therefore of the most interest from the confirmed interactants, was SYNE1 rs214976. This SNP showed a marginally significant interaction with rs821616 in the GAIN schizophrenia dataset (p-value = 0.050), but more interestingly it showed significant interactions with DISC1 rs3738401 in all but the non-GAIN schizophrenia cohort (for full results of the non-GAIN data see Appendix E).

253 5. INTERACTION AT A GENETIC LEVEL

This lack of association in the non-GAIN schizophrenia cohort meant that there was a de- crease in significance in the combined schizophrenia data (p-value = 0.013) compared to the GAIN schizophrenia data alone (p-value = 0.003). However, the addition of the GAIN bipo- lar disorder data (p-value = 0.049) did increase the significance of the interaction to p-value = 0.007. The genotype that appeared to lend the most influence to the interaction was the DISC1 homozygous minor (AA) in combination with the SYNE1 homozygous major (GG) which was significant in all cohorts except the non-GAIN schizophrenia (p-value = 0.518). This trend of increasing significance for the overall interaction suggests that the results seen here may be real even though corrected significance is not reached. The trend may continue further with the addition of more data to the analysis, and is perhaps worth pursuing further if such data was available.

TRAF3IP1 rs12464423 is also of interest as it showed significance (again uncorrected) with both schizophrenia and bipolar disorder. This SNP showed significant interactions with DISC1 rs821616 in both the GAIN schizophrenia (p-value = 0.020) and the GAIN bipolar disorder (p-value = 0.020) datasets. Further, there was a trend towards uncorrected signif- icance for the TRAF3IP1 SNP in a grouped allele analysis with DISC1 rs6675281 (p-value = 0.052).

5.3.3.1 SYNE1

The interaction between DISC1 and Spectrin Repeat containing, Nuclear Envelope 1 - SYNE1 was identified by yeast two-hybrid screen first by Morris et al. (2003) and then by Camargo et al. (2007). The interaction was additionally confirmed by affinity capture western (Morris et al., 2003). The initial detection of the interaction in the yeast two-hybrid screen of Mor- ris et al. (2003) was found in a human heart library but not in a human brain library. The yeast two-hybrid result of Camargo et al. (2007), however, was identified from a human fe- tal brain library, giving evidence that the protein is expressed in brain at least during fetal development.

The SYNE1 SNP reported here (rs214976) has not been previously identified to be associ- ated with any kind of psychiatric disorder; however, the SYNE1 gene has been identified to

254 5.3 Discussion be associated with bipolar disorder on several occasions (Xu et al., 2014; Psychiatric GWAS Consortium Bipolar Disorder Working Group, 2011b; Green et al., 2013; Liu et al., 2011).

A protein product of SYNE1 known as Nesprin-1 has been shown to be involved in ciliogen- esis in mice. Dawe et al. (2009) report that the siRNA-mediated knockdown of this protein results in a significant decrease in the number of cells with a primary cilium. The group suggest that this loss of primary ciliation is due to a failure of the centrosome to migrate correctly during cellular polarisation. The apical-basal polarisation of the cells remains oth- erwise intact without the expression of Nesprin-1 suggesting that only the migration of the centrosome is affected by the knockdown and therefore infers the role of SYNE1 and its protein product in the construction of primary cilia. This phenotype of primary cilium loss with the knockdown of SYNE1 has since been replicated by a second group (Marley & von Zastrow, 2012).

5.3.3.2 TRAF3IP1

Tumor Necrosis Factor Receptor-Associated Factor 3 Interacting Protein 1 -TRAF3IP1 (also known as MIPT3 and IFT54 ) was also identified as one of the proteins to interact with DISC1 through yeast two-hybrid and co-immunoprecipitation assays by Morris et al. (2003). The same authors show that the binding of DISC1 to microtubule components is mediated by TRAF3IP1, and that when the N-terminus of TRAF3IP1 is truncated DISC1 cannot be found at the microtubules. Ling & Goeddel (2000) had previously identified that it was the N-terminal of the TRAF3IP1 protein that was necessary for interaction with microtubules. These findings show that involvement of the DISC1 protein in the construction of primary cilia is at least in part dependent on an interaction with TRAF3IP1.

Additionally, the orthologue of TRAF3IP1 in Caenorhabditis elegans, DYF-11, has been de- scribed as an intraflagellar transport protein (IFT) and was shown to have a role in ciliogene- sis (Li et al., 2008a). Similarly the TRAF3IP1 protein (or its homologue) has been found to be involved with ciliogenesis in Zebrafish (elipsa) (Omori et al., 2008), and mice (Berbari et al., 2011). The interacting domains of TRAF3IP1 and DISC1 have been described by Morris et al. (2003) and are reconstructed in Figure 5.17.

255 5. INTERACTION AT A GENETIC LEVEL

Figure 5.17: TRAF3IP1-DISC1 Interacting Domain. The structure of TRAF3IP1 and DISC1 showing the interaction domain (red) and N-terminal domain (blue) as described by (Morris et al., 2003). The numbers shown refer to amino acid locations. The locations of the two SNPs showing the epistatic relationship are also shown.

TRAF3IP1 and rs12464423 were part of a study conducted by Moens et al. (2011). In this study rs12464423 does not reach significance when tested for association with schizophrenia; how- ever, another common SNP in the same gene (rs13398676) shows a significant increase in risk at a genotypic level with minor allele homozygotes (OR = 2.07, p-value = 0.008 uncorrected). The same group report that the rs12464423 SNP may cause a splicing effect but emphasise that more data is needed to confirm this prediction.

5.3.4 Epistatic Relationships with Prey Subset

Of the six genes tested in this subset, four showed at least one significant interaction. These interactions were seen at least once with each of the three DISC1 SNPs, but the majority were with rs3738401.

As was the case with the confirmed interactant protein list, none of the SNP x SNP interac- tions that were identified from the prey subset reached corrected significance. The Bonfer- roni correction threshold for this group of analyses is a p-value of < 0.0004 (based on the 126 tests conducted). The closest any analysis came to this was that between DISC1 rs821616 and FEZ2 rs2287104 which achieved a p-value of 0.004.

A similar trend as was seen with SYNE1 in the confirmed gene set was apparent here with the FEZ2 SNP rs848642 in combination with DISC1 rs3738401. Neither of the individual schizo- phrenia cohorts showed a significant interaction (GAIN schizophrenia - p-value = 0.172) and

256 5.3 Discussion non-GAIN schizophrenia - p-value = 0.148) but once combined into a single larger cohort significance is reached (p-value = 0.031). A significant interaction is seen in the GAIN bipo- lar disorder dataset on its own (p-value = 0.038) and the combination of all three datasets gives the most significant association (p-value = 0.008). Again, this could suggest that the interaction is real and may be strengthened in a larger cohort.

5.3.4.1 FEZ2

Fasciculation and Elongation protein Zeta 2 (zygin II) - FEZ2 is a paralogue of FEZ1 and the FEZ1 and FEZ2 gene constructs are both considered orthologues of the C. elegans unc-76 gene. FEZ2 was first identified as an interaction partner of DISC1 in the current study, with the protein being pulled as a prey from the yeast two-hybrid screen on four independent occasions (twice in the WTAC 2010 analysis and twice in the analysis conducted as part of this research project).

It is apparent from the literature that there is not a great deal known about FEZ2 specifically and independently of FEZ1. Fujita et al. (2004) compared the mRNA expression of the two gene products and found differences in both spatial and temporal expression patterns. FEZ1 expression is found exclusively in the brain of rats with a spike in expression levels in 11 day old embryos, this expression level gradually decreases over the next few days. In comparison the FEZ2 mRNA showed expression in the heart, liver, lung, skeletal muscle and kidney tissues as well as in the brain. This more ubiquitous expression was at much lower levels than that seen for FEZ1. The levels of FEZ2 expression remain constant in rat embryos from day seven to day 17 and the levels are higher during this time than in adult tissues suggesting that the FEZ2 protein product plays a role in development.

In a single yeast two-hybrid screen using FEZ1, FEZ2 and unc-76 proteins as baits, it was discovered that there was almost complete overlap in the preys identified for FEZ1 and unc- 76 and that all of these interactions were also observed with FEZ2. There were, however, additional preys that exclusively showed interaction with FEZ2 (Alborghetti et al., 2010). The authors suggest that due to the ability of FEZ2 to bind all FEZ1 interactants that there may be a compensatory mechanism for lost FEZ1 expression. Fujita et al. (2004) also provide

257 5. INTERACTION AT A GENETIC LEVEL evidence that the function of FEZ2 is similar to that of FEZ1 but that it extends beyond neuronal tissues.

FEZ1 has been shown to have a role in ciliogenesis (Marley & von Zastrow, 2012) and both FEZ1 and FEZ2 have been shown to interact with NEK1. NEK1 is known to play a role in the development of polycystic kidney disease (Surpili et al., 2003), which is classified as a ciliopathy. Given this evidence it seems reasonable to consider FEZ2 as a part of the pro- tein network responsible for the construction and maintenance of the primary cilium. There are no previously reported associations between any SNPs within this gene and psychiatric illness.

5.3.5 Limitations

R The data available for use in this study was limited to SNPs genotyped on the Affymetrix 6.0 chip, as a result of this the majority of the SNPs used in these analyses were imputed. Consideration was given to using tag SNPs rather than imputed data; however, not all of the SNPs chosen for analysis had genotyped proxies that could be substituted and therefore, it was decided for consistency that all non-genotyped SNPs would be imputed. Imputation of SNP data is not as accurate as genotyped data but the ability to impute genotypes has improved with the increase in size of the available reference sets used to infer the linkage disequilibrium patterns necessary for imputation.

All SNPs that were included in this study had imputed minor allele frequencies that were consistent with those that were expected based on the reported European minor allele fre- quencies of the 1000 Genomes project. The method used in this study has been accepted as reasonable for the imputation of common SNPs but as inadequate for the imputation of rare variants (Wang et al., 2012), lending more weight to the decision to include only SNPs with minor allele frequencies of 10% or greater.

The number of interactions tested in this study was relatively low when the number of possi- ble SNPs to be included is considered. The number of SNPs tested was, however, considered to be the maximum possible with the available data. The use of imputation increased the amount of data greatly from the number of SNPs that were genotyped in the datasets. With

258 5.3 Discussion a larger dataset and a higher coverage of the genome (either in the data used or in the ref- erence set used for imputation) the number of SNPs, and therefore interactions, could be extended to include rarer variants.

5.3.5.1 Power

The power of the analyses conducted here (shown in Figure 5.18) suggest that interactions with effect sizes of those found by Burdick et al. (2008) and Kang et al. (2011) should have been detectable in the cohorts used in this study, and therefore the inability to replicate these findings in this dataset cannot be ascribed to small sample size.

Given the number of genotypes that are possible when analysing an interaction between two SNPs, the power to detect an association between a specific genotype and psychiatric illness in a given cohort is less than the power to detect an interaction between a single SNP and the illness. This should be considered when interpreting the results of this study as it is underpowered to detect some of the associations found with either small effects or very rare genotypes. For example the DISC1 rs3734801 x SYNE1 rs214976 interaction that was found to be significant in all but the non-GAIN schizophrenia cohort, only had adequate power (P=0.999) in the combined GAIN and non-GAIN schizophrenia cohort.

The power to detect rare variants, with enough confidence to proceed with tests of epistasis, is not feasible with datasets of the size used here. In the same way that the mounting evi- dence from the literature suggests that any involvement of DISC1 in mental illness is likely due to rare variants of minor effect that may differ among populations and individuals, it is likely that many epistatic interactions between genes are also the result of such rare variants and thus would be difficult to identify in studies of this size. Therefore, it seems that such procedures should be considered in future larger studies with deeper sequencing capacity, as due to the complex nature of mental illness there is potential for interaction at a genetic level.

259 5. INTERACTION AT A GENETIC LEVEL

Figure 5.18: Statistical Power Graphs for Epistasis Analysis. The graphs show the achievable power at given minor allele frequencies for a range of odds ratios in combined genders for A) GAIN schizophrenia, B) non-GAIN schizophrenia C) Combined schizophrenia, D) GAIN bipolar disorder and E) Combined schizophrenia and bipolar disorder.

260 5.3 Discussion

5.3.6 Conclusions

This study has shown some evidence that the accumulation of variants within the confirmed DISC1 interactant list have some bearing on risk of schizophrenia, but perhaps not bipolar disorder. This was shown by the inflation of observed versus expected p-values in the Q-Q plots generated for this list of genes. For the gene set based on the primary cilia hypothesis there is suggestion of inflation and there is also evidence that some of the included genes may be associated with bipolar disorder (particularly ATP6V1B2) and schizophrenia (particularly USO1).

The epistatic interactions identified between DISC1 and SYNE1, TRAF3IP1 and FEZ2 are of interest given the functions of the protein products of these genes in ciliogenesis. It would be of interest to assess different combinations of SNPs within these genes and to extend the analysis to other cohorts.

The time restrictions and lack of power in this study meant that it was not feasible to test SNPs with minor allele frequencies of less than 10%. This, in turn, means that no real con- clusions can be drawn as to the lack of SNP x SNP, and hence gene x gene interactions with other candidate genes and SNPs.

261 5. INTERACTION AT A GENETIC LEVEL

262 Chapter 6

Summary and Future Directions

6.1 Summary and Discussion of Main Findings

This thesis aimed to elucidate a clearer picture of the involvement of the translocation iden- tified in the original Scottish family ((1;11)(q42,1;q14.3)) and its disrupted gene, Disrupted in Schizophrenia 1 (DISC1), in the major psychiatric illnesses bipolar disorder and schizo- phrenia. This was to be achieved using two differing processes. Firstly, through genetic analysis of the DISC1 and 11q14.3 regions to assess the presence of any common variations (SNPs) that may explain, in part, the heritability of these disorders as well as to test genetic interaction between DISC1 and some of its interacting partners. Second, to assess the molec- ular function of the DISC1 protein in an attempt to identify a potential functional mechanism leading to the development of major psychiatric illness. A broad summary of this thesis can be seen in Figure 6.1.

6.1.1 The Discovery of DISC1

The Disrupted-in-Schizophrenia locus was discovered as a result of a (1;11)(q42.1;q14.3) translocation in a Scottish family, segregating with psychiatric illness. This identification

263 6. SUMMARY AND FUTURE DIRECTIONS

Figure 6.1: Summary of Thesis. The flow chart shows the two main subdivisions of the analyses conducted in this thesis (Function and Association) with a breakdown of the research conducted and their results. Broad date ranges for various components are shown in red. of the gene and its reported association was described in Chapter 1; however, in light of the findings of this study and of others finding no evidence for DISC1 in major psychiatric illness, it seems prudent to re-evaluate the evidence here.

The translocation between chromosome one and what was originally described as a C group chromosome was first described by Jacobs et al. (1970) as part of a cytogenetic study con- ducted in boys from a borstal in Scotland. The original description of the pedigree contains a total of 231 individuals (including partners) of which cytogenetic analysis was performed on 140. The family harbored three cytogenetic abnormalities, the (1;11)(q42.1;q14.3) translo- cation (found in 30 individuals), a Robertsonian translocation between two D group chro- mosomes (19 individuals) and a large ‘constriction’ of chromosome one (10 individuals). No phenotype information was presented as a part of this original study, in which the proband (who carried both the (1;11)(q42.1;q14.3) translocation and the chromosome one constriction) was described as a ‘physically normal 18-year old boy’.

264 6.1 Summary and Discussion of Main Findings

The portion of the original pedigree that segregated the (1;11)(q42.1;q14.3) translocation was later analysed for linkage with psychiatric illness by St Clair et al. (1990). This linkage study included a total of 77 individuals: 34 with the (1;11) translocation and at least six with the Robertsonian translocation. The third cytogenetic abnormality is not mentioned in this later study (four of the individuals included in the study were reported to carry the chromosome one constriction by Jacobs et al. (1970)). An additional four individuals were found to carry the translocation by the time of this study, presumably these were individuals that were not karyotyped originally though there is no mention of any additional cytogenetic analysis. Of note is that one individual designated a carrier of the (1;11) translocation was shown in the original pedigree to carry only the chromosome one constriction. Information regard- ing the psychiatric phenotypes of these individuals was obtained through medical records and 30 of the individuals were diagnosed by a karyotype blinded interview with a psychia- trist. The remaining individuals were diagnosed based solely on medical records (19 of these were deceased). A total of 28 family members were diagnosed with one of schizophrenia, schizoaffective disorder, major depression, generalised anxiety disorder, minor depressive disorder, alcoholism or adolescent conduct disorder. Five models for linkage across varying levels of penetrance were tested in this study with a maximum LOD score of 3.3 in major psychiatric illness (including schizophrenia, major depression and schizoaffective disorder). The maximum LOD score obtained by the entire study was 4.3 when adolescent conduct disorder was added to the major psychiatric illness analysis. There was no account made for multiple testing of the 34 tests conducted as part of this study.

A second follow up study was conducted ten years later by largely the same group (Black- wood et al., 2001). This second study conducted linkage analysis on 67 members of the family, of which 29 carried the translocation and 21 of those were diagnosed with either schizophrenia, bipolar disorder, recurrent major depression, minor depression or adoles- cent conduct disorder. Diagnosis was again made in a blinded fashion by interview with a psychiatrist according to the DSM-IV criteria, or by medical records where the patient was deceased. Five of the individuals that did not carry the translocation were diagnosed with minor depression, alcoholism or adolescent conduct disorder. The study conducted a series of 16 linkage analyses with various combinations of phenotype and penetrance. The analysis

265 6. SUMMARY AND FUTURE DIRECTIONS produced a maximum LOD score of 7.1 with schizophrenia, bipolar disorder and recurrent major depression. Again no account was made for multiple testing.

Evidence for significant LOD scores and an increase in LOD score for major psychiatric ill- ness with the addition of new individuals is promising for the involvement of DISC1 with major psychiatric illness in this family. These results were the driving force behind the DISC1 research area. Additional analysis of association in other small cohorts of unrelated individ- uals also provided some suggestion of association (as was described in Chapter 1). However, a number of comments must be made about these data, particularly of the two original link- age analyses.

It is unclear through the progression detailed above, why the (1;11)(q42.1;q14.3) translocation was chosen for further investigation rather than the other two abnormalities identified—as a result of this the majority of the original family was excluded from all analyses. It is also of concern that the diagnoses of at least some of the patients changed between the two linkage studies and that the second follow up study contained less individuals than the first (29 carriers compared to the original 34) despite stating that nine new cases had been diagnosed during the previous 10 years. The ‘karyotype blinded’ diagnoses of the patients seem to have been made on more than one occasion by the same psychiatrists (who were also authors in the papers) which raises questions about just how ‘blind’ the second set of diagnoses were. The increase in LOD score from 3.3 to 7.1 for major psychiatric illness was promising though it should be noted that different models of penetrance and gene frequency were used to obtain these scores. If the maximum LOD score for the family was originally identified to be 4.3 with the inclusion of the major psychiatric illnesses and adolescent conduct disorder why was adolescent conduct disorder not included in any of the models tested in the second round of linkage analysis?

Additionally all supporting evidence for association of DISC1 with major psychiatric illness comes from small underpowered studies and is largely inconsistent. In light of this evidence and in light of additional questions posed by Sullivan (2013), it seems that a re-evaluation of the involvement of DISC1 with these disorders is timely. A further 13 years has passed since the last assessment of this family, surely there are additional individuals that can now be added to the linkage analysis or perhaps it is time to sequence the exomes or genomes of

266 6.1 Summary and Discussion of Main Findings some of this family to make sure there is not another explanation for the heritability of these disorders in the family.

6.1.1.1 DISC2

The focus of research into the Scottish translocation family is on the DISC1 gene. However, there is a second transcript identified within the breakpoint region known as DISC2. DISC2 is thought to be a non-coding RNA running antisense to DISC1 in the exon 9 region with a predicted 50 end located within DISC1 intron 9 (Millar et al., 2000). There is some suggestion that it may have a role in regulating the expression of the DISC1 protein by a number of different mechanisms but there is not much known about the specifics of this (Chubb et al., 2008; Millar et al., 2004). Due to its proximity to the translocation breakpoint, and its possible role in the regulation of DISC1, it in itself is a putative candidate in psychiatric illness.

6.1.2 Association Analysis

The literature surrounding the genetic association of DISC1 with major psychiatric illness, including bipolar disorder and schizophrenia, is extensive as outlined in Chapters 1 and 3 of this thesis. As the volume of research in this area has rapidly grown over the last 20 years it has slowly become apparent that there are numerous inconsistencies in the findings between publications, with very few of the identified associations being replicable in alternate or dis- tinct datasets. The majority of these findings have been limited to relatively small cohorts of individuals, including that of the association of rs11122324 with bipolar disorder in the SIBS cohort.

Given the lack of evidence for a SNP or group of SNPs in DISC1 being consistently associated with major psychiatric illness across several of these small scale studies, it was the first aim of this research to assess the association of DISC1 in a larger dataset. This analysis was to ini- tially focus on the rs11122324 association but was then to be extended to include all available (genotyped) SNPs in the DISC1 gene region and those in the 11q14.3 region also.

267 6. SUMMARY AND FUTURE DIRECTIONS

6.1.2.1 rs11122324

The SNP found to be associated with bipolar disorder in the SIBS cohort (rs11122324) was not found to be associated with either bipolar disorder or schizophrenia in the data analysed here. The closest this SNP came to a significant association was in the combined GAIN and non-GAIN schizophrenia cohort where it reached a p-value of 0.057. The trend for higher association with females identified in the SIBS cohort was not at all evident in the cohorts of this study, in fact, in all but the GAIN bipolar disorder data, the results for females only were further from significance than in the combined gender analyses.

In light of this result, along with the predicted thermodynamics of the affinity between this SNP and the putative miRNA, miR-575, pursuit of an hypothesis into functional interference of an alternately expressed ES (extra-short) isoform was discontinued.

6.1.2.2 DISC1 Region

Analysis of the DISC1 region as a whole revealed no significant associations. There were, however, a group of three SNPs (rs11122330, rs11122331 and rs1538979) that trended towards corrected significance, which provided some interesting findings when reviewed in light of the literature. It must be emphasised, however, that none of these SNPs actually reached corrected significance in the datasets analysed here. Two of the three SNPs (rs11122330 and rs1538979) had previously been reported as significant in association with major psychiatric illness as was described in Chapter 3.

The most interesting result from this section of work was with rs1538979, which, when added to the published literature, strengthened a protective association with schizophrenia and revealed a protective effect when the combined bipolar disorder and schizophrenia cohort was added to the available data from the literature.

The lack of significant association with SNPs in the DISC1 gene and major psychiatric illness is a result mirrored by the findings of a meta-analysis conducted after that of this study but prior to the completion of this thesis. The study conducted by Mathieson et al. (2012) is the most comprehensive meta-analysis of DISC1 and schizophrenia to date, with over

268 6.1 Summary and Discussion of Main Findings

11000 cases and 15000 controls it was more than four times larger than the study conducted here (3148 cases and 2725 controls) and actually includes the data used here. Additionally, an even larger (21000 cases and 38000 controls) GWAS of schizophrenia fails to find any support for DISC1, even though it did identify 22 regions that met genome wide significance for association with schizophrenia (Ripke et al., 2013).

Do the results of this study, and that of Mathieson et al. (2012) and Ripke et al. (2013), then infer that DISC1 is not associated with major psychiatric illness? Perhaps, but it is also pos- sible the genetic component of these disorders is too heterogenous to detect in this manner. It can now be stated that it is unlikely that there are any common SNPs in DISC1 that are contributing to major psychiatric illness, a remarkable statement given the number of small studies undertaken over the last 24 years that have argued differently (St Clair et al., 1990; Gejman et al., 1993; Detera-Wadleigh et al., 1999; Blackwood et al., 2001; Ekelund et al., 2001; Curtis et al., 2003; Hennah et al., 2003; Hwu et al., 2003; Ekelund et al., 2004; Hodgkinson et al., 2004; Macgregor et al., 2004; Burdick et al., 2005; Callicott et al., 2005; Hamshere et al., 2005; Thomson et al., 2005; Hashimoto et al., 2006; Liu et al., 2006; Maeda et al., 2006; Chen et al., 2007; Palo et al., 2007; Qu et al., 2007; Wood et al., 2007; Kilpinen et al., 2008; Hennah & Porteous, 2009). This, however, does not exclude DISC1 as a candidate in major psychiatric illness.

The translocation found in the Scottish family is a rare, highly penetrant variant, so rare in fact that it is most likely unique to that single family. For this reason, other rare variants, cannot be ruled out as having some influence in the development of bipolar disorder and schizophrenia. Though these rare variants in DISC1 may exist and contribute to the risk of major psychiatric illness, it is likely that any such variants are of minor effect (see Figure 1.1 in Chapter 1). Rare variants of major effect, such as the (1;11)(q42.1;q14.3) translocation are certainly not a common occurrence. Given the incidence of major psychiatric illness in the general population and the interest in genetic contributions to these disorders it is possible that the (1;11)(q42.1;q14.3) translocation is the only rare variant of high penetrance in major psychiatric illness. If such rare variants were a common cause of these disorders surely more familial clusters should have been identified and investigated. Further if these rare, highly penetrant variants do exist they will very likely only be relevant to individual families, which

269 6. SUMMARY AND FUTURE DIRECTIONS means that they will not explain a major contribution to these disorders at a population level (Need et al., 2012).

The search for rare variants of moderate to minor effect is difficult even with the advent of cheaper and higher throughput deep-sequencing of individuals. The tested populations of controls required to gain meaningful results from an association analysis of low penetrance rare variants, need to be extensive in order to discriminate the many rare variations that do not contribute to disease from those that do.

There is increasing evidence as datasets grow in size that there are far more rare variations in the genome than was ever thought. One study that evaluated rare variation in almost 2500 individuals by deep sequencing found that 86% of the variants identified were rare (minor allele frequency less than 0.5%) (Tennessen et al., 2012), while a second larger study (14000 individuals) reports that 95% of the variation identified was rare across 202 genes (Nelson et al., 2012). Many of these have almost certainly been wrongly considered as disease caus- ing (or loss of function alleles) in the past (Ng et al., 2008; Pelak et al., 2010). The ability to differentiate between rare variants that truly play a role in a disease state and those that are completely benign in spite of their potential deleterious appearance, will be a major consid- eration for studies of this nature (MacArthur et al., 2012).

Due to these limitations rare variants of moderate to minor effect within DISC1 cannot be excluded as contributing to major psychiatric illness; however, the same is true of any gene in any disorder. If this is the case, is the time and expense of pursuing specific rare variants smaller effect size in DISC1 worthwhile? This is of particular interest given the questions that are now being raised about the initial evidence suggesting DISC1 as a candidate. Certainly there is some interesting biology that suggests DISC1 is a reasonable candidate in psychiatric illness (as was discussed in Chapter 1) and perhaps this alone is enough to continue research into these rare variants. Taken together, if it is only rare variants of moderate to minor effect that are now being considered as contributory in DISC1, is the biology alone enough to make DISC1 a better candidate in psychiatric illness than any other gene, other that the fact that it is called Disrupted in Schizophrenia?

270 6.1 Summary and Discussion of Main Findings

6.1.2.3 11q14.3 Region

The results of the chromosome 11 region analysis were interesting, especially given the lack of any previous evidence for association with psychiatric illness beyond the translocation in the original Scottish family. Several SNPs reached corrected significance levels in the non- GAIN schizophrenia cohort across both males and females. Four of these SNPs achieved an increase in significance when the data was combined with the GAIN schizophrenia dataset (Debono et al., 2012).

These results remain to be confirmed in any additional studies and should be treated just as cautiously as the associations found with any of the published DISC1 region SNPs. This result was of particular interest, however, as it suggests that the chromosome 11 region of this translocation may well be just as involved in major psychiatric illness as the DISC1 region. For this reason it must be considered equally as a candidate region in the development of bipolar disorder and schizophrenia as its DISC1 counterpart.

All conclusions drawn with regard to the DISC1 region, both here and in the existing liter- ature, including those related to the inability to exclude rare variants of moderate to minor effect, also hold true for the 11q14.3 region.

6.1.3 Biological Analysis

Regardless of the relevance of DISC1 to major psychiatric illness, the molecular function of the DISC1 protein is still of interest. To this end an investigation into the putative function of the DISC1 protein was conducted as the second major component of this thesis.

6.1.3.1 Proteins Interacting with DISC1

The function of the DISC1 protein is a subject that has been evaluated in the past (as was de- scribed in Chapter 1) and remains an ongoing avenue of interest in the wider world of DISC1 research. Due to this ongoing research, information for protein interaction partners of DISC1 from three published yeast two-hybrid screens (Morris et al., 2003; Millar et al., 2003; Camargo et al., 2007) are available. An additional unpublished dataset, also from a yeast two-hybrid

271 6. SUMMARY AND FUTURE DIRECTIONS analysis, was available from the Functional Genomics and Structural Biology Wellcome Trust Advanced Course of 2009. Evaluation of the published and unpublished data revealed an incomplete overlap in the proteins identified to interact with DISC1 suggesting that the list was not exhaustive. For this reason the screen of the WTAC was repeated, both by the 2010 course and, in parallel, as a part of this study.

When the results of the three unpublished analyses (that is the two WTAC screens and the screen undertaken as a part of this research) were compiled a total of 824 potential interac- tions were identified that were not identified as part of the Morris et al. (2003), Millar et al. (2003) or Camargo et al. (2007) studies. This is a large number of proteins to have not yet been reported, a result that may be due to the published analyses not being exhaustive or it may imply that DISC1 generates a lot of false positive results when used as a bait in yeast two-hybrid screens. Although the 824 potential interactions reported above do include the single hit protein interactions (which are more likely to be false positives as was discussed in Chapter 4, Section 4.3.1.4) there are a number of identified interactions that show multiple independent hits (suggesting that they are not library artifacts) that may not be biologically meaningful and as such are false positives of this screen. DISC1 is perhaps unusual in its large potential ‘’, a result that may reflect its predicted status as a scaffold pro- tein or may be a result of the tendency of DISC1 to form aggregates.

Given the large number of interactions identified, subsequent analysis focused on a subset of proteins chosen based on an hypothesis of primary cilium construction and maintenance. These selected proteins were subjected to confirmation analysis in human cells by expression of epitope tagged exogenous protein. Unfortunately, confirmation of these interactions was not achieved due to issues regarding the ability to express the DISC1 protein in a way that it could be detected by western blot or assessed for localisation within cells.

Though the use of over-expressed exogenous protein is a reasonable first step in the con- firmation of yeast two-hybrid interactions, ultimately the interactions need to be tested for temporal and spatial expression to determine if the interaction is biologically meaningful. Many of the ‘confirmed’ interactions in the published literature, are confirmed only by this exogenous expression method, which means that the functional interaction validation for many DISC1 interaction partners may need to be re-evaluated for biological meaning.

272 6.1 Summary and Discussion of Main Findings

6.1.3.2 DISC1 Expression

A number of groups have reported difficulties in expressing DISC1 due to reports of cy- totoxicity with exogenous expression (Marley & von Zastrow, 2010) and reports of protein aggregation (Morris et al., 2003; Brandon et al., 2004; Kamiya et al., 2005; Leliveld et al., 2008, 2009; Ottis et al., 2011), with the difficulty being described as a “major stumbling block in the DISC1 field” (Soares et al., 2011).

The issue in this study was not in the isolation and expression of DISC1 per se but the con- sistent isolation and expression of DISC1. This lack of consistency meant that any results obtained were not able to be replicated to a sufficient degree to draw any real conclusions. It was eventually deemed that aggregation of the DISC1 upon exogenous expression in cells was a likely reason for any lack of detection. Aggregation of DISC1 is a known issue as was discussed in Chapter 4. This issue of DISC1 aggregation is a complex one—a large number of self associating regions and different quaternary protein structures having been identified—a field that has been concisely reviewed by Soares et al. (2011).

Aggregation was not immediately considered due to the sporadic results showing DISC1 expression. Due to the lower transfection efficiency of DISC1, the cytotoxicity present and the relatively large size of the DISC1 protein, a lot of time was spent trying to increase DISC1 expression to ‘detectable’ levels. In hindsight this may have been counterproductive. Had aggregation been considered the main issue earlier, the outcome of this work may not have been any different given the available resources. From brief experimentation in lowering the levels of exogenous DISC1 introduced into the cells it became apparent that the protocol being used was not optimal for expressing proteins at low levels—if indeed there was a critical level below which the DISC1 protein would not aggregate.

6.1.4 Genetic Analysis of DISC1 Interaction Partners

As well as being used to assess the unknown biological functions of DISC1, the interaction partners identified, both in this screen and in the published literature, were tested for asso- ciation and interaction with DISC1 in the context of major psychiatric illness. Attempts to

273 6. SUMMARY AND FUTURE DIRECTIONS replicate two SNP x SNP interactions, of confirmed DISC1 interaction partners NEDL1 and FEZ1, found in the literature did not show evidence for the associations.

6.1.4.1 Association of DISC1 Interactants

Results from the confirmed published interacting partners (those proteins identified by a published yeast two-hybrid screen and confirmed by a second independent test in the liter- ature) and DISC1 revealed some inflation in the observed p-values when compared with the expected (l < 1.1), with a maximum inflation factor of 1.34 being identified for males in the GAIN schizophrenia cohort. The addition of genes previously published and subsequently confirmed by isolation from the yeast two-hybrid screen conducted here, did not add to the inflation effect.

Inflation was also seen when a subset of proposed primary cilia related genes (identified from the yeast two-hybrid screen conducted as part of this study) were analysed in the same way. With this subset of genes the inflation was most evident in the GAIN bipolar disorder cohort with males reaching an inflation value of 1.33. Though this was the analysis that showed the highest inflation factor, the combined schizophrenia and combined schizophrenia and bipolar disorder cohorts showed the highest mean c2 values of any analysis conducted and visual inspection of the Q-Q plots suggested evidence for inflation also. The inadequacy of the inflation factor as a measure of inflation in this dataset (due to the unusual spread of data resulting in a relatively low median c2) made it difficult to evaluate whether the levels of inflation seen in these cohorts was higher than that seen elsewhere or not. Further investigation found that this inflation seen in the males of the GAIN bipolar disorder cohort was contributed to by a group of three significant SNPs in ATP6V1B2. Two further SNPs in USO1 showed a protective effect in the combined schizophrenia cohort.

Functionally, the gene sets used in this analysis appear to have enrichment that could be indicative of an involvement in primary cilia construction. While this was expected for the prey gene set (as the genes included were chosen on this basis), it was interesting that the published interactant set also showed enrichment for functions and processes such as pro- tein and molecule binding, cell projections and microtubule organising—all roles that are

274 6.1 Summary and Discussion of Main Findings important in primary cilia construction. There are of course other pathways which require such processes and the enrichment analysis may equally be a representation of these. It is difficult to assess enrichment analyses such as these in an unbiased manner, especially when a particular hypothesis has already been formed. It must be considered therefore, that en- richment analysis, though a useful tool, must always be interpreted together molecular and functional analyses of a gene and its protein product as on its own it is often far too broad to draw any real conclusions as to the true functions of a gene or gene set.

6.1.4.2 Association of Epistatic Interactions

Interaction at a genetic level was assessed between three DISC1 SNPs and selected SNPs from both the published interactants and the subset of preys chosen from the yeast two-hybrid study of this research project. Although only a relatively small number of SNPs were tested here (for reasons of time and power), there was some evidence that the SNPs in the genes SYNE1, TRAF3IP1 and FEZ2 interact with DISC1 at a genetic level. It must be emphasised that these interactions did not reach corrected significance in these analyses, but the trend of increasing significance seen (particularly for SYNE1 and FEZ2) with increasing dataset size is interesting and may be worth pursuing further in larger datasets.

The power to detect associations of this kind in a study of this size is low, larger studies may be able to screen the interactions of SNPs with lower minor allele frequencies than were included here. This may be of particular interest if the emerging studies of rare variants in schizophrenia and bipolar disorder reveal any consistent results.

275 6. SUMMARY AND FUTURE DIRECTIONS

6.2 Future Directions

Major psychiatric illnesses such as bipolar disorder and schizophrenia are indeed very com- plex in nature. Although they have high heritability, any genetic studies of these disorders, though important contributions to the field, are incomplete representations of the contribut- ing factors. The effect of the environment and its interaction with genetics remains an im- portant consideration of such studies and perhaps one that is more difficult to assess. With the relatively recent advances in knowledge of epigenetics, a third party comes to the table, which may add yet another layer of complexity to these already multifaceted disorders. The biology and molecular function of genetically implicated genes is also important, as this is what assists in the true understanding of how a disease progresses and ultimately provides avenues to pursue for treatment.

6.2.1 Association Analysis

Association analysis has been a powerful tool in candidate gene discovery, but it meets its match in complex diseases such as major psychiatric illness. The technique provides a robust mechanism to assess large numbers of variants in an efficient manner, a situation which is ideal for the identification of common variants that are contributing to disease. When the wheels are turned to rare variants, especially those of low penetrance, it has become clear that studies of this nature need to be vast, including tens if not hundreds of thousands of individuals at least.

Association studies also require that the participants are well phenotyped, something that seems to be achieved with reasonable accuracy and sufficient detail these days. Although, in the quest to increase the size of cohorts many psychiatric diagnoses are being combined (as was the case here, with bipolar disorder and schizophrenia). Should it perhaps be considered that the various forms of bipolar disorder and schizophrenia are actually distinct disorders

276 6.2 Future Directions with distinct genetic profiles and as such should be further separated rather than grouped together?

It is pleasing to see the emergence of negative association studies being published in DISC1 research and it has emphasised the importance of unbiased reporting of results in the liter- ature. The fact that this has only occurred in recent years may be the result of a true bias in the reporting of results in earlier years. It could also be attributable to the relatively recent availability of larger sample sets and that these larger samples are less likely to produce con- vincing false positive results. Either way, a continued effort by researchers and journals to provide an unbiased approach to publication should be made.

Association analysis can also be extended beyond the capacity with which it was applied here, to include analysis of other influences in the outcome of disease such as gene x envi- ronment interaction, copy number and other rare and de novo variation through the use of exome and genome sequencing.

6.2.1.1 Rare Variant Detection

This study was limited to common SNPs (MAF>1%) and as such no investigation of rare variation including single nucleotide variants, structural alterations or copy number vari- ation was undertaken. These rare variants are emerging as a new force in the psychiatric illness field with the individual contribution of each rare variant being higher on average than that of common SNPs.

Copy number variations have been reported to account for approximately 13% of the genome (Stankiewicz & Lupski, 2010) and given their relatively large size they may account for more variation between individuals than do SNPs (Pang et al., 2010). For this reason their role in disease aetiology is of particular interest. Increases and decreases in copy number of genetic transcripts (or regions thereof) have been identified in risk for complex genetic diseases in- cluding schizophrenia and bipolar disorder (reviewed by Giusti-Rodr´ıguez & Sullivan (2013) and Malhotra & Sebat (2012)).

For this reason it is prudent to include capture of copy number variation along with SNPs in the generation of any future datasets collected for further study of these diseases. It is likely

277 6. SUMMARY AND FUTURE DIRECTIONS that these copy number variations fall into the category of rare variants and so will face the same difficulties as the rare single nucleotide variants discussed previously. It is possible, however, that their contribution to genetic load is sizeable and justify continued investigation to the same level as SNP variation. Equally as deep sequencing methods become cheaper and more and more individuals are sequenced, rare variant burden in genes may establish new candidates or provide further validation for current candidates.

6.2.1.2 Gene-Environment Interaction

Given the acceptance that genetics and heredity only account for a percentage of the risk for development of major psychiatric illness (as was described in Chapter 1), it is important that the environmental influences which certainly bridge the gap between genetics and pheno- type are also considered. Although it may be the case that these environmental influences act independently of the increase in risk brought about by genetically inherited variation, it is likely that the two (genes and environment) interact to produce an increased risk profile in certain combinations. In its simplest form this can be described as a ’genetic predisposition’ being triggered by an adverse life event, but in reality is probably not quite as straightfor- ward. The study of environmental influence is more difficult as it relies on accurate and honest recall and dissemination of information, often retrospectively from patients and their families as well as adequate documentation of these events by the interviewer. However, if such information can be accurately obtained for large cohorts of patients for whom there is also genetic data available then the interaction can be tested using the case-control method- ology.

6.2.1.3 Mental Illness and Epigenetics

Though the description and assessment of epigenetic modification in relation to major psy- chiatric illness could form the basis of an entire thesis on its own, it is necessary to mention here very briefly that this is an avenue of research that must be considered in future study of these disorders.

278 6.2 Future Directions

Suggestion has been made that the mechanism by which gene x environment interactions occur is through the modifying effect of epigenetics (Abdolmaleky et al., 2004). If this is the case then it seems that epigenetics plays a role in the development of major psychiatric ill- ness, and indeed there is evidence that exists in support of this claim (reviewed by Mahgoub & Monteggia (2013)). Unfortunately, analysis of the epigenome cannot be retrospectively ap- plied to pre-existing datasets simply due to the fact there is no data to analyse. Methods to evaluate the epigenetic status of genes (or entire genomes) are now available and could be applied to future datasets leading to an ability to further assess the association of epigenetics with schizophrenia and bipolar disorder.

Assessment of association with epigenetics in this way is not quite as simple as with conven- tional genetic association due to the difference in epigenetic modification between cell types within a single individual (Maunakea et al., 2010). This may mean that gaining a complete picture of the epigenetic modifications in patients with schizophrenia is not possible while the patients are still living owing to the need to test specific cell types within the brain.

The diverse process by which epigenetic modification occurs, which include modifications to both DNA and the structure of chromatin within cells, complicates the situation further and it will certainly be a number of years until the epigenetic profiles associated with a particular disease state are fully understood.

6.2.2 Molecular Function of DISC1

The addition of the data generated from the yeast two-hybrid screen undertaken as part of this study increased the knowledge base for the molecular function of DISC1 greatly, as it was the largest yeast two-hybrid screen conducted using DISC1 as a bait (when the WTAC data is included). The addition of these data did not elucidate a clearer function for the protein, but further supported its status as a scaffold protein which is involved in a large number of molecular pathways.

Due to the theme of this thesis being focused on an explanation of major psychiatric illness the data was analysed with a bias towards functions that may explain a role for DISC1 in these disorders. This is admittedly not the most desirable means by which to undertake

279 6. SUMMARY AND FUTURE DIRECTIONS such an analysis but it was deemed appropriate given the scope of the thesis. As such the hypothesis of primary cilia construction and maintenance stood out as an interesting candi- date function to further investigate.

As has been discussed earlier this hypothesis was ultimately not investigated here due to the issues surrounding expression of the full length DISC1 protein. Logically then, the first step that requires undertaking in the ongoing research of this hypothesis, is that of confirming the interactions identified in the yeast two-hybrid screen in human cells. Once these confir- mations have been made assessment of the role of each of the confirmed prey proteins in this pathway can be assessed.

An addition to the field of DISC1 research that would be of great benefit is the creation of a human cell line that stably and effectively expresses tagged DISC1 at levels that are similar to those found endogenously, hopefully overcoming the issues of protein aggregation due to over-expression of transient protein. Such a cell line could be used to test putative DISC1 interactions and to assess co-localisation of interacting proteins in a high throughput manner.

The hypothesis that this network of proteins are involved with the primary cilia was in fact confirmed for DISC1 by two papers published while the research for this thesis was ongoing (Marley & von Zastrow, 2010, 2012). The prey proteins chosen as part of this research, how- ever, have not been confirmed to act in this pathway and could be tested for primary cilium involvement by knock down methods similar to those used in these previous studies.

Of course ongoing research should not be limited to this functional mechanism or those that have been published but should extend to include other putative functions that may or may not have relevance to the brain and therefore major psychiatric illness.

280 6.3 Concluding Comments

6.3 Concluding Comments

This thesis analysed the association of DISC1 and a region of chromosome 11 (11q14.3) with major psychiatric illness by case-control and meta-analysis of three publicly available datasets. Additionally it attempted to gain further insight into the molecular function of DISC1 by identification of proteins shown to interact with DISC1 in a yeast two-hybrid anal- ysis followed by analysis of functional enrichment of the obtained interactant list. Finally, a subset of the interacting partners, as well as those confirmed in the literature, were assessed for genetic interaction with DISC1.

Results for the association analysis revealed some support for the chromosome 11 region and an association with schizophrenia, leading to the conclusion that it should be considered as a candidate region in future studies of the disease. The results for DISC1 failed to find evidence of an association with bipolar disorder, schizophrenia or a combination of the two. This result was consistent with recently emerged results from even larger meta-analyses of the gene. Functional enrichment of the yeast two-hybrid results led to the formation of an hypothesis that DISC1, and a subset of its interaction partners, have a role in the construction and maintenance of the primary cilium. Due to technical difficulties with the expression of the DISC1 protein in human cells this hypothesis could not be confirmed; however, evidence of a role for DISC1 in this process was published by another group during the course of this thesis—it remains to be seen if the subset of preys chosen here are also involved.

The only evidence for DISC1 being genetically involved in major psychiatric illness is the anecdotal evidence of the original translocation from which it gained its name. Low pow- ered studies and marginally significant findings that are ‘interesting’ due to the hyperbole surrounding DISC1 have undoubtedly contributed to a publication bias in this field. These small studies, provide an appearance of support for DISC1 perhaps simply because they were trying so hard to find it, whereas large studies that went about the analysis with per- haps a more neutral viewpoint show no support.

281 6. SUMMARY AND FUTURE DIRECTIONS

Given the large number of interacting partners there are obviously a number of functions for this protein, some of which have been investigated due to their interesting involvement in the brain. Again, has a bias formed? Is the fact that this protein has proposed roles in the development and function of the brain enough to supersede the apparent lack of genetic evidence and should this alone mean that DISC1 remains a candidate in major psychiatric illness? Alternatively, should functional analysis of the protein take on an unbiased and systematic approach of evaluating all possible functions regardless of whether or not that function can be interpreted to ‘cause’ mental illness?

The findings of this thesis cannot exclude DISC1 as a candidate in major psychiatric illness, though they do highlight the need for a concerted effort by the research community of DISC1 to publish or make available in some way all results related to the analysis of this gene and its protein product. This would remove any bias that may be present in the literature currently and would help to elucidate the true functions and disease associations for DISC1.

Given that the DISC1 locus has been studied so extensively and that the anecdotal evidence from the original Scottish family remains the only link between the locus and major psychi- atric illness, perhaps it is time that the candidature of Disrupted in Schizophrenia in major psychiatric illness is truly re-evaluated.

282 Appendices

Due to the nature of the data involved and the length of the appendices for this thesis, they are included in digital form. Descriptions of each appendix are found in this section along with the file names to be found on the included disk (inside back cover).

283 APPENDICES

284 Appendix A

List of Genotyped SNPs

Excel workbooks containing the lists of SNPs that were genotyped on the Affymetrix 6.0 chips can be found in Appendix A for both the DISC1 and 11q14.3 regions.

Included Files:

DISC1 Genotyped SNPs • Ch11 Genotyped SNPs •

285 APPENDICES

286 Appendix B

Association Analysis Raw Results

This appendix includes all of the raw data generated in the association analyses of both the DISC1 region and the chromosome 11q14.3 region. It also contains genotype call scatter plots for SNPs of interest from both sets of analyses.

Included Files:

DISC1 Region Raw Results • DISC1 Genotype Call Plots • – GAIN BP rs1538979 – GAIN SCZ rs1538979 – non-GAIN SCZ rs1538979 – GAIN BP rs11122330 – GAIN SCZ rs11122330 – non-GAIN SCZ rs11122330 – GAIN BP rs11122331 – GAIN SCZ rs11122331 – non-GAIN SCZ rs11122331

Ch11 Region Raw Results • Ch11 Genotype Call Plots •

287 APPENDICES

– GAIN SCZ rs2509382 – non-GAIN SCZ rs2509382 – GAIN SCZ rs11019229 – non-GAIN SCZ rs11019229 – GAIN SCZ rs35003084 – non-GAIN SCZ rs35003084 – GAIN SCZ rs12787172 – non-GAIN SCZ rs12787172 – GAIN SCZ rs1404531 – non-GAIN SCZ rs1404531 – non-GAIN SCZ rs7124944 – non-GAIN SCZ rs3018365 – non-GAIN SCZ rs1894134 – non-GAIN SCZ rs7944181 – non-GAIN SCZ rs7931883 – non-GAIN SCZ rs12420663

288 Appendix C

Yeast Two-Hybrid Screen Raw Results

This appendix includes the raw results files generated from the yeast two-hybrid screen con- ducted in this study (i.e. not including the Wellcome Trust Advanced Courses analyses. Additionally there is a summary file detailing the lists of preys identified as a part of this study as well as from the screens conducted by the Wellcome Trust Advanced Courses of 2009 and 2010.

Included Files:

Yeast Two-Hybrid Raw Results • Summary Yeast Two-Hybrid Results •

289 APPENDICES

290 Appendix D

List of Primers

Appendix D contains details all primers used in this study.

Included Files:

Primers •

291 APPENDICES

292 Appendix E

Interaction Analysis Additional Results

This appendix contains a large number of files relevant to Chapter 5. Included are raw re- sults of analyses as well as results that are not shown in the chapter itself in the interest of space and fluency. Within this appendix file, the results are further subdivided into sectional folders, using the same section headings as were used in Chapter 5.

Included Files:

Validation of Published Epistatic Relationships • – FEZ1 (rs12224788) x DISC1 (Ser704Cys) – NDEL1 (rs1391768 x DISC1 (Ser704Cys)

Assessment of the Confirmed Interacting Gene Set • – Confirmed Published Interactant Set Analysis – Confirmed Published and Screen Interactant Set Analysis

Assessment of Epistasis - Confirmed Gene Set • – GAIN Schizophrenia – non-GAIN Schizophrenia – Combined Schizophrenia – GAIN Bipolar Disorder – Combined Bipolar Disorder and Schizophrenia

293 APPENDICES

Assessment of Prey Subset • – Prey Subset Interactant Set Analysis

Assessment of Epistasis - Prey Subset Genes • – GAIN Schizophrenia – non-GAIN Schizophrenia – Combined Schizophrenia – GAIN Bipolar Disorder – Combined Bipolar Disorder and Schizophrenia

294 References

ABDOLMALEKY,H.M.,SMITH,C.L.,FARAONE,S.V.,SHAFA,R.,STONE,W.,GLATT,S.J.

&TSUANG, M.T. (2004). Methylomics in psychiatry: Modulation of gene-environment interactions may be through DNA methylation. Am J Med Genet B Neuropsychiatr Genet, 127B, 51–9.

ABECASIS,G.R.,CHERNY,S.S.,COOKSON,W.O.&CARDON, L.R. (2002). Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet, 30, 97–101.

ABELSON,J.F.,KWAN,K.Y.,O’ROAK,B.J.,BAEK,D.Y.,STILLMAN,A.A.,MORGAN,

T.M., MATHEWS,C.A.,PAULS,D.L.,RASIN,M.R.,GUNEL,M.,DAVIS,N.R.,ERCAN-

SENCICEK,A.G.,GUEZ,D.H.,SPERTUS,J.A.,LECKMAN,J.F.,DURE,L.S.T., KURLAN,

R., SINGER,H.S.,GILBERT,D.L.,FARHI,A.,LOUVI,A.,LIFTON,R.P.,SESTAN,N.&

STATE, M.W. (2005). Sequence variants in SLITRK1 are associated with Tourette’s syn- drome. Science, 310, 317–20.

ALBERTS,B.,JOHNSON,A.&LEWIS, J. (2002). Molecular Biology of the Cell. Garland Science, New York, 4th edn.

ALBORGHETTI,M.R.,FURLAN,A.S.,SILVA,J.C.,PAES LEME,A.F.,TORRIANI,I.C.L.&

KOBARG, J. (2010). Human FEZ1 protein forms a disulfide bond mediated dimer: impli- cations for cargo transport. J Proteome Res, 9, 4595–603.

AMERICAN PSYCHIATRIC ASSOCIATION (2013). Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association,, Washington, DC, fifth edition edn.

295 REFERENCES

ANDERSEN,J.S.,WILKINSON,C.J.,MAYOR,T.,MORTENSEN,P.,NIGG,E.A.&MANN,M. (2003). Proteomic characterization of the human centrosome by protein correlation profil- ing. Nature, 426, 570–4.

AULCHENKO,Y.S.,DE KONING,D.J.&HALEY, C. (2007). Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree- based quantitative trait loci association analysis. Genetics, 177, 577–85.

AUSTIN,C.P.,KY,B.,MA,L.,MORRIS,J.A.&SHUGHRUE, P.J. (2004). Expression of Disrupted-in-schizophrenia-1, a schizophrenia-associated gene, is prominent in the mouse hippocampus throughout brain development. Neuroscience, 124, 3–10.

AWADALLA,P.,GAUTHIER,J.,MYERS,R.A.,CASALS,F.,HAMDAN,F.F.,GRIFFING,A.R.,

COTˆ E´,M.,HENRION,E.,SPIEGELMAN,D.,TARABEUX,J.,PITON,A.,YANG,Y.,BOYKO,

A., BUSTAMANTE,C.,XIONG,L.,RAPOPORT,J.L.,ADDINGTON,A.M.,DELISI,J.L.E.,

KREBS,M.O.,JOOBER,R.,MILLET,B.,FOMBONNE,E.,MOTTRON,L.,ZILVERSMIT,M.,

KEEBLER,J.,DAOUD,H.,MARINEAU,C.,ROY-GAGNON,M.H.,DUBE´,M.P.,EYRE-

WALKER,A.,DRAPEAU,P.,STONE,E.A.,LAFRENIERE` ,R.G.&ROULEAU, G.A. (2010). Direct measure of the de novo mutation rate in and schizophrenia cohorts. Am J Hum Genet, 87, 316–24.

AYALEW,M.,LE-NICULESCU,H.,LEVEY,D.,JAIN,N.,CHANGALA,B.,PATE, L.S.,

WINIGER,E.,BREIER,A.,SHEKHAR,A.,AMDUR,R.,KOLLER,D.,NURNBERGER,J.,

CORVIN,A.,GEYER,M.,TSUANG,M.,SALOMON,D.,SCHORK,N.,FANOUS,A.,

O’DONOVAN,M.&NICULESCU, A. (2012). Convergent functional genomics of schizo- phrenia: from comprehensive understanding to genetic risk prediction. Mol Psychiatry, 17, 887–905.

BATTAGLIA,M.&BELLODI, L. (1996). Familial risks and reproductive fitness in schizo- phrenia. Schizophr Bull, 22, 191–5.

BAUER,M.,UNUTZER,J.,PINCUS,H.A.&LAWSON, W.B. (2002). Bipolar disorder. Ment Health Serv Res, 4, 225–229.

296 REFERENCES

BENJAMINI,Y.&HOCHBERG, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Atatist. Soc. B, 57, 289–300.

BERBARI,N.F.,KIN,N.W.,SHARMA,N.,MICHAUD,E.J.,KESTERSON,R.A.&YODER, B.K. (2011). Mutations in Traf3ip1 reveal defects in ciliogenesis, embryonic development, and altered cell size regulation. Dev Biol, 360, 66–76.

BLACKWOOD,D.H.,FORDYCE,A.,WALKER,M.T.,ST CLAIR,D.M.,PORTEOUS,D.J.&

MUIR, W.J. (2001). Schizophrenia and affective disorders–cosegregation with a transloca- tion at chromosome 1q42 that directly disrupts brain-expressed genes: clinical and p300 findings in a family. Am J Hum Genet, 69, 428–33.

BODMER,W.&BONILLA, C. (2008). Common and rare variants in multifactorial suscepti- bility to common diseases. Nat Genet, 40, 695–701.

BRADSHAW,N.J.,OGAWA,F.,ANTOLIN-FONTES,B.,CHUBB,J.E.,CARLYLE,B.C.,

CHRISTIE,S.,CLAESSENS,A.,PORTEOUS,D.J.&MILLAR, J.K. (2008). DISC1, PDE4B, and NDE1 at the centrosome and synapse. Biochem Biophys Res Commun, 377, 1091–6.

BRANDON, N.J. (2007). Dissecting DISC1 function through protein-protein interactions. Biochem Soc Trans, 35, 1283–6.

BRANDON,N.J.,HANDFORD,E.J.,SCHUROV,I.,RAIN,J.C.,PELLING,M.,DURAN-

JIMENIZ,B.,CAMARGO,L.M.,OLIVER,K.R.,BEHER,D.,SHEARMAN,M.S.&WHIT-

ING, P.J. (2004). Disrupted in schizophrenia 1 and Nudel form a neurodevelopmentally regulated protein complex: implications for schizophrenia and other major neurological disorders. Mol Cell Neurosci, 25, 42–55.

BRANDON,N.J.,MILLAR,J.K.,KORTH,C.,SIVE,H.,SINGH,K.K.&SAWA, A. (2009). Understanding the role of DISC1 in psychiatric disease and during normal development. J Neurosci, 29, 12768–75.

BRENT,R.&FINLEY,R.L.,JR (1997). Understanding gene and allele function with two- hybrid methods. Annu Rev Genet, 31, 663–704.

297 REFERENCES

BURDICK,K.E.,HODGKINSON,C.A.,SZESZKO,P.R.,LENCZ,T.,EKHOLM,J.M.,KANE,

J.M., GOLDMAN,D.&MALHOTRA, A.K. (2005). DISC1 and neurocognitive function in schizophrenia. Neuroreport, 16, 1399–402.

BURDICK,K.E.,KAMIYA,A.,HODGKINSON,C.A.,LENCZ,T.,DEROSSE,P.,ISHIZUKA,K.,

ELASHVILI,S.,ARAI,H.,GOLDMAN,D.,SAWA,A.&MALHOTRA, A.K. (2008). Eluci- dating the relationship between DISC1, NDEL1 and NDE1 and the risk for schizophrenia: evidence of epistasis and competitive binding. Hum Mol Genet., 17, 2462–73.

CALLICOTT,J.H.,STRAUB,R.E.,PEZAWAS,L.,EGAN,M.F.,MATTAY,V.S.,HARIRI,A.R.,

VERCHINSKI,B.A.,MEYER-LINDENBERG,A.,BALKISSOON,R.,KOLACHANA,B.,GOLD-

BERG,T.E.&WEINBERGER, D.R. (2005). Variation in DISC1 affects hippocampal structure and function and increases risk for schizophrenia. Proc Natl Acad Sci U S A, 102, 8627–32.

CAMARGO,L.M.,COLLURA,V.,RAIN,J.C.,MIZUGUCHI,K.,HERMJAKOB,H.,KERRIEN,

S., BONNERT,T.P.,WHITING,P.J.&BRANDON, N.J. (2007). Disrupted in schizophrenia 1 interactome: evidence for the close connectivity of risk genes and a potential synaptic basis for schizophrenia. Mol Psychiatry, 12, 74–86.

CAPLEN,N.,PARRISH,S.,IMANI,F.,FIRE,A.&MORGAN, R. (2001). Specific inhibition of gene expression by small double-stranded RNAs in invertebrate and vertebrate systems. Proc Natl Acad Sci U S A, 98, 9742–7.

CARLSON,C.S.,EBERLE,M.A.,RIEDER,M.J.,YI,Q.,KRUGLYAK,L.&NICKERSON,D.A. (2004). Selecting a maximally informative set of single-nucleotide polymorphisms for as- sociation analyses using linkage disequilibrium. Am J Hum Genet, 74, 106–20.

CARVAJAL-CARMONA,L.,CAZIER,J.,JONES,A.,HOWARTH,K.,BRODERICK,P.,

PITTMAN,A.,DOBBINS,S.,TENESA,A.,FARRINGTON,S.,PRENDERGAST,J.,THEODOR-

ATOU,E.,BARNETSON,R.,CONTI,D.,NEWCOMB,P.,HOPPER,J.,JENKINS,M.,

GALLINGER,S.,DUGGAN,D.,CAMPBELL,H.,KERR,D.,CASEY,G.,HOULSTON,R.,

DUNLOP,M.&TOMLINSON, I. (2011). Fine-mapping of colorectal cancer susceptibility loci at 8q23.3, 16q22.1 and 19q13.11: refinement of association signals and use of in silico analysis to suggest functional variation and unexpected candidate target genes. Hum Mol Genet., 20, 2879–88.

298 REFERENCES

CHANG,T.C.&MENDELL, J.T. (2007). microRNAs in vertebrate physiology and human disease. Annu Rev Genomics Hum Genet, 8, 215–39.

CHEN,J.,BARDES,E.E.,ARONOW,B.J.&JEGGA, A.G. (2009). ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res, 37, W305–11.

CHEN,Q.Y.,CHEN,Q.,FENG,G.Y.,LINDPAINTNER,K.,WANG,L.J.,CHEN,Z.X.,

GAO,Z.S.,TANG,J.S.,HUANG,G.&HE, L. (2007). Case-control association study of Disrupted-in-schizophrenia-1 (DISC1) gene and schizophrenia in the Chinese population. J Psychiatr Res, 41, 428–34.

CHEUNG,Y.F.,KAN,Z.,GARRETT-ENGELE,P.,GALL,I.,MURDOCH,H.,BAILLIE,

G.S., CAMARGO,L.M.,JOHNSON,J.M.,HOUSLAY,M.D.&CASTLE, J.C. (2007). PDE4B5, a novel, super-short, brain-specific cAMP phosphodiesterase-4 variant whose isoform-specifying n-terminal region is identical to that of cAMP phosphodiesterase-4d6 (PDE4D6). J Pharmacol Exp Ther, 322, 600–9.

CHONGSUVIVATWONG, V. (2012). epicalc: Epidemiological calculator. R package version 2.15.1.0.

CHUBB,J.E.,BRADSHAW,N.J.,SOARES,D.C.,PORTEOUS,D.J.&MILLAR, J.K. (2008). The DISC locus in psychiatric illness. Mol Psychiatry, 13, 36–64.

CLAPCOTE,S.J.,LIPINA,T.V.,MILLAR,J.K.,MACKIE,S.,CHRISTIE,S.,OGAWA,F.,LERCH,

J.P., TRIMBLE,K.,UCHIYAMA,M.,SAKURABA,Y.,KANEDA,H.,SHIROISHI,T.,HOUS-

LAY,M.D.,HENKELMAN,R.M.,SLED,J.G.,GONDO,Y.,PORTEOUS,D.J.&RODER,J.C. (2007). Behavioral phenotypes of Disc1 missense mutations in mice. Neuron, 54, 387–402.

COHEN-WOODS,S.,CRAIG,I.W.&MCGUFFIN, P. (2013). The current state of play on the molecular genetics of depression. Psychol Mad, 43, 673–87.

COMBARROS,O.,CORTINA-BORJA,M.,SMITH,A.D.&LEHMANN, D.J. (2009). Epistasis in sporadic alzheimer’s disease. Neurobiol Aging, 30, 1333–49.

COONS,A.H.,CREECH,H.J.&JONES, R.N. (1941). Immunological properties of an anti- body containing a fluorescent group. Experimental Biology and Medicine, 47, 200–202.

299 REFERENCES

CORDELL, H.J. (2009). Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet, 10, 392–404.

CRADDOCK,N.&SKLAR, P. (2013). Genetics of bipolar disorder. The Lancet, 381, 1654 – 62.

CROSS-DISORDER GROUP OF THE PSYCHIATRIC GENOMICS CONSORTIUM (2013). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet, 45, 984–994.

CUNNINGTON,M.,SANTIBANEZ KOREF,M.,MAYOSI,B.,BURN,J.&KEAVNEY, B. (2010). Chromosome 9p21 SNPs associated with multiple disease phenotypes correlate with AN- RIL expression. PLoS Genet., 6, e1000899.

CURTIS,D.,KALSI,G.,BRYNJOLFSSON,J.,MCINNIS,M.,O’NEILL,J.,SMYTH,C.,

MOLONEY,E.,MURPHY,P.,MCQUILLIN,A.,PETURSSON,H.&GURLING, H. (2003). Genome scan of pedigrees multiply affected with bipolar disorder provides further sup- port for the presence of a susceptibility locus on chromosome 12q23-q24, and suggests the presence of additional loci on 1p and 1q. Psychiatr Genet, 13, 77–84.

DATTA,S.R.,MCQUILLIN,A.,RIZIG,M.,BLAVERI,E.,THIRUMALAI,S.,KALSI,G.,

LAWRENCE,J.,BASS,N.J.,PURI,V.,CHOUDHURY,K.,PIMM,J.,CROMBIE,C.,FRASER,

G., WALKER,N.,CURTIS,D.,ZVELEBIL,M.,PEREIRA,A.,KANDASWAMY,R.,ST CLAIR,

D. & GURLING, H.M. (2010). A threonine to isoleucine missense mutation in the peri- centriolar material 1 gene is strongly associated with schizophrenia. Mol Psychiatry, 15, 615–28.

DAWE,H.R.,ADAMS,M.,WHEWAY,G.,SZYMANSKA,K.,LOGAN,C.V.,NOEGEL,A.A.,

GULL,K.&JOHNSON, C.A. (2009). Nesprin-2 interacts with meckelin and mediates cilio- genesis via remodelling of the actin cytoskeleton. J Cell Sci, 122, 2716–26.

DEANE,C.M.,SALWINSKI´ ,Ł.,XENARIOS,I.&EISENBERG, D. (2002). Protein interactions two methods for assessment of the reliability of high throughput observations. Molecular & Cellular Proteomics, 1, 349–56.

300 REFERENCES

DEBONO,R.,TOPLESS,R.,MARKIE,D.,BLACK,M.A.&MERRIMAN, T.R. (2012). Analysis of the DISC1 translocation partner (11q14.3) in genetic risk of schizophrenia. Genes Brain Behav, 11, 859–63.

DELANEAU,O.,ZAGURY,J.&MARCHINI, J. (2013). Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods, 10, 5–6.

DERKACH,A.,CHIANG,T.,GONG,J.,ADDIS,L.,DOBBINS,S.,TOMLINSON,I.,HOUL-

STON,R.,PAL,D.K.&STRUG, L.J. (2014). Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic. Bioinformatics, 30, 2179–88.

DETERA-WADLEIGH,S.D.,BADNER,J.A.,BERRETTINI,W.H.,YOSHIKAWA,T.,GOLDIN,

L.R., TURNER,G.,ROLLINS,D.Y.,MOSES,T.,SANDERS,A.R.,KARKERA,J.D.,ES-

TERLING,L.E.,ZENG,J.,FERRARO,T.N.,GUROFF,J.J.,KAZUBA,D.,MAXWELL,M.E.,

NURNBERGER,J.,J.I.&GERSHON, E.S. (1999). A high-density genome scan detects evi- dence for a bipolar-disorder susceptibility locus on 13q32 and other potential loci on 1q32 and 18p11.2. Proc Natl Acad Sci U S A, 96, 5604–9.

DICK,D.M.,FOROUD,T.,FLURY,L.,BOWMAN,E.S.,MILLER,M.J.,RAU,N.L.,MOE,

P.R., S AMAVEDY,N.,EL-MALLAKH,R.,MANJI,H.,GLITZ,D.A.,MEYER,E.T.,SMI-

LEY,C.,HAHN,R.,WIDMARK,C.,MCKINNEY,R.,SUTTON,L.,BALLAS,C.,GRICE,D.,

BERRETTINI,W.,BYERLEY,W.,CORYELL,W.,DEPAULO,R.,MACKINNON,D.F.,GER-

SHON,E.S.,KELSOE,J.R.,MCMAHON,F.J.,MCINNIS,M.,MURPHY,D.L.,REICH,T.,

SCHEFTNER,W.&NURNBERGER, J., J. I. (2003). Genomewide linkage analyses of bipo- lar disorder: a new sample of 250 pedigrees from the National Institute of Mental Health Genetics Initiative. Am J Hum Genet, 73, 107–14.

DOENCH,J.G.&SHARP, P.A. (2004). Specificity of microRNA target selection in transla- tional repression. Genes Dev, 18, 504–11.

DOREY,J.M.,BEAUCHET,O.,THOMAS ANTERION,C.,ROUCH,I.,KROLAK-SALMON,

P., G AUCHER,J.,GONTHIER,R.&AKISKAL, H.S. (2008). Behavioral and psychologi- cal symptoms of dementia and bipolar spectrum disorders: review of the evidence of a relationship and treatment implications. CNS Spectr, 13, 796–803.

301 REFERENCES

DOVE,S.L.&HOCHSCHILD, A. (1998). Conversion of the omega subunit of Escherichia coli RNA polymerase into a transcriptional activator or an activation target. Genes Dev, 12, 745–54.

DUAN,X.,CHANG,J.H.,GE,S.,FAULKNER,R.L.,KIM,J.Y.,KITABATAKE,Y.,LIU,X.B.,

YANG,C.H.,JORDAN,J.D.,MA,D.K.,LIU,C.Y.,GANESAN,S.,CHENG,H.J.,MING,

G.L., LU,B.&SONG, H. (2007). Disrupted-in-Schizophrenia 1 regulates integration of newly generated neurons in the adult brain. Cell, 130, 1146–58.

EID,J.,FEHR,A.,GRAY,J.,LUONG,K.,LYLE,J.,OTTO,G.,PELUSO,P.,RANK,D.,BAY-

BAYAN,P.,BETTMAN,B.,BIBILLO,A.,BJORNSON,K.,CHAUDHURI,B.,CHRISTIANS,

F., CICERO,R.,CLARK,S.,DALAL,R.,DEWINTER,A.,DIXON,J.,FOQUET,M.,GAERT-

NER,A.,HARDENBOL,P.,HEINER,C.,HESTER,K.,HOLDEN,D.,KEARNS,G.,KONG,

X., KUSE,R.,LACROIX,Y.,LIN,S.,LUNDQUIST,P.,MA,C.,MARKS,P.,MAXHAM,M.,

MURPHY,D.,PARK,I.,PHAM,T.,PHILLIPS,M.,ROY,J.,SEBRA,R.,SHEN,G.,SOREN-

SON,J.,TOMANEY,A.,TRAVERS,K.,TRULSON,M.,VIECELI,J.,WEGENER,J.,WU,D.,

YANG,A.,ZACCARIN,D.,ZHAO,P.,ZHONG,F.,KORLACH,J.&TURNER, S. (2009). Real-time dna sequencing from single polymerase molecules. Science, 323, 133–8.

EKELUND,J.,HOVATTA,I.,PARKER,A.,PAUNIO,T.,VARILO,T.,MARTIN,R.,SUHONEN,

J., ELLONEN,P.,CHAN,G.,SINSHEIMER,J.S.,SOBEL,E.,JUVONEN,H.,ARAJARVI,R.,

PARTONEN,T.,SUVISAARI,J.,LONNQVIST,J.,MEYER,J.&PELTONEN, L. (2001). Chro- mosome 1 loci in Finnish schizophrenia families. Hum Mol Genet, 10, 1611–7.

EKELUND,J.,HENNAH,W.,HIEKKALINNA,T.,PARKER,A.,MEYER,J.,LONNQVIST,J.&

PELTONEN, L. (2004). Replication of 1q42 linkage in Finnish schizophrenia pedigrees. Mol Psychiatry, 9, 1037–41.

ELBASHIR,S.,HARBORTH,J.,LENDECKEL,W.,YALCIN,A.,WEBER,K.&TUSCHL,T. (2001). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mam- malian cells. Nature, 411, 494–8.

ESQUELA-KERSCHER,A.&SLACK, F.J. (2006). Oncomirs - microRNAs with a role in cancer. Nat Rev Cancer, 6, 259–69.

302 REFERENCES

EVANS,D.M.,SPENCER,C.C.A.,POINTON,J.J.,SU,Z.,HARVEY,D.,KOCHAN,G.,OP-

PERMANN,U.,OPPERMAN,U.,DILTHEY,A.,PIRINEN,M.,STONE,M.A.,APPLETON,

L., MOUTSIANAS,L.,MOUTSIANIS,L.,LESLIE,S.,WORDSWORTH,T.,KENNA,T.J.,KA-

RADERI,T.,THOMAS,G.P.,WARD,M.M.,WEISMAN,M.H.,FARRAR,C.,BRADBURY,

L.A., DANOY,P.,INMAN,R.D.,MAKSYMOWYCH,W.,GLADMAN,D.,RAHMAN,P.,

SPONDYLOARTHRITIS RESEARCH CONSORTIUM OF CANADA (SPARCC), MORGAN,A.,

MARZO-ORTEGA,H.,BOWNESS,P.,GAFFNEY,K.,GASTON,J.S.H.,SMITH,M.,BRUGES-

ARMAS,J.,COUTO,A.R.,SORRENTINO,R.,PALADINI,F.,FERREIRA,M.A.,XU,H.,LIU,

Y., J IANG,L.,LOPEZ-LARREA,C.,D´IAZ-PENA˜ ,R.,LOPEZ´ -VAZQUEZ´ ,A.,ZAYATS,T.,

BAND,G.,BELLENGUEZ,C.,BLACKBURN,H.,BLACKWELL,J.M.,BRAMON,E.,BUMP-

STEAD,S.J.,CASAS,J.P.,CORVIN,A.,CRADDOCK,N.,DELOUKAS,P.,DRONOV,S.,

DUNCANSON,A.,EDKINS,S.,FREEMAN,C.,GILLMAN,M.,GRAY,E.,GWILLIAM,R.,

HAMMOND,N.,HUNT,S.E.,JANKOWSKI,J.,JAYAKUMAR,A.,LANGFORD,C.,LIDDLE,

J., MARKUS,H.S.,MATHEW,C.G.,MCCANN,O.T.,MCCARTHY,M.I.,PALMER,C.N.A.,

PELTONEN,L.,PLOMIN,R.,POTTER,S.C.,RAUTANEN,A.,RAVINDRARAJAH,R.,RICK-

ETTS,M.,SAMANI,N.,SAWCER,S.J.,STRANGE,A.,TREMBATH,R.C.,VISWANATHAN,

A.C., WALLER,M.,WESTON,P.,WHITTAKER,P.,WIDAA,S.,WOOD,N.W.,MCVEAN,

G., REVEILLE,J.D.,WORDSWORTH,B.P.,BROWN,M.A.,DONNELLY,P.,AUSTRALO-

ANGLO-AMERICAN SPONDYLOARTHRITIS CONSORTIUM (TASC) & WELLCOME TRUST

CASE CONTROL CONSORTIUM 2 (WTCCC2) (2011). Interaction between erap1 and hla- b27 in ankylosing spondylitis implicates peptide handling in the mechanism for hla-b27 in disease susceptibility. Nat Genet, 43, 761–7.

EWENS,W.J.&SPIELMAN, R. (1995). The transmission/disequilibrium test: history, subdi- vision, and admixture. Am J Hum Genet, 57, 455–64.

EYKELENBOOM,J.,BRIGGS,G.,BRADSHAW,N.,SOARES,D.,OGAWA,F.,CHRISTIE,S.,

MALAVASI,E.,MAKEDONOPOULOU,P.,MACKIE,S.,MALLOY,M.,WEAR,M.,BLACK-

BURN,E.,BRAMHAM,J.,MCINTOSH,A.,BLACKWOOD,D.,MUIR,W.,PORTEOUS,D.&

MILLAR, J. (2012). A t(1;11) translocation linked to schizophrenia and affective disorders gives rise to aberrant chimeric DISC1 transcripts that encode structurally altered, deleteri- ous mitochondrial proteins. Hum. Mol. Genet., 21, 3374–86.

303 REFERENCES

FANG,Y.&MACOOL, D. (2002). Development of a high-throughput yeast two-hybrid screening system to study protein–protein interactions in plants. Mol Genet Genomics., 267, 142–53.

FELDMAN,J.L.,GEIMER,S.&MARSHALL, W.F. (2007). The mother centriole plays an in- structive role in defining cell geometry. PLoS Biol, 5, e149.

FIELDS,S.&SONG, O. (1989). A novel genetic system to detect protein-protein interactions. Nature, 340, 245–6.

FIRE,A.,XU,S.,MONTGOMERY,M.,KOSTAS,S.,DRIVER,S.&MELLO, C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Na- ture, 391, 806–11.

FISHER, R. (1925). Statistical Methods for Research Workers. Oliver and Boyd, 1st edn.

FOREMAN, L. (2012). RAD51 in Congenital Mirror Movement Disorder - Evidence of a Novel Function?. Honours dissertation, University of Otago.

FROMONT-RACINE,M.,RAIN,J.&LEGRAIN, P. (1997). Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens. Nat Genet, 16, 277–82.

FUJITA,T.,IKUTA,J.,HAMADA,J.,OKAJIMA,T.,TATEMATSU,K.,TANIZAWA,K.&

KURODA, S. (2004). Identification of a tissue-non-specific homologue of axonal fascicu- lation and elongation protein zeta-1. Biochem Biophys Res Commun, 313, 738–44.

GALL,J.G.&PARDUE, M.L. (1969). Formation and detection of -dna hybrid molecules in cytological preparations. Proc Natl Acad Sci U S A, 63, 378–83.

GEJMAN,P.V.,MARTINEZ,M.,CAO,Q.,FRIEDMAN,E.,BERRETTINI,W.H.,GOLDIN,L.R.,

KOROULAKIS,P.,AMES,C.,LERMAN,M.A.&GERSHON, E.S. (1993). Linkage analysis of fifty-seven microsatellite loci to bipolar disorder. Neuropsychopharmacology, 9, 31–40.

GELLER,B.&LUBY, J. (1997). Child and adolescent bipolar disorder: a review of the past 10 years. J Am Acad Child Adolesc Psychiatry, 36, 1168–76.

GIETZ,R.D.,SCHIESTL,R.H.,WILLEMS,A.R.&WOODS, R.A. (1995). Studies on the trans- formation of intact yeast cells by the LiAc/SS-DNA/PEG procedure. Yeast, 11, 355–60.

304 REFERENCES

GIRARD,S.L.,DION,P.A.&ROULEAU, G.A. (2012). Schizophrenia genetics: putting all the pieces together. Curr Neurol Neurosci Rep, 12, 261–66.

GIUSTI-RODR´IGUEZ,P.&SULLIVAN, P.F. (2013). The genomics of schizophrenia: update and implications. J Clin Invest, 123, 4557–63.

GORLOV,I.P.,GORLOVA,O.Y.,SUNYAEV,S.R.,SPITZ,M.R.&AMOS, C.I. (2008). Shift- ing paradigm of association studies: value of rare single-nucleotide polymorphisms. Am J Hum Genet, 82, 100–12.

GOTTESMAN,I.&ERLENMEYER-KIMLING, L. (2001). Family and twin strategies as a head start in defining prodromes and endophenotypes for hypothetical early-interventions in schizophrenia. Schizophr Res, 51, 93–102.

GRAHAM,F.L.,SMILEY,J.,RUSSELL,W.C.&NAIRN, R. (1977). Characteristics of a human cell line transformed by DNA from human adenovirus type 5. J Gen Virol, 36, 59–74.

GREEN,E.K.,GROZEVA,D.,SIMS,R.,RAYBOULD,R.,FORTY,L.,GORDON-SMITH,K.,

RUSSELL,E.,ST CLAIR,D.,YOUNG,A.H.,FERRIER,I.N.,KIROV,G.,JONES,I.,JONES,

L., OWEN,M.J.,O’DONOVAN,M.C.&CRADDOCK, N. (2011). DISC1 exon 11 rare vari- ants found more commonly in schizoaffective spectrum cases than controls. Am J Med Genet B Neuropsychiatr Genet, 156B, 490–2.

GREEN,E.K.,GROZEVA,D.,FORTY,L.,GORDON-SMITH,K.,RUSSELL,E.,FARMER,A.,

HAMSHERE,M.,JONES,I.R.,JONES,L.,MCGUFFIN,P.,MORAN,J.L.,PURCELL,S.,

SKLAR,P.,OWEN,M.J.,O’DONOVAN,M.C.&CRADDOCK, N. (2013). Association at SYNE1 in both bipolar disorder and recurrent major depression. Mol Psychiatry, 18, 614–7.

GREEN,E.K.,REES,E.,WALTERS,J.T.R.,SMITH,K.G.,FORTY,L.,GROZEVA,D.,MORAN,

J.L., SKLAR,P.,RIPKE,S.,CHAMBERT,K.D.,GENOVESE,G.,MCCARROLL,S.A.,JONES,

I., JONES,L.,OWEN,M.J.,O’DONOVAN,M.C.,CRADDOCK,N.&KIROV, G. (2015). Copy number variation in bipolar disorder. Mol Psychiatry.

GUARNIERI,D.J.&DILEONE, R.J. (2008). MicroRNAs: a new class of gene regulators. Ann Med, 40, 197–208.

305 REFERENCES

GUDBJARTSSON,D.F.,THORVALDSSON,T.,KONG,A.,GUNNARSSON,G.&INGOLFSDOT-

TIR, A. (2005). Allegro version 2. Nat Genet, 37, 1015–6.

GURLING,H.M.,CRITCHLEY,H.,DATTA,S.R.,MCQUILLIN,A.,BLAVERI,E.,THIRU-

MALAI,S.,PIMM,J.,KRASUCKI,R.,KALSI,G.,QUESTED,D.,LAWRENCE,J.,BASS,

N., CHOUDHURY,K.,PURI,V.,O’DALY,O.,CURTIS,D.,BLACKWOOD,D.,MUIR,W.,

MALHOTRA,A.K.,BUCHANAN,R.W.,GOOD,C.D.,FRACKOWIAK,R.S.&DOLAN,R.J. (2006). Genetic association and brain morphology studies and the chromosome 8p22 peri- centriolar material 1 (PCM1) gene in susceptibility to schizophrenia. Arch Gen Psychiatry, 63, 844–54.

GUSELLA,J.F.,WEXLER,N.S.,CONNEALLY,P.M.,NAYLOR,S.L.,ANDERSON,M.A.,

TANZI,R.E.,WATKINS,P.C.,OTTINA,K.,WALLACE,M.R.&SAKAGUCHI, A.Y. (1983). A polymorphic DNA marker genetically linked to Huntington’s disease. Nature, 306, 234–8.

HALDANE, J.B.S. (1919). The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet, 8, 299–309.

HAMSHERE,M.L.,BENNETT,P.,WILLIAMS,N.,SEGURADO,R.,CARDNO,A.,NOR-

TON,N.,LAMBERT,D.,WILLIAMS,H.,KIROV,G.,CORVIN,A.,HOLMANS,P.,JONES,

L., JONES,I.,GILL,M.,O’DONOVAN,M.C.,OWEN,M.J.&CRADDOCK, N. (2005). Genomewide linkage scan in schizoaffective disorder: significant evidence for linkage at 1q42 close to DISC1, and suggestive evidence at 22q11 and 19p13. Arch Gen Psychiatry, 62, 1081–8.

HARTLEY,J.L.,TEMPLE,G.F.&BRASCH, M.A. (2000). DNA cloning using in vitro site- specific recombination. Genome Res, 10, 1788–95.

HASHIMOTO,R.,NUMAKAWA,T.,OHNISHI,T.,KUMAMARU,E.,YAGASAKI,Y.,ISHI-

MOTO,T.,MORI,T.,NEMOTO,K.,ADACHI,N.,IZUMI,A.,CHIBA,S.,NOGUCHI,H.,

SUZUKI,T.,IWATA,N.,OZAKI,N.,TAGUCHI,T.,KAMIYA,A.,KOSUGA,A.,TATSUMI,

M., KAMIJIMA,K.,WEINBERGER,D.R.,SAWA,A.&KUNUGI, H. (2006). Impact of the DISC1 Ser704Cys polymorphism on risk for major depression, brain morphology and ERK signaling. Hum Mol Genet, 15, 3024–33.

306 REFERENCES

HAYASHI-TAKAGI,A.,TAKAKI,M.,GRAZIANE,N.,SESHADRI,S.,MURDOCH,H.,DUN-

LOP,A.J.,MAKINO,Y.,SESHADRI,A.J.,ISHIZUKA,K.,SRIVASTAVA,D.P.,XIE,Z.,BARA-

BAN,J.M.,HOUSLAY,M.D.,TOMODA,T.,BRANDON,N.J.,KAMIYA,A.,YAN,Z.,PEN-

ZES,P.&SAWA, A. (2010). Disrupted-in-Schizophrenia 1 (DISC1) regulates spines of the glutamate synapse via Rac1. Nat Neurosci, 13, 327–32.

HAYDEN,E.P.&NURNBERGER, J.I. (2006). Molecular genetics of bipolar disorder. Genes, Brain and Behavior, 5, 85–95.

HENNAH,W.&PORTEOUS, D. (2009). The DISC1 pathway modulates expression of neu- rodevelopmental, synaptogenic and sensory perception genes. PLoS ONE, 4, e4906.

HENNAH,W.,VARILO,T.,KESTILA,M.,PAUNIO,T.,ARAJARVI,R.,HAUKKA,J.,PARKER,

A., MARTIN,R.,LEVITZKY,S.,PARTONEN,T.,MEYER,J.,LONNQVIST,J.,PELTONEN,L.

&EKELUND, J. (2003). Haplotype transmission analysis provides evidence of association for DISC1 to schizophrenia and suggests sex-dependent effects. Hum Mol Genet, 12, 3151–9.

HIKIDA,T.,JAARO-PELED,H.,SESHADRI,S.,OISHI,K.,HOOKWAY,C.,KONG,S.,WU,D.,

XUE,R.,ANDRADE,M.,TANKOU,S.,MORI,S.,GALLAGHER,M.,ISHIZUKA,K.,PLET-

NIKOV,M.,KIDA,S.&SAWA, A. (2007). Dominant-negative DISC1 transgenic mice dis- play schizophrenia-associated phenotypes detected by measures translatable to humans. Proc Natl Acad Sci USA, 104, 14501–6.

HODGE, S.E. (2001). Model-free vs. model-based linkage analysis: a false dichotomy? Am J Med Genet, 105, 62–4.

HODGKINSON,C.A.,GOLDMAN,D.,JAEGER,J.,PERSAUD,S.,KANE,J.M.,LIPSKY,R.H.&

MALHOTRA, A.K. (2004). Disrupted in schizophrenia 1 (DISC1): association with schizo- phrenia, schizoaffective disorder, and bipolar disorder. Am J Hum Genet, 75, 862–72.

HOLM, S. (1979). A simple sequentially rejective multiple test procedure. Scand J Statist, 6, 65–70.

HOWIE,B.,DONNELLY,P.&MARCHINI, J. (2009). A flexible and accurate genotype impu- tation method for the next generation of genome-wide association studies. PLoS Genet, 5, e1000529.

307 REFERENCES

HUANG,W.,WANG,P.,LIU,Z.&ZHANG, L. (2009). Identifying disease associations via genome-wide association studies. BMC bioinformatics, 10, S68.

HURT,J.A.,THIBODEAU,S.A.,HIRSH,A.S.,PABO,C.O.&JOUNG, J.K. (2003). Highly spe- cific zinc finger proteins obtained by directed domain shuffling and cell-based selection. Proc Natl Acad Sci U S A, 100, 12271–6.

HWU,H.G.,LIU,C.M.,FANN,C.S.,OU-YANG,W.C.&LEE, S.F. (2003). Linkage of schizo- phrenia with chromosome 1q loci in Taiwanese families. Mol Psychiatry, 8, 445–52.

INGRAHAM,L.J.&KETY, S.S. (2000). Adoption studies of schizophrenia. Am J Med Genet, 97, 18–22.

INTERNATIONAL SCHIZOPHRENIA CONSORTIUM (2008). Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature, 455, 237–41.

JAARO-PELED, H. (2009). Chapter 9 - gene models of schizophrenia: DISC1 mouse models. { } In S. Akira, ed., Genetic Models of Schizophrenia, vol. 179 of Progress in Brain Research, 75 – 86, Elsevier.

JACOBS,P.,BRUNTON,M.,FRACKIEWICZ,A.,NEWTON,M.,COOK,P.&ROBSON,E. (1970). Studies on a family with three cytogenetic markers. Ann Hum Genet, 33, 325–336.

JAMES,P.,HALLADAY,J.&CRAIG, E. (1996). Genomic libraries and a host strain designed for highly efficient two-hybrid selection in yeast. Genetics, 144, 1425–1436.

JOHNSON,G.C.,ESPOSITO,L.,BARRATT,B.J.,SMITH,A.N.,HEWARD,J.,DI GENOVA,

G., UEDA,H.,CORDELL,H.J.,EAVES,I.A.,DUDBRIDGE,F.,TWELLS,R.C.,PAYNE,F.,

HUGHES,W.,NUTLAND,S.,STEVENS,H.,CARR,P.,TUOMILEHTO-WOLF,E.,TUOMILE-

HTO,J.,GOUGH,S.C.,CLAYTON,D.G.&TODD, J.A. (2001). Haplotype tagging for the identification of common disease genes. Nat Genet, 29, 233–7.

JOUNG,J.K.,RAMM,E.I.&PABO, C.O. (2000). A bacterial two-hybrid selection system for studying protein-DNA and protein-protein interactions. Proc Natl Acad Sci U S A, 97, 7382– 7.

308 REFERENCES

JOYCE,P.R.,DOUGHTY,C.J.,WELLS,J.E.,WALSH,A.E.,ADMIRAAL,A.,LILL,M.&OLDS, R.J. (2004). Affective disorders in the first-degree relatives of bipolar probands: results from the South Island Bipolar Study. Compr Psychiatry, 45, 168–74.

KAMIYA,A.,KUBO,K.,TOMODA,T.,TAKAKI,M.,YOUN,R.,OZEKI,Y.,SAWAMURA,

N., PARK,U.,KUDO,C.,OKAWA,M.,ROSS,C.A.,HATTEN,M.E.,NAKAJIMA,K.&

SAWA, A. (2005). A schizophrenia-associated mutation of DISC1 perturbs cerebral cortex development. Nat Cell Biol, 7, 1167–78.

KAMIYA,A.,TOMODA,T.,CHANG,J.,TAKAKI,M.,ZHAN,C.,MORITA,M.,CASCIO,M.B.,

ELASHVILI,S.,KOIZUMI,H.,TAKANEZAWA,Y.,DICKERSON,F.,YOLKEN,R.,ARAI,H.

&SAWA, A. (2006). DISC1-NDEL1/NUDEL protein interaction, an essential component for neurite outgrowth, is modulated by genetic variations of DISC1. Hum Mol Genet, 15, 3313–23.

KAMIYA,A.,TAN,P.L.,KUBO,K.,ENGELHARD,C.,ISHIZUKA,K.,KUBO,A.,TSUKITA,

S., PULVER,A.E.,NAKAJIMA,K.,CASCELLA,N.G.,KATSANIS,N.&SAWA, A. (2008). Recruitment of PCM1 to the centrosome by the cooperative action of DISC1 and BBS4: a candidate for psychiatric illnesses. Arch Gen Psychiatry, 65, 996–1006.

KANG,E.,BURDICK,K.,KIM,J.,DUAN,X.,GUO,J.,SAILOR,K.,JUNG,D.,GANESAN,

S., CHOI,S.,PRADHAN,D.,LU,B.,AVRAMOPOULOS,D.,CHRISTIAN,K.,MALHOTRA,

A., SONG,H.&MING, G.L. (2011). Interaction between FEZ1 and DISC1 in regulation of neuronal development and risk for schizophrenia. Neuron, 72, 559–71.

KIESEPPA,T.,PARTONEN,T.,HAUKKA,J.,KAPRIO,J.&LONNQVIST, J. (2004). High concor- dance of bipolar i disorder in a nationwide sample of twins. Am J Psychiatry, 161, 1814–21.

KILPINEN,H.,YLISAUKKO-OJA,T.,HENNAH,W.,PALO,O.M.,VARILO,T.,VANHALA,

R., NIEMINEN-VON WENDT,T.,VON WENDT,L.,PAUNIO,T.&PELTONEN, L. (2008). Association of DISC1 with autism and . Mol Psychiatry, 13, 187–96.

KIRKPATRICK,B.,XU,L.,CASCELLA,N.,OZEKI,Y.,SAWA,A.&ROBERTS, R.C. (2006). DISC1 immunoreactivity at the light and ultrastructural level in the human neocortex. J Comp Neurol, 497, 436–50.

309 REFERENCES

KUHN,R.,SCHWENK,F.,AGUET,M.&RAJEWSKY, K. (1995). Inducible gene targeting in mice. Science, 269, 1427–9.

KVAJO,M.,MCKELLAR,H.,ARGUELLO,P.A.,DREW,L.J.,MOORE,H.,MACDERMOTT,

A.B., KARAYIORGOU,M.&GOGOS, J.A. (2008). A mutation in mouse Disc1 that models a schizophrenia risk allele leads to specific alterations in neuronal architecture and cogni- tion. Proc Natl Acad Sci U S A, 105, 7076–81.

LAGOS-QUINTANA,M.,RAUHUT,R.,LENDECKEL,W.&TUSCHL, T. (2001). Identification of novel genes coding for small expressed RNAs. Science, 294, 853–8.

LANDER,E.S.&SCHORK, N.J. (1994). Genetic dissection of complex traits. Science, 265, 2037–48.

LAURSEN,T.M.&MUNK-OLSEN, T. (2010). Reproductive patterns in psychotic patients. Schizophr Res, 121, 234–40.

LE-NICULESCU,H.,PATEL,S.D.,BHAT,M.,KUCZENSKI,R.,FARAONE,S.V.,TSUANG,

M.T., MCMAHON,F.J.,SCHORK,N.J.,NURNBERGER,J.I.,JR.&NICULESCU,A.,III (2009). Convergent functional genomics of genome-wide association data for bipolar dis- order: Comprehensive identification of candidate genes, pathways and mechanisms. Am J Med Genet B Neuropsychiatr Genet, 150B, 155–81.

LEANNA,C.A.&HANNINK, M. (1996). The reverse two-hybrid system: a genetic scheme for selection against specific protein/protein interactions. Nucleic Acids Res, 24, 3341–7.

LEE,S.H.,DECANDIA,T.R.,RIPKE,S.,YANG,J.,SCHIZOPHRENIA PSYCHIATRIC GENOME-

WIDE ASSOCIATION STUDY CONSORTIUM (PGC-SCZ), INTERNATIONAL SCHIZO-

PHRENIA CONSORTIUM (ISC), MOLECULAR GENETICS OF SCHIZOPHRENIA COLLABO-

RATION (MGS), SULLIVAN,P.F.,GODDARD,M.E.,KELLER,M.C.,VISSCHER,P.M.&

WRAY, N.R. (2012). Estimating the proportion of variation in susceptibility to schizo- phrenia captured by common snps. Nat Genet, 44, 247–50.

LELIVELD,S.R.,BADER,V.,HENDRIKS,P.,PRIKULIS,I.,SAJNANI,G.,REQUENA,J.R.

&KORTH, C. (2008). Insolubility of disrupted-in-schizophrenia 1 disrupts oligomer-

310 REFERENCES

dependent interactions with nuclear distribution element 1 and is associated with sporadic mental disease. J Neurosci, 28, 3839–45.

LELIVELD,S.R.,HENDRIKS,P.,MICHEL,M.,SAJNANI,G.,BADER,V.,TROSSBACH,S.,

PRIKULIS,I.,HARTMANN,R.,JONAS,E.,WILLBOLD,D.,REQUENA,J.R.&KORTH,C. (2009). Oligomer assembly of the C-terminal DISC1 domain (640-854) is controlled by self- association motifs and disease-associated polymorphism S704C. Biochemistry, 48, 7746–55.

LENGI,A.J.&CORL, B.A. (2007). Identification and characterization of a novel bovine stearoyl-CoA desaturase isoform with homology to human SCD5. Lipids, 42, 499–508.

LEVINSON,D.F.,DUAN,J.,OH,S.,WANG,K.,SANDERS,A.R.,SHI,J.,ZHANG,N.,

MOWRY,B.J.,OLINCY,A.,AMIN,F.,CLONINGER,C.R.,SILVERMAN,J.M.,BUCCOLA,

N.G., BYERLEY,W.F.,BLACK,D.W.,KENDLER,K.S.,FREEDMAN,R.,DUDBRIDGE,F.,

PE’ER,I.,HAKONARSON,H.,BERGEN,S.E.,FANOUS,A.H.,HOLMANS,P.A.&GEJ-

MAN, P.V. (2011). Copy number variants in schizophrenia: confirmation of five previous findings and new evidence for 3q29 microdeletions and VIPR2 duplications. Am J Psychia- try, 168, 302–16.

LEWINSOHN,P.M.,CLARKE,G.N.,SEELEY,J.R.&ROHDE, P. (1994). Major depression in community adolescents: age at onset, episode duration, and time to recurrence. J Am Acad Child Adolesc Psychiatry, 33, 809–18.

LEWIS,B.P.,BURGE,C.B.&BARTEL, D.P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell, 120, 15–20.

LEWIS,D.A.&LEVITT, P. (2002). Schizophrenia as a disorder of neurodevelopment. Annu Rev Neurosci, 25, 409–32.

LEWONTIN, R.C. (1964). The interaction of selection and linkage. i. general considerations; heterotic models. Genetics, 49, 49–67.

LI,C.,INGLIS,P.N.,LEITCH,C.C.,EFIMENKO,E.,ZAGHLOUL,N.A.,MOK,C.A.,DAVIS,

E.E., BIALAS,N.J.,HEALEY,M.P.,HEON´ ,E.,ZHEN,M.,SWOBODA,P.,KATSANIS,N.

311 REFERENCES

&LEROUX, M.R. (2008a). An essential role for DYF-11/MIP-T3 in assembling functional intraflagellar transport complexes. PLoS Genet, 4, e1000044.

LI,M.,LI,C.&GUAN, W. (2008b). Evaluation of coverage variation of snp chips for genome-wide association studies. Eur J Hum Genet, 16, 635–643.

LI,Q.&YU, K. (2008a). Improved correction for population stratification in genome-wide association studies by identifying hidden population structures. Genet Epidemiol., 32, 215– 226.

LI,Q.&YU, K. (2008b). Improved correction for population stratification in genome-wide association studies by identifying improved correction for population stratification in genome-wide association studies by identifying hidden population structures. Genet Epi- demiol., 32, 215–26.

LI,W.,ZHOU,Y.,JENTSCH,J.D.,BROWN,R.A.,TIAN,X.,EHNINGER,D.,HENNAH,W.,

PELTONEN,L.,LONNQVIST,J.,HUTTUNEN,M.O.,KAPRIO,J.,TRACHTENBERG,J.T.,

SILVA,A.J.&CANNON, T.D. (2007). Specific developmental disruption of disrupted-in- schizophrenia-1 function results in schizophrenia-related phenotypes in mice. Proc Natl Acad Sci U S A, 104, 18280–5.

LING,L.&GOEDDEL, D.V. (2000). MIP-T3, a novel protein linking tumor necrosis factor receptor-associated factor 3 to the microtubule network. J Biol Chem, 275, 23852–60.

LIPPERT,C.,LISTGARTEN,J.,DAVIDSON,R.I.,BAXTER,S.,POON,H.,POONG,H.,KADIE,

C.M. & HECKERMAN, D. (2013). An exhaustive epistatic snp association analysis on ex- panded wellcome trust data. Sci Rep, 3, 1099.

LIU,L.,ZHANG,D.,LIU,H.&ARENDT, C. (2013). Robust methods for population stratifi- cation in genome wide association studies. BMC Bioinformatics, 14, 132.

LIU,Y.,BLACKWOOD,D.H.,CAESAR,S.,DE GEUS,E.J.C.,FARMER,A.,FERREIRA,

M.A.R., FERRIER,I.N.,FRASER,C.,GORDON-SMITH,K.,GREEN,E.K.,GROZEVA,D.,

GURLING,H.M.,HAMSHERE,M.L.,HEUTINK,P.,HOLMANS,P.A.,HOOGENDIJK,W.J.,

HOTTENGA,J.J.,JONES,L.,JONES,I.R.,KIROV,G.,LIN,D.,MCGUFFIN,P.,MOSKV-

INA,V.,NOLEN,W.A.,PERLIS,R.H.,POSTHUMA,D.,SCOLNICK,E.M.,SMIT,A.B.,

312 REFERENCES

SMIT,J.H.,SMOLLER,J.W.,ST CLAIR,D.,VAN DYCK,R.,VERHAGE,M.,WILLEMSEN,

G., YOUNG,A.H.,ZANDBELT,T.,BOOMSMA,D.I.,CRADDOCK,N.,O’DONOVAN,M.C.,

OWEN,M.J.,PENNINX,B.W.J.H.,PURCELL,S.,SKLAR,P.,SULLIVAN,P.F.&WELLCOME

TRUST CASE-CONTROL CONSORTIUM (2011). Meta-analysis of genome-wide association data of bipolar disorder and major depressive disorder. Mol Psychiatry, 16, 2–4.

LIU,Y.L.,FANN,C.S.,LIU,C.M.,CHEN,W.J.,WU,J.Y.,HUNG,S.I.,CHEN,C.H.,JOU,

Y.S., L IU,S.K.,HWANG,T.J.,HSIEH,M.H.,OUYANG,W.C.,CHAN,H.Y.,CHEN,J.J.,

YANG,W.C.,LIN,C.Y.,LEE,S.F.&HWU, H.G. (2006). A single nucleotide polymorphism fine mapping study of chromosome 1q42.1 reveals the vulnerability genes for schizo- phrenia, GNPAT and DISC1: Association with impairment of sustained attention. Biol Psychiatry, 60, 554–62.

MA,L.,LIU,Y.,KY,B.,SHUGHRUE,P.J.,AUSTIN,C.P.&MORRIS, J.A. (2002). Cloning and characterization of Disc1, the mouse ortholog of DISC1 (Disrupted-in-Schizophrenia 1). Genomics, 80, 662–72.

MACARTHUR,D.G.,BALASUBRAMANIAN,S.,FRANKISH,A.,HUANG,N.,MORRIS,J.,

WALTER,K.,JOSTINS,L.,HABEGGER,L.,PICKRELL,J.K.,MONTGOMERY,S.B.,ALBERS,

C.A., ZHANG,Z.D.,CONRAD,D.F.,LUNTER,G.,ZHENG,H.,AYUB,Q.,DEPRISTO,

M.A., BANKS,E.,HU,M.,HANDSAKER,R.E.,ROSENFELD,J.A.,FROMER,M.,JIN,M.,

MU,X.J.,KHURANA,E.,YE,K.,KAY,M.,SAUNDERS,G.I.,SUNER,M.M.,HUNT,T.,

BARNES,I.H.A.,AMID,C.,CARVALHO-SILVA,D.R.,BIGNELL,A.H.,SNOW,C.,YNG-

VADOTTIR,B.,BUMPSTEAD,S.,COOPER,D.N.,XUE,Y.,ROMERO,I.G.,1000GENOMES

PROJECT CONSORTIUM,WANG,J.,LI,Y.,GIBBS,R.A.,MCCARROLL,S.A.,DERMITZA-

KIS,E.T.,PRITCHARD,J.K.,BARRETT,J.C.,HARROW,J.,HURLES,M.E.,GERSTEIN,M.B.

&TYLER-SMITH, C. (2012). A systematic survey of loss-of-function variants in human protein-coding genes. Science, 335, 823–8.

MACGREGOR,S.,VISSCHER,P.M.,KNOTT,S.A.,THOMSON,P.,PORTEOUS,D.J.,MILLAR,

J.K., DEVON,R.S.,BLACKWOOD,D.&MUIR, W.J. (2004). A genome scan and follow-up study identify a bipolar disorder susceptibility locus on chromosome 1q42. Mol Psychiatry, 9, 1083–90.

313 REFERENCES

MAEDA,K.,NWULIA,E.,CHANG,J.,BALKISSOON,R.,ISHIZUKA,K.,CHEN,H.,

ZANDI,P.,MCINNIS,M.G.&SAWA, A. (2006). Differential expression of disrupted-in- schizophrenia (DISC1) in bipolar disorder. Biol Psychiatry, 60, 929–35.

MAHGOUB,M.&MONTEGGIA, L.M. (2013). Epigenetics and psychiatry. Neurotherapeutics, 10, 734–41.

MALHOTRA,D.&SEBAT, J. (2012). CNVs: harbingers of a rare variant revolution in psychi- atric genetics. Cell, 148, 1223–41.

MANOLIO,T.A.,RODRIGUEZ,L.L.,BROOKS,L.,ABECASIS,G.,BALLINGER,D.,DALY,M.,

DONNELLY,P.,FARAONE,S.V.,FRAZER,K.,GABRIEL,S.,GEJMAN,P.,GUTTMACHER,

A., HARRIS,E.L.,INSEL,T.,KELSOE,J.R.,LANDER,E.,MCCOWIN,N.,MAILMAN,

M.D., NABEL,E.,OSTELL,J.,PUGH,E.,SHERRY,S.,SULLIVAN,P.F.,THOMPSON,J.F.,

WARRAM,J.,WHOLLEY,D.,MILOS,P.M.&COLLINS, F.S. (2007). New models of collabo- ration in genome-wide association studies: the Genetic Association Information Network. Nat Genet, 39, 1045–51.

MAO,Y.,GE,X.,FRANK,C.L.,MADISON,J.M.,KOEHLER,A.N.,DOUD,M.K.,TASSA,

C., BERRY,E.M.,SODA,T.,SINGH,K.K.,BIECHELE,T.,PETRYSHEN,T.L.,MOON,R.T.,

HAGGARTY,S.J.&TSAI, L.H. (2009). Disrupted in schizophrenia 1 regulates neuronal pro- genitor proliferation via modulation of GSK3beta/beta-catenin signaling. Cell, 136, 1017– 31.

MARKHAM,N.R.&ZUKER, M. (2005). DINAMelt web server for nucleic acid melting pre- diction. Nucleic Acids Res, 33, W577–81.

MARKIANOS,K.,DALY,M.J.&KRUGLYAK, L. (2001). Efficient multipoint linkage analysis through reduction of inheritance space. Am J Hum Genet, 68, 963–77.

MARLEY,A.&VON ZASTROW, M. (2010). DISC1 regulates primary cilia that display specific dopamine receptors. PLoS One, 5, e10902.

MARLEY,A.&VON ZASTROW, M. (2012). A simple cell-based assay reveals that diverse neuropsychiatric risk genes converge on primary cilia. PloS one, 7, e46647.

314 REFERENCES

MATA,I.,PEREZ-IGLESIAS,R.,ROIZ-SANTIANEZ,R.,TORDESILLAS-GUTIERREZ,D.,

GONZALEZ-MANDLY,A.,BERJA,A.,VAZQUEZ-BARQUERO,J.L.&CRESPO-FACORRO, B. (2009). Additive effect of NRG1 and DISC1 genes on lateral ventricle enlargement in first episode schizophrenia. Neuroimage, 53, 1016–22.

MATHIESON,I.,MUNAFO,M.R.&FLINT, J. (2012). Meta-analysis indicates that common variants at the DISC1 locus are not associated with schizophrenia. Mol Psychiatry, 17, 634– 641.

MAUNAKEA,A.K.,NAGARAJAN,R.P.,BILENKY,M.,BALLINGER,T.J.,D’SOUZA,C.,

FOUSE,S.D.,JOHNSON,B.E.,HONG,C.,NIELSEN,C.,ZHAO,Y.,TURECKI,G.,DE-

LANEY,A.,VARHOL,R.,THIESSEN,N.,SHCHORS,K.,HEINE,V.M.,ROWITCH,D.H.,

XING,X.,FIORE,C.,SCHILLEBEECKX,M.,JONES,S.J.M.,HAUSSLER,D.,MARRA,M.A.,

HIRST,M.,WANG,T.&COSTELLO, J.F. (2010). Conserved role of intragenic DNA methy- lation in regulating alternative promoters. Nature, 466, 253–7.

MCINTYRE,J.,R.S.SOCZYNSKA,BOTTAS,A.,BORDBAR,K.,KONARSKI,J.&KENNEDY,S. (2006). Anxiety disorders and bipolar disorder: a review. Bipolar Disord, 8, 665–76.

MEYER,K.D.&MORRIS, J.A. (2009). Disc1 regulates granule in the develop- ing hippocampus. Hum Mol Genet, 18, 3286–97.

MIKLOWITZ,D.J.&CHANG, K.D. (2008). Prevention of bipolar disorder in at-risk children: theoretical assumptions and empirical foundations. Dev Psychopathol, 20, 881–97.

MILLAR,J.K.,WILSON-ANNAN,J.C.,ANDERSON,S.,CHRISTIE,S.,TAYLOR,M.S.,SEM-

PLE,C.A.,DEVON,R.S.,CLAIR,D.M.,MUIR,W.J.,BLACKWOOD,D.H.&PORTEOUS, D.J. (2000). Disruption of two novel genes by a translocation co-segregating with schizo- phrenia. Hum Mol Genet, 9, 1415–23.

MILLAR,J.K.,CHRISTIE,S.&PORTEOUS, D.J. (2003). Yeast two-hybrid screens implicate DISC1 in brain development and function. Biochem Biophys Res Commun, 311, 1019–25.

MILLAR,J.K.,JAMES,R.,BRANDON,N.J.&THOMSON, P.A. (2004). Disc1 and disc2: dis- covering and dissecting molecular mechanisms underlying psychiatric illness. Ann Med, 36, 367–78.

315 REFERENCES

MILLAR,J.K.,PICKARD,B.S.,MACKIE,S.,JAMES,R.,CHRISTIE,S.,BUCHANAN,S.R.,

MALLOY,M.P.,CHUBB,J.E.,HUSTON,E.,BAILLIE,G.S.,THOMSON,P.A.,HILL,E.V.,

BRANDON,N.J.,RAIN,J.C.,CAMARGO,L.M.,WHITING,P.J.,HOUSLAY,M.D.,BLACK-

WOOD,D.H.,MUIR,W.J.&PORTEOUS, D.J. (2005). DISC1 and PDE4B are interacting genetic factors in schizophrenia that regulate cAMP signaling. Science, 310, 1187–91.

MIYOSHI,K.,HONDA,A.,BABA,K.,TANIGUCHI,M.,OONO,K.,FUJITA,T.,KURODA,S.,

KATAYAMA,T.&TOHYAMA, M. (2003). Disrupted-In-Schizophrenia 1, a candidate gene for schizophrenia, participates in neurite outgrowth. Mol Psychiatry, 8, 685–94.

MIYOSHI,K.,ASANUMA,M.,MIYAZAKI,I.,DIAZ-CORRALES,F.J.,KATAYAMA,T.,TO-

HYAMA,M.&OGAWA, N. (2004). DISC1 localizes to the centrosome by binding to kendrin. Biochem Biophys Res Commun, 317, 1195–9.

MIYOSHI,K.,KASAHARA,K.,MIYAZAKI,I.&ASANUMA, M. (2009). Lithium treatment elongates primary cilia in the mouse brain and in cultured cells. Biochem Biophys Res Com- mun, 388, 757–62.

MOENS,L.N.,DE RIJK,P.,REUMERS,J.,VAN DEN BOSSCHE,M.J.A.,GLASSEE,W.,

DE ZUTTER,S.,LENAERTS,A.S.,NORDIN,A.,NILSSON,L.G.,MEDINA CASTELLO,I.,

NORRBACK,K.F.,GOOSSENS,D.,VAN STEEN,K.,ADOLFSSON,R.&DEL-FAVERO,J. (2011). Sequencing of DISC1 pathway genes reveals increased burden of rare missense variants in schizophrenia patients from a northern Swedish population. PLoS One, 6, e23450.

MORRIS,J.A.,KANDPAL,G.,MA,L.&AUSTIN, C.P. (2003). DISC1 (Disrupted-In- Schizophrenia 1) is a centrosome-associated protein that interacts with MAP1A, MIPT3, ATF4/5 and NUDEL: regulation and loss of interaction with mutation. Hum Mol Genet, 12, 1591–608.

MORTENSEN, R.M. (1993). Double knockouts. production of mutant cell lines in cardiovas- cular research. Hypertension, 22, 646–51.

MORTON, N.E. (1955). Sequential tests for the detection of linkage. Am J Hum Genet, 7, 277– 318.

316 REFERENCES

MOWRY,B.&GRATTEN, J. (2013). The emerging spectrum of allelic variation in schizo- phrenia: current evidence and strategies for the identification and functional characteriza- tion of common and rare variants. Mol Psychiatry, 18, 38–52.

MULLIS,K.B.&FALOONA, F.A. (1987). Specific synthesis of DNA in vitro via a polymerase- catalyzed chain reaction. Methods Enzymol, 155, 335–50.

MURDOCH,H.,MACKIE,S.,COLLINS,D.M.,HILL,E.V.,BOLGER,G.B.,KLUSSMANN,E.,

PORTEOUS,D.J.,MILLAR,J.K.&HOUSLAY, M.D. (2007). Isoform-selective susceptibility of DISC1/phosphodiesterase-4 complexes to dissociation by elevated intracellular cAMP levels. J Neurosci, 27, 9513–24.

MYERS,R.,CASALS,F.,GAUTHIER,J.,HAMDAN,F.,KEEBLER,J.,BOYKO,A.,BUSTA-

MANTE,C.,PITON,A.,SPIEGELMAN,D.,HENRION,E.,ZILVERSMIT,M.,HUSSIN,J.,

QUINLAN,J.,YANG,Y.,LAFRENIERE` ,R.,GRIFFING,A.,STONE,E.,ROULEAU,G.&

AWADALLA, P. (2011). A population genetic approach to mapping neurological disorder genes using deep resequencing. PLoS Genet, 7, e1001318.

NAKATA,K.,LIPSKA,B.K.,HYDE,T.M.,YE,T.,NEWBURN,E.N.,MORITA,Y.,

VAKKALANKA,R.,BARENBOIM,M.,SEI,Y.,WEINBERGER,D.R.&KLEINMAN,J.E. (2009). DISC1 splice variants are upregulated in schizophrenia and associated with risk polymorphisms. Proc Natl Acad Sci U S A, 106, 15873–8.

NEED,A.C.,GE,D.,WEALE,M.E.,MAIA,J.,FENG,S.,HEINZEN,E.L.,SHIANNA,K.V.,

YOON,W.,KASPERAVICIUTE,D.,GENNARELLI,M.,STRITTMATTER,W.J.,BONVICINI,

C., ROSSI,G.,JAYATHILAKE,K.,COLA,P.A.,MCEVOY,J.P.,KEEFE,R.S.,FISHER,E.M.,

ST JEAN,P.L.,GIEGLING,I.,HARTMANN,A.M.,MOLLER,H.J.,RUPPERT,A.,FRASER,

G., CROMBIE,C.,MIDDLETON,L.T.,ST CLAIR,D.,ROSES,A.D.,MUGLIA,P.,FRANCKS,

C., RUJESCU,D.,MELTZER,H.Y.&GOLDSTEIN, D.B. (2009). A genome-wide investiga- tion of SNPs and CNVs in schizophrenia. PLoS Genet, 5, e1000373.

NEED,A.C.,MCEVOY,J.P.,GENNARELLI,M.,HEINZEN,E.L.,GE,D.,MAIA,J.M.,SHI-

ANNA,K.V.,HE,M.,CIRULLI,E.T.,GUMBS,C.E.,ZHAO,Q.,CAMPBELL,C.R.,HONG,

L., ROSENQUIST,P.,PUTKONEN,A.,HALLIKAINEN,T.,REPO-TIIHONEN,E.,TIIHONEN,

317 REFERENCES

J., LEVY,D.L.,MELTZER,H.Y.&GOLDSTEIN, D.B. (2012). Exome sequencing followed by large-scale genotyping suggests a limited role for moderately rare risk factors of strong effect in schizophrenia. Am J Hum Genet, 91, 303–12.

NELSON,M.R.,WEGMANN,D.,EHM,M.G.,KESSNER,D.,ST JEAN,P.,VERZILLI,C.,

SHEN,J.,TANG,Z.,BACANU,S.A.,FRASER,D.,WARREN,L.,APONTE,J.,ZAWIS-

TOWSKI,M.,LIU,X.,ZHANG,H.,ZHANG,Y.,LI,J.,LI,Y.,LI,L.,WOOLLARD,P.,TOPP,

S., HALL,M.D.,NANGLE,K.,WANG,J.,ABECASIS,G.,CARDON,L.R.,ZOLLNER¨ ,S.,

WHITTAKER,J.C.,CHISSOE,S.L.,NOVEMBRE,J.&MOOSER, V. (2012). An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science, 337, 100–4.

NG,P.C.,LEVY,S.,HUANG,J.,STOCKWELL,T.B.,WALENZ,B.P.,LI,K.,AXELROD,N.,

BUSAM,D.A.,STRAUSBERG,R.L.&VENTER, J.C. (2008). Genetic variation in an individ- ual human exome. PLoS Genet, 4, e1000160.

NICODEMUS,K.,KOLACHANA,B.,VAKKALANKA,R.,STRAUB,R.E.,GIEGLING,I.,EGAN,

M.F., RUJESCU,D.&WEINBERGER, D.R. (2007). Evidence for statistical epistasis between catechol-O-methyltransferase (COMT) and polymorphisms in RGS4, G72 (DAOA), GRM3, and DISC1: influence on risk of schizophrenia. Hum Genet., 120, 889–906.

NICODEMUS,K.K.,CALLICOTT,J.H.,HIGIER,R.G.,LUNA,A.,NIXON,D.C.,LIP-

SKA,B.K.,VAKKALANKA,R.,GIEGLING,I.,RUJESCU,D.,CLAIR,D.S.,MUGLIA,P.,

SHUGART,Y.Y.&WEINBERGER, D.R. (2010). Evidence of statistical epistasis between DISC1, CIT and NDEL1 impacting risk for schizophrenia: biological validation with func- tional neuroimaging. Hum Genet, 127, 441–52.

NIWA,M.,KAMIYA,A.,MURAI,R.,KUBO,K.,GRUBER,A.J.,TOMITA,K.,LU,L.,

TOMISATO,S.,JAARO-PELED,H.,SESHADRI,S.,HIYAMA,H.,HUANG,B.,KOHDA,K.,

NODA,Y.,O’DONNELL,P.,NAKAJIMA,K.,SAWA,A.&NABESHIMA, T. (2010). Knock- down of DISC1 by in utero gene transfer disturbs postnatal dopaminergic maturation in the frontal cortex and leads to adult behavioral deficits. Neuron, 65, 480–9.

NURNBERGER,J.I.J.,BLEHAR,M.C.,KAUFMANN,C.A.,YORK-COOLER,C.,SIMPSON,

S.G., HARKAVY-FRIEDMAN,J.,SEVERE,J.B.,MALASPINA,D.&REICH, T. (1994). Di-

318 REFERENCES

agnostic interview for genetic studies. rationale, unique features, and training. NIMH ge- netics initiative. Arch Gen Psychiatry, 51, 849–59; (discussion 863–4).

OBREITER,M.,FISCHER,C.,CHANG-CLAUDE,J.&BECKMANN, L. (2005). SDMinP: a pro- gram to control the family wise error rate using step-down minP adjusted p-values. Bioin- formatics, 21, 3183–4.

OLDENBURG,K.R.,VO,K.T.,MICHAELIS,S.&PADDON, C. (1997). Recombination- mediated PCR-directed plasmid construction in vivo in yeast. Nucleic Acids Res, 25, 451–2.

OLIVER, T. (2009). Investigating a putative genetic contribution to bipolar disorder. Honours dis- sertation, University of Otago.

OMORI,Y.,ZHAO,C.,SARAS,A.,MUKHOPADHYAY,S.,KIM,W.,FURUKAWA,T.,SEN-

GUPTA,P.,VERAKSA,A.&MALICKI, J. (2008). Elipsa is an early determinant of ciliogen- esis that links the IFT particle to membrane-associated small GTPase Rab8. Nat Cell Biol, 10, 437–44.

ONSTAD,S.,SKRE,I.,TORGERSEN,S.&KRINGLEN, E. (1991). Twin concordance for DSM- III-R schizophrenia. Acta Psychiatr Scand, 83, 395–401.

OTTIS,P.,BADER,V.,TROSSBACH,S.V.,KRETZSCHMAR,H.,MICHEL,M.,LELIVELD,S.R.

&KORTH, C. (2011). Convergence of two independent mental disease genes on the protein level: recruitment of dysbindin to cell-invasive disrupted-in-schizophrenia 1 aggresomes. Biol Psychiatry, 70, 604–10.

OZEKI,Y.,TOMODA,T.,KLEIDERLEIN,J.,KAMIYA,A.,BORD,L.,FUJII,K.,OKAWA,M.,

YAMADA,N.,HATTEN,M.E.,SNYDER,S.H.,ROSS,C.A.&SAWA, A. (2003). Disrupted- in-Schizophrenia-1 (DISC-1): mutant truncation prevents binding to NudE-like (NUDEL) and inhibits neurite outgrowth. Proc Natl Acad Sci U S A, 100, 289–94.

PALO,O.M.,ANTILA,M.,SILANDER,K.,HENNAH,W.,KILPINEN,H.,SORONEN,P.,

TUULIO-HENRIKSSON,A.,KIESEPPA,T.,PARTONEN,T.,LONNQVIST,J.,PELTONEN,L.&

PAUNIO, T. (2007). Association of distinct allelic haplotypes of DISC1 with psychotic and bipolar spectrum disorders and with underlying cognitive impairments. Hum Mol Genet, 16, 2517–28.

319 REFERENCES

PANG,A.W.,MACDONALD,J.R.,PINTO,D.,WEI,J.,RAFIQ,M.A.,CONRAD,D.F.,PARK,

H., HURLES,M.E.,LEE,C.,VENTER,J.C.,KIRKNESS,E.F.,LEVY,S.,FEUK,L.&

SCHERER, S.W. (2010). Towards a comprehensive structural variation map of an individ- ual human genome. Genome Biol, 11, R52.

PARK,Y.U.,JEONG,J.,LEE,H.,MUN,J.Y.,KIM,J.H.,LEE,J.S.,NGUYEN,M.,HAN,S.,

SUH,P.&PARK, S.K. (2010). Disrupted-in-schizophrenia 1 (DISC1) plays essential roles in mitochondria in collaboration with Mitofilin. Proceedings of the National Academy of Sciences, 107, 17785–90.

PATTERSON,N.,PRICE,A.L.&REICH, D. (2006). Population structure and eigenanalysis. PLoS Genet, 2, e190.

PEARSON, K. (1895). Notes on regression and inheritance in the case of two parents. Proceed- ings of the Royal Society of London, 58, 240–242.

PEARSON,T.A.&MANOLIO, T.A. (2008). How to interpret a genome-wide association study. JAMA, 299, 1335–44.

PEDERSEN,L.B.&ROSENBAUM, J.L. (2008). Intraflagellar transport (IFT) role in ciliary as- sembly, resorption and signalling. Curr Top Dev Biol, 85, 23–61.

PELAK,K.,SHIANNA,K.V.,GE,D.,MAIA,J.M.,ZHU,M.,SMITH,J.P.,CIRULLI,E.T.,FEL-

LAY,J.,DICKSON,S.P.,GUMBS,C.E.,HEINZEN,E.L.,NEED,A.C.,RUZZO,E.K.,SINGH,

A., CAMPBELL,C.R.,HONG,L.K.,LORNSEN,K.A.,MCKENZIE,A.M.,SOBREIRA,

N.L.M., HOOVER-FONG,J.E.,MILNER,J.D.,OTTMAN,R.,HAYNES,B.F.,GOEDERT,J.J.

&GOLDSTEIN, D.B. (2010). The characterization of twenty sequenced human genomes. PLoS Genet, 6, e1001111.

PETERMANN,R.,MOSSIER,B.,ARYEE,D.&KOVAR, H. (1998). A recombination based method to rapidly assess specificity of two-hybrid clones in yeast. Nucleic Acids Res, 26, 2252–3.

PHILLIPS, P.C. (2008). Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet, 9, 855–67.

320 REFERENCES

PRABHU,S.&PE’ER, I. (2012). Ultrafast genome-wide scan for snp-snp interactions in com- mon complex disease. Genome Res, 22, 2230–40.

PRICE,A.L.,PATTERSON,N.J.,PLENGE,R.M.,WEINBLATT,M.E.,SHADICK,N.A.&RE-

ICH, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet, 38, 904–9.

PRICE,A.L.,ZAITLEN,N.A.,REICH,D.&PATTERSON, N. (2010). New approaches to pop- ulation stratification in genome-wide association studies. Nat Rev Genet, 11, 459–63.

PSYCHIATRIC GWAS CONSORTIUM BIPOLAR DISORDER WORKING GROUP (2011a). Large- scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near odz4. Nat Genet, 43, 977–83.

PSYCHIATRIC GWAS CONSORTIUM BIPOLAR DISORDER WORKING GROUP (2011b). Large- scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet, 43, 977–83.

PURCELL,S.,NEALE,B.,TODD-BROWN,K.,THOMAS,L.,FERREIRA,M.,BENDER,D.,

MALLER,J.,SKLAR,P.,DE BAKKER,P.,DALY,M.&SHAM, P. (2007). PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81, 559–575.

PURCELL,S.M.,MORAN,J.L.,FROMER,M.,RUDERFER,D.,SOLOVIEFF,N.,ROUSSOS,P.,

O’DUSHLAINE,C.,CHAMBERT,K.,BERGEN,S.E.,KAHLER¨ ,A.,DUNCAN,L.,STAHL,E.,

GENOVESE,G.,FERNANDEZ´ ,E.,COLLINS,M.O.,KOMIYAMA,N.H.,CHOUDHARY,J.S.,

MAGNUSSON,P.K.E.,BANKS,E.,SHAKIR,K.,GARIMELLA,K.,FENNELL,T.,DEPRISTO,

M., GRANT,S.G.N.,HAGGARTY,S.J.,GABRIEL,S.,SCOLNICK,E.M.,LANDER,E.S.,

HULTMAN,C.M.,SULLIVAN,P.F.,MCCARROLL,S.A.&SKLAR, P. (2014). A polygenic burden of rare disruptive mutations in schizophrenia. Nature, 506, 185–90.

QU,M.,TANG,F.,YUE,W.,RUAN,Y.,LU,T.,LIU,Z.,ZHANG,H.,HAN,Y.,ZHANG,D.

&WANG, F. (2007). Positive association of the disrupted-in-schizophrenia-1 gene (DISC1) with schizophrenia in the Chinese Han population. Am J Med Genet B Neuropsychiatr Genet, 144B, 266–70.

321 REFERENCES

RCORE TEAM (2013). R: A language and environment for statistical computing.. R Foundation for Statistical Computing, Vienna, Austria.

RADEMAKERS,R.,ERIKSEN,J.L.,BAKER,M.,ROBINSON,T.,AHMED,Z.,LINCOLN,S.J.,

FINCH,N.,RUTHERFORD,N.J.,CROOK,R.J.,JOSEPHS,K.A.,BOEVE,B.F.,KNOPMAN,

D.S., PETERSEN,R.C.,PARISI,J.E.,CASELLI,R.J.,WSZOLEK,Z.K.,UITTI,R.J.,FELD-

MAN,H.,HUTTON,M.L.,MACKENZIE,I.R.,GRAFF-RADFORD,N.R.&DICKSON,D.W. (2008). Common variation in the miR-659 binding-site of GRN is a major risk factor for TDP43-positive frontotemporal dementia. Hum Mol Genet, 17, 3631–42.

RAGAN,C.,ZUKER,M.&RAGAN, M.A. (2011). Quantitative prediction of mirna-mrna interaction based on equilibrium concentrations. PLoS Comput Biol, 7, e1001090.

RHOADES,M.W.,REINHART,B.J.,LIM,L.P.,BURGE,C.B.,BARTEL,B.&BARTEL,D.P. (2002). Prediction of plant microRNA targets. Cell, 110, 513–20.

RICE,J.P.,SACCONE,N.L.&CORBETT, J. (2001). The lod score method. Adv Genet, 42, 99– 113.

RIPKE,S.,O’DUSHLAINE,C.,CHAMBERT,K.,MORAN,J.L.,KAHLER¨ ,A.K.,AKTERIN,S.,

BERGEN,S.E.,COLLINS,A.L.,CROWLEY,J.J.,FROMER,M.,KIM,Y.,LEE,S.H.,MAG-

NUSSON,P.K.E.,SANCHEZ,N.,STAHL,E.A.,WILLIAMS,S.,WRAY,N.R.,XIA,K.,BET-

TELLA,F.,BORGLUM,A.D.,BULIK-SULLIVAN,B.K.,CORMICAN,P.,CRADDOCK,N.,

DE LEEUW,C.,DURMISHI,N.,GILL,M.,GOLIMBET,V.,HAMSHERE,M.L.,HOLMANS,

P., H OUGAARD,D.M.,KENDLER,K.S.,LIN,K.,MORRIS,D.W.,MORS,O.,MORTENSEN,

P.B., N EALE,B.M.,O’NEILL,F.A.,OWEN,M.J.,MILOVANCEVIC,M.P.,POSTHUMA,

D., POWELL,J.,RICHARDS,A.L.,RILEY,B.P.,RUDERFER,D.,RUJESCU,D.,SIGURDS-

SON,E.,SILAGADZE,T.,SMIT,A.B.,STEFANSSON,H.,STEINBERG,S.,SUVISAARI,J.,

TOSATO,S.,VERHAGE,M.,WALTERS,J.T.,MULTICENTER GENETIC STUDIES OF SCHIZO-

PHRENIA CONSORTIUM,LEVINSON,D.F.,GEJMAN,P.V.,KENDLER,K.S.,LAURENT,C.,

MOWRY,B.J.,O’DONOVAN,M.C.,OWEN,M.J.,PULVER,A.E.,RILEY,B.P.,SCHWAB,

S.G., WILDENAUER,D.B.,DUDBRIDGE,F.,HOLMANS,P.,SHI,J.,ALBUS,M.,ALEXAN-

DER,M.,CAMPION,D.,COHEN,D.,DIKEOS,D.,DUAN,J.,EICHHAMMER,P.,GO-

DARD,S.,HANSEN,M.,LERER,F.B.,LIANG,K.Y.,MAIER,W.,MALLET,J.,NERTNEY,

322 REFERENCES

D.A., NESTADT,G.,NORTON,N.,O’NEILL,F.A.,PAPADIMITRIOU,G.N.,RIBBLE,R.,

SANDERS,A.R.,SILVERMAN,J.M.,WALSH,D.,WILLIAMS,N.M.,WORMLEY,B.,PSY-

CHOSIS ENDOPHENOTYPES INTERNATIONAL CONSORTIUM,ARRANZ,M.J.,BAKKER,S.,

BENDER,S.,BRAMON,E.,COLLIER,D.,CRESPO-FACORRO,B.,HALL,J.,IYEGBE,C.,

JABLENSKY,A.,KAHN,R.S.,KALAYDJIEVA,L.,LAWRIE,S.,LEWIS,C.M.,LIN,K.,

LINSZEN,D.H.,MATA,I.,MCINTOSH,A.,MURRAY,R.M.,OPHOFF,R.A.,POWELL,

J., RUJESCU,D.,VAN OS,J.,WALSHE,M.,WEISBROD,M.,WIERSMA,D.,WELLCOME

TRUST CASE CONTROL CONSORTIUM 2, DONNELLY,P.,BARROSO,I.,BLACKWELL,J.M.,

BRAMON,E.,BROWN,M.A.,CASAS,J.P.,CORVIN,A.P.,DELOUKAS,P.,DUNCANSON,

A., JANKOWSKI,J.,MARKUS,H.S.,MATHEW,C.G.,PALMER,C.N.A.,PLOMIN,R.,

RAUTANEN,A.,SAWCER,S.J.,TREMBATH,R.C.,VISWANATHAN,A.C.,WOOD,N.W.,

SPENCER,C.C.A.,BAND,G.,BELLENGUEZ,C.,FREEMAN,C.,HELLENTHAL,G.,GI-

ANNOULATOU,E.,PIRINEN,M.,PEARSON,R.D.,STRANGE,A.,SU,Z.,VUKCEVIC,D.,

DONNELLY,P.,LANGFORD,C.,HUNT,S.E.,EDKINS,S.,GWILLIAM,R.,BLACKBURN,

H., BUMPSTEAD,S.J.,DRONOV,S.,GILLMAN,M.,GRAY,E.,HAMMOND,N.,JAYAKU-

MAR,A.,MCCANN,O.T.,LIDDLE,J.,POTTER,S.C.,RAVINDRARAJAH,R.,RICKETTS,

M., TASHAKKORI-GHANBARIA,A.,WALLER,M.J.,WESTON,P.,WIDAA,S.,WHIT-

TAKER,P.,BARROSO,I.,DELOUKAS,P.,MATHEW,C.G.,BLACKWELL,J.M.,BROWN,

M.A., CORVIN,A.P.,MCCARTHY,M.I.,SPENCER,C.C.A.,BRAMON,E.,CORVIN,A.P.,

O’DONOVAN,M.C.,STEFANSSON,K.,SCOLNICK,E.,PURCELL,S.,MCCARROLL,S.A.,

SKLAR,P.,HULTMAN,C.M.&SULLIVAN, P.F. (2013). Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet, 45, 1150–9.

RISCH, N.J. (2000). Searching for genetic determinants in the new millennium. Nature, 405, 847–56.

RYU,S.,WON,H.,OH,S.,KIM,J.,PARK,T.,CHO,E.,CHO,Y.,PARK,D.,LEE,Y.,KWON,

J. & HONG, K. (2013). Genome-wide linkage scan of quantitative traits representing symp- tom dimensions in multiplex schizophrenia families. Psychiatry Res, 210, 756–60.

SANGER,F.,NICKLEN,S.&COULSON, A.R. (1977). Dna sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A, 74, 5463–7.

323 REFERENCES

SANTONI,F.A.,MAKRYTHANASIS,P.,NIKOLAEV,S.,GUIPPONI,M.,ROBYR,D.,BOTTANI,

A. & ANTONARAKIS, S.E. (2014). Simultaneous identification and prioritization of vari- ants in familial, de novo, and somatic genetic disorders with variantmaster. Genome Res, 24, 349–55.

SATIR,P.,PEDERSEN,L.B.&CHRISTENSEN, S.T. (2010). The primary cilium at a glance. J Cell Sci, 123, 499–503.

SCHERER,W.F.,SYVERTON,J.T.&GEY, G.O. (1953). Studies on the propagation in vitro of poliomyelitis viruses. IV. Viral multiplication in a stable strain of human malignant epithelial cells (strain HeLa) derived from an epidermoid carcinoma of the cervix. J Exp Med, 97, 695–710.

SCHIZOPHRENIA PSYCHIATRIC GENOME-WIDE ASSOCIATION STUDY (GWAS) CONSOR-

TIUM (2011). Genome-wide association study identifies five new schizophrenia loci. Nat Genet, 43, 969–76.

SCHIZOPHRENIA WORKING GROUP OF THE PSYCHIATRIC GENOMICS CONSORTIUM (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511, 421–7.

SCHUMACHER,J.,LAJE,G.,ABOU JAMRA,R.,BECKER,T.,MUHLEISEN,T.W.,VASILESCU,

C., MATTHEISEN,M.,HERMS,S.,HOFFMANN,P.,HILLMER,A.M.,GEORGI,A.,

HEROLD,C.,SCHULZE,T.G.,PROPPING,P.,RIETSCHEL,M.,MCMAHON,F.J.,NOTHEN,

M.M. & CICHON, S. (2009). The DISC locus and schizophrenia: evidence from an as- sociation study in a central European sample and from a meta-analysis across different European populations. Hum Mol Genet, 18, 2719–27.

SCHUROV,I.L.,HANDFORD,E.J.,BRANDON,N.J.&WHITING, P.J. (2004). Expression of disrupted in schizophrenia 1 (DISC1) protein in the adult and developing mouse brain indicates its role in neurodevelopment. Mol Psychiatry, 9, 1100–10.

SEGURADO,R.,DETERA-WADLEIGH,S.D.,LEVINSON,D.F.,LEWIS,C.M.,GILL,M.,

NURNBERGER,J.I.,JR., CRADDOCK,N.,DEPAULO,J.R.,BARON,M.,GERSHON,E.S.,

EKHOLM,J.,CICHON,S.,TURECKI,G.,CLAES,S.,KELSOE,J.R.,SCHOFIELD,P.R.,

BADENHOP,R.F.,MORISSETTE,J.,COON,H.,BLACKWOOD,D.,MCINNES,L.A.,

324 REFERENCES

FOROUD,T.,EDENBERG,H.J.,REICH,T.,RICE,J.P.,GOATE,A.,MCINNIS,M.G.,

MCMAHON,F.J.,BADNER,J.A.,GOLDIN,L.R.,BENNETT,P.,WILLOUR,V.L.,ZANDI,

P.P., L IU,J.,GILLIAM,C.,JUO,S.H.,BERRETTINI,W.H.,YOSHIKAWA,T.,PELTONEN,

L., LONNQVIST,J.,NOTHEN,M.M.,SCHUMACHER,J.,WINDEMUTH,C.,RIETSCHEL,

M., PROPPING,P.,MAIER,W.,ALDA,M.,GROF,P.,ROULEAU,G.A.,DEL-FAVERO,J.,

VAN BROECKHOVEN,C.,MENDLEWICZ,J.,ADOLFSSON,R.,SPENCE,M.A.,LUEBBERT,

H., ADAMS,L.J.,DONALD,J.A.,MITCHELL,P.B.,BARDEN,N.,SHINK,E.,BYERLEY,W.,

MUIR,W.,VISSCHER,P.M.,MACGREGOR,S.,GURLING,H.,KALSI,G.,MCQUILLIN,

A., ESCAMILLA,M.A.,REUS,V.I.,LEON,P.,FREIMER,N.B.,EWALD,H.,KRUSE,T.A.,

MORS,O.,RADHAKRISHNA,U.,BLOUIN,J.L.&ANTONARAKIS,N.,S.E.AKARSU (2003). Genome scan meta-analysis of schizophrenia and bipolar disorder, Part III: Bipolar disorder. The American Journal of Human Genetics, 73, 49 – 62.

SEMPLE,J.,PRIME,G.,WALLIS,L.,SANDERSON,C.&MARKIE, D. (2005). Two-hybrid reporter vectors for gap repair cloning. Biotechniques, 38, 927–34.

SHI,J.,LEVINSON,D.F.,DUAN,J.,SANDERS,A.R.,ZHENG,Y.,PE’ER,I.,DUDBRIDGE,F.,

HOLMANS,P.A.,WHITTEMORE,A.S.,MOWRY,B.J.,OLINCY,A.,AMIN,F.,CLONINGER,

C.R., SILVERMAN,J.M.,BUCCOLA,N.G.,BYERLEY,W.F.,BLACK,D.W.,CROWE,R.R.,

OKSENBERG,J.R.,MIREL,D.B.,KENDLER,K.S.,FREEDMAN,R.&GEJMAN, P.V. (2009). Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature, 460, 753–7.

SHIH,R.A.,BELMONTE,P.L.&ZANDI, P.P. (2004). A review of the evidence from family, twin and adoption studies for a genetic contribution to adult psychiatric disorders. Int Rev Psychiatry, 16, 260–83.

SHYN,S.I.,SHI,J.,KRAFT,J.B.,POTASH,J.B.,KNOWLES,J.A.,WEISSMAN,M.M.,

GARRIOCK,H.A.,YOKOYAMA,J.S.,MCGRATH,P.J.,PETERS,E.J.,SCHEFTNER,W.A.,

CORYELL,W.,LAWSON,W.B.,JANCIC,D.,GEJMAN,P.V.,SANDERS,A.R.,HOLMANS,

P., S LAGER,S.L.,LEVINSON,D.F.&HAMILTON, S.P. (2011). Novel loci for major depres- sion identified by genome-wide association study of sequenced treatment alternatives to relieve depression and meta-analysis of three studies. Mol Psychiatry, 16, 202–15.

325 REFERENCES

SLATKIN, M. (2008). Linkage disequilibrium–understanding the evolutionary past and map- ping the medical future. Nat Rev Genet, 9, 477–85.

SMITH,E.N.,BLOSS,C.S.,BADNER,J.A.,BARRETT,T.,BELMONTE,P.L.,BERRETTINI,

W., BYERLEY,W.,CORYELL,W.,CRAIG,D.,EDENBERG,H.J.,ESKIN,E.,FOROUD,T.,

GERSHON,E.,GREENWOOD,T.A.,HIPOLITO,M.,KOLLER,D.L.,LAWSON,W.B.,LIU,

C., LOHOFF,F.,MCINNIS,M.G.,MCMAHON,F.J.,MIREL,D.B.,MURRAY,S.S.,NIEV-

ERGELT,C.,NURNBERGER,J.,NWULIA,E.A.,PASCHALL,J.,POTASH,J.B.,RICE,J.,

SCHULZE,T.G.,SCHEFTNER,W.,PANGANIBAN,C.,ZAITLEN,N.,ZANDI,P.P.,ZOLL¨ -

NER,S.,SCHORK,N.J.&KELSOE, J.R. (2009). Genome-wide association study of bipolar disorder in european american and african american individuals. Mol Psychiatry, 14, 755– 63.

SMITH,M.,WASMUTH,J.&MCPHERSON, J. (1989). Cosegregation of an 11q22. 3-9 p22 translocation with affective disorder: proximity of the dopamine D2 receptor gene relative to the translocation breakpoint. Am J Hum Genet, 45, A220 (0864).

SOARES,D.C.,CARLYLE,B.C.,BRADSHAW,N.J.&PORTEOUS, D.J. (2011). DISC1: Struc- ture, function, and therapeutic potential for major mental illness. ACS Chem Neurosci, 2, 609–632.

SONG,W.,LI,W.,NOLTNER,K.,YAN,J.,GREEN,E.,GROZEVA,D.,JONES,I.R.,CRAD-

DOCK,N.,LONGMATE,J.,FENG,J.&SOMMER, S.S. (2010). Identification of high risk disc1 protein structural variants in patients with bipolar spectrum disorder. Neurosci Lett, 486, 136–40.

SOROKIN, S. (1962). Centrioles and the formation of rudimentary cilia by fibroblasts and smooth muscle cells. J Cell Biol, 15, 363–77.

SOROKIN, S.P. (1968). Reconstructions of centriole formation and ciliogenesis in mammalian lungs. J Cell Sci, 3, 207–30.

SOUTULLO,C.A.,CHANG,K.D.,DIEZ-SUAREZ,A.,FIGUEROA-QUINTANA,A.,

ESCAMILLA-CANALES,I.&RAPADO-CASTRO,M.E.A. (2005). Bipolar disorder in children

326 REFERENCES

and adolescents: international perspective on epidemiology and phenomenology. Bipolar Disord, 7, 497–506.

SPIELMAN,R.S.&EWENS, W.J. (1996). The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet, 59, 983–9.

SPIELMAN,R.S.,MCGINNIS,R.E.&EWENS, W.J. (1993). Transmission test for linkage dise- quilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet, 52, 506–16.

ST CLAIR,D.,BLACKWOOD,D.,MUIR,W.,CAROTHERS,A.,WALKER,M.,SPOWART,G.,

GOSDEN,C.&EVANS, H.J. (1990). Association within a family of a balanced autosomal translocation with major mental illness. Lancet, 336, 13–6.

STANKIEWICZ,P.&LUPSKI, J.R. (2010). Structural variation in the human genome and its role in disease. Annu Rev Med, 61, 437–55.

STEFANSSON,H.,RUJESCU,D.,CICHON,S.,PIETILAINEN¨ ,O.P.H.,INGASON,A.,STEIN-

BERG,S.,FOSSDAL,R.,SIGURDSSON,E.,SIGMUNDSSON,T.,BUIZER-VOSKAMP,J.E.,

HANSEN,T.,JAKOBSEN,K.D.,MUGLIA,P.,FRANCKS,C.,MATTHEWS,P.M.,GYLFA-

SON,A.,HALLDORSSON,B.V.,GUDBJARTSSON,D.,THORGEIRSSON,T.E.,SIGURDSSON,

A., JONASDOTTIR,A.,JONASDOTTIR,A.,BJORNSSON,A.,MATTIASDOTTIR,S.,BLON-

DAL,T.,HARALDSSON,M.,MAGNUSDOTTIR,B.B.,GIEGLING,I.,MOLLER¨ ,H.J.,HART-

MANN,A.,SHIANNA,K.V.,GE,D.,NEED,A.C.,CROMBIE,C.,FRASER,G.,WALKER,N.,

LONNQVIST,J.,SUVISAARI,J.,TUULIO-HENRIKSSON,A.,PAUNIO,T.,TOULOPOULOU,

T., BRAMON,E.,DI FORTI,M.,MURRAY,R.,RUGGERI,M.,VASSOS,E.,TOSATO,S.,

WALSHE,M.,LI,T.,VASILESCU,C.,MUHLEISEN¨ ,T.W.,WANG,A.G.,ULLUM,H.,

DJUROVIC,S.,MELLE,I.,OLESEN,J.,KIEMENEY,L.A.,FRANKE,B.,GROUP,SABATTI,

C., FREIMER,N.B.,GULCHER,J.R.,THORSTEINSDOTTIR,U.,KONG,A.,ANDREASSEN,

O.A., OPHOFF,R.A.,GEORGI,A.,RIETSCHEL,M.,WERGE,T.,PETURSSON,H.,GOLD-

STEIN,D.B.,NOTHEN¨ ,M.M.,PELTONEN,L.,COLLIER,D.A.,ST CLAIR,D.&STEFANS-

SON, K. (2008). Large recurrent microdeletions associated with schizophrenia. Nature, 455, 232–6.

327 REFERENCES

STEFANSSON,H.,OPHOFF,R.A.,STEINBERG,S.,ANDREASSEN,O.A.,CICHON,S.,RU-

JESCU,D.,WERGE,T.,PIETILAINEN¨ ,O.P.H.,MORS,O.,MORTENSEN,P.B.,SIGURDS-

SON,E.,GUSTAFSSON,O.,NYEGAARD,M.,TUULIO-HENRIKSSON,A.,INGASON,A.,

HANSEN,T.,SUVISAARI,J.,LONNQVIST,J.,PAUNIO,T.,BØRGLUM,A.D.,HARTMANN,

A., FINK-JENSEN,A.,NORDENTOFT,M.,HOUGAARD,D.,NORGAARD-PEDERSEN,B.,

BOTTCHER¨ ,Y.,OLESEN,J.,BREUER,R.,MOLLER¨ ,H.J.,GIEGLING,I.,RASMUSSEN,H.B.,

TIMM,S.,MATTHEISEN,M.,BITTER,I.,RETHELYI´ ,J.M.,MAGNUSDOTTIR,B.B.,SIG-

MUNDSSON,T.,OLASON,P.,MASSON,G.,GULCHER,J.R.,HARALDSSON,M.,FOS-

SDAL,R.,THORGEIRSSON,T.E.,THORSTEINSDOTTIR,U.,RUGGERI,M.,TOSATO,S.,

FRANKE,B.,STRENGMAN,E.,KIEMENEY,L.A.,GENETIC RISK AND OUTCOME IN PSY-

CHOSIS (GROUP), MELLE,I.,DJUROVIC,S.,ABRAMOVA,L.,KALEDA,V.,SANJUAN,

J., DE FRUTOS,R.,BRAMON,E.,VASSOS,E.,FRASER,G.,ETTINGER,U.,PICCHIONI,

M., WALKER,N.,TOULOPOULOU,T.,NEED,A.C.,GE,D.,YOON,J.L.,SHIANNA,K.V.,

FREIMER,N.B.,CANTOR,R.M.,MURRAY,R.,KONG,A.,GOLIMBET,V.,CARRACEDO,

A., ARANGO,C.,COSTAS,J.,JONSSON¨ ,E.G.,TERENIUS,L.,AGARTZ,I.,PETURSSON,

H., NOTHEN¨ ,M.M.,RIETSCHEL,M.,MATTHEWS,P.M.,MUGLIA,P.,PELTONEN,L.,

ST CLAIR,D.,GOLDSTEIN,D.B.,STEFANSSON,K.&COLLIER, D.A. (2009). Common variants conferring risk of schizophrenia. Nature, 460, 744–7.

STURTEVANT, A.H. (1913). The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J Exp Zool, 14, 43–59.

SUAREZ,B.K.,DUAN,J.,SANDERS,A.R.,HINRICHS,A.L.,JIN,C.H.,HOU,C.,BUCCOLA,

N.G., HALE,N.,WEILBAECHER,A.N.,NERTNEY,D.A.,OLINCY,A.,GREEN,S.,SCHAF-

FER,A.W.,SMITH,C.J.,HANNAH,D.E.,RICE,J.P.,COX,N.J.,MARTINEZ,M.,MOWRY,

B.J., AMIN,F.,SILVERMAN,J.M.,BLACK,D.W.,BYERLEY,W.F.,CROWE,R.R.,FREED-

MAN,R.,CLONINGER,C.R.,LEVINSON,D.F.&GEJMAN, P.V. (2006). Genomewide link- age scan of 409 European-ancestry and African American families with schizophrenia: suggestive evidence of linkage at 8p23.3-p21.2 and 11p13.1-q14.1 in the combined sample. Am J Hum Genet, 78, 315–33.

SULLIVAN, P.F. (2013). Questions about DISC1 as a genetic risk factor for schizophrenia. Mol Psychiatry, 18, 1050–2.

328 REFERENCES

SULLIVAN,P.F.,KENDLER,K.S.&NEALE, M. (2003). Schizophrenia as a complex trait: evi- dence from a meta-analysis of twin studies. Arch Gen Psychiatry, 60, 1187–92.

SUMMERTON,J.&WELLER, D. (1997). Morpholino antisense oligomers: design, prepara- tion, and properties. Antisense Nucleic Acid Drug Dev, 7, 187–95.

SURPILI,M.J.,DELBEN,T.M.&KOBARG, J. (2003). Identification of proteins that interact with the central coiled-coil region of the human protein kinase NEK1. Biochemistry, 42, 15369–76.

TAYLOR,M.S.,DEVON,R.S.,MILLAR,J.K.&PORTEOUS, D.J. (2003). Evolutionary con- straints on the Disrupted in Schizophrenia locus. Genomics, 81, 67–77.

TENNESSEN,J.A.,BIGHAM,A.W.,O’CONNOR,T.D.,FU,W.,KENNY,E.E.,GRAVEL,S.,

MCGEE,S.,DO,R.,LIU,X.,JUN,G.,KANG,H.M.,JORDAN,D.,LEAL,S.M.,GABRIEL,

S., RIEDER,M.J.,ABECASIS,G.,ALTSHULER,D.,NICKERSON,D.A.,BOERWINKLE,E.,

SUNYAEV,S.,BUSTAMANTE,C.D.,BAMSHAD,M.J.,AKEY,J.M.,BROAD GO, SEATTLE

GO & NHLBI EXOME SEQUENCING PROJECT (2012). Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337, 64–9.

THE 1000 GENOMES PROJECT CONSORTIUM (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65.

THOMSON,P.A.,WRAY,N.R.,MILLAR,J.K.,EVANS,K.L.,HELLARD,S.L.,CONDIE,A.,

MUIR,W.J.,BLACKWOOD,D.H.&PORTEOUS, D.J. (2005). Association between the TRAX/DISC locus and both bipolar disorder and schizophrenia in the Scottish popula- tion. Mol Psychiatry, 10, 657–68, 616.

THOMSON,P.A.,PARLA,J.S.,MCRAE,A.F.,KRAMER,M.,RAMAKRISHNAN,K.,YAO,J.,

SOARES,D.C.,MCCARTHY,S.,MORRIS,S.W.,CARDONE,L.,CASS,S.,GHIBAN,E.,

HENNAH,W.,EVANS,K.L.,REBOLINI,D.,MILLAR,J.K.,HARRIS,S.E.,STARR,J.M.,

MACINTYRE,D.J.,GENERATION SCOTLAND,MCINTOSH,A.M.,WATSON,J.D.,DEARY,

I.J., VISSCHER,P.M.,BLACKWOOD,D.H.,MCCOMBIE,W.R.&PORTEOUS, D.J. (2014). 708 common and 2010 rare DISC1 locus variants identified in 1542 subjects: analysis for association with psychiatric disorder and cognitive traits. Mol Psychiatry, 19, 668–75.

329 REFERENCES

TIAN,C.,GREGERSEN,P.K.&SELDIN, M.F. (2008). Accounting for ancestry: population substructure and genome-wide association studies. Hum Mol Genet, 17, R143–50.

TSUANG, M. (2000). Schizophrenia: genes and environment. Biol Psychiatry Psychiatry, 47, 210–20.

TSUI,L.C.,BUCHWALD,M.,BARKER,D.,BRAMAN,J.C.,KNOWLTON,R.,SCHUMM,J.W.,

EIBERG,H.,MOHR,J.,KENNEDY,D.&PLAVSIC, N. (1985). Cystic fibrosis locus defined by a genetically linked polymorphic DNA marker. Science, 230, 1054–7.

TSUI,L.C.,BUETOW,K.&BUCHWALD, M. (1986). Genetic analysis of cystic fibrosis using linked DNA markers. Am J Hum Genet, 39, 720–8.

TURNER,S.,ARMSTRONG,L.L.,BRADFORD,Y.,CARLSON,C.S.,CRAWFORD,D.C.,CREN-

SHAW,A.T.,DE ANDRADE,M.,DOHENY,K.F.,HAINES,J.L.,HAYES,G.,JARVIK,G.,

JIANG,L.,KULLO,I.J.,LI,R.,LING,H.,MANOLIO,T.A.,MATSUMOTO,M.,MCCARTY,

C.A., MCDAVID,A.N.,MIREL,D.B.,PASCHALL,J.E.,PUGH,E.W.,RASMUSSEN,L.V.,

WILKE,R.A.,ZUVICH,R.L.&RITCHIE, M.D. (2011). Quality control procedures for genome-wide association studies. Curr Protoc Hum Genet, Chapter 1, Unit1.19.

UEKI,M.&CORDELL, H.J. (2012). Improved statistics for genome-wide interaction analysis. PLoS Genet, 8, e1002625.

VENTER,J.C.,ADAMS,M.D.,MYERS,E.W.,LI,P.W.,MURAL,R.J.,SUTTON,G.G.,SMITH,

H.O., YANDELL,M.,EVANS,C.A.,HOLT,R.A.,GOCAYNE,J.D.,AMANATIDES,P.,

BALLEW,R.M.,HUSON,D.H.,WORTMAN,J.R.,ZHANG,Q.,KODIRA,C.D.,ZHENG,

X.H., CHEN,L.,SKUPSKI,M.,SUBRAMANIAN,G.,THOMAS,P.D.,ZHANG,J.,GA-

BOR MIKLOS,G.L.,NELSON,C.,BRODER,S.,CLARK,A.G.,NADEAU,J.,MCKU-

SICK,V.A.,ZINDER,N.,LEVINE,A.J.,ROBERTS,R.J.,SIMON,M.,SLAYMAN,C.,

HUNKAPILLER,M.,BOLANOS,R.,DELCHER,A.,DEW,I.,FASULO,D.,FLANIGAN,

M., FLOREA,L.,HALPERN,A.,HANNENHALLI,S.,KRAVITZ,S.,LEVY,S.,MOBARRY,

C., REINERT,K.,REMINGTON,K.,ABU-THREIDEH,J.,BEASLEY,E.,BIDDICK,K.,

BONAZZI,V.,BRANDON,R.,CARGILL,M.,CHANDRAMOULISWARAN,I.,CHARLAB,

R., CHATURVEDI,K.,DENG,Z.,DI FRANCESCO,V.,DUNN,P.,EILBECK,K.,EVANGE-

LISTA,C.,GABRIELIAN,A.E.,GAN,W.,GE,W.,GONG,F.,GU,Z.,GUAN,P.,HEIMAN,

330 REFERENCES

T.J., HIGGINS,M.E.,JI,R.R.,KE,Z.,KETCHUM,K.A.,LAI,Z.,LEI,Y.,LI,Z.,LI,J.,

LIANG,Y.,LIN,X.,LU,F.,MERKULOV,G.V.,MILSHINA,N.,MOORE,H.M.,NAIK,A.K.,

NARAYAN,V.A.,NEELAM,B.,NUSSKERN,D.,RUSCH,D.B.,SALZBERG,S.,SHAO,W.,

SHUE,B.,SUN,J.,WANG,Z.,WANG,A.,WANG,X.,WANG,J.,WEI,M.,WIDES,R.,

XIAO,C.,YAN,C.,YAO,A.,YE,J.,ZHAN,M.,ZHANG,W.,ZHANG,H.,ZHAO,Q.,

ZHENG,L.,ZHONG,F.,ZHONG,W.,ZHU,S.,ZHAO,S.,GILBERT,D.,BAUMHUETER,S.,

SPIER,G.,CARTER,C.,CRAVCHIK,A.,WOODAGE,T.,ALI,F.,AN,H.,AWE,A.,BALD-

WIN,D.,BADEN,H.,BARNSTEAD,M.,BARROW,I.,BEESON,K.,BUSAM,D.,CARVER,

A., CENTER,A.,CHENG,M.L.,CURRY,L.,DANAHER,S.,DAVENPORT,L.,DESILETS,R.,

DIETZ,S.,DODSON,K.,DOUP,L.,FERRIERA,S.,GARG,N.,GLUECKSMANN,A.,HART,

B., HAYNES,J.,HAYNES,C.,HEINER,C.,HLADUN,S.,HOSTIN,D.,HOUCK,J.,HOW-

LAND,T.,IBEGWAM,C.,JOHNSON,J.,KALUSH,F.,KLINE,L.,KODURU,S.,LOVE,A.,

MANN,F.,MAY,D.,MCCAWLEY,S.,MCINTOSH,T.,MCMULLEN,I.,MOY,M.,MOY,

L., MURPHY,B.,NELSON,K.,PFANNKOCH,C.,PRATTS,E.,PURI,V.,QURESHI,H.,

REARDON,M.,RODRIGUEZ,R.,ROGERS,Y.H.,ROMBLAD,D.,RUHFEL,B.,SCOTT,R.,

SITTER,C.,SMALLWOOD,M.,STEWART,E.,STRONG,R.,SUH,E.,THOMAS,R.,TINT,

N.N., TSE,S.,VECH,C.,WANG,G.,WETTER,J.,WILLIAMS,S.,WILLIAMS,M.,WIND-

SOR,S.,WINN-DEEN,E.,WOLFE,K.,ZAVERI,J.,ZAVERI,K.,ABRIL,J.F.,GUIGO´ ,R.,

CAMPBELL,M.J.,SJOLANDER,K.V.,KARLAK,B.,KEJARIWAL,A.,MI,H.,LAZAREVA,

B., HATTON,T.,NARECHANIA,A.,DIEMER,K.,MURUGANUJAN,A.,GUO,N.,SATO,S.,

BAFNA,V.,ISTRAIL,S.,LIPPERT,R.,SCHWARTZ,R.,WALENZ,B.,YOOSEPH,S.,ALLEN,

D., BASU,A.,BAXENDALE,J.,BLICK,L.,CAMINHA,M.,CARNES-STINE,J.,CAULK,P.,

CHIANG,Y.H.,COYNE,M.,DAHLKE,C.,MAYS,A.,DOMBROSKI,M.,DONNELLY,M.,

ELY,D.,ESPARHAM,S.,FOSLER,C.,GIRE,H.,GLANOWSKI,S.,GLASSER,K.,GLODEK,

A., GOROKHOV,M.,GRAHAM,K.,GROPMAN,B.,HARRIS,M.,HEIL,J.,HENDERSON,

S., HOOVER,J.,JENNINGS,D.,JORDAN,C.,JORDAN,J.,KASHA,J.,KAGAN,L.,KRAFT,

C., LEVITSKY,A.,LEWIS,M.,LIU,X.,LOPEZ,J.,MA,D.,MAJOROS,W.,MCDANIEL,J.,

MURPHY,S.,NEWMAN,M.,NGUYEN,T.,NGUYEN,N.,NODELL,M.,PAN,S.,PECK,J.,

PETERSON,M.,ROWE,W.,SANDERS,R.,SCOTT,J.,SIMPSON,M.,SMITH,T.,SPRAGUE,

A., STOCKWELL,T.,TURNER,R.,VENTER,E.,WANG,M.,WEN,M.,WU,D.,WU,M.,

XIA,A.,ZANDIEH,A.&ZHU, X. (2001). The sequence of the human genome. Science, 291,

331 REFERENCES

1304–51.

VIDALAIN,P.,BOXEM,M.,GE,H.,LI,S.&VIDAL, M. (2004). Increasing specificity in high- throughput yeast two-hybrid experiments. Methods, 32, 363–70.

VON MERING,C.,KRAUSE,R.,SNEL,B.,CORNELL,M.,OLIVER,S.,FIELDS,S.&BORK, P. (2002). Comparative assessment of large-scale data sets of protein–protein interactions. Nature, 417, 399–403.

WALSH,T.,MCCLELLAN,J.M.,MCCARTHY,S.E.,ADDINGTON,A.M.,PIERCE,S.B.,

COOPER,G.M.,NORD,A.S.,KUSENDA,M.,MALHOTRA,D.,BHANDARI,A.,STRAY,

S.M., RIPPEY,C.F.,ROCCANOVA,P.,MAKAROV,V.,LAKSHMI,B.,FINDLING,R.L.,SI-

KICH,L.,STROMBERG,T.,MERRIMAN,B.,GOGTAY,N.,BUTLER,P.,ECKSTRAND,K.,

NOORY,L.,GOCHMAN,P.,LONG,R.,CHEN,Z.,DAVIS,S.,BAKER,C.,EICHLER,E.E.,

MELTZER,P.S.,NELSON,S.F.,SINGLETON,A.B.,LEE,M.K.,RAPOPORT,J.L.,KING,

M.C. & SEBAT, J. (2008). Rare structural variants disrupt multiple genes in neurodevelop- mental pathways in schizophrenia. Science, 320, 539–43.

WAN,X.,YANG,C.,YANG,Q.,XUE,H.,FAN,X.,TANG,N.L.S.&YU, W. (2010). Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet, 87, 325–40.

WANG,K.,LI,M.&HAKONARSON, H. (2010a). Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res, 38, e164.

WANG,K.S.,LIU,X.F.&ARAGAM, N. (2010b). A genome-wide meta-analysis identifies novel loci associated with schizophrenia and bipolar disorder. Schizophrenia Research, 124, 192 – 199.

WANG,Z.,JACOBS,K.B.,YEAGER,M.,HUTCHINSON,A.,SAMPSON,J.,CHATTERJEE,N.,

ALBANES,D.,BERNDT,S.I.,CHUNG,C.C.,DIVER,W.R.,GAPSTUR,S.M.,TERAS,L.R.,

HAIMAN,C.A.,HENDERSON,B.E.,STRAM,D.,DENG,X.,HSING,A.W.,VIRTAMO,J.,

EBERLE,M.A.,STONE,J.L.,PURDUE,M.P.,TAYLOR,P.,TUCKER,M.&CHANOCK,S.J. (2012). Improved imputation of common and uncommon SNPs with a new reference set. Nat Genet, 44, 6–7.

332 REFERENCES

WEI,W.H.,HEMANI,G.&HALEY, C.S. (2014). Detecting epistasis in human complex traits. Nat Rev Genet, 15, 722–33.

WELLCOME TRUST CASE-CONTROL CONSORTIUM (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661–78.

WHEATLEY, D.N. (2005). Landmarks in the first hundred years of primary (9+0) cilium re- search. Cell Biol Int, 29, 333–9.

WILLOUR,V.L.,ZANDI,P.P.,HUO,Y.,DIGGS,T.L.,CHELLIS,J.L.,MACKINNON,D.F.,

SIMPSON,S.G.,MCMAHON,F.J.,POTASH,J.B.,GERSHON,E.S.,REICH,T.,FOROUD,

T., NURNBERGER,J.I.,RAYMOND DEPAULO,J.&MCINNIS, M.G. (2003). Genome scan of the fifty-six bipolar pedigrees from the NIMH genetics initiative replication sample: Chromosomes 4, 7, 9, 18, 19, 20, and 21. Am. J. Med. Genet., 121B, 21–7.

WONG,K.A.,WILSON,J.,RUSSO,A.,WANG,L.,OKUR,M.,WANG,X.,MARTIN,N.,

SCAPPINI,E.,CARNEGIE,G.&O’BRYAN, J. (2012). Intersectin (ITSN) family of scaffolds function as molecular hubs in protein interaction networks. PLoS ONE, 7, e36023.

WOOD,J.D.,BONATH,F.,KUMAR,S.,ROSS,C.A.&CUNLIFFE, V.T. (2009). Disrupted-in- schizophrenia 1 and neuregulin 1 are required for the specification of oligodendrocytes and neurones in the zebrafish brain. Hum Mol Genet, 18, 391–404.

WOOD,L.S.,PICKERING,E.H.&DECHAIRO, B.M. (2007). Significant support for DAO as a schizophrenia susceptibility locus: examination of five genes putatively associated with schizophrenia. Biol Psychiatry, 61, 1195–9.

XU,B.,ROOS,J.L.,LEVY,S.,VAN RENSBURG,E.J.,GOGOS,J.A.&KARAYIORGOU,M. (2008). Strong association of de novo copy number mutations with sporadic schizophrenia. Nat Genet, 40, 880–885.

XU,J.,WIESCH,D.G.&MEYERS, D.A. (1998). Genetics of complex human diseases: genome screening, association studies and fine mapping. Clin Exp Allergy, 28, 1–5; discussion 26–8.

XU,W.,COHEN-WOODS,S.,CHEN,Q.,NOOR,A.,KNIGHT,J.,HOSANG,G.,PARIKH,

S.V., DE LUCA,V.,TOZZI,F.,MUGLIA,P.,FORTE,J.,MCQUILLIN,A.,HU,P.,GURL-

ING,H.M.D.,KENNEDY,J.L.,MCGUFFIN,P.,FARMER,A.,STRAUSS,J.&VINCENT,J.B.

333 REFERENCES

(2014). Genome-wide association study of bipolar disorder in Canadian and UK popula- tions corroborates disease loci including SYNE1 and CSMD1. BMC Med Genet, 15, 2.

YANG,J.,WEEDON,M.,PURCELL,S.,LETTRE,G.,ESTRADA,K.,WILLER,C.,SMITH,A.,

INGELSSON,E.,O’CONNELL,J.,MANGINO,M.,MAGI¨ ,R.,MADDEN,P.,HEATH,A.,NY-

HOLT,D.,MARTIN,N.,MONTGOMERY,G.,FRAYLING,T.,HIRSCHHORN,J.,MCCARTHY,

M., GODDARD,M.,VISSCHER,P.&CONSORTIUM, G. (2011a). Genomic inflation factors under polygenic inheritance. Eur J Hum Genet., 9, 807–12.

YANG,J.,WEEDON,M.N.,PURCELL,S.,LETTRE,G.,ESTRADA,K.,WILLER,C.J.,SMITH,

A.V., INGELSSON,E.,O’CONNELL,J.R.,MANGINO,M.,MAGI¨ ,R.,MADDEN,P.A.,

HEATH,A.C.,NYHOLT,D.R.,MARTIN,N.G.,MONTGOMERY,G.W.,FRAYLING,T.M.,

HIRSCHHORN,J.N.,MCCARTHY,M.I.,GODDARD,M.E.,VISSCHER,P.M.&GIANT

CONSORTIUM (2011b). Genomic inflation factors under polygenic inheritance. Eur J Hum Genet, 19, 807–12.

YING,S.Y.,CHANG,D.C.&LIN, S.L. (2008). The microRNA (miRNA): overview of the RNA genes that modulate gene function. Mol Biotechnol, 38, 257–68.

YOUNG, K.H. (1998). Yeast two-hybrid: so many interactions, (in) so little time. Biol Reprod, 58, 302–11.

ZHOU,X.,GEYER,M.A.&KELSOE, J.R. (2008). Does disrupted-in-schizophrenia (DISC1) generate fusion transcripts? Mol Psychiatry, 13, 361–3.

ZHOU,X.,CHEN,Q.,SCHAUKOWITCH,K.,KELSOE,J.R.&GEYER, M.A. (2010). Insoluble DISC1-Boymaw fusion proteins generated by DISC1 translocation. Mol Psychiatry, 15, 669– 72.

ZIMMERMAN, K.W. (1898). [Contributions to the knowledge of some glands and epithelia]. Archiv f¨urmikroskopische Anatomie, 52, 552–706.

334