bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Genomic dissection of bipolar disorder and schizophrenia including 28 subphenotypes
Douglas M Ruderfer1, Stephan Ripke2,3,4, Andrew McQuillin5, James Boocock6, Eli A Stahl7, Jennifer M Whitehead Pavlides8, Niamh Mullins9, Alexander W Charney10, Anil P S Ori11, Loes M Olde Loohuis11, Enrico Domenici12, Arianna Di Florio13, Sergi Papiol14,15, Janos L. Kalman14,15, Rolf Adolfsson16, Ingrid Agartz17,18,19, Esben Agerbo20,21,22, Huda Akil23, Diego Albani24, Margot Albus25, Martin Alda26,27, Madeline Alexander28, Judith Allardyce29, Ney Alliey-Rodriguez30, Thomas D Als31,22,32, Farooq Amin33,34, Adebayo Anjorin35, Maria J Arranz36,37, Swapnil Awasthi3, Silviu A Bacanu38, Judith A Badner39, Marie Baekvad-Hansen40, Steven Bakker 41, Gavin Band42, Jack D Barchas43, Ines Barroso44, Nicholas Bass45, Michael Bauer46, Bernhard T Baune47, Martin Begemann48, Celine Bellenguez42, Richard A Belliveau Jr4, Frank Bellivier49,50,51,52, Stephan Bender53,54, Judit Bene55,56, Sarah E Bergen57,4, Wade H Berrettini58, Elizabeth Bevilacqua4, Joanna M Biernacka59, Tim B Bigdeli38, Donald W Black60, Hannah Blackburn44, Jenefer M Blackwell61,62, Douglas HR Blackwood63, Carsten Bocker Pedersen20,22,21, Michael Boehnke64, Marco Boks41, Anders D Borglum32,65,22, Elvira Bramon66,37,67, Gerome Breen9,68, Matthew A Brown69, Richard Bruggeman70, Nancy G Buccola71, Randy L Buckner72,73,74, Monika Budde14, Brendan Bulik-Sullivan2,4, Suzannah J Bumpstead44, William Bunney75, Margit Burmeister76, Joseph D Buxbaum77,78,10,79, Jonas Bybjerg-Grauholm80, William Byerley81, Wiepke Cahn41, Guiqing Cai77,10, Murray J Cairns82,83,84, Dominique Campion85, Rita M Cantor6, Vaughan J Carr83,86, Noa Carrera87, Juan P Casas88,89, Miquel Casas90,91,92,93, Stanley V Catts94,83, Pablo Cervantes95, Kimberley D Chambert4, Raymond CK Chan96, Eric YH Chen97,98, Ronald YL Chen97, Wei Cheng99, Eric FC Cheung100, Siow Ann Chong101, Toni-Kim Clarke63, C Robert Cloninger102, David Cohen103, Nadine Cohen104, Jonathan R I Coleman9,68, David A Collier105,106, Paul Cormican107, William Coryell108, Nicholas Craddock87,109, David W Craig110, Benedicto Crespo-Facorro111,112, James J Crowley113, Cristiana Cruceanu114, David Curtis115,116, Piotr M Czerski117, Anders M Dale118, Mark J Daly2,119,4, Udo Dannlowski120, Ariel Darvasi121, Michael Davidson122, Kenneth L Davis10, Christiaan A de Leeuw123, Franziska Degenhardt124,125,125,124, Jurgen Del Favero126, Lynn E DeLisi127,128, Panos Deloukas44, Ditte Demontis32,31,22, J Raymond DePaulo129, Marta di Forti130, Dimitris Dikeos131, Timothy Dinan132, Srdjan Djurovic133,19, Amanda L Dobbyn7,134, Peter Donnelly135,42, Gary Donohoe 136,107, Elodie Drapeau10, Serge Dronov44, Jubao Duan30,137, Frank Dudbridge138, Audrey Duncanson139, Howard Edenberg140, Sarah Edkins44, Hannelore Ehrenreich48, Peter Eichhammer141, Torbjorn Elvsashagen142,19, Johan Eriksson143,144,145, Valentina Escott-Price87, Tonu Esko146,147,148,119, Laurent Essioux149, Bruno Etain150,49,51,52, Chun Chieh Fan151, Kai-How Farh2, Martilias S Farrell113, Matthew Flickinger64, Tatiana M Foroud152, Liz Forty29, Josef Frank153, Lude Franke154, Christine Fraser29, Robert Freedman155, Colin Freeman42, Nelson B Freimer11, Joseph I Friedman10, Menachem Fromer2,134,156,4, Mark A Frye157, Janice M Fullerton158,159, Katrin Gade160,14, Julie Garnham26, Helena A Gaspar9, Pablo V Gejman30,137, Giulio Genovese4, Lyudmila Georgieva 87, Claudia Giambartolomei6, Eleni Giannoulatou42, Ina Giegling161,162, Michael Gill163, Matthew Gillman44, Marianne Giortz Pedersen164, Paola Giusti-Rodriguez113, Stephanie Godard165, Fernando Goes129, Jacqueline I Goldstein2,119, Srihari Gopal166, Scott D Gordon167, Katherine Gordon-Smith168, Jacob Gratten8, Emma Gray44, Elaine K Green169, Melissa J Green86, Tiffany A Greenwood170, Maria Grigoroiu-Serbanescu171, Jakob Grove172,31,22,32, Weihua Guan173, Hugh Gurling5, Jose Guzman Parra174, Rhian Gwilliam44, Lieuwe de Haan175, Jeremy Hall63, Mei-Hua Hall176, Christian Hammer48, Naomi Hammond44, Marian L Hamshere87, Mark Hansen177, Thomas Hansen178,22, Vahram Haroutunian10,79,179, Annette M Hartmann161, Joanna Hauser117, Martin Hautzinger180, Urs Heilbronner14, Garrett Hellenthal42, Frans A Henskens181,83,182, Stefan Herms183,124,125, Maria Hipolito184, Joel N Hirschhorn146,147,119, Per Hoffmann124,183,125,185, Mads V Hollegaard186, David M Hougaard187,186, Hailiang Huang2,119, Laura Huckins7,134, Christina M Hultman57, Sarah E Hunt44, Masashi Ikeda188, Nakao Iwata188, Conrad Iyegbe37, Assen V Jablensky189,83,190,191, Stephane Jamain192,193, Janusz Jankowski194, Alagurevathi Jayakumar44, Inge Joa 195, Ian Jones29, Lisa A Jones168, Erik G Jonsson17,19, Antonio Julia196, Anders Jureus57, Anna K Kahler57, Rene S Kahn41,10, Luba Kalaydjieva197,198, Radhika Kandaswamy9, Sena Karachanak-Yankova199, Juha Karjalainen154, Robert Karlsson57, David Kavanagh87, Matthew C Keller200, Brian J Kelly82, John Kelsoe170, James L Kennedy201,202,203, Andrey Khrunin204, Yunjung Kim113, George Kirov87, Sarah Kittel-Schneider205, Janis Klovins206, Jo Knight201,202,203, Sarah V Knott168, James A Knowles207,208,209, Manolis Kogevinas210, Bettina Konte161, Eugenia Kravariti130, Vaidutis Kucinskas211, Zita Ausrele Kucinskiene211, Ralph Kupka41,212,213, Hana Kuzelova-Ptackova214, Mikael Landen57,215, Cordelia Langford44, Claudine Laurent216,28, Jacob Lawrence217, Stephen Lawrie63, William B Lawson184, Markus Leber218, Marion Leboyer49,192,219, Phil H Lee2,156,4, Jimmy Lee Chee Keong220,101, Sophie E Legge87, Todd Lencz221,222,223, Bernard Lerer224, Douglas F Levinson28, Shawn E Levy225, Cathryn M Lewis226,37,9, Jun Z Li227, Miaoxin Li228,97,98, Qingqin S Li166,166, Tao Li229, Kung-Yee Liang230, Jennifer Liddle44, Jeffrey Lieberman231, Svetlana Limborska204, Kuang Lin37, Don H Linszen232, Jolanta Lissowska233, Chunyu Liu234, Jianjun Liu235,236, Jouko Lonnqvist237, Carmel M Loughland82,83, Jan Lubinski238, bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Susanne Lucae239, Milan Macek Jr214, Donald J MacIntyre240,241, Patrik KE Magnusson57, Brion S Maher242, Pamela B Mahon129,243, Wolfgang Maier244, Anil K Malhotra222,223,245, Jacques Mallet246, Ulrik F Malt247, Hugh S Markus248, Sara Marsal196, Nicholas G Martin167,249, Ignacio Mata111,112, Christopher G Mathew250, Manuel Mattheisen32,31,17,251,32,252,22, Morten Mattingsdal19,253, Fermin Mayoral174, Owen T McCann44, Robert W McCarley127,128, Steven A McCarroll146,4, Mark I McCarthy254, Colm McDonald255, Susan L McElroy256, Peter McGuffin9, Melvin G McInnis257, Andrew M McIntosh258,63, James D McKay259, Francis J McMahon260, Helena Medeiros209, Sarah E Medland167, Sandra Meier153, Carin J Meijer175, Bela Melegh55,56, Ingrid Melle261,19, Fan Meng257,23, Raquelle I Mesholam-Gately127,262, Andres Metspalu148,263, Patricia T Michie83,264, Lili Milani148, Vihra Milanova265, Philip B Mitchell86, Younes Mokrab105, Grant W Montgomery266, Jennifer L Moran4, Gunnar Morken267,268, Derek W Morris269,107, Ole Mors32,22, Preben B Mortensen20,21,22, Bryan J Mowry8,270, Thomas W Mühleisen183,271, Bertram Müller-Myhsok272,239,273, Kieran C Murphy274, Robin M Murray130, Richard M Myers225, Inez Myin-Germeys275, Benjamin M Neale2,119,156,4, Mari Nelis148, Igor Nenadic276, Deborah A Nertney270, Gerald Nestadt129, Kristin K Nicodemus277, Caroline M Nievergelt170,278, Liene Nikitina-Zake206, Vishwajit Nimgaonkar279, Laura Nisenbaum280, Merete Nordentoft281, Annelie Nordin16, Markus M Nöthen124,125, Evaristus A Nwulia184, Eadbhard O'Callaghan282, Claire O'Donovan26, Colm O'Dushlaine4, F Anthony O'Neill283, Ketil J Oedegaard284,285, Sang-Yun Oh286, Ann Olincy155, Line Olsen178,22, Lilijana Oruc287, Jim Van Os37,275, Michael J Owen87,109, Sara A Paciga288, Colin N A Palmer289, Aarno Palotie290,156,4, Christos Pantelis291,83, George N Papadimitriou131, Elena Parkhomenko10, Carlos Pato292,209, Michele T Pato209, Tiina Paunio293,294, Richard Pearson42, Psychosis Endophenotypes International Consortium295, Diana O Perkins296, Roy H Perlis297,127, Amy Perry168, Tune H Pers298,147,119, Tracey L Petryshen299,127,4, Andrea Pfennig46, Marco Picchioni130, Olli Pietilainen290,294, Jonathan Pimm5, Matti Pirinen42, Robert Plomin106, Andrew J Pocklington87, Danielle Posthuma300,301,302, James B Potash129, Simon C Potter44, John Powell37, Alkes Price303,119, Ann E Pulver129, Shaun M Purcell134,243, Digby Quested304, Josep Antoni Ramos-Quiroga90,91,92,93, Henrik B Rasmussen178,22, Anna Rautanen42, Radhi Ravindrarajah44, Eline J Regeer305, Abraham Reichenberg10,79, Andreas Reif205, Mark A Reimers306, Marta Ribases91,92,93, John P Rice102, Alexander L Richards87,109, Michelle Ricketts44, Brien P Riley307, Fabio Rivas174, Margarita Rivera308,9, Joshua L Roffman72,74, Guy A Rouleau309,310, Panos Roussos7,78,134,311, Dan Rujescu161,162, Veikko Salomaa145, Cristina Sanchez-Mora91,92,93, Alan R Sanders30,137, Stephen J Sawcer312, Ulrich Schall82,83, Alan F Schatzberg28, William A Scheftner313, Peter R Schofield158,159, Nicholas J Schork314, Sibylle G Schwab315, Edward M Scolnick4, Laura J Scott64, Rodney J Scott316,83,84, Larry J Seidman127,262, Alessandro Serretti317, Pak C Sham228,97,98, Cynthia Shannon Weickert158,86, Tatyana Shehktman170, Jianxin Shi318, Paul D Shilling170, Engilbert Sigurdsson319, Jeremy M Silverman10,320, Kang Sim101, Claire Slaney26, Petr Slominsky204, Olav B Smeland321,261,19, Jordan W Smoller74,156,4, Hon-Cheong So97, Janet L Sobell322, Erik Soderman17, Christine Soholm Hansen40, Chris C A Spencer42, Anne T Spijker323, David St Clair324, Hreinn Stefansson325, Kari Stefansson325,326, Stacy Steinberg325, Elisabeth Stogmann 327, Eystein Stordal328,329, Amy Strange42, Richard E Straub330, John S Strauss331,202, Fabian Streit153, Eric Strengman332,41, Jana Strohmaier153, T Scott Stroup231, Zhan Su42, Mythily Subramaniam333, Jaana Suvisaari237, Dragan M Svrakic102, Jin P Szatkiewicz113, Szabolcs Szelinger334, Avazeh Tashakkori-Ghanbaria44, Srinivas Thirumalai335, Robert C Thompson257, Thorgeir E Thorgeirsson325, Draga Toncheva199, Paul A Tooney336,83,84, Sarah Tosato337, Timothea Toulopoulou338, Richard C Trembath250, Jens Treutlein153, Vassily Trubetskoy3, Gustavo Turecki339, Arne E Vaaler340, Helmut Vedder341, Eduard Vieta342, John Vincent331, Peter M Visscher8, Ananth C Viswanathan343,344, Damjan Vukcevic42, John Waddington345, Matthew Waller44, Dermot Walsh346, Muriel Walshe37, James TR Walters87, Dai Wang166, Qiang Wang229, Weiqing Wang7,10, Yunpeng Wang178,347, Stanley J Watson257, Bradley T Webb38, Thomas W Weickert158,86, Daniel R Weinberger348,330, Matthias Weisbrod54, Mark Weiser122, Thomas Werge349,178,22, Paul Weston44, Pamela Whittaker44, Sara Widaa44, Durk Wiersma350, Dieter B Wildenauer190, Nigel M Williams87, Stephanie Williams113, Stephanie H Witt153, Aaron R Wolen306, Emily HM Wong97, Nicholas W Wood351, Brandon K Wormley38, Wellcome Trust Case-Control Consortium 2 352, Jing Qin Wu83,84, Simon Xi353, Wei Xu354,355, Allan H Young356, Clement C Zai201,202, Peter Zandi357, Peng Zhang358, Xuebin Zheng235, Fritz Zimprich327, Sebastian Zollner257, Aiden Corvin163, Ayman H Fanous359, Sven Cichon124,183,125,271,185, Marcella Rietschel153, Elliot S Gershon360, Thomas G Schulze153,129,160,260,14, Alfredo B Cuellar-Barboza361, Andreas J Forstner183,362,125,124, Peter A Holmans29,87,109, John I Nurnberger363, Ole A Andreassen261,19, S Hong Lee8, Michael C O'Donovan87,109, Patrick F Sullivan113,57,296, Roel A Ophoff11,41, Naomi R Wray266,8, Pamela Sklar7,78,79,311*, Kenneth S Kendler364*
*Authors contributed equally to this work Affiliations are towards the end of the manuscript Corresponding author: Douglas M. Ruderfer ([email protected])
bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Abstract
Schizophrenia (SCZ) and bipolar disorder (BD) are highly heritable disorders that share a
significant proportion of common risk variation. Understanding the genetic factors underlying
the specific symptoms of these disorders will be crucial for improving diagnosis, intervention
and treatment. In case-control data consisting of 53,555 cases (20,129 BD, 33,426 SCZ) and
54,065 controls, we identified 114 genome-wide significant loci (GWS) when comparing all
cases to controls, of which 41 represented novel findings. Two genome-wide significant loci
were identified when comparing SCZ to BD and a third was found when directly incorporating
functional information. Regional joint association identified a genomic region of overlapping
association in BD and SCZ with disease-independent causal variants indicating a fourth region
contributing to differences between these disorders. Regional SNP-heritability analyses
demonstrated that the estimated heritability of BD based on the SCZ GWS regions was
significantly higher than that based on the average genomic region (91 regions, p = 1.2x10-6)
while the inverse was not significant (19 regions, p=0.89). Using our BD and SCZ GWAS we
calculated polygenic risk scores and identified several significant correlations with: 1) SCZ
subphenotypes: negative symptoms (SCZ, p=3.6x10-6) and manic symptoms (BD, p=2x10-5), 2)
BD subphenotypes: psychotic features (SCZ p=1.2x10-10, BD p=5.3x10-5) and age of onset (SCZ
p=7.9x10-4). Finally, we show that psychotic features in BD has significant SNP-heritability
2 (h snp=0.15, SE=0.06), and a significant genetic correlation with SCZ (rg=0.34) in addition there
is a significant sign test result between SCZ GWAS and a GWAS of BD cases contrasting those
with and without psychotic features (p=0.0038, one-side binomial test). For the first time, we
have identified specific loci pointing to a potential role of 4 genes (DARS2, ARFGEF2, DCAKD
and GATAD2A) that distinguish between BD and SCZ, providing an opportunity to understand bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
the biology contributing to clinical differences of these disorders. Our results provide the best
evidence so far of genomic components distinguishing between BD and SCZ that contribute
directly to specific symptom dimensions.
Introduction
Bipolar disorder (BD) and schizophrenia (SCZ) are severe psychiatric disorders and among the
leading causes of disability worldwide1. Both disorders have significant genetic components with
heritability estimates ranging from 60-80%2. A genetic-epidemiological genetic study
demonstrated a substantial overlap between these two disorders with a genetic correlation from
common variation near 0.6-0.7 and high relative risks (RR) among relatives of both BD and SCZ
patients (RRs for parent/offspring: BD/BD: 6.4, BD/SCZ: 2.4; SCZ/BD: 5.2, SCZ/SCZ: 9.9)3.
Despite shared genetics and symptomology, the current diagnostic systems4,5 represent BD and
SCZ as distinct categorical entities differentiated on the basis of their clinical presentation, with
BD characterized by predominant mood symptoms, mood-congruent delusions and an episodic
disease course and SCZ considered a prototypical psychotic disorder. Further, premorbid
cognitive impairment and reduced intelligence are more frequent and severe in SCZ than BD6.
The genetic contributors to these phenotypic distinctions have yet to be elucidated and could aid
in understanding the underlying biology of their unique clinical presentation.
While the shared genetic component is large, studies to date have identified key genetic
architecture differences between these two disorders. A polygenic risk score created from a case
only SCZ vs BD genome-wide association study (GWAS) significantly correlated with SCZ vs
BD diagnosis in an independent sample7, providing evidence that differences between the
disorders also have a genetic basis. An enrichment of rare, moderate to highly penetrant copy bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
number variants (CNVs) and de novo CNVs are seen in SCZ patients8–12, while, the involvement
of CNVs in BD is much less clear13. Although the role of de novo single nucleotide variants in
BD and SCZ has been investigated in only a handful of studies so far, enrichment in pathways
associated with the postsynaptic density has been reported for SCZ, but not BD14,15. Identifying
disorder-specific variants or quantifying the contribution of variation to specific symptom
dimensions remains an open question. For example, previous work by this group has
demonstrated that SCZ patients with greater manic symptoms had higher polygenic risk for BD7.
Here, we utilize the largest collection of genotyped samples of BD and SCZ along with 28
subphenotypes to assess variants and genomic regions that contribute differentially to the
disorders and to specific symptoms dimensions or subphenotypes within them.
Methods
Sample Description
SCZ samples are those analyzed previously16. BD samples are the newest collection from
Psychiatric Genomics Consortium Bipolar Disorder Working Group (Stahl et al. submitted). To
ensure independence of the data sets, individuals were excluded until no individual showed a
relatedness (pihat) value greater than 0.2 to any other individual in the collection, while
preferentially keeping the case over the control for case-control related pairs. In total 2,181 BD
cases, 1,604 SCZ cases and 27,308 controls were removed (most of which were previously
known), leaving 20,129 BD cases 33,426 SCZ cases and 54,065 controls for the final meta-
analysis. bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
For analyses directly comparing BD and SCZ, we matched cases from both phenotypes on
genotyping platform and ancestry, resulting in 15,270 BD cases versus 23,585 SCZ cases. In
other words, we were able to match 76% of BD cases and 71% of SCZ cases.
Sub-phenotype description
BD sub-phenotypes were collected by each study site using a combination of diagnostic
instruments, case records and participant interviews. Ascertainment details for each study site are
described in the supplementary data of the PGC Bipolar Working Group paper (Stahl et al.
submitted). The selection of phenotypes for collection by this group was determined by literature
searches in order to determine phenotypes with prior evidence for heritability. It was further
refined dependent on the availability of phenotype data across a range of study sites and the
consistency by which the phenotypes were defined. Schizophrenia subphenotypes are the same
as described previously but in a larger proportion of patients7.
Quality Control, Imputation, Association Analysis and Polygenic Risk Scoring
Quality control and imputation were performed on each of the study cohort datasets (n=81),
according to standards established by the Psychiatric Genomics Consortium (PGC). The quality
control parameters for retaining SNPs and subjects were: SNP missingness < 0.05 (before
sample removal); subject missingness (p < 0.02); autosomal heterozygosity deviation (| Fhet | <
0.2); SNP missingness < 0.02 (after sample removal); difference in SNP missingness between
cases and controls < 0.02; and SNP Hardy-Weinberg equilibrium (p > 10−6 in controls or p >
10−10 in cases). Genotype imputation was performed using the pre-phasing/imputation stepwise
approach implemented in IMPUTE217 / SHAPEIT18 (chunk size of 3 Mb and default bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
parameters). The imputation reference set consisted of 2,186 phased haplotypes from the full
1000 Genomes Project dataset (August 2012, 30,069,288 variants, release “v3.macGT1”). After
imputation, we used the best guess genotypes, for further robust relatedness testing and
population structure analysis. Here we required very high imputation quality (INFO > 0.8) and
low missingness (<1%) for further quality control. After linkage disequilibrium (LD) pruning (r2
< 0.02) and frequency filtering (MAF > 0.05), there were 14,473 autosomal SNPs in the data set.
Relatedness testing was done with PLINK19 and pairs of subjects with pihat > 0.2 were identified
and one member of each pair removed at random after preferentially retaining cases over
controls. Principal component estimation was done with the same collection of autosomal SNPs.
We tested the first 20 principal components for phenotype association (using logistic regression
with study indicator variables included as covariates) and evaluated their impact on the genome-
wide test statistics using λ. Thirteen principal components namely 1,2,3,4,5,6,7,8,10,12,15,18,20
were included in all association analyses (λ=1.45). Analytical steps were repeated for SCZ vs
BD analysis.
We performed four main association analyses, i.e. (i) GWAS of BD and SCZ as a single
combined case phenotype, as well as disorder-specific GWAS using independent control sets in
(ii) BD cases vs BD controls and (iii) SCZ cases vs SCZ controls, and (iv) association analysis of
SCZ cases vs BD cases.
Summary-data-based Mendelian Randomization (SMR)20
We used SMR as a statistical fine-mapping tool applied to the SCZ vs BD GWAS results to
identify loci with strong evidence of causality via gene expression. SMR analysis is limited to
significant (FDR < 0.05) cis SNP-expression quantitative trait loci (eQTLs) with MAF > 0.01. bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
eQTLs passing these thresholds were combined with GWAS results in the SMR test, with
significance (pSMR) reported at a Bonferroni-corrected threshold for each eQTL data set. The
eQTL architecture may differ between genes. Through LD, many SNPs can generate significant
associations with the same gene, but in some instances multiple SNPs may be independently
associated with the expression of a gene. After identification of significant SNP-expression-trait
association through the SMR test, a follow-up heterogeneity test aims to prioritize variants by
excluding regions for which there is conservative evidence for multiple causal loci (pHET < 0.05).
SMR analyses were conducted using eQTL data from whole peripheral blood21, dorsolateral
prefrontal cortex generated by the CommonMind Consortium8, and 11 brain sub-regions from
the GTEx consortium22.
Regional joint GWAS
Summary statistic Z-scores were calculated for each marker in each of the four main GWAS
results, using the logistic regression coefficient and its standard error. Rare SNPs (MAF < 0.01),
and SNPs with a low INFO score (< 0.3) in either dataset were removed. The causal variant
relationships between SCZ and BD were investigated using the Bayesian method software pw-
gwas (v0.2.1), with quasi-independent regions determined by estimate LD blocks in an analysis
of European individuals (n=1,702)23,24. Briefly, pw-gwas takes a Bayesian approach to determine
the probability of five independent models of association. (1) There is no causal variant in BD or
SCZ; (2) a causal variant in BD, but not SCZ (3); a causal variant in SCZ, but not BD; (4) a
shared causal variant influencing both BD and SCZ; (5) two causal variants where one influences
BD, and one influences SCZ. The posterior probability of each model is calculated using model bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
priors, estimated empirically within pw-gwas. Regions were considered to support a particular
model when the posterior probability of the model was greater than 0.5.
Regional SNP-heritability estimation
We calculated local SNP-heritability independently for SCZ and BD using the Heritability
Estimator from Summary Statistics (HESS) software25 for each of the independent regions
defined above. The sum of these regional estimates is the total SNP-heritability of the trait. To
calculate local SNP-heritability HESS requires reference LD matrices representative of the
population from which the GWAS samples were drawn. We utilized the 1000 genomes European
individuals as the reference panel26. Unlike pw-gwas23, HESS does not assume that only one
causal variant can be present in each region.
Results
GWAS
We performed association analysis of BD and SCZ as a combined phenotype, totaling 53,555
cases (20,129 BD, 33,426 SCZ) and 54,065 controls on 15.5 million dosages imputed from 1000
genomes phase 326. Logistic regression was performed controlling for 13 components of
ancestry, study sites and genotyping platform. One hundred and fourteen regions contained at
least one variant for which the p-value was lower than our genome-wide significance (GWS)
threshold of p < 5x10-8. Among these 114 loci, 41 had non-overlapping LD regions (r2 > 0.6)
with the largest and most recently performed single disease GWAS of SCZ16 and BD (Stahl et al.
submitted). Establishing independent controls (see Methods) allowed us to perform disorder- bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
specific GWAS in 20,129 BD cases vs 21,524 BD controls and 33,426 SCZ cases and 32,541
SCZ controls. Using these results, we compared effect sizes of these 114 loci across each
disorder independently (Figure 1a) showing that subsets of variants have larger effects in SCZ vs
BD or vice versa.
To identify loci with divergent effects on BD and SCZ, we performed an association analysis on
23,585 SCZ cases and 15,270 BD cases matched for shared ancestry and genotyping platform
(see Methods, Figure 1b Supplementary Figures 1-5, Supplementary Table 1). Two genome-
wide significant loci were identified, the most significant of which was rs56355601 located on
chromosome 1 at position 173,811,455 within an intron of DARS2. The second most significant
locus was a four base indel on chromosome 20 at position 47638976 in an intron of ARFGEF2.
For both variants, the minor allele frequency was higher in BD cases than SCZ cases and
disease-specific GWAS showed opposite directions of effect. We sought to identify additional
disease specific loci by incorporating expression information with association results to perform
fine-mapping and identify novel variants27–30. Here, we applied the summary-data-based
Mendelian randomization (SMR) method20 (see Methods) utilizing the cis-QTLs derived from
peripheral blood21, human dorsolateral prefrontal cortex (DLPFC)31 from the Common Mind
Consortium and 11 brain regions from the GTEx consortium22. We identified one SNP-probe
combination that surpassed the threshold for genome-wide significance in blood but was also the
most significant finding in brain. We found that SNP rs4793172 in gene DCAKD is associated
-6 with SCZ vs BD analysis (pGWAS = 2.8x10 ) and is an eQTL for probe ILMN 1811648 (peQTL =
-168 -6 -25 -5 2.9x10 ), resulting in pSMR = 4.1x10 in blood (peQTL = 2.9x10 , pSMR = 2.0x10 in DLFC,
-15 -5 and peQTL = 4.6x10 , pSMR = 6.0x10 in GTEx cerebellar hemisphere) (Supplementary Table 2, bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Supplementary Figure 6) and shows no evidence of heterogeneity (pHET =0.66) which implies
only a single causal variant in the region.
Regional joint association
We expanded our efforts to identify disorder specific genomic regions by jointly analyzing
independent GWAS results from BD and SCZ23. Among 1,702 regions genome-wide (see
Methods), 223 had a posterior probability of greater than 0.5 of having a causal variant in at least
one disorder. Of these, 132 best fit the model of a shared causal variant influencing both BD and
SCZ, 88 were most likely specific to SCZ, 3 demonstrated evidence of two independent variants
(with one impacting each of the two disorders) and zero were BD specific. Of note, the data
estimated prior probability of having a BD specific region was 0.1% compared to 15% for SCZ,
potentially a result of increased power from the larger SCZ sample size.
The 114 GWS SNPs from the combined BD and SCZ GWAS localized into 99 independent
regions, of which 78 (79%) were shared with a posterior probability of greater than 0.5. Sixty
regions had at least one GWS SNP in the independent SCZ GWAS, of which 30 (50%) are
shared and 8 regions contained a GWS SNP in the independent BD GWAS, of which 6 (75%)
are shared using the same definition. For the three regions showing evidence for independent
variants, two had highly non-overlapping association signals in the same region stemming from
independent variants. The third, on chromosome 19 presented a different scenario where
association signals were overlapping (Supplementary Figure 7). The most significant variant in
BD was rs111444407 (chr19:19358207, p = 8.67x10-10) and for SCZ was rs2315283
(chr19:19480575, p=4.41x10-7). After conditioning on the most significant variant in the other
disorder, the association signals of the most significant variant in BD and SCZ were largely bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
unchanged (BD rs111444407 =1.3x10-9, SCZ rs2315283 p=6.7x10-5). We further calculated the
probability of each variant in the region being causal for both BD and SCZ32 and found no
correlation (r= -0.00016). The most significant variants had the highest posterior probability of
being causal (SCZ: rs2315283, prob = 0.02, BD: rs111444407, prob = 0.16). Both variants most
significantly regulate the expression of GATAD2A in brain31 but in opposite directions
-15 -28 (rs111444407 peQTL = 6x10 , beta = 0.105; rs2315283 peQTL = 1.5x10 , beta = -0.11).
Regional SNP-heritability estimation
2 Across the genome, regional SNP-heritabilities (h snp) were estimated separately for SCZ and
BD25 and were found to be moderately correlated (r=0.25). We next defined risk regions as those
containing the most associated SNP for each GWS locus. In total, there were 101 SCZ risk
regions from the 105 autosomal GWS loci reported previously16 and 29 BD risk regions from 30
GWS loci reported in a companion paper (Stahl et al. submitted). Ten regions were risk regions
for both BD and SCZ comprising 33% of BD risk regions and 10% of SCZ risk regions. We
2 further stratified regional h snp by whether a region was a risk region in one disorder, none or
both (Figure 2). Since the discovery data for the regions overlapped with the data used for the
heritability estimation, we expected within-disorder analyses to show significant results. In risk
2 regions specific to SCZ (n=91) there was a significant increase in regional h snp in SCZ, as
expected (p = 1.1x10-22), but also in BD (p = 1.2x10-6). In risk regions specific to BD (n=19),
2 significantly increased regional h snp was observed in BD, as expected (p = 0.0007), but not in
2 SCZ (p = 0.89). Risk regions shared by both disorders had significantly higher h snp in both
disorders, as expected (BD p = 5.3x10-5, SCZ p = 0.006), compared to non-risk regions.
2 However, we observed a significant increase in BD h snp in shared risk regions compared to BD bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
2 risk regions (BD p = 0.003) but not SCZ h snp for shared risk regions compared to SCZ risk
regions (p = 0.62). Using a less stringent p-value threshold for defining risk regions (p < 5x10-6),
thereby substantially increasing the number of regions, resulted in similar results (Supplementary
2 Figure 8). Seven regions contributed to substantially higher h snp in SCZ compared to BD but no
region showed the inverse pattern. Of these regions, all but one was in the major
histocompatibility region (MHC), the sole novel region was chr10:104380410-106695047 with
2 2 regional h snp= 0.0019 in SCZ and h snp=0.00063 in BD.
Polygenic dissection of subphenotypes
Subphenotypes were collected for a subset of patients in both BD and SCZ (see Methods). For
SCZ, we had clinical quantitative measurements of manic, depressive, positive and negative
symptoms generated from factor analysis of multiple instruments as described previously7 but in
larger sample sizes (n=6908, 6907, 8259, 8355 respectively). For BD, 24 subphenotypes were
collected among nearly 13,000 cases in distinct categories including comorbidities, clinical
information such as rapid cycling and psychotic features as well as additional disease course data
such as age of onset and number of hospitalizations. For each BD and SCZ patient, we calculated
a polygenic risk score (PRS) using all SNPs, from each of the four main GWAS analyses
(BD+SCZ, BD, SCZ and SCZvsBD). We then used regression analysis including principal
components and site to assess the relationship between each subphenotype and the 4 PRS. We
applied a significance cutoff of p < 0.0004 based on Bonferroni correction for 112 tests. In total,
we identified 6 significant results after correction (Figure 3, Table 1). For BD PRS we see a
significant positive correlation between PRS and manic symptoms in SCZ cases as seen
previously7 (p=2x10-5, t=4.26) and psychotic features in BD patients (p=5.3x10-5, t=4.04). For bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
SCZ PRS, we see a significant increase in PRS for BD cases with versus without psychotic
features (p=1.2x10-10, t=6.45) and negative symptoms in SCZ patients (p=3.60x10-6, t=4.64). As
with the SCZ PRS, BD+SCZ PRS is also significantly associated with psychotic features in BD
(p=7.9x10-13, t=7.17) and negative symptoms in SCZ (p=1.5x10-5, t=4.33). While not surpassing
conservative correction, the next two most significant results are both indicative of a more severe
course in BD: increased BD+SCZ PRS with increased numbers of hospitalizations in BD cases
(p=4.2x10-4, t=3.53) and increased SCZ PRS with earlier onset of BD (p=7.9x10-4, t=-3.36). We
assessed the role of BD subtype on correlation between SCZ PRS and psychotic features and
identified significant correlation when restricted to only BD type I cases (BDI: 3,763 with
psychosis, 2,629 without, p=1.55x10-5, Supplementary Table 3).
For all 8 quantitative subphenotypes and 9 binary subphenotypes having at least 1,000 cases, we
performed a GWAS within cases to calculate heritability and genetic correlation with BD and
2 33 SCZ. Only two subphenotypes had significant h snp estimates using LD-score regression ,
2 2 psychotic features in BD (h snp=0.15, SE=0.06) and suicide attempt (h snp=0.25, SE=0.1). Only
psychotic features demonstrated significant genetic correlation with SCZ (rg=0.34, SE=0.13,
p=0.009). While the genetic correlation demonstrates a genome-wide relationship between
common variants contributing to SCZ and those contributing to psychotic features in BD cases,
we sought to assess whether this could be demonstrated among the most significantly associated
SCZ loci. Of the 105 autosomal genome-wide significant SCZ loci previously published16, 60
out of 100 variants in our dataset after QC demonstrated the same direction of effect for
psychotic features in BD (p=0.028, one-sided binomial-test).
bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Discussion
Here we present a genetic dissection of bipolar disorder and schizophrenia from over 100,000
genotyped subjects. As previously shown34, we found an extensive degree of genetic sharing
between these two disorders. We identified 114 genome-wide significant loci contributing to
both disorders of which 37 are novel to this analysis. Despite the high degree of sharing, we
identified several loci that significantly differentiated between the two disorders, having opposite
directions of effect, and polygenic components that significantly correlated from one disorder to
symptoms of the other.
Two GWS loci were identified from the case only SCZ versus BD analysis providing
opportunities to inform the underlying biological distinctions between BD and SCZ. The most
significant locus is in DARS2 (coding for the mitochondrial Aspartate-tRNA ligase) which is
highly expressed in the brain and significantly regulated by the most significant SNP rs56355601
-11 (peQTL=2.5x10 ). Homozygous mutations in DARS2 are responsible for leukoencephalopathy
with brainstem and spinal cord involvement and lactate elevation (LBSL), which was
characterized by neurological symptoms such as psychomotor developmental delay, cerebellar
ataxia and delayed mental development35. Interestingly, based on methylation analysis from the
prefrontal cortex of stress models (rats and monkeys) and from peripheral samples (in monkeys
and human newborns), DARS2, among others, has been suggested as a potential molecular
marker of early-life stress and vulnerability to psychiatric disorders36. The second most
significant locus maps to ARFGEF2, which codes for ADP Ribosylation Factor Guanine
Nucleotide Exchange Factor 2 (also known as BIG2), a protein involved in vesicular trafficking
from the trans-Golgi network. Mutations in ARFGEF2 have been shown to underlie an bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
autosomal recessive condition characterized by microcephaly and periventricular heterotopia, a
disorder caused by abnormal neural proliferation and migration37. Although not genome-wide
significant, the third most significant locus implicates ARNTL (Aryl Hydrocarbon Receptor
Nuclear Translocator Like), which is a core component of the circadian clock. ARNTL has been
previously hypothesized for relevance in bipolar disorder,38 although human genetic evidence is
limited39. Incorporating transcriptional data identified a third genome-wide significant finding in
DCAKD. The gene codes for Dephospho-CoA Kinase Domain Containing, a member of the
human postsynaptic density proteome from human neocortex40. In the mouse cortical
synaptoproteome DCAKD has been found to be among the proteins with the highest changes
between juvenile postnatal days and adult stage, which suggests a putative role in brain
development41,42.
We further assessed the contribution of regions of the genome to each disorder through joint
regional association and regional heritability estimation. These results point to two additional
loci that may contribute differentially to liability to BD and SCZ. The region on chr19 shows
overlapping association peaks that are driven by independent causal variants for each disorder.
Both variants significantly regulate the same gene GATAD2A but in opposite directions.
GATAD2A is a transcriptional repressor, which is targeted by MBD2 and is involved in
methylation-dependent gene silencing. The protein is part of the large NuRD (nucleosome
remodeling and deacetylase) complex, for which also HDAC1/2 are essential components. NurD
complex proteins have been associated to autism43. Their members, including GATAD2A, display
preferential expression in fetal brain development43 and in recent work has been implicated in
SCZ through open chromatin44. Further, p66a (mouse GATAD2A) was recently shown to bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
participate in memory preservation through long-lasting histone modification in hippocampal
memory-activated neurons45. The region on chromosome 10 appears to be shared across both
disorders; however, there are additional independent contributing variants to SCZ and not BD,
indicating another region of interest, although biological interpretation remains unknown.
More broadly, SNP-heritability appears to be consistently shared across regions and
chromosomes between these two disorders. Regions with GWS loci often explain higher
proportions of heritability as expected. When looking at the effect on heritability of the presence
of a GWS locus in the other disorder, we identified a significant increase in BD heritability for
regions containing a GWS locus for SCZ but no significant increase in SCZ heritability in
regions having a BD one. This result suggests a directionality to the genetic sharing of these
disorders with a larger proportion of BD loci being specific to BD. However, we cannot exclude
that the asymmetry of results may reflect less power of discovery for BD than SCZ. The degree
to which power and subphenotypes contribute to this result requires further examination.
We have now identified multiple genomic signatures that correlate between one disorder and a
clinical symptom in the other disorder, demonstrating that there are genetic components
underlying particular symptom dimensions within these disorders. As previously shown, we find
a significant positive correlation between PRS of BD and manic symptoms in SCZ. We also
demonstrate that BD cases with psychotic features carry a significantly higher SCZ PRS than BD
cases without psychotic features and this result is not driven by schizoaffective BD subtype.
Further, we show evidence that increased PRS is associated with more severe illness. This is true
for BD with psychotic features having increased SCZ PRS, earlier onset BD having higher SCZ bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
PRS and cases with higher BD+SCZ PRS having a larger number of hospitalizations. We
demonstrated that psychotic features within BD is an independently heritable trait and that GWS
loci for SCZ have a consistent direction of effect in psychotic features in BD, demonstrating the
potential to study psychosis more directly to identify variants contributing to that symptom
dimension. All in all, this work illustrates the utility of genetic data to dissect symptom
heterogeneity among correlated disorders and suggests that further work could potentially aid in
defining subgroups of patients for more personalized treatment.
bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Figures
a.
1.3 SCZ BD 1.2 OR 1.1 1.0 rs4702 rs1850 rs778353 rs989197 rs302315 rs760648 rs704370 rs571171 rs880446 rs151680 rs1023330 rs4958577 rs6694545 rs2910032 rs4470825 rs7785663 rs2890780 rs9922678 rs4377658 rs1198588 rs4298967 rs4812321 rs9636107 rs7907645 rs4906364 rs7185124 rs4604142 rs4380187 rs6434928 rs9607782 rs3858402 rs9834970 rs2239063 rs2514218 rs4481150 rs7776707 rs2414718 rs9695226 rs2905627 rs6927514 rs6065777 rs2358740 rs2339519 rs6534338 rs76432012 rs10745841 rs10469966 rs10043984 rs11587347 rs12691307 rs13438979 rs12753750 rs11693702 rs10132089 rs35768850 rs59979824 rs10035564 rs11624408 rs12474906 rs73060317 rs12541491 rs78676616 rs34076257 rs10149407 rs80256351 rs74804370 rs11191419 rs28681284 rs72936308 rs11210892 rs12293670 rs11693528 rs12668727 rs12902973 rs11223661 rs34440050 rs12704290 rs10791098 rs13135092 rs13233308 rs13169274 rs11688767 rs10764493 rs12190758 rs66515729 rs62263118 rs36074045 rs28735056 rs12805133 rs66885728 rs10265001 rs13231398 rs12751064 rs12202969 rs111547017 rs114385935 rs143809748 rs138079842 rs111444407 chr3_2522690_I chr7_2025096_I chr8_26191524_I chr2_97444432_I chr7_24747494_D chr6_84280274_D chr8_38291845_D chr12_57483523_I chr16_58643275_I chr2_208447631_D chr1_163646390_D chr2_194465711_D chr11_46350213_D chr10_104957618_I chr12_123742918_D
b.
Figure 1. a) Odds ratios (OR) from independent data sets of BD (blue) and SCZ (red) for each of
the 114 genome-wide significant variants in the BD and SCZ vs controls GWAS. b) Manhattan
plot for SCZ vs BD GWAS. bioRxiv preprint doi: https://doi.org/10.1101/173435; this version posted August 8, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
0.0020 GWS disorder NONE (n=1,559) BP (n=19) SCZ (n=91) BOTH (n=10)
0.0015 p=0.003
p=1.2x10-6 itability
r p=0.0007 0.0010